Checking out our examples
The hello-samza project contains several examples to help you create your Samza applications. To checkout the hello-samza project:
High-level API examples
The Samza Cookbook contains various recipes using the Samza high-level API. These include:
The Filter example demonstrates how to perform stateless operations on a stream.
The Join example demonstrates how you can join a Kafka stream of page-views with a stream of ad-clicks
The Stream-Table Join example demonstrates how to use the Samza Table API. It joins a Kafka stream with a remote dataset accessed through a REST service.
In addition to the cookbook, you can also consult these:
Wikipedia Parser: An advanced example that builds a streaming pipeline consuming a live-feed of wikipedia edits, parsing each message and generating statistics from them.
Low-level API examples
The Wikipedia Parser (low-level API): Same example that builds a streaming pipeline consuming a live-feed of wikipedia edits, parsing each message and generating statistics from them, but using low-level APIs.
Samza SQL API examples
You can easily create a Samza job declaratively using Samza SQL.
Apache Beam API examples
The easiest way to get a copy of the WordCount examples in Beam API is to use Apache Maven. After installing Maven, please run the following command:
This command creates a maven project
word-count-beam which contains a series of example pipelines that count words in text files:
To use SamzaRunner, please add the following
samza-runner profile to
pom.xml under the “profiles” section, same as in here.
Now we can run the wordcount example with Samza using the following command:
After the pipeline finishes, you can check out the output counts files in /tmp folder. Note Beam generates multiple output files for parallel processing. If you prefer a single output, please update the code to use TextIO.write().withoutSharding().