Getting Started
Documentation

Checking out our examples

The hello-samza project contains several examples to help you create your Samza applications. To checkout the hello-samza project:

> git clone https://git.apache.org/samza-hello-samza.git hello-samza

High-level API examples

The Samza Cookbook contains various recipes using the Samza high-level API. These include:

  • The Filter example demonstrates how to perform stateless operations on a stream.

  • The Join example demonstrates how you can join a Kafka stream of page-views with a stream of ad-clicks

  • The Stream-Table Join example demonstrates how to use the Samza Table API. It joins a Kafka stream with a remote dataset accessed through a REST service.

  • The SessionWindow and TumblingWindow examples illustrate Samza’s rich windowing and triggering capabilities.

In addition to the cookbook, you can also consult these:

  • Wikipedia Parser: An advanced example that builds a streaming pipeline consuming a live-feed of wikipedia edits, parsing each message and generating statistics from them.

  • Amazon Kinesis and Azure Eventhubs examples that cover how to consume input data from the respective systems.

Low-level API examples

The Wikipedia Parser (low-level API): Same example that builds a streaming pipeline consuming a live-feed of wikipedia edits, parsing each message and generating statistics from them, but using low-level APIs.

Samza SQL API examples

You can easily create a Samza job declaratively using Samza SQL.

Apache Beam API examples

The easiest way to get a copy of the WordCount examples in Beam API is to use Apache Maven. After installing Maven, please run the following command:

> mvn archetype:generate \
      -DarchetypeGroupId=org.apache.beam \
      -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
      -DarchetypeVersion=2.9.0 \
      -DgroupId=org.example \
      -DartifactId=word-count-beam \
      -Dversion="0.1" \
      -Dpackage=org.apache.beam.examples \
      -DinteractiveMode=false

This command creates a maven project word-count-beam which contains a series of example pipelines that count words in text files:

> cd word-count-beam/

> ls src/main/java/org/apache/beam/examples/
DebuggingWordCount.java WindowedWordCount.java  common
MinimalWordCount.java   WordCount.java

To use SamzaRunner, please add the following samza-runner profile to pom.xml under the “profiles” section, same as in here.

    ...
    <profile>
      <id>samza-runner</id>
      <dependencies>
        <dependency>
          <groupId>org.apache.beam</groupId>
          <artifactId>beam-runners-samza</artifactId>
          <version>${beam.version}</version>
          <scope>runtime</scope>
        </dependency>
      </dependencies>
    </profile>
    ....

Now we can run the wordcount example with Samza using the following command:

>mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
     -Dexec.args="--inputFile=pom.xml --output=/tmp/counts --runner=SamzaRunner" -Psamza-runner

After the pipeline finishes, you can check out the output counts files in /tmp folder. Note Beam generates multiple output files for parallel processing. If you prefer a single output, please update the code to use TextIO.write().withoutSharding().

>more /tmp/counts*
AS: 1
IO: 2
IS: 1
OF: 1
...

A walkthrough of the example code can be found here. Feel free to play with other examples in the project or write your own. Please don’t hesitate to reach out if you encounter any issues.