How to use Samza tools

Get Samza tools

Please visit the Download page to download the Samza tools package

tar -xvzf samza-tools-*.tgz
cd samza-tools-<version>

Using Samza tools

Generate kafka events

Generate kafka events tool is used to insert avro serialized events into kafka topics. Right now it can insert two types of events PageViewEvent and ProfileChangeEvent

Before you can generate kafka events, Please follow instructions here to start the zookeeper and kafka server on your local machine.

You can follow below instructions on how to use Generate kafka events tool.

# Usage of the tool

usage: Error: Missing required options: t, e
 -b,--broker <BROKER>               Kafka broker endpoint Default (localhost:9092).
 -n,--numEvents <NUM_EVENTS>        Number of events to be produced, 
                                    Default - Produces events continuously every second.
 -p,--partitions <NUM_PARTITIONS>   Number of partitions in the topic,
                                    Default (4).
 -t,--topic <TOPIC_NAME>            Name of the topic to write events to.
 -e,--eventtype <EVENT_TYPE>        Type of the event values can be (PageView|ProfileChange). 

# Example command to generate 100 events of type PageViewEvent into topic named PageViewStream

 ./scripts/ -t PageViewStream -e PageView -n 100

# Example command to generate ProfileChange events continuously into topic named ProfileChangeStream

 ./scripts/ -t ProfileChangeStream -e ProfileChange

Samza SQL console tool

Once you generated the events into the kafka topic. Now you can use samza-sql-console tool to perform processing on the events published into the kafka topic.

There are two ways to use the tool -

  1. You can either pass the sql statement directly as an argument to the tool.
  2. You can write the sql statement(s) into a file and pass the sql file as an argument to the tool.

Second option allows you to execute multiple sql statements, whereas the first one lets you execute one at a time.

Samza SQL needs all the events in the topic to be uniform schema. And it also needs access to the schema corresponding to the events in a topic. Typically in an organization, there is a deployment of schema registry which maps topics to schemas.

In the absence of schema registry, Samza SQL console tool uses the convention to identify the schemas associated with the topic. If the topic name has string “page” it assumes the topic has PageViewEvents else ProfileChangeEvents.

# Usage of the tool

usage: Error: One of the (f or s) options needs to be set
 -f,--file <SQL_FILE>   Path to the SQL file to execute.
 -s,--sql <SQL_STMT>    SQL statement to execute.

# Example command to filter out all the users who have moved to LinkedIn

./scripts/ --sql "Insert into log.consoleOutput select Name, OldCompany from kafka.ProfileChangeStream where NewCompany = 'LinkedIn'"

You can run below sql commands using Samza sql console. Please make sure you are running generate-kafka-events tool to generate events into ProfileChangeStream before running the below command.

./scripts/ --sql "Insert into log.consoleOutput select Name, OldCompany from kafka.ProfileChangeStream where NewCompany = 'LinkedIn'"