All Samza jobs have a configuration file that defines the job. A very basic configuration file looks like this:
There are four major sections to a configuration file:
- The job section defines things like the name of the job, and whether to use the YarnJobFactory or ProcessJobFactory/ThreadJobFactory (See the job.factory.class property in Configuration Table).
- The task section is where you specify the class name for your StreamTask. It’s also where you define what the input streams are for your task.
- The serializers section defines the classes of the serdes used for serialization and deserialization of specific objects that are received and sent along different streams.
- The system section defines systems that your StreamTask can read from along with the types of serdes used for sending keys and messages from that system. Usually, you’ll define a Kafka system, if you’re reading from Kafka, although you can also specify your own self-implemented Samza-compatible systems. See the hello-samza example project‘s Wikipedia system for a good example of a self-implemented system.
Configuration keys that absolutely must be defined for a Samza job are:
A complete list of configuration keys can be found on the Configuration Table page. Note that configuration keys prefixed with “sensitive.” are treated specially, in that the values associated with such keys will be masked in logs and Samza’s YARN ApplicationMaster UI. This is to prevent accidental disclosure only; no encryption is done.