Samza - Configuration

Configuration

Releases

All Samza applications have a properties format file that defines its configurations. A complete list of configuration keys can be found on the Samza Configurations Table page.

A very basic configuration file looks like this:

# Application Configurations
job.factory.class=org.apache.samza.job.local.YarnJobFactory
app.name=hello-world
job.default.system=example-system
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory

# Systems & Streams Configurations
systems.example-system.samza.factory=samza.stream.example.ExampleConsumerFactory
systems.example-system.samza.key.serde=string
systems.example-system.samza.msg.serde=json

# Checkpointing
task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory

# State Storage
stores.example-store.factory=org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory
stores.example-store.key.serde=string
stores.example-store.value.serde=json

# Metrics
metrics.reporter.example-reporter.class=org.apache.samza.metrics.reporter.JmxReporterFactory
metrics.reporters=example-reporter

There are 6 sections sections to a configuration file:

The Application section defines things like the name of the job, job factory (See the job.factory.class property in Configuration Table), the class name for your StreamTask and serialization and deserialization of specific objects that are received and sent along different streams.
The Systems & Streams section defines systems that your StreamTask can read from along with the types of serdes used for sending keys and messages from that system. You may use any of the predefined systems that Samza ships with, although you can also specify your own self-implemented Samza-compatible systems. See the hello-samza example project‘s Wikipedia system for a good example of a self-implemented system.
The Checkpointing section defines how the messages processing state is saved, which provides fault-tolerant processing of streams (See Checkpointing for more details).
The State Storage section defines the stateful stream processing settings for Samza.
The Deployment section defines how the Samza application will be deployed (To a cluster manager (YARN), or as a standalone library) as well as settings for each option. See Deployment Models for more details.
The Metrics section defines how the Samza application metrics will be monitored and collected. (See Monitoring)

Note that configuration keys prefixed with sensitive. are treated specially, in that the values associated with such keys will be masked in logs and Samza’s YARN ApplicationMaster UI. This is to prevent accidental disclosure only; no encryption is done.