When you’re running a stream process in production, it’s important that you have good metrics to track the health of your job. In order to make this easy, Samza includes a metrics library. It is used by Samza itself to generate some standard metrics such as message throughput, but you can also use it in your task code to emit custom metrics.
Metrics can be reported in various ways. You can expose them via JMX, which is useful in development. In production, a common setup is for each Samza container to periodically publish its metrics to a “metrics” Kafka topic, in which the metrics from all Samza jobs are aggregated. You can then consume this stream in another Samza job, and send the metrics to your favorite graphing system such as Graphite.
To set up your job to publish metrics to Kafka, you can use the following configuration:
With this configuration, the job automatically sends several JSON-encoded messages to the “metrics” topic in Kafka every 60 seconds. The messages look something like this:
There is a separate message for each task instance, and the header tells you the job name, job ID and partition of the task. The metrics allow you to see how many messages have been processed and sent, the current offset in the input stream partition, and other details. There are additional messages which give you metrics about the JVM (heap size, garbage collection information, threads etc.), internal metrics of the Kafka producers and consumers, and more. The list of all metrics emitted by samza is shown here.
It’s easy to generate custom metrics in your job, if there’s some value you want to keep an eye on. You can use Samza’s built-in metrics framework, which is similar in design to Coda Hale’s metrics library.
You can register your custom metrics through a MetricsRegistry. Your stream task needs to implement InitableTask, so that you can get the metrics registry from the TaskContext. This simple example shows how to count the number of messages processed by your task:
Samza currently supports three kinds of metrics: counters, gauges and timer. Use a counter when you want to track how often something occurs, a gauge when you want to report the level of something, such as the size of a buffer, and a timer when you want to know how much time the block of code spends. Each task instance (for each input stream partition) gets its own set of metrics.
If you want to report metrics in some other way, e.g. directly to a graphing system (without going via Kafka), you can implement a MetricsReporterFactory and reference it in your job configuration.