Samza uses SLF4J for all of its logging. By default, Samza only depends on slf4j-api, so you must add an SLF4J runtime dependency to your Samza packages for whichever underlying logging platform you wish to use.
The hello-samza project shows how to use log4j with Samza. To turn on log4j logging, you just need to make sure slf4j-log4j12 is in your SamzaContainer’s classpath. In Maven, this can be done by adding the following dependency to your Samza package project.
If you’re not using Maven, just make sure that slf4j-log4j12 ends up in your Samza package’s lib directory.
The run-class.sh script will also set the following Java system properties:
The run-container.sh will also set:
Likewise, run-am.sh sets:
These settings are very useful if you’re using a file-based appender. For example, you can use a rolling appender to separate log file when it reaches certain size by configuring log4j.xml like this:
Setting up a file-based appender is recommended as a better alternative to using standard out. Standard out log files (see below) don’t roll, and can get quite large if used for logging.
When using a rolling file appender, it is common for a long-running job to exceed the max file size and count. In such cases, the beginning of the logs will be lost. Since the beginning of the logs include some of the most critical information like configuration, it is important to not lose this information. To address this issue, Samza logs this critical information to a “startup logger” in addition to the normal logger. You can write these log messages to a separate, finite file by including the following snippet in your log4j.xml:
Changing log levels
Sometimes it’s desirable to change the Log4J log level from
DEBUG at runtime so that a developer can enable more logging for a Samza container that’s exhibiting undesirable behavior. Samza provides a Log4j class called JmxAppender, which will allow you to dynamically modify log levels at runtime. The JmxAppender class is located in the samza-log4j package, and can be turned on by first adding a runtime dependency to the samza-log4j package:
And then updating your log4j.xml to include the appender:
Stream Log4j Appender
Samza provides a StreamAppender to publish the logs into a specific system. You can specify the system name using “task.log4j.system” and change name of log stream with param ‘StreamName’. The MDC contains the keys “containerName”, “jobName” and “jobId”, which help identify the source of the log. In order to use this appender, add:
to log4j.xml and define the system name by specifying the config:
The default stream name for logger is generated using the following convention, though you can override it using the
StreamName property in the log4j.xml as shown above.
"__samza_%s_%s_logs" format (jobName.replaceAll("_", "-"), jobId.replaceAll("_", "-"))
Configuring the StreamAppender will automatically encode messages using logstash’s Log4J JSON format. Samza also supports pluggable serialization for those that prefer non-JSON logging events. This can be configured the same way other stream serializers are defined:
The StreamAppender will always send messages to a job’s log stream keyed by the container name.
Samza will look for the
SAMZA_LOG_DIR environment variable when it executes. If this variable is defined, all logs will be written to this directory. If the environment variable is empty, or not defined, then Samza will use
$base_dir, which is the directory one level up from Samza’s run-class.sh script. This environment variable can also be referenced inside log4j.xml files (see above).
Garbage Collection Logging
Samza will automatically set the following garbage collection logging setting, and will output it to
In older versions of Java, it is impossible to have GC logs roll over based on time or size without the use of a secondary tool. This means that your GC logs will never be deleted until a Samza job ceases to run. As of Java 6 Update 34, and Java 7 Update 2, new GC command line switches have been added to support this functionality. If GC log file rotation is supported by the JVM, Samza will also set:
When a Samza job executes on a YARN grid, the
$SAMZA_LOG_DIR environment variable will point to a directory that is secured such that only the user executing the Samza job can read and write to it, if YARN is securely configured.
Samza’s ApplicationMaster pipes all STDOUT and STDERR output to logs/stdout and logs/stderr, respectively. These files are never rotated.