YARN Jobs

When you define job.factory.class=org.apache.samza.job.yarn.YarnJobFactory in your job’s configuration, Samza will use YARN to execute your job. The YarnJobFactory will use the HADOOPYARNHOME environment variable on the machine that run-job.sh is executed on to get the appropriate YARN configuration, which will define where the YARN resource manager is. The YarnJob will work with the resource manager to get your job started on the YARN cluster.

If you want to use YARN to run your Samza job, you’ll also need to define the location of your Samza job’s package. For example, you might say:

yarn.package.path=http://my.http.server/jobs/ingraphs-package-0.0.55.tgz

This .tgz file follows the conventions outlined on the Packaging page (it has bin/run-am.sh and bin/run-container.sh). YARN NodeManagers will take responsibility for downloading this .tgz file on the appropriate machines, and untar'ing them. From there, YARN will execute run-am.sh or run-container.sh for the Samza Application Master, and SamzaContainer, respectively.

If you want to run Samza job in the specific YARN queue not the default one, you can set yarn.queue property in your job’s configuration. For example, with the following property setting,

yarn.queue=root.adhoc

Samza job will run in the root.adhoc queue.

Logging »