Deploy Samza Job To CDH
The tutorial assumes you have successfully run hello-samza and now you want to deploy the job to your Cloudera Data Hub (CDH). This tutorial is based on CDH 5.4.0 and uses hello-samza as the example job.
Compile Package for CDH 5.4.0
We need to use a specific compile option to build hello-samza package for CDH 5.4.0
mvn clean package -Dhadoop.version=cdh5.4.0Upload Package to Cluster
There are a few ways of uploading the package to the cluster’s HDFS. If you do not have the job package in your cluster, scp from you local machine to the cluster. Then run
hadoop fs -put path/to/hello-samza-0.12.0-dist.tar.gz /path/for/tgzGet Deploying Scripts
Untar the job package (assume you will run from the current directory)
tar -xvf path/to/samza-job-package-0.12.0-dist.tar.gz -C ./Add Package Path to Properties File
vim config/wikipedia-parser.propertiesChange the yarn package path:
yarn.package.path=hdfs://<hdfs name node ip>:<hdfs name node port>/path/to/tgzSet Yarn Environment Variable
export HADOOP_CONF_DIR=/etc/hadoop/confRun Samza Job
bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/config/wikipedia-parser.properties