Run Hello-samza in Multi-node YARN
You must successfully run the hello-samza project in a single-node YARN by following the hello-samza tutorial. Now it’s time to run the Samza job in a “real” YARN grid (with more than one node).
Set Up Multi-node YARN
If you already have a multi-node YARN cluster (such as CDH5 cluster), you can skip this set-up section.
Basic YARN Setting
1. Download YARN 2.6 to /tmp and untar it.
2. Set up environment variables.
3. Configure YARN setting file.
Add the following property to yarn-site.xml:
Download and add capacity-schedule.xml.
curl http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/resources/capacity-scheduler.xml?view=co > conf/capacity-scheduler.xml
Set Up Http Filesystem for YARN
The goal of these steps is to configure YARN to read http filesystem because we will use Http server to deploy Samza job package. If you want to use HDFS to deploy Samza job package, you can skip step 4~6 and follow Deploying a Samza Job from HDFS
4. Download Scala package and untar it.
5. Add Scala, its log jars, and Samza’s HttpFileSystem implementation.
6. Add http configuration in core-site.xml (create the core-site.xml file and add content).
Add the following code:
Distribute Hadoop File to Slaves
7. Basically, you copy the hadoop file in your host machine to slave machines. (172.21.100.35, in my case):
- If you get “172.21.100.35: Error: JAVA_HOME is not set and could not be found.”, you’ll need to add a conf/hadoop-env.sh file to the machine with the failure (172.21.100.35, in this case), which has “export JAVA_HOME=/export/apps/jdk/JDK-1_8_0_45” (or wherever your JAVA_HOME actually is).
8. Validate that your nodes are up by visiting http://yourHostname:8088/cluster/nodes.
Deploy Samza Job
Some of the following steps are exactlly identical to what you have seen in hello-samza. You may skip them if you have already done so.
1. Download Samza and publish it to Maven local repository.
2. Download hello-samza project and change the job properties file.
Change the yarn.package.path property to be:
3. Compile hello-samza.
4. Deploy Samza job package to Http server..
Open a new terminal, and run:
Go back to the original terminal (not the one running the HTTP server):
Go to http://yourHostname:8088 and find the wikipedia-feed job. Click on the ApplicationMaster link to see that it’s running.
Congratulations! You now run the Samza job in a “real” YARN grid!