Overview
Samza provides a REST service that is deployable on any node in the cluster and has pluggable interfaces to add custom Resources and Monitors. It is intended to be a node-local delegate for common actions such as starting jobs, sampling the local state, measuring disk usage, taking heap dumps, verifying liveness, and so on.
The Samza REST Service does not yet have SSL enabled or authentication, so it is initially geared for more backend and operations use cases. It would not be wise to expose it as a user-facing API in environments where users are not allowed to stop each other’s jobs, for example.
Samza REST is packaged and configured very similarly to Samza jobs. A notable difference is Samza REST must be deployed and executed on each host you want it to run on, whereas a Samza Job is typically started on a master node in the cluster manager and the master deploys the job to the other nodes.
Deployment
Samza REST is intended to be a proxy for all operations which need to be executed from the nodes of the Samza cluster. It can be deployed to all the hosts in the cluster and may serve different purposes on different hosts. In such cases it may be useful to deploy the same release tarball with different configs to customize the functionality for the role of the hosts. For example, Samza REST may be deployed on a YARN cluster with one config for the ResourceManager (RM) hosts and another config for the NodeManager (NM) hosts.
Deploying the service is very similar to running a Samza job. First build the tarball using:
Then from the extracted location, run the service using:
The two config parameters have the same purpose as they do for run-job.sh.
Follow the getting started tutorial to quickly deploy and test the Samza REST Service for the first time.
Configuration
The Samza REST Service relies on the same configuration system as Samza Jobs. However, the Samza REST Service config file itself is completely separate and unrelated to the config files for your Samza jobs.
The configuration may provide values for the core configs as well as any additional configs needed for Resources or Monitors that you may have added to the service. A basic configuration file which includes configs for the core service as well as the JobsResource looks like this:
Core Configuration
Name | Default | Description |
---|---|---|
services.rest.port | Required: The port to use on the local host for the Samza REST Service. If 0, an available port will be dynamically chosen. | |
rest.resource.factory.classes | A comma-delimited list of class names that implement ResourceFactory. These factories will be used to create specific instances of resources and can pull whatever properties they need from the provided server config. The instance returned will be used for the lifetime of the server. If no value is provided for this property or rest.resource.classesthen org.apache.samza.rest.resources.DefaultResourceFactorywill be used as a default. |
|
rest.resource.classes | A comma-delimited list of class names of resources to register with the server. These classes can be instantiated as often as each request, the life cycle is not guaranteed to match the server. Also, the instances do not receive any config. Note that the lifecycle and ability to receive config are the primary differences between resources added via this property versus rest.resource.factory.classes |
Logging
Samza REST uses SLF4J for logging. The run-samza-rest-service.sh
script mentioned above by default expects a log4j.xml in the package’s bin directory and writes the logs to a logs directory in the package root. However, since the script invokes the same run-class.sh
script used to run Samza jobs, it can be reconfigured very similarly to logging for Samza jobs.