Upgrading from 0.7.0 to 0.8.0

Samza’s checkpointing implementation changed between Samza 0.7.0 and 0.8.0. If you are running a Samza job with 0.7.0, and upgrade to 0.8.0, your job’s checkpoint offsets will be lost, and the job will start (by default) with the most recent message in its input streams. If this is undesirable, and a job needs to pick up where it left off, the following steps must be taken:

  1. Shutdown your job.
  2. Run the CheckpointMigrationTool.
  3. Start your job.

The CheckpointMigrationTool is responsible for migrating your checkpoint topic from the 0.7.0 style format to the 0.8.0 style format. This tool works only against Kafka, so you must be storing your checkpoints in Kafka with the KafkaCheckpointManager.

Running CheckpointMigrationTool

Checkout Samza 0.8.0:

git clone http://git-wip-us.apache.org/repos/asf/samza.git
cd samza
git fetch origin 0.8.0
git checkout 0.8.0

Run the checkpoint migration task:

./gradlew samza-shell:checkpointMigrationTool -PconfigPath=file:///path/to/job/config.properties

The configPath property should be pointed at the .properties file for the job you wish to migrate. The tool will use the job’s properties file to connect to the Kafka cluster, and migrate the checkpointed offsets to the 0.8.0 format. Once the tool is complete, the job should be restarted so that it can pick up the migrated offsets.

NOTE: The checkpointMigrationTool task must be run from a machine that can connect to the Kafka cluster.