Getting Started
Documentation

Apache Samza 1.1 [Docs]

We’re thrilled to announce to the release of Apache Samza 1.1.0.

Today Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, VMWare, Slack, Redfin among many others. Samza provides leading support for large-scale stateful stream processing with:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.

  • A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.

  • High level API for expressing complex stream processing pipelines in a few lines of code.

  • Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.

  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).

  • A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to “join” an input event stream with such a Table.

  • Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN.

  • Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.

New Features, Upgrades and Bug Fixes:

This release brings the following features, upgrades, and capabilities:

API enhancements and simplifications:

SAMZA-1981: Consolidate table descriptors to samza-api.

SAMZA-1998: Table API refactoring.

SAMZA-1980: Rename LocalStoreBackedTable to LocalTable.

SAMZA-2043: Consolidate ReadableTable and ReadWriteTable.

SAMZA-2012: Add API for wiring an external context through to application processing.

SAMZA-2026: Refactor remote table API to separate retry policy settings.

SAMZA-2041: Add system descriptors for HDFS and Kinesis.

SAMZA-2081: Samza SQL: Type system for Samza SQL.

SAMZA-2106: Samza App & Job Config Refactor.

State Store Restoration:

SAMZA-2018: State restore improvements using RocksDB writebatch API.

Standalone Improvements:

SAMZA-1973: Unify the TaskNameGrouper interface for yarn and standalone.

SAMZA-1952: StreamPartitionCountMonitor for standalone.

Other Upgrades and Bug-fixes:

SAMZA-1638: Recreate SystemProducer on KafkaCheckpointManager.writeCheckpoint failure.

SAMZA-1946: Problem with Race between TimerListener initialization and timers fired from init().

SAMZA-2004: Add ability to disable table metrics.

SAMZA-2013: Account for cycles in graph traversal within Execution Planner.

SAMZA-2015: Refactor timer handling in tables to be consistent with stores.

SAMZA-2072: Update guava to 23.0.

SAMZA-2090: Fix flush behavior for remote and hybrid tables.

SAMZA-2108: Check for host affinity config before resolving preferred host matching.

SAMZA-2109: Reduce default-buffer sizes for per-partition queues.

SAMZA-2118: Improve the shutdown sequence of AsyncRunLoop.

SAMZA-2119: Upgrading yarn-client version to 2.7.1.

SAMZA-2122: Fix the task caught-up logic which doesn’t handle no incoming messages

The complete list of resolved Jira tickets for this release is found here.

Upgrading your application to Apache Samza 1.1.0

Thank you on your decision to upgrade to Samza 1.1.0!

API Updates

The following imports for Table API have been updated:

  • Rename the import org.apache.samza.storage.kv.descriptors.BaseLocalStoreBackedTableDescriptor to org.apache.samza.storage.kv.descriptors.BaseLocalTableDescriptor

  • Rename the import org.apache.samza.table.remote.descriptors.RemoteTableDescriptor to org.apache.samza.table.descriptors.RemoteTableDescriptor

  • Rename the import org.apache.samza.table.caching.descriptors.CachingTableDescriptor to org.apache.samza.table.descriptors.CachingTableDescriptor

Configurations Updates

The job.name and job.id configs are now deprecated in favor of app.name and app.id configs respectively.

A source download of Samza 1.1.0 is available here, and is also available in Apache’s Maven repository. Samza’s download page for details and Samza’s feature preview for new features.

Community Developments

A Stream Processing with Apache Kafka & Apache Samza meetup/symposium that was held on March 20th which had following presentation for Samza:

  • Apache Samza 1.0: Recent Advances and our plans for future in Stream Processing

##Contribute

It’s a great time to get involved. You can start by reviewing the tutorials, signing up for the mailing list, and grabbing some newbie JIRAs.

I’d like to close by thanking everyone who’s been involved in the project. It’s been a great experience to be involved in this community, and I look forward to its continued growth.