Getting Started
Documentation

Apache Samza 1.2 [Docs]

We’re thrilled to announce the release of Apache Samza 1.2.0.

Today Samza forms the backbone of hundreds of real-time production applications across a multitude of companies, such as LinkedIn, VMWare, Slack, Redfin among many others. Samza provides leading support for large-scale stateful stream processing with:

  • First class support for local state (with RocksDB store). This allows a stateful application to scale up to 1.1 Million events/sec on a single machine with SSD.

  • Support for incremental checkpointing of state instead of full snapshots. This enables Samza to scale to applications with very large state.

  • A fully asynchronous programming model that makes parallelizing remote calls efficient and effortless.

  • High level API for expressing complex stream processing pipelines in a few lines of code.

  • Beam Samza Runner that marries Beam’s best in class support for EventTime based windowed processing and sophisticated triggering with Samza’s stable and scalable stateful processing model.

  • A fully pluggable model for input sources (e.g. Kafka, Kinesis, DynamoDB streams etc.) and output systems (HDFS, Kafka, ElastiCache etc.).

  • A Table API that provides a common abstraction for accessing remote or local databases and allowing developers are able to “join” an input event stream with such a Table.

  • Flexible deployment model for running the applications in any hosting environment and with cluster managers other than YARN.

  • Features like canaries, upgrades and rollbacks that support extremely large deployments with minimal downtime.

New Features, Upgrades and Bug Fixes:

This release brings the following features, upgrades, and capabilities:

  • Upgrade to Kafka 2

  • Beam integration with tables and integration with CouchBase

  • Async high level API

  • Bug fixes

Full list of the jiras addressed in this release can be found here.

Upgrading your application to Apache Samza 1.2.0

Kafka upgrade

SAMZA-2127 Upgrade to Kafka 2.0

Async API for high level

SAMZA-2055 Design and Implement async API for high level

SAMZA-2172 Async High Level API does not schedule StreamOperatorTasks on separate threads

Startpoint

SAMZA-2192 Add StartpointVisitor implementation for EventHub.

SAMZA-2189 Integrate startpoint resolution workflow with SamzaContainer startup sequence.

SAMZA-2179 Move the StartpointVisitor abstraction to SystemAdmin interface.

SAMZA-2046 Startpoints - Fanout of SSP-only keyed Startpoints to SSP+TaskName

SAMZA-2132 Startpoint - flatten serialized key

Table API

SAMZA-2185 Ability to expose remote data source specific features in remote table

SAMZA-2156 Couchbase Table Support for Samza Table API

SAMZA-2153 Config for TableRetryPolicy

SAMZA-2134 Enable remote table rate limiter by default

SAMZA-2116 Make sendTo operators non-terminal

Bug Fixes, Testing and Stability improvments

SAMZA-2202 Modify topic creation s.t. all log compacted topics are created with a 5MB message size limit.

SAMZA-2181 Ensure consistency of coordinator store creation and initialization

SAMZA-2178 Utils to directly inject custom IME to InMemorySystem streams

SAMZA-2176 Ignore the configurations with serialized null values from coordinator stream.

SAMZA-2171 Encapsulate creation and loading of metadata streams

SAMZA-2170 Enabling writing of both new and old format offset files for stores and side-input-stores

SAMZA-2169 Preventing task-shuffle after task mode addition

SAMZA-2161 Move ChangelogPartitionManager and CoordinatorStream ConfigReader to MetadataStore

SAMZA-2135 Provide a way inject ExternalContext to TestRunner

Sources downloads

A source download of Samza 1.2.0 is available here, and is also available in Apache’s Maven repository. See Samza’s download page for details and Samza’s feature preview for new features.