Apache Samza

A distributed stream processing framework

Samza allows you to build stateful applications that process data in real-time from multiple sources including Apache Kafka.

Battle-tested at scale, it supports flexible deployment options to run on YARN or as a standalone library.

Samza features
High performance
Samza provides extremely low latencies and high throughput to analyze your data instantly
Horizontally scalable
Scales to several terabytes of state with features like incremental checkpoints and host-affinity
Easy to Operate
Samza is easy to operate with flexible deployment options - YARN, Kubernetes or standalone
Powerful APIs
Rich APIs to build your applications: Choose from low-level , Streams DSL , Samza SQL and Apache BEAM APIs
Write once, Run Anywhere
Ability to run the same code to process both batch and streaming data
Pluggable architecture
Integrates with several sources including Kafka, HDFS, AWS Kinesis, Azure Eventhubs, K-V stores and ElasticSearch
Case Studies