@InterfaceStability.Evolving public interface StreamApplication extends SamzaApplication<StreamApplicationDescriptor>
StreamApplication describes the inputs, outputs, state, configuration and the processing logic for the
application in Samza's High Level API.
A typical StreamApplication implementation consists of the following stages:
SystemDescriptors,
InputDescriptors, OutputDescriptors and TableDescriptors
MessageStreams, OutputStreams and Tables from the
provided StreamApplicationDescriptor.
MessageStream.filter(FilterFunction)
The following example StreamApplication removes page views older than 1 hour from the input stream:
public class PageViewFilter implements StreamApplication {
public void describe(StreamApplicationDescriptor appDescriptor) {
KafkaSystemDescriptor trackingSystemDescriptor = new KafkaSystemDescriptor("tracking");
KafkaInputDescriptor<PageViewEvent> inputStreamDescriptor =
trackingSystemDescriptor.getInputDescriptor("pageViewEvent", new JsonSerdeV2<>(PageViewEvent.class));
KafkaOutputDescriptor<PageViewEvent>> outputStreamDescriptor =
trackingSystemDescriptor.getOutputDescriptor("recentPageViewEvent", new JsonSerdeV2<>(PageViewEvent.class)));
MessageStream<PageViewEvent> pageViewEvents = appDescriptor.getInputStream(inputStreamDescriptor);
OutputStream<PageViewEvent> recentPageViewEvents = appDescriptor.getOutputStream(outputStreamDescriptor);
pageViewEvents
.filter(m -> m.getCreationTime() > System.currentTimeMillis() - Duration.ofHours(1).toMillis())
.sendTo(recentPageViewEvents);
}
}
All operator function implementations used in a StreamApplication must be Serializable. Any
context required within an operator function may be managed by implementing the InitableFunction.init(org.apache.samza.context.Context) and
ClosableFunction.close() methods in the function implementation.
Functions may implement the ScheduledFunction interface to schedule and receive periodic callbacks from the
Samza framework.
Implementation Notes: Currently StreamApplications are wrapped in a StreamTask during execution. The
execution planner will generate a serialized DAG which will be deserialized in each StreamTask instance used
for processing incoming messages. Execution is synchronous and thread-safe within each StreamTask. Multiple
tasks may process their messages concurrently depending on the job parallelism configuration.
describe