Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that makes it easy to build and run…
In this article I will illustrate how to submit a spark job programmatically using SparkLauncher. Let us take a use…
Lets say we want to send a custom object as the kafka value type and we need to push this…
Batch and Window Sizes – The most common question is what minimum batch size Spark Streaming can use. In general,…
Checkpointing is the main mechanism that needs to be set up for fault tolerance in Spark Streaming. It allows Spark…
Stateful transformations are operations on DStreams that track data across time that is, some data from previous batches is used…
Stateless transformations like map(), flatMap(), filter(), repartition(), reduceByKey(), groupByKey() are simple RDD transformations being applied on every batch. Keep in…