Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Category: stream processing

Producing events and handling credentials refresh for IAM enabled aws msk cluster using aws msk IAM auth library

January, 2023 adarsh

Amazon Managed Streaming for Apache Kafka (MSK) is a fully managed service that makes it easy to build and run…

Continue Reading →

Posted in: aws, Data Analytics, Design Patterns, stream processing

submit spark job programmatically using SparkLauncher

March, 2019 adarsh

In this article I will illustrate how to submit a spark job programmatically using SparkLauncher. Let us take a use…

Continue Reading →

Posted in: aws, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming

kafka example for custom serializer, deserializer and encoder with spark streaming integration

November, 2017 adarsh 1 Comment

Lets say we want to send a custom object as the kafka value type and we need to push this…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming, streaming

performance tuning in spark streaming

adarsh

Batch and Window Sizes – The most common question is what minimum batch size Spark Streaming can use. In general,…

Continue Reading →

Posted in: Data Analytics, performance tuning, Spark, stream processing Filed under: Spark Rdd, spark streaming, streaming

checkpointing and fault tolerance in spark streaming

adarsh

Checkpointing is the main mechanism that needs to be set up for fault tolerance in Spark Streaming. It allows Spark…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: Spark Rdd, spark streaming, streaming

stateful transformation spark streaming example

adarsh

Stateful transformations are operations on DStreams that track data across time that is, some data from previous batches is used…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming, streaming

stateless transformation spark streaming example

adarsh

Stateless transformations like map(), flatMap(), filter(), repartition(), reduceByKey(), groupByKey() are simple RDD transformations being applied on every batch. Keep in…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming, streaming

Post navigation

Page 1 of 2
1 2 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies