Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Tag: Spark Rdd

inverted index spark example

November, 2017 adarsh

Inverted index pattern is used to generate an index from a data set to allow for faster searches or data…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: Spark Rdd

spark finding standard deviation and mean using rdd, dataframe and dataset

November, 2017 adarsh

A standard deviation shows how much variation exists in the data from the average. Problem Given a list of employee…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark finding average using rdd, dataframe and dataset

November, 2017 adarsh

Problem to Solve : Given a list of employees with there department and salary find the average salary in each…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark finding minimum,maximum and count using rdd, dataframe and dataset

adarsh

Problem : 1. Given a list of employees with there department and salary find the maximum and minimum salary in…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

kafka example for custom serializer, deserializer and encoder with spark streaming integration

November, 2017 adarsh 1 Comment

Lets say we want to send a custom object as the kafka value type and we need to push this…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming, streaming

performance tuning in spark streaming

adarsh

Batch and Window Sizes – The most common question is what minimum batch size Spark Streaming can use. In general,…

Continue Reading →

Posted in: Data Analytics, performance tuning, Spark, stream processing Filed under: Spark Rdd, spark streaming, streaming

checkpointing and fault tolerance in spark streaming

adarsh

Checkpointing is the main mechanism that needs to be set up for fault tolerance in Spark Streaming. It allows Spark…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: Spark Rdd, spark streaming, streaming

Post navigation

Page 6 of 10
← Previous 1 … 5 6 7 … 10 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies