Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Category: Programming

spark secondary sorting example using rdd and dataframe

November, 2017 adarsh

We can do a secondary sorting in spark as with map reduce .We need to define a composite key when…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

inverted index spark example

November, 2017 adarsh

Inverted index pattern is used to generate an index from a data set to allow for faster searches or data…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: Spark Rdd

spark finding standard deviation and mean using rdd, dataframe and dataset

November, 2017 adarsh

A standard deviation shows how much variation exists in the data from the average. Problem Given a list of employee…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark finding average using rdd, dataframe and dataset

November, 2017 adarsh

Problem to Solve : Given a list of employees with there department and salary find the average salary in each…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark finding minimum,maximum and count using rdd, dataframe and dataset

adarsh

Problem : 1. Given a list of employees with there department and salary find the maximum and minimum salary in…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

kafka example for custom serializer, deserializer and encoder with spark streaming integration

November, 2017 adarsh 1 Comment

Lets say we want to send a custom object as the kafka value type and we need to push this…

Continue Reading →

Posted in: Data Analytics, Spark, stream processing Filed under: kafka, Spark Rdd, spark streaming, streaming

performance tuning in spark streaming

adarsh

Batch and Window Sizes – The most common question is what minimum batch size Spark Streaming can use. In general,…

Continue Reading →

Posted in: Data Analytics, performance tuning, Spark, stream processing Filed under: Spark Rdd, spark streaming, streaming

Post navigation

Page 11 of 33
← Previous 1 … 10 11 12 … 33 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies