Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Tag: Spark Rdd

oozie spark action workflow example

March, 2018 adarsh 1 Comment

Lets create oozie workflow with spark action for creating a inverted index use case. Inverted index pattern is used to…

Continue Reading →

Posted in: Data Analytics, Oozie, Spark Filed under: oozie workflow, Spark Rdd

reading orc file in spark

adarsh

We will be using the hadoopFile method of spark context to read the orc file . Below is the method…

Continue Reading →

Posted in: Spark Filed under: Spark Rdd

debugging a spark application

adarsh

Performance issues can be categorized into two parts 1. Distribution Performance – program slow due to scheduling , coordination and…

Continue Reading →

Posted in: performance tuning, Spark Filed under: spark performance tuning, Spark Rdd

spark read avro file from hdfs example

December, 2017 adarsh 1 Comment

To load avro data in spark we need few additional jars and in the below example we are using the…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark distinct example for rdd,pairrdd and dataframe

November, 2017 adarsh

We often have duplicates in the data and removing the duplicates from dataset is a common use case.If we want…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark top n records example in a sample data using rdd and dataframe

adarsh

Finding outliers is an important part of data analysis because these records are typically the most interesting and unique pieces…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

spark secondary sorting example using rdd and dataframe

adarsh

We can do a secondary sorting in spark as with map reduce .We need to define a composite key when…

Continue Reading →

Posted in: Data Analytics, Spark Filed under: datasets and dataframe, Spark Rdd

Post navigation

Page 5 of 10
← Previous 1 … 4 5 6 … 10 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies