Programming Archives - Page 10 of 33

oozie spark action workflow example

March, 2018 adarsh 1 Comment

Lets create oozie workflow with spark action for creating a inverted index use case. Inverted index pattern is used to…

adarsh

We will be using the hadoopFile method of spark context to read the orc file . Below is the method…

adarsh

Performance issues can be categorized into two parts 1. Distribution Performance – program slow due to scheduling , coordination and…

December, 2017 adarsh 1 Comment

To load avro data in spark we need few additional jars and in the below example we are using the…

November, 2017 adarsh

We can implement secondary sorting in mapreduce using the below steps 1. Make the key a composite of the natural…

November, 2017 adarsh

We often have duplicates in the data and removing the duplicates from dataset is a common use case.If we want…

adarsh

Finding outliers is an important part of data analysis because these records are typically the most interesting and unique pieces…