Data Analytics Archives - Page 10 of 26

reading orc file in spark

March, 2018 adarsh

We will be using the hadoopFile method of spark context to read the orc file . Below is the method…

adarsh

Performance issues can be categorized into two parts 1. Distribution Performance – program slow due to scheduling , coordination and…

December, 2017 adarsh 1 Comment

To load avro data in spark we need few additional jars and in the below example we are using the…

November, 2017 adarsh

We can implement secondary sorting in mapreduce using the below steps 1. Make the key a composite of the natural…

November, 2017 adarsh

We often have duplicates in the data and removing the duplicates from dataset is a common use case.If we want…

adarsh

Finding outliers is an important part of data analysis because these records are typically the most interesting and unique pieces…

adarsh

We can do a secondary sorting in spark as with map reduce .We need to define a composite key when…