datasets and dataframe Archives - Page 3 of 4

spark distinct example for rdd,pairrdd and dataframe

November, 2017 adarsh

We often have duplicates in the data and removing the duplicates from dataset is a common use case.If we want…

adarsh

Finding outliers is an important part of data analysis because these records are typically the most interesting and unique pieces…

adarsh

We can do a secondary sorting in spark as with map reduce .We need to define a composite key when…

November, 2017 adarsh

A standard deviation shows how much variation exists in the data from the average. Problem Given a list of employee…

November, 2017 adarsh

Problem to Solve : Given a list of employees with there department and salary find the average salary in each…

adarsh

Problem : 1. Given a list of employees with there department and salary find the maximum and minimum salary in…

November, 2017 adarsh

A Dataset is a strongly typed collection of domain-specific objects that can be transformed in parallel using functional or relational…