Spark has support for zipping rdds using functions like zip, zipPartition, zipWithIndex and zipWithUniqueId . Lets go through each of…
A window function calculates a return value for every input row of a table based on a group of rows,…
Here we want to find the difference between two dataframes at a column level . We can use the dataframe1.except(dataframe2)…
Lets say we have dataset as below and we want to split a single column into multiple columns using withcolumn…
Lets create a dataframe from list of row object . First populate the list with row object and then we…
Lets create oozie workflow with spark action for creating a inverted index use case. Inverted index pattern is used to…
We will be using the hadoopFile method of spark context to read the orc file . Below is the method…