Spark Archives - Page 12 of 12

spark sortby and sortbykey example in java and scala – tutorial 7

November, 2017 adarsh 2d Comments

We can sort an RDD with key/value pairs provided that there is an ordering defined on the key. Once we…

adarsh

Joining data together is probably one of the most common operations on a pair RDD, and spark has full range…

adarsh

groupBy function works on unpaired data or data where we want to use a different condition besides equality on the…

November, 2017 adarsh

CombineByKey is the most general of the per-key aggregation functions. Most of the other per-key combiners are implemented using it.…

adarsh

When datasets are described in terms of key/value pairs, it is common to want to aggregate statistics across all elements…

adarsh

There are a number of ways to get pair RDDs in Spark and many formats will directly load pair RDDs…

October, 2017 adarsh

An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which…