While Spark’s HashPartitioner and RangePartitioner are well suited to many use cases, Spark also allows you to tune how an…
Sometimes we want a different sort order entirely, and to support this we can provide our own comparison function. In…
We can sort an RDD with key/value pairs provided that there is an ordering defined on the key. Once we…
Joining data together is probably one of the most common operations on a pair RDD, and spark has full range…
groupBy function works on unpaired data or data where we want to use a different condition besides equality on the…
CombineByKey is the most general of the per-key aggregation functions. Most of the other per-key combiners are implemented using it.…
When datasets are described in terms of key/value pairs, it is common to want to aggregate statistics across all elements…