Big Data

mapreduce example to do a composite joins with many very large formatted inputs on mapside

June, 2017 adarsh

Composite joins are particularly useful if you want to join very large data sets together. However, the data sets must…

adarsh

A replicated join is an extremely useful, but has a strict size limit on all but one of the data…

June, 2017 adarsh 2d Comments

A reduce side join is arguably one of the easiest implementations of a join in MapReduce, and therefore is a…

June, 2017 adarsh

Shuffling pattern can be used when we want to randomize the data set for repeatable random sampling For example, the…

adarsh

Sorting is easy in sequential programming. Sorting in MapReduce, or more generally in parallel, is not easy. This is because…

June, 2017 adarsh

Binning is very similar to partitioning and often can be used to solve the same problem. The major difference is…

adarsh

The partitioning pattern moves the records into categories i,e shards, partitions, or bins but it doesn’t really care about the…