Big Data

mapreduce shuffle and sort phase

July, 2017 adarsh

MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system…

adarsh

Performance issues in a map reduce jobs is a common problem faced by hadoop developers and there are a few hadoop…

July, 2017 adarsh

Sequence files, map files, and Avro datafiles are all row-oriented file formats, which means that the values for each row…

adarsh 1 Comment

Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing…

adarsh

File compression brings two major benefits: it reduces the space needed to store files, and it speeds up data transfer…

adarsh

HDFS transparently checksums all data written to it and by default verifies checksums when reading data. Datanodes are responsible for…

July, 2017 adarsh

Apache YARN (Yet Another Resource Negotiator) is Hadoop’s cluster resource management system. YARN was introduced in Hadoop 2 to improve…