Skip to content

Big Data

Analytics And More
  • Home
  • Map Reduce
  • Spark
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase
  • Design Patterns
  • streaming

Category: Hdfs

life cycle of a mapreduce program – job submission,job initialization, task assignment, task execution, progress updates and job completion

July, 2017 adarsh Leave a comment

You can run a mapreduce job with a single method call submit() on a Job object or you can also…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, yarn Filed under: hdfs, map reduce, yarn

mapreduce shuffle and sort phase

adarsh Leave a comment

MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, yarn Filed under: hdfs, map reduce, yarn

performance tuning of mapreduce job, yarn resource manager and profiling

adarsh Leave a comment

Performance issues in a map reduce jobs is a common problem faced by hadoop developers and there are a few hadoop…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, performance tuning, yarn Filed under: hdfs filesystem, map reduce, map reduce performance tuning, yarn

row-oriented and column-oriented file formats in hadoop

July, 2017 adarsh Leave a comment

Sequence files, map files, and Avro datafiles are all row-oriented file formats, which means that the values for each row…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

Serialization in hadoop with writable interface

adarsh Leave a comment

Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

compression formats and their effects in hdfs and map reduce program

adarsh Leave a comment

File compression brings two major benefits: it reduces the space needed to store files, and it speeds up data transfer…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

Data Integrity in hadoop distributed file system

adarsh Leave a comment

HDFS transparently checksums all data written to it and by default verifies checksums when reading data. Datanodes are responsible for…

Continue Reading →

Posted in: Data Analytics, Hdfs Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

Post navigation

Page 2 of 3
← Previous 1 2 3 Next →

Recent Posts

  • spark sql consecutive sequence example
  • spark sql example to find second highest average
  • spark sql example to find max of average
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies