Skip to content

Big Data

Analytics And More
  • Home
  • Spark
  • Design Patterns
  • streaming
  • Map Reduce
  • Hive
  • Hdfs & Yarn
  • Pig
  • Oozie
  • Hbase

Category: Data Analytics

hadoop mapreduce reading the entire file content without splitting the file for example reading an xml file

July, 2017 adarsh 2d Comments

Some applications don’t want files to be split, as this allows a single mapper to process each input file in…

Continue Reading →

Posted in: Hdfs, Map Reduce Filed under: hdfs, hdfs filesystem, map reduce

handling failures in hadoop,mapreduce and yarn

July, 2017 adarsh 1 Comment

In the real world, user code is buggy, processes crash, and machines fail. One of the major benefits of using…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, yarn Filed under: hdfs, map reduce, yarn

life cycle of a mapreduce program – job submission,job initialization, task assignment, task execution, progress updates and job completion

adarsh

You can run a mapreduce job with a single method call submit() on a Job object or you can also…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, yarn Filed under: hdfs, map reduce, yarn

mapreduce shuffle and sort phase

adarsh

MapReduce makes the guarantee that the input to every reducer is sorted by key. The process by which the system…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, yarn Filed under: hdfs, map reduce, yarn

performance tuning of mapreduce job, yarn resource manager and profiling

adarsh

Performance issues in a map reduce jobs is a common problem faced by hadoop developers and there are a few hadoop…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce, performance tuning, yarn Filed under: hdfs filesystem, map reduce, map reduce performance tuning, yarn

row-oriented and column-oriented file formats in hadoop

July, 2017 adarsh

Sequence files, map files, and Avro datafiles are all row-oriented file formats, which means that the values for each row…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

Serialization in hadoop with writable interface

adarsh 1 Comment

Serialization is the process of turning structured objects into a byte stream for transmission over a network or for writing…

Continue Reading →

Posted in: Data Analytics, Hdfs, Map Reduce Filed under: hadoop input output, hdfs, hdfs filesystem, map reduce

Post navigation

Page 21 of 26
← Previous 1 … 20 21 22 … 26 Next →

Recent Posts

  • Optimization for Using AWS Lambda to Send Messages to Amazon MSK
  • Rebalancing a Kafka Cluster in AWS MSK using CLI Commands
  • Using StsAssumeRoleCredentialsProvider with Glue Schema Registry Integration in Kafka Producer
  • Home
  • Contact Me
  • About Me
Copyright © 2017 Time Pass Techies
 

Loading Comments...