Hdfs Archives - Big Data

spark copy files to s3 using hadoop api

May, 2019 adarsh

In this article I will illustrate how to copy raw files from S3 using spark. Spark out of the box…

December, 2018 adarsh

In spark if we are using the textFile method to read the input data spark will make many recursive calls…

August, 2017 adarsh 1 Comment

Users can run HDFS commands using Oozie’s FS action. Not all HDFS commands are supported, but the following common operations…

July, 2017 adarsh

There are many input and output formats supported in hadoop out of the box and we will explore the same…

adarsh

What will be the mapper,reducer and the partitioner that will be used in mapreduce program if we dont specify any…

adarsh 2d Comments

Some applications don’t want files to be split, as this allows a single mapper to process each input file in…

July, 2017 adarsh 1 Comment

In the real world, user code is buggy, processes crash, and machines fail. One of the major benefits of using…