Data Analytics Archives - Page 20 of 26

pig tutorial 5 – debugging pig with diagnostic operators like describe, dump, explain and illustrate

July, 2017 adarsh

DESCRIBE Use the DESCRIBE operator to review the schema of a particular alias. Input Service Data 1,NDATEST,/shelf=0/slot/port=1 2,NDATEST,/shelf=0/slot/port=2 3,NDATEST,/shelf=0/slot/port=3 4,NDATEST,/shelf=0/slot/port=4…

Continue Reading →

pig tutorial 4 – inner join, outer join, replicated join, skewed join

adarsh

Inner JOIN Use the JOIN operator to perform an inner, equijoin join of two or more relations based on common…

Continue Reading →

pig tutorial 3 – Flatten, GROUP, COGROUP, CROSS, DISTINCT, FILTER, FOREACH, LIMIT, Load, ORDER, SAMPLE, SPLIT, STORE, STREAM and UNION Operators

adarsh

Flatten Operator The FLATTEN operator which is an arithmetic operator looks like a UDF syntactically, but it is actually an…

Continue Reading →

pig tutorial 2 – pig data types, relations, bags, tuples, fields and parameter substitution

adarsh

Relations, Bags, Tuples, Fields Pig Latin statements work with relations. A relation is a bag and a bag is a…

Continue Reading →

pig tutorial 1 – multiquery execution, store, dump, dependencies and replicated, skewed, merge joins

adarsh

A Pig Latin statement is an operator that takes a relation as input and produces another relation as output this…

Continue Reading →

input formats and output formats in hadoop and mapreduce

July, 2017 adarsh

There are many input and output formats supported in hadoop out of the box and we will explore the same…

Continue Reading →

default mappper, reducer, partitioner, multithreadedmapper and split size configuration in hadoop and mapreduce

adarsh

What will be the mapper,reducer and the partitioner that will be used in mapreduce program if we dont specify any…

Continue Reading →

Big Data

Category: Data Analytics

pig tutorial 5 – debugging pig with diagnostic operators like describe, dump, explain and illustrate

pig tutorial 4 – inner join, outer join, replicated join, skewed join

pig tutorial 3 – Flatten, GROUP, COGROUP, CROSS, DISTINCT, FILTER, FOREACH, LIMIT, Load, ORDER, SAMPLE, SPLIT, STORE, STREAM and UNION Operators

pig tutorial 2 – pig data types, relations, bags, tuples, fields and parameter substitution

pig tutorial 1 – multiquery execution, store, dump, dependencies and replicated, skewed, merge joins

input formats and output formats in hadoop and mapreduce

default mappper, reducer, partitioner, multithreadedmapper and split size configuration in hadoop and mapreduce