pig tutorial 9 – pig example to implement custom eval function for foreach

Eval is the most common type of function. It can be used in FOREACH statements.Lets take an example

Input Data

ABCDDI,ABCDDI./shelf=0/port=21,OGEA02699269,08/03/2012
ABCDJC,ABCDJC./shelf=0/port=31,OGEA05357149,06/18/2016
ABCDEG,ABCDEG./shelf=0/port=15,OGEA16722054,07/24/2015
ABCDHG,ABCDHG./shelf=0/port=1,OGEA17217386,08/14/2015
ABCDJS,ABCDJS./shelf=0/port=24,OGEA04123702,09/29/2016
ABCDHF,ABCDHF./shelf=0/port=47,OGEA02497453,06/19/2012

The data above needs to be formatted as the sub_element_id which is the second field in the above data is appended with the element_id . So we need to modify the sub_element_id from ABCDDI./shelf=0/port=21 to /shelf=0/port=21. The below script uses the eval function CustomEval which does this modification.


set default_parallel 1;
raw_data = load 'service_mapping.txt' USING PigStorage(',') as (element_id:chararray,sub_element_id:chararray,service_id:chararray,update_date:chararray);
raw_data_valid_final = FOREACH raw_data GENERATE element_id,com.blog.pig.CustomEval(sub_element_id),service_id,update_date;
dump raw_data_valid_final;

The name of the UDF has to be fully qualified with the package name or an java.io.IOException will be thrown.

Custom Eval Code

The UDF class extends the EvalFunc class which is the base class for all eval functions. It is parameterized with the return type of the UDF which is a Java String in this case.Exec function is invoked on every input tuple. The input into the function is a tuple with input parameters in the order they are passed to the function in the Pig script. In our example, it will contain a single string field corresponding to the sub_element_id name.So we are basically converting the sub_element_id as explained above.


import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;

public class CustomEval extends EvalFunc<String> {

public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0)
return null;
try {
String str = (String) input.get(0);
String[] split = str.split("\\.");
if (split.length == 2) {
return split[1];
} else {
return "";
}
} catch (Exception e) {
throw new IOException(e);
}
}

}