pig tutorial 11 – pig example to implement custom filter functions

Filter functions are eval functions that return a boolean value. Filter functions can be used anywhere a Boolean expression is appropriate, including the FILTER operator or bincond expression.

Input Data

ABCDDI,ABCDDI./shelf=0/port=21,OGEA02699269,08/03/2012
ABCDJC,ABCDJC./shelf=0/port=31,OGEA05357149,06/18/2016
ABCDEG,ABCDEG./shelf=0/svlan=15,OGEA16722054,07/24/2015

Here we want to filter out the sub_element_id which is not a port.


raw_data = load 'src/main/resources/service_mapping.txt' USING PigStorage(',') as (element_id:chararray,sub_element_id:chararray,service_id:chararray,update_date:chararray);

raw_data_filter = FILTER raw_data BY com.bt.haas.pig.CustomFilter(raw_data.sub_element_id);

dump raw_data_filter;

Custom Filter Code

import java.io.IOException;
import org.apache.pig.FilterFunc;
import org.apache.pig.data.Tuple;

public class CustomFilter extends FilterFunc {

@Override
public Boolean exec(Tuple input) throws IOException {
// TODO Auto-generated method stub

String str = (String) input.get(0);
int index = str.indexOf("port");
if (index >= 0) {
return true;
} else {
return false;
}}}