hadoop - Flattening tuples in Pig -
i’ve data in following format:
(id, description)
1, xyz something. abc bcd & on.
1, xyz something. abc xyz & on.
2, abc something. abc xyz & on.
i need output in format:
id, word
i tried this:
a = load './data.txt' using pigstorage(',') (id: int, desc:chararray);
b = foreach generate id, flatten(strsplit(desc, '[,?:;\s]'));
this results in output such this:
1, xyz, is, something, abc, bcd, so, on
what want is:
1, xyz
1, is
1, something
etc etc..
how can in pig (without writing udf)?
ps: tried:
b = foreach generate id, flatten(datafu.pig.util.transposetupletobag(strsplit(desc, '[.&,?:;\s]')));
you can use tokenize in pig. please find below answer.
here input file
cat file1
1,xyz something
2,abc something
a = load 'file1' using pigstorage(',');
b = foreach generate $0, flatten(tokenize($1));
dump b
(1,xyz)
(1,is)
(1,something)
(2,abc)
(2,is)
(2,something)
Comments
Post a Comment