apache pig - Pig Writing too many records in a single file -
i reading huge file 4.4 million records. want split 2 records 4 million records , .4 million records based , want store them.
the logic :
- read huge file 4.4 million records .
- split 2 relations , b (1:10 ratio)
- union relations c = union a, b
- store c .
the issue c contain 2 part-m-* files . 1 .4 million records , other 4 million records .
now, thing takes lot of time write these files . if dont union , store , b seperately in 2 files , faster. want mappers output fast.
for , need more mappers , want mappers output specific relation written multiple files .
hoow can ensure ?
Comments
Post a Comment