apache pig - Pig Writing too many records in a single file -


i reading huge file 4.4 million records. want split 2 records 4 million records , .4 million records based , want store them.

the logic :

  1. read huge file 4.4 million records .
  2. split 2 relations , b (1:10 ratio)
  3. union relations c = union a, b
  4. store c .

the issue c contain 2 part-m-* files . 1 .4 million records , other 4 million records .

now, thing takes lot of time write these files . if dont union , store , b seperately in 2 files , faster. want mappers output fast.

for , need more mappers , want mappers output specific relation written multiple files .

hoow can ensure ?


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Python ctypes access violation with const pointer arguments -