Working with multiple Large Files in Python -
i have around 60 files each contains around 900000 lines each line 17 tab separated float numbers. per each line need calculation using corresponding lines 60 files, because of huge sizes (each file size 400 mb) , limited computation resources, takes long time. know there solution fast?
it depends on how process them. if have enough memory can read files first , change them python data structures. can calculations.
if files don't fit memory easiest way use distributed computing mechanism (hadoop or other lighter alternatives).
another smaller improvements use fadvice linux function call how using file (sequential reading or random access), tells operating system how optimize file access.
if calculations fit common libraries numpy numexpr has lot of optimizations can use them (this can if computations use not-optimized algorithms process them).
Comments
Post a Comment