Working with multiple Large Files in Python -

March 15, 2012

i have around 60 files each contains around 900000 lines each line 17 tab separated float numbers. per each line need calculation using corresponding lines 60 files, because of huge sizes (each file size 400 mb) , limited computation resources, takes long time. know there solution fast?

it depends on how process them. if have enough memory can read files first , change them python data structures. can calculations.

if files don't fit memory easiest way use distributed computing mechanism (hadoop or other lighter alternatives).

another smaller improvements use fadvice linux function call how using file (sequential reading or random access), tells operating system how optimize file access.

if calculations fit common libraries numpy numexpr has lot of optimizations can use them (this can if computations use not-optimized algorithms process them).

Search This Blog

My

Working with multiple Large Files in Python -

Comments

Post a Comment

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

Why am I getting Internal .NET Framework Data Provider error 1025 when passing Method to where? -

postgresql - how to get points from linestring postgis -