Slowdown in large perl array -
i'm running perl program have take 1 million line text file, break down chunks (anywhere between 50 , 50,000 lines per chunk), , run calculations , such on them. right now, load of data array1. take array2 , use pull chunks of data need. need perform on array 2, , go , grab next set.
example data
a, blah1, blah2
a, blah6, blah7
a, blah4, blah5
b, blah2, blah2
so grab first 3 array 2, sort them, move on next set. program works pretty , efficiently begin with, experiences severe slowdown later on.
50k takes 50 seconds, 100k takes 184 seconds, 150k takes 360 seconds, 200k takes 581 seconds, , gets exponentially worse program continues (4500 seconds @ line 500k)
no, cannot use database project, suggestions?
my @rows1=<file>; $temp = @rows1; for($k = 0; $k < $temp; $k++) { @temp2array = (); $temp2count = 0; $thisrow = $rows1[$k]; @thisarray = split(',', $thisrow); $currcode = $thisarray[0]; $flag123 = 0; $temp2array[$temp2count] = $thisrow; $temp2count++; while ($flag123 == 0) { $nextrow = $turows1[$k + 1]; @nextarray = split(',', $nextrow); if ($currcode eq $nextarray[0]) { $temp2array[$temp2count] = $nextrow; $k++; $temp2count++; } else { $flag123 = 1; } } }
i have edited code more resemble answer below, , i've got these times:
50k = 42, 100k = 133, 150k = 280, 200k = 467, 250k = 699, 300k = 978, 350k = 1313
its not keeping linear, , trend, prog still take 14000+ seconds. i'll investigate other parts of code
loading entire large file memory slow down os need start swapping pages of virtual memory. in such cases, best deal section of file need.
in case, seem processing lines have same value in first field together, like:
my @lines = (); $current_key = ''; while (<file>) { ($key) = split /,/; # first column if ($key ne $current_key) { # new key. process lines previous key. if (@lines > 0) { process(@lines); } @lines = (); $current_key = $key; } push @lines, $_ } # don't forget lines last key if (@lines > 0) { process(@lines); }
this way, storing in memory enough lines make 1 group.
(i assuming input data sorted or organized key. if that's not case, make multiple passes through file: first pass see keys need process, , subsequent passes collect lines associated each key.)
Comments
Post a Comment