shell pipe "loses" parts of the data, was: help needed to debug Perl script
M. Fioretti
mfioretti at nexaima.net
Wed Oct 17 16:36:18 UTC 2018
Hello all, again
(sorry, this is a bit long, but it can't be helped)
in another thread today, I wrote how a Perl script I had been using
for weeks without problems had suddenly started to drive me mad, losing
~80% of the data it was supposed to print.
Basically the script was telling me "I have built a hash with ~26K keys,
and now I am going to print them all, one per line", but instead of ~26k
lines of data, it would only print ~4700. For details, please see my
other thread.
After some painful manual browsing of the input and output data, I have
realized that those missing lines were lost AFTER the script, in a shell
pipe. That Perl script (myscript.pl) is wrapped into a bash one that
filters this output in this way:
myscript.pl 2> error.log | tee datadump | grep ^csvfinal | sort | cut
-c10- > result.csv
what I have realized only ten minutes ago is that, after weeks working
without problems, the **PIPE** stopped working. Namely:
a) datadump was simply truncated. Like, the script would print +30k
lines,
and datadump would contain only the first ~6k or so
b) on top of that, grep did not extract all the lines it was supposed to
Here is what I mean
#> grep -c ^csvfinal datadump
5700 (this is because datadump itself was truncated, see above)
#> grep ^csvfinal datadump | wc -l
4700 (grep extracts LESS lines than it can count)
If, instead, I run these commands manually, one at a time (note the -a
option!!!):
myscript.pl > manualdatadump
grep -a ^csvfinal manualdatadump | sort | cut -c10- > result.csv
then result.csv contains all the lines it was supposed to contain.
COMMENTS/QUESTIONS: in hindsight, I wasted time by not looking at the
pipe first simply because the script is much more complex, so I assumed
the fault could only be there. My fault, sorry :-(
BUT: what contributed to my confusion is the fact that everything, pipe
included, had been working for weeks without a hitch. Right now, my
explanation of what happened is that, by pure chance, yesterday:
a) the volume of input data processed and output by the script passed
for
the first time some threshold that makes the buffers used by shell pipes
overflow
b) AND the data also contained, for the first time, non-ascii characters
that
make grep fail unless the -a option is used
I am not sure at all of what I have just written, and every comment, and
tip
to make sure this does not happen again in some future script is very
welcome.
Marco
-------- Original Message --------
Subject: SOLVED (not completely...): OT: help needed to debug Perl
script
Date: 2018-10-17 18:14
From: "M. Fioretti" <mfioretti at nexaima.net>
To: ubuntu-users at lists.ubuntu.com
Reply-To: mfioretti at nexaima.net
On 2018-10-17 12:06, Colin Law wrote:
> On Wed, 17 Oct 2018 at 06:07, M. Fioretti <mfioretti at nexaima.net>
> wrote:
>> ...
>> 157 print "\nADDINGURX: $url;\n";
>> 158 print "\nADDINGURQ: $qq;\n";
>> ...
>> ~4700 lines starting with ADDINGURX
>> ZERO lines starting with ADDINGURQ
>
> Do you mean that line 157 is printing ok but the output from line 158
> never appears?
> Are you sure there is not another line there somewhere printing
> ADDINGURX?
Answering (indirectly) also to Joel:
the snippet of script that I posted is the part of the actual output of
#> cat -n myscript
So this code, from my original message:
147 my $keycounter = 1;
148
149 foreach my $qtq (sort keys %all) {
150
151 printf "\nALLCHECK: %6.6s >> %s;\n", $keycounter, $qtq;
152 $keycounter++;
153 }
154
155 foreach my $qq (sort keys %all) {
156 $url = $qq;
157 print "\nADDINGURX: $url;\n";
158 print "\nADDINGURQ: $qq;\n";
is lines 147 to 158 of the complete script, and consequently yes, I was
sure that there was no other Perl code at all playing tricks here.
What I have been trying to say, maybe badly, is:
a) the above is part of the actual code
b) I run the script dumping the output to a file, for further
processing:
#> myscript > datadump
c) and I get different numbers of lines from the three statements
(again,
what follows is ACTUAL output of grep at the shell prompt):
#> grep -c ^ALLCHECK datadump (=line 151 prints 26080 keys from the
hash)
26080
#> grep -c ^ADDINGURX datadump (=line 157 prints only 4732 keys from the
hash)
473
#> grep -c ^ADDINGURQ datadump (=line 158 prints only 4732 keys from the
hash)
473
now the "solution":
After looking at the whole flow from scratch, I found out that the
problem
seems to be 100% *outside* that specific Perl script, and somehow even
more
confusing (for me at least). But that deserves a different thread,
coming
in a few minutes.
Thanks!!!
Marco
--
http://mfioretti.com
More information about the ubuntu-users
mailing list