Monday, September 15, 2008

Programmatically surrealistic

Earlier in the day, i had an argument with the update-manager of my test machine who was refusing to update my Dapper Drake because of some "hash mismatch" at the mirror. For some reason, i didn't trust the ubunutu forums that state that "this occur when you try to update while mirrors are synchronizing" and finally discovered that the way hashes (that confirm your softwares' authenticity) are managed has changed with Hardy Heron ... Doh. Fortunately enough, someone is still thinking that some user prefer to update from a text console rather than having a "user-friendly" graphical interface but only at the cost of burning a new ISO distribution and moving your feets two level down in the lab.

So my thanks fly to the maintainers of the update-manager-core package and the sudo do-release-upgrade -m desktop trick that did the whole upgrade without pop'ing any window and without silly hashing bugs.

You thought that would have been the weirdest thing around? You were wrong.

With the machine updated, i'm now able to test the new release of that European project i'm working on ... The software has a thing called "the minmex" which spawns thread for various sub-tasks and i thought it would be handy to catch those "logfile " lines it produces on stdout, and process them in a script that would automatically start following each new log. Easy as (s)hell :


# invoke this as:
# sudo bin/minmex | perl myscript.pl
while ($_=) {
print;  # just echo back the filename
chomp;
sleep 1; # just let the minmex actually create the file
system "tail -f $1 &" if (/logfile (.*)/);
print "waiting for more ...\n";
}


Except it didn't worked. Only the "master thread" displayed something. Of course, running a plain "sudo bin/minmex" show you all you wish to know on the console :-/
And of course, as usual with the latest libc6, you don't even see any write(1, ...) in the output of an strace bin/minmex. My colleague Cyril supposed something odd with termcaps, but there was absolutely nothing that sophisticated in the minmex. Just a very plain printf("logfile %s"); ...

Fairly puzzling, don't you think ?

Well, now it started getting funny when i realised that i actually was facing this in the 'master' thread:

mkdir("/tmp/log", 0777) = -1 EEXIST (File exists)
fstat64(1, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbf97285c) = -1 ENOTTY (Inappropriate ioctl for device)
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7f7f000
write(1, "logfile /tmp/log/26274_"..., 71logfile /tmp/log/26274_anaMinmex
) = 71
brk(0)                                  = 0x806b000
brk(0x808c000)                          = 0x808c000
open("/tmp/log/26274_anaMinmex", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3
.

I'm not very comfortable with those termcaps things, but that ENOTTY error clearly suggested that Cyril was right and that the source code was wrong 0_o ... Using plain write(1,...) instead of printf(...) let me see all the stuff i wanted, but that can hardly be called a solution. After a bit crawling in the sourcecode of libc6, i could not find any direct relationship between tcgetattr (the function using that TCGETS argument) and printf, but some reference to glibc-2.6.1/misc/getpass.c did ring a bell in my mind ...
What if the problem was actually coming from the sudo not liking me to redirect the program's output ? what if, for some reason, it *had* to ensure the access to a terminal (e.g. to prompt me for a password) and failed to properly restore the output-to-next process ?

And indeed, running the whole chain in a regular root session has now shown a perfectly regular behaviour, in other terms:
  • either you shouldn't try to redirect stdout when using printf
  • or you shouldn't try to build multitasking programs
  • or you shouldn't try to run those programs as a sudoer
  • and you certainly shouldn't do all these three things "for chaos and maddness awaits thou at thee end" ...
For your mind's sanity, i just hope that you'll never ever find such a post useful. But in case you happen to struggle with such a debugging nightmare, i hope i've got a sufficiently high pagerank to get ahead all these mailing list that discuss about whether a bug is a bug or not and that you'll get a clear understanding of the situation faster than i did...

HAve a nice day,
/Pype, the crazy coder.

edit: the problem seems to be tcsh-specific (e.g. doing the plain sudo bin/minmex | perl myscript.pl in bash will work) and there are two possible (and both equally surrealistic) work-around for those who don't want to switch to bash:
  • sudo -c tcsh "bin/minmex | perl myscript.pl"
  • sudo -c tcsh "bin/minmex" | perl myscript.pl
I'm accepting any explanation for this, and myself strongly suspecting tcsh to do something wrong when building the pipes between minmex and perl...

No comments: