[buug] I/O redirection and named pipes

Tue Aug 3 06:03:36 PDT 2010

A few things from the 2010-07-15 BUUG meeting,
... among other things discussed,
I/O redirection and named pipes

I/O redirection - much of what was said/covered, I've covered before,
even on this list - for starters, see:
http://www.weak.org/pipermail/buug/2004-June/002444.html
In addition to that, a few (additional) points were made/emphasized:
Order matters - I/O redirection is (mostly) processed left-to-right.
Though 2>&1 can be thought of as redirecting file descriptor 2 (stderr)
to file descriptor 1 (stdout), it more literally and properly means
copy file descriptor 1 to file descriptor 2.
Here, with strace(1), we can see what happens at the system call level.
Using strace(1), and here showing only the specific calls of interest:
$ strace -fv -e trace=dup2,fcntl64 sh -c '2>&1 echo -n ""'
dup2(1, 2)                              = 2
$ strace -fv -e trace=dup2,fcntl64 dash -c '2>&1 echo -n ""'
fcntl64(1, F_DUPFD, 2)                  = 2
$
references: sh(1), strace(1), dup(2), fcntl(2)

And named pipes - rather like shell pipes, but they're a named file on a
filesystem.  A few points were made - need to have something reading the
pipe before trying to write the pipe.  The pipe stores no data on disk
(file of type pipe/FIFO has no data blocks - though buffered data might
be subject to being paged or swapped to disk).  And a practical example.
Let's say we've got a large ISO image file (e.g. DVD, but I'll use CD in
this example), and we want to compute multiple hash algorithms on the
file - but we don't want to have to read the file's blocks from disk
multiple times (which would be inefficient in its redundant disk I/O).
We can use named pipes (and a bit of use of tee(1) and asynchronous job
execution).  Let's say we already validated (via gpg) our files giving
the hashes.
I'll introduce my comments with // at the start of the line:
//So, let's say we first snag information on mtime and length from
//archive:
$ 2>>/dev/null curl -I  
http://releases.ubuntu.com/lucid/ubuntu-10.04-server-amd64.iso | fgrep  
'Modified
Length'
Last-Modified: Tue, 27 Apr 2010 10:56:34 GMT
Content-Length: 710412288
//Next we determine block count (2 KiB blocks for CD-ROM/R/RW) and copy
//our data from CD to file:
$ echo '710412288/2048' | bc -l
346881.00000000000000000000
$ dd if=/media/cdrom0 bs=2048 count=346881 of=ubuntu-10.04-server-amd64.iso
//we then set our mtime to that of the archive copy:
$ TZ=GMT0 touch -t 201004271056.34
//we examine our expected hashes:
$ fgrep ubuntu-10.04-server-amd64.iso *SUMS
MD5SUMS:8ee25c78f4c66610b6872a05ee9ad81b *ubuntu-10.04-server-amd64.iso
SHA1SUMS:74a8ee0a72a539d76dadb4ac2ed233e4cbf9b4df  
*ubuntu-10.04-server-amd64.iso
SHA256SUMS:212cdd71b95b8ee957b826782983890c536ba1fde547e42e9764ee5c12f43c2d  
*ubuntu-10.04-server-amd64.iso
//we create our named pipes:
$ mknod p p && mknod p2 p
//we start our processes reading those named pipes:
$ < p > md5 md5sum &
$ < p2 > sha1 sha1sum &
//we then use tee(1) to feed the named pies
$ < ubuntu-10.04-server-amd64.iso tee p | tee p2 | sha256sum > sha256
//we wait for our background processes to complete then examine our
//results and remove our no longer needed named pipes
$ wait; cat md5 sha1 sha256; rm p p2
8ee25c78f4c66610b6872a05ee9ad81b  -
74a8ee0a72a539d76dadb4ac2ed233e4cbf9b4df  -
212cdd71b95b8ee957b826782983890c536ba1fde547e42e9764ee5c12f43c2d  -
$
//We then compare - confirming all our hashes matched as expected.
Left as an exercise :-) ...  How could we be even more disk I/O
efficient?  Hint: did we really need to read the file we wrote to disk?