[buug] Fun (and practical) uses of named pipes (example)

Sun May 29 08:20:58 PDT 2011

Hi Michael,

I am wondering if the way to establish my credentials is to write something
like this.

It is interesting that you refer to Named Pipes as a unix concept.  I
thought that Microsoft had invented the name "Named Pipes" to try to adapt
to having its software work on a network of heterogenous computers.

Karen

On Sat, May 28, 2011 at 11:26 PM, Michael Paoli <
Michael.Paoli at cal.berkeley.edu> wrote:

> So, not too long ago, had bit of a task I wanted to accomplish.  I had
> two quite large compressed (gzip and bzip2) files of a rather larger
> hard drive image (120,031,511,040 bytes).  Due to size, time, etc., I
> wanted my manipulations of these files to be fairly I/O and time
> efficient.  E.g. I didn't want to read a source file or write a target
> file more than once, or unnecessarily read a target file.  I wanted to
> perform various hash calculations (md5, sha1, sha256 and sha512) on the
> uncompressed data from each of the compressed images, and also extract
> and save as files, the uncompressed images, and also compare them, and
> do a bit of custom block-wise and running total hash calculations on the
> uncompressed images (the multisum.128M program I reference below is bit
> of perl code I wrote to do that - it simultaneously does both
> block-by-block
> and running cumulative (up through block) hash calculations - in this
> case by 128 MiB blocks (if the files differed, I wanted to know within
> which 128 MiB blocks they differed)).  I also wanted to know if the two
> uncompressed files were byte-by-byte identical all the way through
> (regardless as to whether or not the various hash calculations may have
> also matched).
>
> So, ... to do all that rather efficiently, I used a bunch of named
> pipes.  The only processes that read from the compressed files were
> their uncompressing programs, and a single program wrote the
> uncompressed images - nothing else read those images.  All the other I/O
> - reading, writing (for comparison and calculating all the various hash
> functions) read from, or wrote to a named pipe.  tee(1) was also
> significantly used to simultaneous write to multiple outputs (stdout
> and/or various "files" - mostly named pipes in our case here).
>
> What's a named pipe?  A First In First Out (FIFO) file.  One of the
> types of "special" files in Unix(/Linux, etc.) - but unlike block or
> character special devices, no special privilege is needed to create
> named pipes.  Named pipes are sort of like the shell's pipe (|), except
> they exist in the filesystem (and thus have a name), and are read from
> and written to, rather like ordinary files ... except they're not.  They
> have no disk data blocks - they're just a buffer - one generally has one
> process read from a named pipe, and another process write the named
> pipe.  The data "comes out from" (is read from) the pipe, with the bytes
> coming out in the same order they were written to the pipe.  mknod(1) is
> used to create named pipe, e.g.:
> $ mknod name p
> would generally create a named pipe of name name, e.g.:
> $ mknod name p && ls -ond name && rm name
> prw-------  1 1003 0 May 28 23:14 name
> $
> That leading p in our ls -o (or -l) listing shows us that it's a named
> pipe.
>
> One generally needs to set something up to read from named pipe, before
> writing to named pipe.
>
> Anyway, bit of example program I did a while back for the task at hand.
> I've tweaked it slightly (for readability - namely shortening some
> names/paths and folding some lines), and added some comments to describe
> a bit better what it does.
>
> #!/bin/sh
>
> set -e # exit non-zero if command fails
>
> cd /tmp/hd
>
> for f in gz bz2
> do
>    # create pair of files for each of our gz / bz2 flavors we'll use
>    mknod p-"$f" p
>    mknod p-multisum.128M-"$f" p
>    # launch our custom multisum.128M on each, saving stdout and stderr
>    ./multisum.128M p-multisum.128M-"$f" \
>    > P-multisum.128M-"$f".out \
>    2> P-multisum.128M-"$f".err &
> done
>
> # launch our cmp, and have it report results to file CMP, and save
> # stdout and stderr
> if >cmp.out 2>cmp.err cmp p-gz p-bz2; then
>    echo matched > CMP
> else
>    echo not matched > CMP
> fi &
>
> for s in md5 sha1 sha256 sha512
> do
>    for f in gz bz2
>    do
>        # make named pipes for each of our hash and file type
>        # combinations
>        mknod p-"$s"-"$f" p
>        # start the hash calculations on each, saving stdout
>        "$s"sum < p-"$s"-"$f" > P-"$s"sum-"$f" &
>    done
> done
>
> # start our uncompressions, pipe to tee to write our pipes and file for
> # each
> gzip -d < /tmp/sdb1/hd.gz | tee p*-gz > hd-gz &
> bzip2 -d < /tmp/sdb2/hd.bz2 | tee p*-bz2 > hd-bz2 &
>
> # essentially all the preceding read/write stuff was started in
> # background, with reads started before writes
>
> # we just then wait for the preceding background stuff to all finish,
> # at which point we should be done
> wait
>
> I'll commonly use similar technique when I wish to calculate mutliple
> hash values on a CD or DVD or image thereof.  E.g. I'll create named
> pipe file(s), start background process(es) to calculate hash(es) on the
> named pipe(s), redirecting their output to file(s), then I'll read the
> CD/DVD/image, and typically via tee(1), write it to the named pipes -
> and typically also pipe (|) tee(1)'s stdout to one of the hash programs
> I wish to use.  In that way, I read the input CD/DVD/image just once,
> rather than rereading and doing that I/O on the media or disk repeatedly
> for each hash type I wish to calculate.
>
> _______________________________________________
> Buug mailing list
> Buug at weak.org
> http://www.weak.org/mailman/listinfo/buug
>

-- 
Karen Lee Hogoboom
Computer Programmer
Phone:  (510) 666-8298
Mobile:  (510) 407-4363

khogoboom at gmail.com
http://www.linkedin.com/in/karenlhogoboom
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://buug.org/pipermail/buug/attachments/20110529/ca97f7ff/attachment.html>