Hi Michael,<br><br>I am wondering if the way to establish my credentials is to write something like this.<br><br>It is interesting that you refer to Named Pipes as a unix concept.  I thought that Microsoft had invented the name "Named Pipes" to try to adapt to having its software work on a network of heterogenous computers.<br>

<br>Karen<br><br><div class="gmail_quote">On Sat, May 28, 2011 at 11:26 PM, Michael Paoli <span dir="ltr"><<a href="mailto:Michael.Paoli@cal.berkeley.edu">Michael.Paoli@cal.berkeley.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

So, not too long ago, had bit of a task I wanted to accomplish.  I had<br>

two quite large compressed (gzip and bzip2) files of a rather larger<br>

hard drive image (120,031,511,040 bytes).  Due to size, time, etc., I<br>

wanted my manipulations of these files to be fairly I/O and time<br>

efficient.  E.g. I didn't want to read a source file or write a target<br>

file more than once, or unnecessarily read a target file.  I wanted to<br>

perform various hash calculations (md5, sha1, sha256 and sha512) on the<br>

uncompressed data from each of the compressed images, and also extract<br>

and save as files, the uncompressed images, and also compare them, and<br>

do a bit of custom block-wise and running total hash calculations on the<br>

uncompressed images (the multisum.128M program I reference below is bit<br>

of perl code I wrote to do that - it simultaneously does both block-by-block<br>

and running cumulative (up through block) hash calculations - in this<br>

case by 128 MiB blocks (if the files differed, I wanted to know within<br>

which 128 MiB blocks they differed)).  I also wanted to know if the two<br>

uncompressed files were byte-by-byte identical all the way through<br>

(regardless as to whether or not the various hash calculations may have<br>

also matched).<br>

<br>

So, ... to do all that rather efficiently, I used a bunch of named<br>

pipes.  The only processes that read from the compressed files were<br>

their uncompressing programs, and a single program wrote the<br>

uncompressed images - nothing else read those images.  All the other I/O<br>

- reading, writing (for comparison and calculating all the various hash<br>

functions) read from, or wrote to a named pipe.  tee(1) was also<br>

significantly used to simultaneous write to multiple outputs (stdout<br>

and/or various "files" - mostly named pipes in our case here).<br>

<br>

What's a named pipe?  A First In First Out (FIFO) file.  One of the<br>

types of "special" files in Unix(/Linux, etc.) - but unlike block or<br>

character special devices, no special privilege is needed to create<br>

named pipes.  Named pipes are sort of like the shell's pipe (|), except<br>

they exist in the filesystem (and thus have a name), and are read from<br>

and written to, rather like ordinary files ... except they're not.  They<br>

have no disk data blocks - they're just a buffer - one generally has one<br>

process read from a named pipe, and another process write the named<br>

pipe.  The data "comes out from" (is read from) the pipe, with the bytes<br>

coming out in the same order they were written to the pipe.  mknod(1) is<br>

used to create named pipe, e.g.:<br>

$ mknod name p<br>

would generally create a named pipe of name name, e.g.:<br>

$ mknod name p && ls -ond name && rm name<br>

prw-------  1 1003 0 May 28 23:14 name<br>

$<br>

That leading p in our ls -o (or -l) listing shows us that it's a named<br>

pipe.<br>

<br>

One generally needs to set something up to read from named pipe, before<br>

writing to named pipe.<br>

<br>

Anyway, bit of example program I did a while back for the task at hand.<br>

I've tweaked it slightly (for readability - namely shortening some<br>

names/paths and folding some lines), and added some comments to describe<br>

a bit better what it does.<br>

<br>

#!/bin/sh<br>

<br>

set -e # exit non-zero if command fails<br>

<br>

cd /tmp/hd<br>

<br>

for f in gz bz2<br>

do<br>

    # create pair of files for each of our gz / bz2 flavors we'll use<br>

    mknod p-"$f" p<br>

    mknod p-multisum.128M-"$f" p<br>

    # launch our custom multisum.128M on each, saving stdout and stderr<br>

    ./multisum.128M p-multisum.128M-"$f" \<br>

    > P-multisum.128M-"$f".out \<br>

    2> P-multisum.128M-"$f".err &<br>

done<br>

<br>

# launch our cmp, and have it report results to file CMP, and save<br>

# stdout and stderr<br>

if >cmp.out 2>cmp.err cmp p-gz p-bz2; then<br>

    echo matched > CMP<br>

else<br>

    echo not matched > CMP<br>

fi &<br>

<br>

for s in md5 sha1 sha256 sha512<br>

do<br>

    for f in gz bz2<br>

    do<br>

        # make named pipes for each of our hash and file type<br>

        # combinations<br>

        mknod p-"$s"-"$f" p<br>

        # start the hash calculations on each, saving stdout<br>

        "$s"sum < p-"$s"-"$f" > P-"$s"sum-"$f" &<br>

    done<br>

done<br>

<br>

# start our uncompressions, pipe to tee to write our pipes and file for<br>

# each<br>

gzip -d < /tmp/sdb1/hd.gz | tee p*-gz > hd-gz &<br>

bzip2 -d < /tmp/sdb2/hd.bz2 | tee p*-bz2 > hd-bz2 &<br>

<br>

# essentially all the preceding read/write stuff was started in<br>

# background, with reads started before writes<br>

<br>

# we just then wait for the preceding background stuff to all finish,<br>

# at which point we should be done<br>

wait<br>

<br>

I'll commonly use similar technique when I wish to calculate mutliple<br>

hash values on a CD or DVD or image thereof.  E.g. I'll create named<br>

pipe file(s), start background process(es) to calculate hash(es) on the<br>

named pipe(s), redirecting their output to file(s), then I'll read the<br>

CD/DVD/image, and typically via tee(1), write it to the named pipes -<br>

and typically also pipe (|) tee(1)'s stdout to one of the hash programs<br>

I wish to use.  In that way, I read the input CD/DVD/image just once,<br>

rather than rereading and doing that I/O on the media or disk repeatedly<br>

for each hash type I wish to calculate.<br>

<br>

_______________________________________________<br>

Buug mailing list<br>

<a href="mailto:Buug@weak.org" target="_blank">Buug@weak.org</a><br>

<a href="http://www.weak.org/mailman/listinfo/buug" target="_blank">http://www.weak.org/mailman/listinfo/buug</a><br>

</blockquote></div><br><br clear="all"><br>-- <br>Karen Lee Hogoboom 

<div>Computer Programmer</div>

<div>Phone:  (510) 666-8298<br>Mobile:  (510) 407-4363</div>

<div> </div>

<div><a href="mailto:khogoboom@gmail.com" target="_blank">khogoboom@gmail.com</a><br><a href="http://www.linkedin.com/in/karenlhogoboom" target="_blank">http://www.linkedin.com/in/karenlhogoboom</a></div><br>