[buug] On missing data

Claude Rubinson cmsclaud at arches.uga.edu
Fri Nov 29 20:46:59 PST 2002


One of the issues that I've been dealing with for the survey that I'm
administering is handling missing data.  Coding missing data hasn't
posed much of a problem -- basically, I just code the fact that the
data is missing and then the reason why.

But now I'm beginning to face the issue of handling the missing data
in my data analysis programs and I'm wondering about the intersection
between missing data and computer programming.  I haven't run into any
specific problems yet; rather, I'm curious about the theory of missing
data within computer science (in other words, at this point, it's more
of an academic curiosity than a pressing problem).

If I sound a bit vague, it's because I'm a bit unclear as how to
phrase my question.  Basically, in both survey design and database
design, a great deal of attention is given to the issue of missing
data (e.g., What constitutes missing data?  How do you code it?  How
does one type of missing data differ from another type?  And how does
all of this affect your analysis?).  I assume that there are similar
discussions within computer science but, outside of discussions of
error-handling, I haven't found any references in any of my texts
(e.g. _Practice of Programming_, Code Complete, etc).  In one of my
database texts, I did find a reference to a compsci paper identifying
"14 different types of incomplete data...including overflows,
underflows, errors, and other problems in trying to represent the real
world within the limits of a computer."  I think that I'm looking for
something along those lines -- how does missing data present itself
within computer programming and how do you respond to it?

If anyone could provide any pointers or references, that would be
great.  Thanks.

Claude



More information about the buug mailing list