| Publications > IHDP Newsletter UPDATE > Update 2/2002 > Article 1 | ||
![]() |
||
|
Newsletter of the International
Human Dimensions Programme on
Global Environmental Change |
||
|
Nr. 2/2002
|
||
|
Conflicting demands
Confidentiality Promises and Data Availability by Ronald R. Rindfuss As the human dimensions research community moves towards fine-grain,
spatially-explicit studies, we face a confidentiality conflict which arises
from the need to serve three quite different goals: a) link people and
the environments they affect, b) protect the confidentiality of respondents,
and c) make data available to the entire scientific community. Linking
data on people and their environments is at the very core of IHDP. Protecting
the confidentiality of respondents is a moral imperative of those involved
in collecting data from humans. Making data available to the entire scientific
community is a goal increasingly shared by those in the IHDP community.
All three are reasonable, indeed laudable, goals, yet their commingling
produces a fundamental conflict that we need to solve, otherwise the quantity
and quality of research will be impeded. As a research community, we need
to completely understand the underpinnings of this conflict across reasonable
scientific goals prior to building the momentum for solutions. This is
a discussion begun in People and Pixels (Liverman, Moran, Rindfuss, and
Stern, 1998). To simplify, I write from the perspective of land use change
research, but extension to other research areas using geographically explicit
data is straightforward. Linking people and environment Understanding land use change involves linking data on human behaviour/intentions
to data on land cover. This linkage can occur at a variety of scales ranging
from individuals to households, to political or administrative units,
to business, religious or other social institutions. Perhaps the most
common example in the current research literature is linkage at the level
of political or administrative units, using census data that has been
aggregated from the household level to the district, county, or municipality
level, along with information showing the boundaries of these administrative
units. Such administrative boundaries can be overlaid on land coverage,
producing the link between land cover and population data. Increasingly we see studies that link specific households to the land
that they own or use. These fine-grain studies are designed to examine
how the characteristics of households, and their individual members, might
impact land use. They are typically based on the supposition that households
are a critical land use decision making unit, and an understanding that
making inferences about household level behaviour from more aggregated
data runs the risk of suffering from the well-known ecological correlation
fallacy (Robinson, 1950). While these studies vary enormously in detail,
a common feature is that they contain geographically explicit information
on the location of the household's residence, as well as land that they
own/use. Fig. 1 shows, for a hypothetical household, ID# 7421, the location
of its residence in Bangkok on a coverage derived from IKONOS data. Protecting confidentiality Data on human behaviour can come from a variety of sources. Typically,
the researcher-respondent exchange begins with establishing the ground
rules for the collection and use of the data. A face-to-face or a telephone
interview will begin with the interviewer explaining the purpose of the
study and how the data will be used. If it is a mail-out, mail-back or
web-based interview, there will be a printed description. With few exceptions,
the researcher is promising the respondent that under no circumstances
will the information be released to a third party in such a manner that
the third party could know about the information, identity or location
of the respondent. Why protect the confidentiality of respondents? It is a general ethical
principle that researchers should do everything in their power to avoid
that respondents will be harmed as a result of participating in the study.
Since a third party might misuse the data provided, the best protection
for the respondent is to discard (or protect, in the case of a longitudinal
study) the link between the respondent's identity and the information
provided. Increasingly universities and funding agencies are setting up
review boards to compel ethical adherence. The ability to collect data
rests on the respondents' expectation that providing the information is
needed for some important scientific or administrative purpose, and that
the information will not be used to harm the respondent in any way. If
these expectations are not met, respondents will not provide the data,
creating a serious crisis for the research community. Finally, with the
increasingly easy use of computers, publicly released data is available
outside the scientific community, hence increasing the risk of a confidentiality
breech. Sharing data Although there is variance across disciplines, data collected by researchers
should be shared with other scientists. Collecting data has become so
expensive that we need to obtain maximum payoff by allowing and encouraging
the entire scientific community to analyse the data. Another rationale
is that if we are troubling respondents for their time and information,
then this information should be put to the greatest scientific good. Sharing
data facilitates comparative studies. Some journals are requiring authors
to make their data available to allow for replication and verification
of findings. For some researchers, their own academic reputation is influenced
by the quality of the data they provide to the research community. The conflict The conflict is most intense with fine-grain, geographically explicit
data. Reconsider Fig. 1: here, a household survey is linked to IKONOS
data having one-meter resolution. We can literally »see« the
household's dwelling unit; even if the names of household members were
suppressed, it would be very easy to determine who lived there. The link
to IKONOS data provides a road map to the dwelling unit. What are the risks to respondents if such data were put in the public
domain? Clearly, it depends on what was included in the survey, but even
if it were seemingly innocuous data, there are several ways that it could
place household members at risk. An estranged spouse could use the information
in a manner detrimental to one or more household members. The military
could use the information to coerce household members into the army. Unscrupulous
merchants could use the information for their own advantage. The exact
nature of the risk depends on the characteristics of the household, the
nature of the data collected, and social, political and legal conditions.
However, it is difficult to imagine any situation where the risk would
be zero. Potential resolutions The risk is greatest when some third party can identify with certainty
a specific household. Doing so in the Fig. 1 example is child's play.
But even if the spatial data were coarser resolution, such linkage would
make it relatively easy to find a specific household. There are several
directions in which the confidentiality conflict can be resolved, but
all have drawbacks. The simplest solution is not collecting geographically explicit survey
data. This, however, would profoundly inhibit the analysis of land use
decision-making. The alternative of collecting fine-grained linked data
but not releasing it to the broader scientific community also would inhibit
scientific progress. Introducing a substantial amount of random error
or spatial transformations in a public data set would reduce the ability
to identify specific households with certainty, but it also would diminish
the scientific value of the data. The classic solution, favoured by census agencies around the globe, is
to aggregate up to geographic units that contain a sufficiently large
number of households so that no single household can be identified. The
drawback is that using such aggregated data to make inferences at the
household level runs aground the ecological correlation fallacy. An alternative
solution, now used for data sets with contextual data, is to release the
linked data only after a researcher has assured that the confidentiality
of respondents will be protected. One problem with this solution for fine-grain,
spatially linked data is that the ease with which a researcher could find
a respondent is considerably simpler than with the usual contextual variables.
The alternative of inviting interested researchers to spend time at the
institution that collected the data allows the original investigators
some measure of insuring that confidentiality is protected, but not a
failsafe assurance. Finally, a solution that has been discussed, but to the best of my knowledge, never implemented, is to keep the data at an institution that will protect the confidentiality of respondents, allow interested researchers to have access to fine-grain, geographically explicit data on the host institution's computer system, and then have a system that screens output to insure that individual respondents cannot be identified from the totality of output generated by the outside researcher. For example, an outside researcher might create new variables based on the available data, use these variables in a statistical model, and have the results returned to the researcher. Such a solution is untested and is likely to be expensive. Considerable effort would be needed to implement and test such a system, but if available, it would serve to reduce the conflict among the three reasonable goals enunciated at the beginning of this article. Click here for references to
this article. Ronald R. Rindfuss is a member of the Scientific Steering Committee of the IHDP/IGBP Project on Land-Use and Land-Cover Change (LUCC); he is a researcher at the University of North Carolina, Department of Sociology, Chapel Hill, NC, USA; ron_rindfuss@unc.edu; http://www.unc.edu |
||
|
IHDP Update, Newsletter of the International Human Dimensions Programme on Global Environmental Change, Number 2/2002 |
| © IHDP, Walter-Flex-Str. 3, D - 53113
Bonn, Germany, Tel. +49 (0) 228 73 90 50 E-mail: ihdp@uni-bonn.de http://www.ihdp.org |
||