Publications > IHDP Newsletter UPDATE > Update 2/2002 > Article 1  
 
IHDP Update Title 2002
 
 
Newsletter of the International Human Dimensions Programme on
Global Environmental Change
 
 
Nr. 2/2002
 
     
 
Conflicting demands
Confidentiality Promises and Data Availability
by Ronald R. Rindfuss


As the human dimensions research community moves towards fine-grain, spatially-explicit studies, we face a confidentiality conflict which arises from the need to serve three quite different goals: a) link people and the environments they affect, b) protect the confidentiality of respondents, and c) make data available to the entire scientific community. Linking data on people and their environments is at the very core of IHDP. Protecting the confidentiality of respondents is a moral imperative of those involved in collecting data from humans. Making data available to the entire scientific community is a goal increasingly shared by those in the IHDP community. All three are reasonable, indeed laudable, goals, yet their commingling produces a fundamental conflict that we need to solve, otherwise the quantity and quality of research will be impeded. As a research community, we need to completely understand the underpinnings of this conflict across reasonable scientific goals prior to building the momentum for solutions. This is a discussion begun in People and Pixels (Liverman, Moran, Rindfuss, and Stern, 1998). To simplify, I write from the perspective of land use change research, but extension to other research areas using geographically explicit data is straightforward.

Linking people and environment

Understanding land use change involves linking data on human behaviour/intentions to data on land cover. This linkage can occur at a variety of scales ranging from individuals to households, to political or administrative units, to business, religious or other social institutions. Perhaps the most common example in the current research literature is linkage at the level of political or administrative units, using census data that has been aggregated from the household level to the district, county, or municipality level, along with information showing the boundaries of these administrative units. Such administrative boundaries can be overlaid on land coverage, producing the link between land cover and population data.

Increasingly we see studies that link specific households to the land that they own or use. These fine-grain studies are designed to examine how the characteristics of households, and their individual members, might impact land use. They are typically based on the supposition that households are a critical land use decision making unit, and an understanding that making inferences about household level behaviour from more aggregated data runs the risk of suffering from the well-known ecological correlation fallacy (Robinson, 1950). While these studies vary enormously in detail, a common feature is that they contain geographically explicit information on the location of the household's residence, as well as land that they own/use. Fig. 1 shows, for a hypothetical household, ID# 7421, the location of its residence in Bangkok on a coverage derived from IKONOS data.

Protecting confidentiality

Data on human behaviour can come from a variety of sources. Typically, the researcher-respondent exchange begins with establishing the ground rules for the collection and use of the data. A face-to-face or a telephone interview will begin with the interviewer explaining the purpose of the study and how the data will be used. If it is a mail-out, mail-back or web-based interview, there will be a printed description. With few exceptions, the researcher is promising the respondent that under no circumstances will the information be released to a third party in such a manner that the third party could know about the information, identity or location of the respondent.

Why protect the confidentiality of respondents? It is a general ethical principle that researchers should do everything in their power to avoid that respondents will be harmed as a result of participating in the study. Since a third party might misuse the data provided, the best protection for the respondent is to discard (or protect, in the case of a longitudinal study) the link between the respondent's identity and the information provided. Increasingly universities and funding agencies are setting up review boards to compel ethical adherence. The ability to collect data rests on the respondents' expectation that providing the information is needed for some important scientific or administrative purpose, and that the information will not be used to harm the respondent in any way. If these expectations are not met, respondents will not provide the data, creating a serious crisis for the research community. Finally, with the increasingly easy use of computers, publicly released data is available outside the scientific community, hence increasing the risk of a confidentiality breech.

Sharing data

Although there is variance across disciplines, data collected by researchers should be shared with other scientists. Collecting data has become so expensive that we need to obtain maximum payoff by allowing and encouraging the entire scientific community to analyse the data. Another rationale is that if we are troubling respondents for their time and information, then this information should be put to the greatest scientific good. Sharing data facilitates comparative studies. Some journals are requiring authors to make their data available to allow for replication and verification of findings. For some researchers, their own academic reputation is influenced by the quality of the data they provide to the research community.

The conflict

The conflict is most intense with fine-grain, geographically explicit data. Reconsider Fig. 1: here, a household survey is linked to IKONOS data having one-meter resolution. We can literally »see« the household's dwelling unit; even if the names of household members were suppressed, it would be very easy to determine who lived there. The link to IKONOS data provides a road map to the dwelling unit.

What are the risks to respondents if such data were put in the public domain? Clearly, it depends on what was included in the survey, but even if it were seemingly innocuous data, there are several ways that it could place household members at risk. An estranged spouse could use the information in a manner detrimental to one or more household members. The military could use the information to coerce household members into the army. Unscrupulous merchants could use the information for their own advantage. The exact nature of the risk depends on the characteristics of the household, the nature of the data collected, and social, political and legal conditions. However, it is difficult to imagine any situation where the risk would be zero.

Potential resolutions

The risk is greatest when some third party can identify with certainty a specific household. Doing so in the Fig. 1 example is child's play. But even if the spatial data were coarser resolution, such linkage would make it relatively easy to find a specific household. There are several directions in which the confidentiality conflict can be resolved, but all have drawbacks.

The simplest solution is not collecting geographically explicit survey data. This, however, would profoundly inhibit the analysis of land use decision-making. The alternative of collecting fine-grained linked data but not releasing it to the broader scientific community also would inhibit scientific progress. Introducing a substantial amount of random error or spatial transformations in a public data set would reduce the ability to identify specific households with certainty, but it also would diminish the scientific value of the data.

The classic solution, favoured by census agencies around the globe, is to aggregate up to geographic units that contain a sufficiently large number of households so that no single household can be identified. The drawback is that using such aggregated data to make inferences at the household level runs aground the ecological correlation fallacy. An alternative solution, now used for data sets with contextual data, is to release the linked data only after a researcher has assured that the confidentiality of respondents will be protected. One problem with this solution for fine-grain, spatially linked data is that the ease with which a researcher could find a respondent is considerably simpler than with the usual contextual variables. The alternative of inviting interested researchers to spend time at the institution that collected the data allows the original investigators some measure of insuring that confidentiality is protected, but not a failsafe assurance.

Finally, a solution that has been discussed, but to the best of my knowledge, never implemented, is to keep the data at an institution that will protect the confidentiality of respondents, allow interested researchers to have access to fine-grain, geographically explicit data on the host institution's computer system, and then have a system that screens output to insure that individual respondents cannot be identified from the totality of output generated by the outside researcher. For example, an outside researcher might create new variables based on the available data, use these variables in a statistical model, and have the results returned to the researcher. Such a solution is untested and is likely to be expensive. Considerable effort would be needed to implement and test such a system, but if available, it would serve to reduce the conflict among the three reasonable goals enunciated at the beginning of this article.

Click here for references to this article.

Ronald R. Rindfuss is a member of the Scientific Steering Committee of the IHDP/IGBP Project on Land-Use and Land-Cover Change (LUCC); he is a researcher at the University of North Carolina, Department of Sociology, Chapel Hill, NC, USA; ron_rindfuss@unc.edu; http://www.unc.edu

 

IHDP Update, Newsletter of the International Human Dimensions Programme on Global Environmental Change, Number 2/2002
 
to the top
 
  © IHDP, Walter-Flex-Str. 3, D - 53113 Bonn, Germany, Tel. +49 (0) 228 73 90 50
E-mail: ihdp@uni-bonn.de   http://www.ihdp.org