As per ethical codes, which is the most likely case in which one can break confidentiality

What I refer to as the “dominant approach” is, arguably, the most common approach to protecting respondent confidentiality in sociology. Under the dominant approach, if data cannot be collected anonymously, i.e., without any identifying information (Sieber, 1992), researchers must collect, analyze and report data without compromising the identities of their respondents. The ultimate goal is complete confidentiality for every research participant, which Baez (2002) refers to as the “convention of confidentiality.” The convention of confidentiality is primarily upheld as a means to protect research participants from harm. Respondents with stigmatizing traits or behaviors, such as drug users, would be harmed if their identities were revealed in conjunction with reports of their undesirable behavior. Vulnerable populations such as minors or subordinates in the workplace might face negative consequences if their identities are revealed (Baez, 2002). The emphasis on protection from harm is consistent with The Belmont Report’s emphasis on “beneficence”—researchers must not harm their study participants. The convention of confidentiality is upheld as a means to protect the privacy of all persons, to build trust and rapport with study participants, and to maintain ethical standards and the integrity of the research process (Baez, 2002).

Under the dominant approach, confidentiality is addressed during research planning (i.e., proposal writing and securing approval from ethics review boards) and at three points during the research process: data collection, data cleaning, and dissemination of research results. Guillemin and Gillam (2004) refer to the process of obtaining approval to conduct research as “procedural ethics.” They note that procedural ethics, while useful for prompting researchers to think about ethical issues, is largely a formality that cannot address the specific ethical dilemmas that arise in qualitative research. Thus, I focus primarily on addressing confidentiality during data collection, data cleaning, and dissemination, although I return to the issue of review boards below.

First, issues of confidentiality are addressed at the time of data collection. At this point, sociologists make assurances of confidentiality, typically via consent form statements such as, “All identifying characteristics, such as occupation, city, and ethnic background, will be changed.” (Sieber, 1992, p. 52) Researchers typically present confidentiality agreements at the beginning of the data collection process. Discussing confidentiality at the outset is necessary for acquiring informed consent and building trust with respondents (Crow et al., 2006). However, these discussions occur without knowledge of the specific information subsequently shared by the respondent. Furthermore, discussions about informed consent and confidentiality are rarely ongoing; once the consent form is signed researchers lack a standardized way of returning to the issue of confidentiality and data use with respondents.

Second, confidentiality is addressed during data cleaning. Researchers remove identifiers to create a “clean” data set. A clean data set does not contain information that identifies respondents, such as a name or address (such identifying information might be stored elsewhere, in separate, protected files). Some identifiers are easily recognized and dealt with. For example, the names of respondents can be replaced with pseudonyms. Addresses can be deleted from the file once they are no longer needed. However, for both quantitative and qualitative data sets, unique combinations of traits can be used to identify respondents. For example, in quantitative studies of cancer, individuals with rare forms of cancer, such as brain tumors, can be identified with a few pieces of information such as census track, cancer type and gender (Howe et al., 2007). In quantitative studies, computer programs such a Record Uniqueness (Howe et al., 2007) can automatically identify and alter cases with unique attributes or sets of attributes in variable-based data sets. For qualitative data sets, such as interview transcripts, researchers often rely on the find and replace tool in word processing programs to change specific names of people and places. However, qualitative data sets will likely contain references to specific places and persons that are difficult to capture because they vary across respondents and occur randomly throughout transcripts or notes. Work by Sweeney (1996) illustrates this challenge. Sweeney (1996) showed that using find and replace only captured 30–60% of personally-identifying information in medical records. References to specific people (e.g., family members or physicians), nicknames, or additional phone numbers were overlooked. Sweeney created the Scrub program, which uses a complex set of algorithms to find and replace personal identifiers to cleanse medical records of personal information. In her tests, the Scrub program identified 100% of personal identifiers.

Although meticulous data cleaning can remove personal identifiers such as names, the contextual identifiers in individuals’ life stories will remain. This is particularly true for respondents who have faced unusual life events or who are unique in some way, as the case of Rachel illustrates. As such, researchers must also consider whether the specific quotations and examples they present when disseminating research results could lead their respondents to be identified via deductive disclosure. If so, details in the data will need to be modified. As Tolich (2004) notes, the primary concern is whether the people with whom respondents have relationships will be able to identify the respondent given their knowledge of him or her. Inevitably, the researcher takes responsibility for deciding what aspects of a person’s stories or life circumstances need to be changed to maintain confidentiality (Parry & Mauthner, 2004; Wiles et al., 2008). Researchers vary in how much they are willing to change. I changed very few details in my respondents’ quotations (Kaiser, 2006; 2008). Weiss (1994) alters non-essential information, such as a respondent’s specific occupation or the number of children she has to render her unrecognizable to others. Hopkins (1993) creates entirely new “characters” and scenes that are a composite of many people and events she witnessed in her fieldwork but which represent no single person.

However, unlike changing a specific name, changing additional details to render data unidentifiable can alter or destroy the original meaning of the data. For example, in a study of work-family policies, removing or altering details of employer size, industry, policies, and family structure might protect individual and employer identities, but these change make the data useless for addressing the research questions at hand (McKee et al., 2000; Parry & Mauthner, 2004). Readers are typically unaware of how data has been altered and therefore unable to consider the significance of changes for their interpretations of the data or for the validity of the data (Wiles et al., 2008). Moreover, although little is known about how study participants respond to having their data altered, Corden and Sainsbury (2006) report that respondents have strong feelings about how their words or their personal characteristics are altered in research reports.

As an alternative to altering key details in data, researchers leave data unpublished because of fears that publication will lead to deductive disclosure (Wiles et al., 2008). This is the option I chose with Rachel’s data. It is also the option chosen by Goodwin and colleagues (2003) and by Baez (2002), who decided he could not report the specific examples of discrimination shared by minority faculty members for fear that they would be recognized by their peers and face negative consequences. Losing the insights of a small number of respondents or even of one respondent may be particularly consequential for researchers seeking to impact clinical practice as the experiences of one or two patients can hold key insights for improving clinic care (Karnieli-Miller et al., 2009).

Despite emphasizing the importance of maintaining confidentiality (Grinyer, 2002), the literature on research design and the ethical codes of professional associations offer virtually no specific, practical guidance on disguising respondents’ identities and preventing deductive disclosure in qualitative research (Giordano et al., 2007; Wiles et al., 2008). For example, the United States Code of Federal Regulations (Department of Health and Human Services, 2005) section on the protection of human subjects simply states that researchers and review boards must ensure adequate provisions to protect respondent privacy and maintain confidentiality. The National Institutes of Health Guidelines for the Conduct of Research Involving Human Subjects (2004) mandates that privacy and confidentiality be “maximized.” The ASA Code is more specific about the practical handling of data and confidentiality. Per the ASA Code of Ethics (1999), confidentiality should be addressed with research participants at the beginning of the research relationship. The code also states that “Sociologists have an obligation to protect confidential information and must remove personal identifiers or employ other methods to mask individual identities. In cases where identifying information cannot be removed from the data, sociologists must obtain consent to release such data.” As such the code is calling for obtaining additional consent when data cannot be altered. However, researchers lack a standardized process for obtaining additional consent; therefore, it is unlikely to be obtained. Thus, it is not surprising that researchers perceive maintaining confidentiality as challenging and as an area of great uncertainty (Wiles et al., 2008).

The dominant approach has several weaknesses. First, the dominant approach is designed to ensure what Tolich (2004) calls “external confidentiality,” i.e., confidentiality to the outside world. But this approach does little to ensure that persons with whom respondents have relationships such as spouses, co-workers, or neighbors will be unable to identify respondents. Second, under the dominant approach, researchers carry the burden of deciding which data could identify a respondent and deciding how to alter the data. Third, the dominant approach lacks standardized practices for dialoguing with respondents about confidentiality after the data has been collected. Thus, the approach dissuades researchers from having ongoing discussions with respondents about the use of their data and assumes all respondents want complete confidentiality. Fourth, although assurances of confidentiality function to build trust with respondents, the promise of confidentiality might prevent the researcher from using the rich data received when respondents open up to the researcher. Finally, the dominant approach to confidentiality assumes that details in our data can be changed to protect our respondents without compromising the meaning of the data.

Next, I describe an alternative to the dominant approach. The alternative approach that I propose draws insights from anthropology and emphasizes a greater consideration of the audience for one’s research and a re-envisioned informed consent process. The goal of the alternative approach is to be able to share detailed, rich data while maintaining the essence of the data and respecting our respondents’ perspectives on how their data should be used.

The dominant approach to maintaining confidentiality, while commonly used, is not mandated nor is it the only way of handling data that might identify respondents. The alternative approach provides practical guidelines to reduce the uncertainty surrounding the use of detailed data that might lead to deductive disclosure. The alternative approach addresses the shortcomings of the dominant approach by 1) making respondents better informed of the use of data (i.e., who is the audience for the study results and how will the study results be disseminated), and 2) by instituting practical steps to facilitate dialogue with respondents about how their data can be used (i.e., revising the informed consent process). I discuss these two dimensions of the alternative approach below, followed by a discussion of internal review boards.

Determining Audience and Dissemination Plans

Every project has a number of potential audiences. For example, a sociologist conducting an ethnography on student-teacher interaction in a high school might share her findings with students, parents, staff, the school board, administrators, state or city policy makers, other academics, and the general public. Results can be shared via presentations, drafts read by colleagues, journal articles, radio commentaries, newspaper or magazine articles, and books.

Making assurances of confidentiality (or knowing that you cannot promise confidentiality) is easier when the intended use of the data is clear and specific. As I developed my dissertation topic, wrote my proposal, and gained approval from my university’s ethics review board, I hoped that the work would someday be disseminated as journal articles or as a book. I assumed that I would present my findings at professional conferences. Beyond these vague ideas I had not thought carefully about the outlets for my research. If I had thought clearly about audience, I would have been better equipped to have informed discussions about confidentiality with my respondents. In particular, I could have spoken with Rachel about sharing her views with the physicians and nurses affiliated with the Edgewater Center. As a patient participating in a study on cancer, Rachel may have assumed that I worked closely with her physicians; she might have seen us as one in the same (Karnieli-Miller et al., 2009). Thus, in health research, speaking with respondents specifically about how data will be shared with physicians is especially important. Moreover, because many respondents are driven by a desire to help others; dialoging with them about the use of their data can help them to grasp the outcomes of their participation (Beck, 2005; Carter et al., 2008; Dyregrov, 2004; Hynson et al., 2006).

Anticipating one’s audience presents challenges. While the formal process of gaining approval to conduct research prompts researchers to consider confidentiality, most often this process emphasizes data storage and cleaning over specific, thoughtful considerations of how data will be disseminated. Thus, the responsibility lies with the researcher to carefully consider future data use. Moreover, anticipating one’s audience challenges the inductive framework of qualitative research (Morse, 2008). The degree to which we can anticipate the use of qualitative data is debatable given the inductive and emergent nature of qualitative inquiry (James & Platzer, 1999; Parry & Mauthner, 2004). “The reflexive nature of qualitative research, its use of unexpected ideas that arise through data collection and its focus upon respondents’ meanings and interpretations renders the commitment to informing respondents of the exact path of the research unrealistic.” (Parry & Mauthner, 2004, p. 146; see also Merrel & Williams, 1994) Nonetheless, most outlets for research can be anticipated. If, however, the data lead us towards different forms of dissemination, it may be necessary to re-contact participants to request permission to use their data in these unanticipated ways. As an additional challenge, stating our specific plans for the data might influence what respondents say or how they behave (Crow et al., 2006; Morse, 2008). However, we can avoid biasing our respondents by discussing the specifics of audience and confidentiality after data collection.

Despite these challenges, serious considerations of audience can improve research. As Weiss (1994) notes, writing for more than one audience in any given piece is difficult. Identifying one’s primary audience can help focus analyses and writing. Weighing potential audiences also leads to a consideration of priorities. Who am I indebted to? What are my goals? Am I striving to further scientific knowledge? Do I hope to impact policy? Or is it my goal to alter clinical practice? Or engage in transformative research (Baez, 2002)? In the case of my research, I readily agreed to present findings to the doctors and nurses out of a feeling of indebtedness to them. As a new researcher, I was grateful to be granted access to patients. My quick concession to work with the doctors and nurses also revealed my priorities. I valued an exchange of knowledge with the hospital staff. But, I had not considered whether I wanted to impact the support services available to women with breast cancer. As a young researcher I had perhaps thought too little about my broader motivations.

Participants as Audience

It is important to remember that our respondents constitute one potential audience. Ellis (1986) assumed that her respondents would not have access to her research findings. In hindsight, Ellis (1995) acknowledged that the problems that followed the publication of Fisher Folk could have been prevented by approaching the respondents with the data she planned to publish, allowing them to know what would become of their data, and making them aware of how they would be portrayed in the final research. In sociology, sharing results with our respondents is not standard practice. In contrast, anthropologists commonly share their research findings with study participants and solicit their feedback. I first became aware of this disciplinary difference while participating in a National Science Foundation multi-discipline ethics workshop (National Science Foundation, 2005). Each member of the workshop wrote an ethics case. The sociologists in attendance drafted cases related to confidentiality concerns. The anthropologists, in contrast, did not seem as plagued by confidentiality concerns and wrote cases on a variety of ethical issues. As the group discussed each case, disciplinary differences in approaches to confidentiality emerged. The sociologists largely followed the dominant approach, assuming that respondents want confidentiality and taking responsibility to edit the data to ensure confidentiality. Their study participants were not given a role in resolving confidentiality dilemmas. The anthropologists, in contrast, viewed confidentiality as the choice of their research participants. They dialogued with research participants to determine if they wanted to remain anonymous or if they would like to be identified in the research. Their approach is reflected in their ethical guidelines (American Anthropological Association, 1998):

Anthropological researchers must determine in advance whether their hosts/providers of information wish to remain anonymous or receive recognition, and make every effort to comply with those wishes. Researchers must present to their research participants the possible impacts of the choices, and make clear that despite their best efforts, anonymity may be compromised or recognition fail to materialize.

Sharing our work with our study participants can be challenging. Our participants might be disinterested, especially in academic writing. Jay MacLeod attempted to share his book, Ain’t No Making It (1995), with the some of the young men who were the focus of his ethnography; however, the young men showed little interest in reading his book or discussing it with him. Our respondents might not like how we use their data or how we chose to portray them (Corden & Sainsbury, 2006; Ellis, 1995; Lawton, 2001). Additional challenges arise when respondents are part of a community, rather than individuals with no connection to each other (Ellis, 1995; Hopkins, 1993). As sociologists, we lack standard practices for using respondent input. Do we acknowledge their suggestions and insights in our text? In footnotes? Or not at all? Moreover, researchers who know that respondents will see their work might alter their descriptions of the people and social setting (Hopkins, 1994), which raises questions about the validity of research. However, sharing our conclusions with respondents can also enhance validity of our research by allowing respondents to comment on the accuracy of our data and interpretations (Maxwell, 1996).

Obtaining Informed Consent

Considering one’s audience is a first step in improving dialogue with respondents about data use and confidentiality. The second step involves modifying the informed consent process. Many of the weaknesses of the dominant approach to confidentiality can be avoided via a re-envisioned informed consent process. A re-envisioned informed consent process should include greater detail about the audience for one’s research, be ongoing, and present respondents with a wider range of confidentiality options.

First, although most consent forms provide general information about what will be done with the data, stating specifically who you plan to share results with allows respondents to make informed choices about the use of their information. Communication about audience can be verbal, as it largely is in anthropology, or part of an informed consent document. Second, discussions about data use and confidentiality should be ongoing. Conversations about consent and confidentiality rarely extend beyond what occurs at the outset of a study as part of internal review board requirements. In fact, respondents may pay little attention to the consent form at the start of data collection as they are anxious to begin (Wiles et al., 2006). Discussions of data use and confidentiality need not be limited to the start of the research relationship. For example, in her study of inpatient hospice patients, Lawton (2001) had to view consent as a process and as under continual reevaluation as the health status of her study participants changed. After data is collection, researchers can discuss data outlets with the respondent in light of the data that has been shared. In order to facilitate ongoing discussions, the researcher should first make the respondent aware of the fact that confidentiality might be further discussed at a later point in time. This can be done in the written consent form or verbally before, during or after data collection. Knowing that the respondent is aware of the possibility of follow-up discussions about confidentiality makes re-contacting respondents for this purpose less daunting. Researchers can also facilitate ongoing communication by securing contact information at the outset of the project. Of course, some respondents do not want additional contact; this can be discussed and documented as well.

Third, re-envisioning consent means viewing confidentiality in a more nuanced way and providing a wider range of confidentiality options. A more nuanced view of consent means moving away from the assumption that every respondent desires complete confidentiality and instead recognizing that a research participant might want to receive recognition for some or all of what he or she contributes. By assuming all study participants want complete confidentiality researchers risk becoming paternalistic and denying participants their voice and the freedom to choose how their data is handled (Giordano et al., 2007; Ryen, 2004). Existing literature suggests that respondents, even those typically considered to be “vulnerable,” may want to be identified in our reports. For example, in one study of parents of young adults with cancer, 75% of the parents chose to have their real names used rather than a pseudonym in the resulting article published from the data (Grinyer, 2004).

Appendix A contains an example of a document that could be used to give respondents a wider range of confidentiality options. This post-interview confidentiality form could be presented to respondents at the conclusion of data collection, therefore helping to extend confidentiality conversations beyond the signing of the consent form at the start of data collection. Although I have not used a post-interview confidentiality form in a real study, it is as an example of a tool that could be used to alleviate the uncertainty surrounding data use. The key feature of the document is that it is considers confidentiality in light of the actual data that has been collected. Furthermore, it gives respondents the option to be identified and it allows respondents to pinpoint which pieces of data they feel must be handled most carefully. Thus, the document gives respondents greater control of their data and is therefore consistent with feminist and other nonpositivist paradigms (Karnieli-Miller et al., 2009). The form also removes the burden of deciding how to identify and handle particularly sensitive data from the researcher. Notably, this represents a shift in power (Giordano et al., 2007) that may be uncomfortable for some researchers and for some respondents. As Giordano and colleagues (2007) note, when respondents opt to be identified in research, they need to be made aware that the final presentation of their views may not be entirely what they envisioned and once something is in print it cannot be changed. Furthermore, respondents should be advised that regardless of their preferences, their data may not appear in final reports.

Introducing a post-interview confidentiality form entails additional work for the researcher. However, as noted above, the payoff comes in having clearer input from the respondent on whether data can be published or shared with others. Using the form could cause respondents to designate much of their data as off-limits for publication. But, some research suggests the opposite may occur—respondents may express a desire to publish data that researchers would have deemed too sensitive for publication. Researchers tend to respond to displays of painful emotion by respondents as an indication that sharing their data would harm them; however, respondents may want this data published since sharing the data makes them feel empowered or feel that they are helping others in some way (Beck, 2005; Carter et al., 2008; Dyregrov, 2004; Hynson et al., 2006; James & Platzer, 1999; Wiles et al., 2006).

Addition discussions and paperwork surrounding consent may overwhelm respondents, tax their patience, or cause them to feel alienated from the researcher (Crow et al., 2006). Thus, researchers must carefully consider the best way to initiate these more elaborate data use conversations with their particular study population and research setting. Thus, the guide shown in Appendix A should not be seen as a rigid, one size fits all procedure, but should be manipulated as needed to address real ethical dilemmas (Goodwin et al., 2003). Because the alternative approach gives the respondent a more active role, it may work best in longer research relationships where respondents feel comfortable with the researcher (Carter et al., 2008). Using a post-interview confidentiality form also increases the time needed to conduct a study. However, respondents may enjoy chatting and reflecting on the interview. Again, different study populations will be more or less suited for a post-interview confidentiality discussion. However, with all populations, advising study participants up front about the time needed for the study and the format of the study will make such discussions easier.

Presenting the Alternative Approach to Ethics Review Boards

There are several steps researchers can take to facilitate review board acceptance of the alternative approach to confidentiality. First, researchers should present the alternative approach in conjunction with a clear plan for documenting respondent views on data use and for coding and cleaning data to match those views. Second, researchers should emphasize that the alternative approach still assures complete confidentiality as the standard given to every respondent. The alternative approach is consistent with the Belmont Report’s emphasis on beneficence and respect for persons and represents ethical research practices. Third, researchers can stress that using an additional confidentiality form, such as the example shown in Appendix A, increases ethical compliance by giving respondents greater voice, in essence treating them as autonomous agents in accordance with the Belmont report’s guidelines requiring respect for persons. Fourth, researchers can emphasize that using a post-interview confidentiality form for studies of sensitive topics ensures that respondents have an opportunity to express their views of data use and are comfortable with the confidentiality agreement (Carter et al., 2008). In particular, the third option on the form, which allows respondents to specify particular pieces of their data that should remain confidential, provides an opportunity for researchers to discuss sensitive areas with respondents.