ANDS Logo banner
 

De-identifying Your Data

This page provides some basic information and links to resources for researchers and data managers on the topic of de-identification of data.

What is data de-identification?

Data de-identification, anonymisation and pseudonymisation are processes for removing identifying information from datasets, most commonly to protect the privacy of individuals. Data de-identification may also be used to protect organisations, such as businesses included in statistical surveys, or other information such as the spatial location of mineral or archaeological finds or endangered species. Data de-identification may be mandated by legislation or ethical guidelines governing research.

The National Statement on Ethical Conduct in Human Research (2009) published by the National Health and Medical Research Council (NH&MRC) does not advocate use of the term de-identified data, but suggests the term ‘non-identified' in preference.

This National Statement avoids the term ‘de-identified data', as its meaning it unclear. While it is sometimes used to refer to a record that cannot be linked to an individual (‘non-identifiable'), it is also used to refer to a record in which identifying information has been removed but the means still exist to re-identify the individual. When the term ‘de-identified data' is used, researchers and those reviewing research need to establish precisely which of these possible meanings is intended. (http://www.nhmrc.gov.au/publications/ethics/2007_humans/section3.2.htm#c)

Techniques

Identifying information such as identifiers, names, addresses, gender, date of birth or other identifying information can be removed from datasets entirely, or coded or encrypted. Information can also be masked by changing data values or by aggregation.

For a brief overview of techniques, see the UK Data Archive resource (but note that different legislation applies in the UK).

The British Medical Journal suggests a minimum standard of data de-identification for authors who are sharing raw, unprocessed data. This is designed to ensure patient privacy when sharing clinical research data. See
Iain Hrynaszkiewicz, Melissa L Norton, Andrew J Vickers, Douglas G Altman, ‘Preparing raw clinical data for publication: guidance for journal editors, authors, and peer reviewers', British Medical Journal, 28 January 2010. doi10.1136/bmj.c181.

Implications for re-use

The purpose of de-identifying data is to allow it to be used by others without the possibility of individuals being identified.  The loss of individual identities, however, means that it will not be possible to incorporate the data into other datasets which may include information about the same individuals.  For an overview of the potential for sharing data without linking it, see the Australian Bureau of Statistics'  A Good Practice Guide to Sharing your Data with Others.

When to de-identify data

The need for data de-identification arises when data is published, shared or re-used. Researchers need to consider legislation, policies and ethical guidelines that apply to them, as well as any undertakings made or informed consent obtained from funders or research participants.

If data is only being stored in its original form by the researcher who created it, and is not being shared or published, ethics and privacy requirements are usually met through access control and data security, rather than through data de-identification. Identifiers are usually needed for analysis of research data by the original researcher.

Avoiding re-identification

When de-identifying data it is important to keep in mind the possibility of re-identification.  This usually occurs with large data sets which can be subject to data mining or other analytical techniques.  For a lay guide to some of these issues, see "Anonymized" data really isn't-and here's why not.

Legislation

De-identification is also impacted by legal requirements. In Australia, in addition to the Commonwealth legislation (the Privacy Act 1988), each state and territory (except the ACT) has its own privacy legislation. The Office of the Privacy Commissioner's website offers links to all this legislation, and to other material - http://www.privacy.gov.au

National guidelines

Other resources

Examples of guidelines, or discussion of issues around de-identification (this is not a comprehensive list):

Feedback

We are keen to get feedback on this page. If you have any comments, constructive criticism, suggestions for improvements or additions, please send them to guides@ands.org.au.

 
Australian National Data Service, Monash University, Victoria 3800, Australia: Telephone +61 3 9902 0585; Facsimile +61 3 9902 0599
Legal Caution | Accessibility Tips