Data management planning
Level: awareness
Who should read this?
This is likely to be of particular interest to researchers and research administrators who are charged with preparing a data management plan for a research project or an institution.
What is data management planning?
Data management includes all activities associated with data other than the direct use of the data. It may include:
- data organisation;
- backups;
- archiving data for long-term preservation;
- data sharing or publishing;
- ensuring security of confidential data; and
- data synchronisation.
A data management plan is a document that describes what data will be created, what policies will apply to the data, who will own and have access to the data, what data management practices will be used, what facilities and equipment will be required, and who will be responsible for each of these activities.
Why do I need a data management plan?
The carrot: Data management in some form is an unavoidable consequence of working with data. As it is an activity that is not rewarded in any way, it makes sense to do the job in as little time and with as little effort and cost as possible. Typically data management is done at the last minute and using the first method that comes to mind. This approach is usually time-consuming and error-prone. Taking time at the start of a research project to put in place robust, easy-to-use data management procedures will usually pay off several times over in the later stages of the project. To sum up: improvements to efficiency, protection, quality & exposure.
The stick: Basic data management is required by the Australian Code for the Responsible Conduct of Research. Compliance with the Code is already a requirement for ARC and NHMRC funding and is likely to be mandated by other funding bodies, government and institutions in the near future.
Inadequate data management can also lead to catastrophes like the loss of data or the violation of people's privacy.
What does a data management plan need to cover?
The following list of topics can be treated as a check-list:
- Survey of existing data: What existing data will need to be managed?
- Data to be created: What data will your project create?
- Data owners & stakeholders: Who will own the data created, and who would be interested in it?
- File formats: What file formats will you use for your data?
- Metadata: What metadata will you keep? What format or standard will you follow?
- Access & security: Who will have access to your data? If the data is sensitive, how will you protect it from unauthorised access?
- Data organisation: How will you name your data files? How will you organise your data into folders? How will you manage transfers and synchronisation of data between different machines? How will you manage collaborative writing with your colleagues? How will you keep track of the different versions of your data files and documents?
- Storage: Where will your data be stored? Who will pay for the hardware? Who will manage it?
- Backups: This is probably the single most important item on this list. Hard drives on desktop and laptop computers fail regularly. You must have a credible backup strategy of regular backups, and of course you must then follow it. Consider including an off-site backup so that your data will not be lost if your building burns down. Rather than relying on memory, consider an automated backup process.
- Bibliography management: What bibliography management tools will you use? How will you share references with the other members of your group?
- Data sharing, publishing and archiving; What data will you share with others? How will you do this?
- Destruction: What data will you destroy? When? How?
- Responsibilities: Who will be responsible for each of the items in this plan?
- Budget: What will this plan cost? Possible costs include hardware for backups, research assistant time for data curation, metadata creation, archiving etc.
- Anything else: Don't restrict yourself to the items above. Stop and think. What is missing from this list? (And if you think of something, please let us know so that we can include it in the next version of this document.)
Other issues to consider
Funding bodies and governments are moving rapidly to require sound data management. You have a responsibility to make yourself aware of any relevant codes and to comply with them. Failure to comply with requirements from funding bodies like the ARC or NHMRC may jeopardise future research funding. Failure to comply with legal requirements, such as those that safeguard the privacy of participants in medical research, may lead to prosecution.
Different disciplines have different conventions. In order to facilitate cooperation, you should make sure that your data management is compatible with the prevailing standards in your discipline. (This mostly applies to file formats and metadata standards.)
Further information
ANU Data Management Manual, http://ilp.anu.edu.au/dm/ANU_DM_Manual_v10.09.17-63_2010-09-17.pdf
ANU Information literacy program, Data management training page, with links to resources including a data management plan template. http://ilp.anu.edu.au/dm/
Monash University Research Data Management web site. http://www.researchdata.monash.edu.au/
ARC Funding Agreement for Discovery Projects, http://www.arc.gov.au/ncgp/dp/dp_fundingagreement.htm. (In the agreement for 2010 projects, data management requirements are in Section 20, on page 18.)
Australian Code for the Responsible Conduct of Research, http://www.nhmrc.gov.au/publications/synopses/r39syn.htm
Thanks to Mark Euston who wrote the ANU Data Management Manual and Template cited above, on which this document is based.
V2




