Important Aspects to Managing Data
Data Security (Security/Backup [Purdue University Libraries] 2015)
Physical security and computer security of data must be considered in good data management. While it is encouraged to make scientific data available to the public, sometimes confidential or sensitive information must be kept secure. Keep lab notebooks safe and secure as well!
i. Network Security
- Keep confidential data off the Internet
- Put sensitive materials on computers not connected to the Internet
ii. Physical Security
- Restrict access to buildings and rooms where computers or media are kept
- Only let trusted individuals troubleshoot computer problems
iii. Computer Systems & Files
- Keep virus protection up to date
- Don’t send confidential data via e-mail or FTP - use encryption, if you must send data
- Use passwords on files and computers
Data Backup (Security/Backup [Purdue University Libraries] 2015)
Making backups of collected data is critically important in data management and the data lifecycle. Backups protect against human errors, hardware failure, virus attacks, power failure, and natural disasters. Backups can help save time and money if these failures occur. Don't keep backups in the same place!
Use the 3-2-1 rule:
- 3 copies of your data - 2 copies are not enough
- 2 different formats - i.e. hard drive + tape backup or DVD (short term) + flash drive
- 1 off-site backup - have 2 physical backups and one in the cloud
- Hard drives - personal or work computer
- Departmental or institution server
- External hard drives
- Tape backups
- Disciplinary archives (repositories)
- Cloud storage
i. Short Term Storage—NevadaBox
For short term data storage requirements the data can be stored in the cloud at NevadaBox. NevadaBox (box.unr.edu) is a file storage and sharing system for files hosted by Box.com. It is similar to DropBox, OneDrive, Google Drive, and ownCloud. NevadaBox comes with the following benefits:
- No sign-up required. If you have a NetID, you have an account
- Secure sign-in with your NetID
- All current University employees have access
- Free apps that sync files between mobile devices, computers, and Box
- Ability to share and collaborate with others within and outside the University
- Ability to store sensitive data
- 50 GB storage limit for all individuals
More information can be found at unr.edu/it/data-web-and-cloud-systems/box
ii. Existing Free repositories
Open Access Directory for Data Repositories
Depending on research area, data can often be deposited in one or more open repositories and will provide access to your data. The following link has a list of repositories and databases available and are sorted by research areas.
NSF Public Access Repository
The NSF Public Access Repository (NSF-PAR) is the designated repository where NSF-funded investigators deposit peer-reviewed, published journal articles and juried conference papers. NSF-PAR also provides search mechanisms to enable you to find and use these articles and papers. Download FAQ’s on NSF-PAR (PDF).
iii. University ScholarWorks Repository
The Scholarworks repository is an open access and public repository available to the University faculty and serves as a place to archive research datasets and publications for long term.
University researchers who are interested in depositing the data into this public repository should contact repository administrator (Rohit Patil) via email email@example.com. After understanding the data, an intellectual property librarian will determine if you have rights to deposit your research material, or if we need to ask the rightsholder for permission. For each item submitted to the repository the rightsholder must agree to the non-exclusive ScholarWorks repository license.
University ScholarWorks Repository License (IUScholarWorks Repository License [Indiana University Libraries] 2015)
By signing and submitting this license, you (the creator or copyright owner) grant to University of Nevada, Reno a non-exclusive, perpetual, irrevocable right to reproduce, translate (as defined below), and/or distribute your submission (including the abstract) worldwide in print and electronic format and in any medium, including but not limited to audio or video.
You agree that University of Nevada, Reno may, without changing the content, translate the submission to any medium or format, now known or later developed, for preservation or access, and provide basic metadata that describes the contents for discovery.
You also agree that University of Nevada, Reno may keep more than one copy of this submission for security, back-up and preservation.
You represent that the submission is your original work, and that you have the right to grant the rights contained in this license. You also represent that your submission does not, to the best of your knowledge, infringe upon anyone's copyright.
If the submission contains material for which you do not hold copyright, you represent that you have obtained the unrestricted permission of the copyright owner to grant University of Nevada, Reno the rights required by this license, and that such third-party owned material is clearly identified and acknowledged within the text or content of the submission.
If the submission is based upon work that has been sponsored or supported by an agency or organization other than University of Nevada, Reno, you represent that you have fulfilled any right of review or other obligations required by such contract or agreement.
University of Nevada, Reno will clearly identify your name as the creator and/or copyright owner of the submission, and will not make any alterations, other than as allowed by this license, to your submission. We agree to not make available any files that are embargoed until the embargo has expired.
If you are submitting this item on behalf of the rightsholder, you must have the rights owner's written permission to accept this license on his/her behalf.
i. Data Sharing Essentials (Strategies for Data Sharing [Cornell University] 2015)
Sharing data makes it possible to conduct synthetic and comparative studies, to validate research results, and to reuse data for teaching and further research. Funders seek to maximize the impact of the research they fund by encouraging or requiring data sharing.
See also Best practices: Sharing data.
A data sharing plan should:
- Describe how the data will be made available (via a disciplinary data center or repository, an institutional repository, as supplementary material supporting a publication, or other strategy).
- Include a description of file formats to be used for the data that will be shared. Select file formats for sharing and archiving that maximize the potential for reuse and longevity, and describe the plans for conversion to those formats, if necessary.
- Include a plan for creating metadata to describe the data. Indicate who will create metadata and when they will do so. Identify the standards that will be used. If no applicable standards exist, indicate this in the data management plan and describe what supplementary documentation you will make available to make publicly shared data understandable and usable by others.
- Describe how users will discover the data (via a specific repository, references in publications, project website, Internet search engines, or other means).
- Describe how users will obtain the data (direct download, registration and download, upon request).
- If acquiring data from another source, describe whether the data or derived versions of the data will be shared, and under what conditions.
- If data will not be made immediately available, indicate when data will be shared.
- Indicate who will have primary responsibility for the data and who owns the data (for sponsored research at Cornell, the university is usually considered the owner. See Introduction to intellectual property rights in data management for more information.
- Describe how your data sharing strategy will maximize the value of the data to the audiences of interest (a particular research community, the general public, etc.).
ii. Sharing with collaborators while working on the project
University researchers interested in sharing on-going project data with other researchers or collaborators not on campus can deposit the data on NevadaBox and follow the procedures described at Research Data Sharing to provide secure access to their non-public data.
iii. Public sharing after completion of project
University researchers that want to archive their published research materials and have them publicly available can deposit the data into the University repository. To get started, visit the University of Nevada, Reno ScholarWorks institutional repository.
File Naming Conventions (File Naming Conventions [Carnegie Mellon University Libraries] 2015)
Data set titles should be as descriptive as possible. These data sets may be accessed many years in the future by people who will be unaware of the details of the project. Data set titles should contain the type of data and other information such as the date range, the location, and the instrument used.
Examples of bad titles are:
- The Aerostar 100 Data Set
- Respiration Data
Some great titles are:
- SAFARI 2000 Upper Air Meteorological Profiles, Skukuza, Dry Seasons 1999-2000
- NACP Integrated Wildland and Cropland 30-m Fuel Characteristics Map, U.S.A., 2010
- Global Fire Emissions Database, Version 2 (GFFDv2.2)
In order for others to use your data, they must fully understand the contents of the data set, including the parameter names, units of measure, formats, and definitions of coded values. Parameters, units, and other coded values may be required to follow certain naming standards as defined in experiment plans and the destination archive.
File names should reflect the contents of the file and include enough information to uniquely identify the data file. File names may contain information such as
- project acronym
- study title
- year(s) of study
- data type
- version number
- file type
Select a consistent format that can be read well into the future and is independent of changes in applications. If your data collection process used proprietary file formats, convert those files into a stable, well-documented, and non-proprietary format to maximize others' abilities to use and build upon your data.
Examples of bad file names:
Good file name examples:
From data set NACP New England and Sierra National Forests Biophysical Measurements: 2008-2010
Sevilleta_LTER is the project name
NM is the state abbreviation
2001 is the calendar year
NPP represents Net Primary Productivity data
csv stands for the file type—ASCII comma separated variable
Avoid really long file titles-aim for no more than 64 characters and instead of "data May2011" use "data_May2011" or "data-May2011"
Metadata (Metadata [Carnegie Mellon University Libraries] 2015)
Data that provides descriptive information (content, context, quality, structure, and accessibility) about a data product and enables others to search for and use the data product. In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata is a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
The Digital Curation Center provides a catalog of common metadata standards, organized by discipline.
Some specific examples of metadata standards, both general and domain specific are:
- Dublin Core
Domain agnostic, basic and widely used metadata standard
- DDI (Data Documentation Initiative)
Common standard for social, behavioral and economic sciences, including survey data
- EML (Ecological Metadata Language)
Specific for ecology disciplines
- ISO 19115
For describing geospatial information
- FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata)
For describing geospatial information
- MINSEQE (MINimal information about high throughput SEQeuencing Experiments)
- FITS (Flexible Image Transport System)
Astronomy digital file standard that includes structured, embedded metadata
Make sure all data generated and/or collected is easy to understand and analyze. If someone were to look at this in 20 years, they should be able to understand what and why it was done. Stable non-proprietary software should be used. Documentation may include, but is not limited to:
- lab notebooks
- methodology reports
- codebooks with full variable and value labels
- documenting decisions about software
- tracking changes to different versions of the dataset
- recording assumptions made during analysis
Research Data Policies
The NevadaBox data storage policies can be found at NevadaBox Data Policies.
ii. University ScholarWorks Repository
- Storage Policy
Each project will be allotted a starting storage space of 5GB to store datasets, reports, publications and other digital content. If additional storage is needed it can be purchased for $5 for each additional GB.
- Data retention
Data will be archived and available in the University ScholarWorks repository for public access for a period of 5 years. If the funding agency requires the datasets to be available longer then additional storage time can be purchased at $1 per GB for each additional year.