Preservation of data is the responsibility of every scientist. However, hard drives crash, files are misplaced, specific details required to use data are forgotten, and even the most conscientious scientists may move, retire, or die. Fortunately technological advances and the advent of publicly available archives have made long-term data preservation easier and more reliable.
The many benefits of data archiving have been detailed repeatedly (see Whitlock 2011 for a comprehensive essay and list of citations that highlight the need for, and utility of, ubiquitous data archiving policies). Data archiving allows for more responsible, reproducible, and transparent science. It allows for the fuller use of data, including follow-up research, meta-analyses, research using analyses not available or considered when data were collected, development of new methods, and use in teaching. It increases the opportunities for acknowledging the contribution of scientists who collected the data, as the data themselves become citable entities. It encourages implementation of improved quality control and data assurance standards, ultimately improving the quality of the resulting findings. Archived data, saved for posterity in usable form and made available to scientists beyond the original collectors, can continue to contribute to our understanding and scientific development for decades—consider the value of baseline data for detecting long-term trends. Finally, papers with archived data have more impact, evidenced by the fact that they are more often cited by other scientists (Piwowar et al. 2007).
Perhaps the best example of a public data archive that is already of great utility to the scientific community is GenBank (http://www.ncbi.nlm.nih.gov/genbank), a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms, obtained primarily through submissions from individual laboratories and large-scale sequencing projects. The near-universal use of GenBank for DNA sequence data is in large part due to the communal decision by journals to archive all DNA sequence data. Public archives like GenBank are available for a variety of specialized data, such as phylogenetic trees (TreeBASE; http://www.treebase.org), microarrays (GEO; http://www.ncbi.nlm.nih.gov/geo), and vegetation plots (Veg-Bank; http://www.vegbank.org), to name a few. Recently, new data repositories have been developed that provide a more flexible framework for eclectic datasets, such as the National Science Foundation-sponsored Dryad archive (http://www.datadryad.org) and the Knowledge Network for Biocomplexity (http://knb.ecoinformatics.org). The communal adoption of data archiving policies by journals, this time for all data types, not just sequence data, will hopefully lead to near-universal implementation of comprehensive data archiving in the peer-reviewed literature.
As foreshadowed in a previous editorial (Wenburg 2010), the Journal of Fish and Wildlife Management has officially adopted a data archiving policy, similar to many other top ecology and evolutionary biology journals and funding agencies (e.g., American Naturalist; Evolution; Molecular Ecology; Journal of Evolutionary Biology; Heredity; Molecular Biology and Evolution; Ecological Society of America; National Science Foundation; National Institutes of Health; Natural Environment Research Council; Biotechnology and Biological Sciences Research Council; see Whitlock 2011 for details and citations). In keeping with the suggested language of the Joint Data Archiving Policy (Whitlock et al. 2010), the data archiving policy for the Journal of Fish and Wildlife Management is effective immediately and states the following:
The Journal of Fish and Wildlife Management requires, as a condition for publication, that data supporting the results in papers published be provided either directly in the paper, in the associated supplemental materials (electronic files that provide information associated with a paper; Internet links to these files are given in the published paper), or archived in an appropriate public archive. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Exceptions, especially for sensitive information such as human subject data or the location of endangered species, and short-term embargoes, may be granted at the discretion of the Editor-In-Chief.
All data required to recreate results in the paper (not necessarily all the data from the overall project, although authors are encouraged to be as comprehensive as possible) should be provided. Data should be in the form of individual data points used in the study, not just summaries such as means. As described by Whitlock (2011, p. 63), “The data should be archived at a level ready for a statistical analysis program, and should be given at the individual level. For example, a behavior trial in a Y-maze might have been videotaped; the archived data should record the choice made by the fish and (if relevant) the time taken, but not necessarily the movie file itself.” As another example, genetic studies using microsatellites should provide individual genotypes, but the electronic images used to determine the scores (i.e., electropherograms) need not be provided. In determining exactly which data should be archived, authors should take care to provide enough information so someone else, unknown to them and with no other knowledge of their research, can reproduce the results in 10 or 20 y, perhaps more. To accomplish this, it is important to provide necessary methodological details not provided in the paper, including definitions for all terms, variables, row and column headings, and precise locations. A short readme file that accompanies the data is often the best way to convey this information. Nonproprietary file formats are preferred, such as text files and comma-delimited text, as opposed to Microsoft® Word and Excel files, respectively. There are other recommended standard formats for metadata that can further aid interpretation of the work (e.g., http://knb.ecoinformatics.org/softward/eml/, http://repositories.lib.utexas.edu/recommended_file_formats). The authors of all papers published in this issue of the Journal of Fish and Wildlife Management have complied with the data archiving policy.
Authors are required to agree to this policy during the online submission process. Authors should provide details in a cover letter on how they plan to submit their data for review and how they plan to have them appear in the final publication. We allow authors to choose whether they want to provide the necessary data (not already included directly in the paper) as supplemental materials or in public archives, or both. However, we encourage the use of public archives because they provide more search features, are more easily cited, and provide data in more usable formats than supplemental materials. For those public archives that have the technology to make the data available during the review process, authors should provide details on how reviewers are to access those data, including all website addresses and security codes. Some archives do not yet allow data to be submitted before acceptance of the paper. If authors intend to use such archives, they should provide the data as supplemental materials for review only upon submission and detail their intentions for eventual archiving of the data if the paper is accepted. Any exceptions to the data archiving policy must be granted by the Editor-In-Chief before submission and documented in a cover letter.
To simplify the data archiving process for authors choosing to use the Dryad database (http://www.datadryad.org), we have integrated our manuscript submission process with their system. Dryad offers a wide array of formats and is ideal for a variety of idiosyncratic datasets. However, authors are free to choose where to archive their data. For authors choosing to submit some or all of their data to Dryad, the acceptance letter for their paper will include website links to the proper location in Dryad, where much of the author and paper information required will be automatically transferred for them, streamlining the process.
The processes and repositories for data archiving are not perfect and are evolving quickly. We will continue to adapt to the available technology, as well as the needs of the scientific community and our authors. Our goal is to take advantage of technology to maintain an efficient, common-sense approach to maximize the usability of data, without being too onerous or proscriptive. As scientists and authors, you know your data and what it takes to reproduce your results better than anyone else—your acceptance of, and good-faith participation in, this policy are critical to making it a success. We urge authors to embrace this effort and work with us for the greater good of the scientific conservation community.