The types of research data receiving scholarly credit within and across the science, engineering, and mathematics fields

Main Article Content

Hyoungjoo Park


This study examined the types of data that receive formal scholarly credit within and across the science, engineering, and mathematics (SEM) fields. The topics of whether data types are used in a way that encourages data reuse has not been actively studied. This study applied an exploratory method because formal data citation is a relatively new area. The Data Citation Index (DCI) of the Web of Science (WoS) was selected because the DCI provides a single access point to 400 data repositories worldwide across multiple disciplines. Nearly all citations were of quantitative data. The types that received the most credit were, in descending order, ribonucleic acid (RNA), crystal structure, protein sequence data, crystallographic data, Sequence Read Archive (SRA), genomic, images, nucleotide sequencing information, molecular structure, and crystallographic information, though citation was diverse across the various disciplines within these fields. In particular, qualitative data received no scholarly credit. This study contributes to better understanding of data types for data reuse.


Download data is not yet available.

Article Details

How to Cite
Park, H. (2023). The types of research data receiving scholarly credit within and across the science, engineering, and mathematics fields. Malaysian Journal of Library and Information Science, 28(1), 1–14.


Belter, W. C. 2014. Measuring the value of research data: A citation analysis of oceanographic data sets. PLoS ONE, Vol.9, no.3: e92590.

Blumenthal, D., Campbell, E. G., Gokhale, M., Yucel, R., Clarridge, B., Hilgartner, S. and Holzman, N.A. 2006. Data withholdings in genetics and the other life sciences: Prevalences and predictors. Academic Medicine, Vol.81, no. 2: 137-145.

Boyak, K. W., Klavans, R., and Börner, K. 2005. Mapping the backbone of science, Scientometrics, Vol.64, no.3: 351-374.

Brandt, D. S. and Uden, L. 2003. Insight into mental models of novice internet searchers. Communications of the ACM, Vol.46, no.7: 133-136.

Brueman, P. 2006. How to cite curated databases and how to make them citable. Paper presented at the 18th Scientific Database Management Conference, 2006, at Los Alamitos, U.S.A.

Byrd, J. B., Greene, A. C., Prasad, D. V., Jiang, X., and Greene, C.S. 2020. Responsible, practical genomic data sharing that accelerates research. Nature Reviews Genetics, Vol.21: 615–629.

Carlson, S. and Anderson, B. 2007. What are data? The many kinds of data and their implications for data re-use. Journal of Computer-Mediated Communication, Vol.12, no.2: 635-651.

Chen, X. and Wu, M. 2017. Survey on the needs for chemistry research data management and sharing. The Journal of Academic Librarianship, Vol.43, no.4: 346-353.

Clarivate Analytics. 2012. Research areas. Available at:

Clarivate Analytics. 2016. Data Citation Index - Research area. Available at:

Clarivate Analytics. 2022. Data Citation Index: connecting data to the research it informs. Available at:

Dale, A., Wathan, J. and Higgins, V. 2004. Secondary analysis of quantitative data. The SAGE encyclopedia of social science research methods (Thousand Oaks, CA: Sage), 1007-1008.

Data Citation Synthesis Group. 2014. Joint declaration of data citation principles. San Diego: FORCE11.

DataCite and the International Association of Scientific, Technical and Medical Publishers. 2012. STM-DataCite Joint Statement. Available at:

Editorial. 2018. Data sharing and the future of science. Nature Communications, Vol.9, no.1: 2817.

Editorial. 2019. Data citation needed. Scientific Data, Vol. 6, no.27.

Faniel, I. M. and Jacobsen, T.E. 2010. Reusing scientific data: How earthquake engineering researchers assess the reusability of colleagues’ data. Journal of Computer Supported Cooperative Work, Vol.19, no.3-4: 355-375.

Gichamo, T. Z., Sazib, N. S., Tarboton, D. G. and Dash, P. 2020. HydroDS: Data services in support of physically based, distributed hydrological models. Environmental Modeling and Software, Vol.125: 104623.

He, L. and Nahar, V. 2016. Reuse of scientific data in academic publications: An investigation of Dryad digital repository. Aslib Journal of Information Management, Vol.68, no.4: 478-494.

Howison, J. and Bullard, J. 2016. Software in the scientific literature: Problems with seeing, finding, and using software mentioned in the biology literature. Journal of the Association for Information Science and Technology, Vol.67, no.9: 2137-2155.

International Standard Organization for Standardization. 2013. ISO/IEC 11179-3:2013(E) Information technology - metadata registries (MDR)-Part 3: Registry metamodel and basic attributes. Available at:

Ivezić, Z. 2012. Data sharing in astronomy. In: K. B. Mathae and Uhlir P. F. (eds). Committee on the case of international sharing of scientific data: A focus on developing countries (pp. 41-45). Washington, D.C.: National Academic Press.

Izzo, M., Mortola, F., Arnulfo, G., Fato, M. M. and Varesio, L. 2014. A digital repository with an extensible data model for biobanking and genomic analysis management. BMC Genomics, Vol.15, no.Suppl 3: S3.

Keynon J., Sprague, N. and Flathers, E. 2016. The journal article as a means to share data: a content analysis of supplementary materials from two disciplines. Journal of Librarianship and Scholarly Communication, Vol.4: Ep2112.

Kim Y. and Burns, C.S. 2016. Norms of data sharing in biological sciences: The roles of metadata, data repository, and journal and funding requirements. Journal of Information Science, Vol.42: 230-245.

Kim, N., Yoon, J. and Chung, E. 2020. What data characteristics are needed for data reuse in the domain of social sciences in Korea? Paper presented at the iConference, March 2020 at Borås, Sweden.

Kim, Y. 2022. Data sharing by biologists: A comparative study of genome sequence data and lab experiment data. Library & Information Science Research, Vol.44, no.1: 101139.

Kolesnikov, N., Hastings, E., Keays, M., Melnichuk, O., Tang, Y. A., Williams, E., Dylag, M., Kurbatova, N., Brandizi, M., Burdett, T., Megy, K., Pilicheva, E., Rustici, G., Tikhonov, A., Parkinson, H., Petryszak, R., Sarkans, U. and Brazma, A. 2015. ArrayExpress update-simplifying data submissions. Nucleic Acids Research, Vol.43: D1113-D1116.

Lannom, L., Broeder, D. and Manepalli, G. 2015. RDA data type registries working group output. Available at:

Lannom, L. and Broeder, D. 2014. Data type registries: A research data alliance working group. D-Lib Magazine, Vol.20, no.1/2.

Lewandowsky S. and Bishop, D. 2016. Research integrity: Don't let transparency damage science. Nature, Vol. 529: 459-461.

Ma, X., Erickson, J. S., Zednik, W., West, P and Fox, P. 2016. Semantic specification of data types for a world of open data. ISPRS International Journal of Geo-Information, Vol.53, no.3: 38.

Moore, N. 2006. The contexts of context: Broadening perspectives in the (re)use of qualitative data. Methodological Innovations, Vol.1, no.2: 21-32.

National Center for Biotechnology Information. [n.d.]. SRA: Now available on the cloud. Available at:

National Institutes of Health. 2020. Final NIH policy for data management and sharing. Available at:

National Science Foundation. 2010. Instructions and codes for completing project data form (Form 1295). Available at:

Niu, J. 2009. Overcoming inadequate documentation. Paper presented at the 72nd American Society for Information Science and Technology Conference, November 2009, at Vancouver, Canada.

Park, H. 2022. The interdisciplinary of research data: How widely is shared research data reused in the STEM fields? Journal of Academic Librarianship, Vol. 48: 102535.

Park, H. and Wolfram, D. 2017. An examination of research data sharing and re-use: Implications for data citation practice. Scientometrics, Vol.111, no.1: 443-461.

Peters, P. C. D. 2010. Accessible ecology: Synthesis of the long, deep, and broad. Trends in Ecology & Evolution, Vol.25, no.10: 592-601.

Peters, I., Kraker, P., Lex, E., Gumpenberger, C. and Gorraiz, J. 2016. Research data explored: An extended analysis of citations and altmetrics. Scientometrics, Vol.107, no.2: 723-724.

Piwowar, A. H. 2010. Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data. Pittsburgh: University of Pittsburgh.

Piwowar, H. A., Day, R. and Fridsma, D. 2007. Sharing detailed research data is associated with increased citation rate. PLoS ONE, Vol. 2, no.3: e308.

Read, K. B., Sheehan, J. R., Huerta, M. F., Knecht, L. S., Mork, J. G., Humphreys, B. L. and NIH Big Data Annotator Group. 2015. Sizing the problem of improving discovery and access to NIH-funded data: A preliminary study. PLoS ONE, Vol.10, no.7: e0132735.

Rocher, L., Hendrickx, J. M. and de Montjoye, Y.-A. 2019. Estimating the success of re-identifications in incomplete datasets using generative models. Nature Communications, Vol.10, no.1: 1-9.

Rung, J. and Brazma, A. 2013. Reuse of public genome-wide gene expression data. Nature Reviews Genetics, Vol.14, no.2: 89-99.

Soranno, P. A., Bissell, E. G., Cheruvelil, K. S., Christel, S. T., Collins, S. M., Fergus, C. M. and Webster, K.E. 2015. Building a multi-scaled geospatial temporal ecology database from disparate data sources: Fostering open science and data reuse. GigaScience, Vol.4, no.28.

Stephens, D. Z., Lee, S. Y., Faghri, F., Campbell, R. H., Zhai, C., Efron, M. J. and Robinson, G.E. 2015. Big data: Astronomical or genomical? PLoS Biology, Vol.13, no.7.

Taichman, D.B., Backus, J., Baethge C, Bauchner, H., Leeuw, P.W., Drazen J.M., Baethge, C. and Wu, S. 2016. Sharing clinical trial data. BMJ. 2016. Vol. 532: i255.

Tedersoo, L., Küngas, R., Oras, E., Köster, K., Eenmaa, H., Leijen, Ä., Pedaste, M., Raju, M., Astapova, A., Lukner, H., Kogermann, K. and Sepp, T. 2021. Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, Vol.8, no.192. Available at:

Tenopir, C., Allard, S., Douglass, K., Aydinoglu, A. U., Wu, L., Read, E. and Frame, M. 2011. Data sharing by scientists: Practices and perceptions. PLoS ONE, Vol.6, no.6: e21101.

Vogeli, C., Yucel, R., Bendavid, E., Jones, L. M., Anderson, M. S., Louis, K. S. and Campbell, E.G. 2006. Data withholding and the next generation of scientists: Results of a national survey. Academic Medicine, Vol.81, no. 2: 128–136.

Wallis, J. C., Rolando, E. and Borgman, C.L. 2013. If we share data, will anyone use them? Data sharing and reuse in the long tail of science and technology. PLoS ONE, Vol 8, no.7: e67332.

Williams, E., Moore, J., Li, S. W., Rustici, G., Tarkowska, A., Chessel, A. and Swedlow, J.R. 2017. Image data resource: A bioimage data integration and publication platform. Nature Methods, Vol.14, no.8: 775-781.

Womack, P. R. 2015. Research data in core journals in biology, chemistry, mathematics, and physics. PLoS ONE, Vol.10, no.12: e0143460.

Wren, J. D., Kozak, K. Z., Johnso, K., Deakynes, S. J., Schilling, L. M. and Dellavalle, R.P. 2007. The write position: A survey of perceived contributions to papers based on byline position and number of authors. EMBO Reports, Vol.8, no.11: 988-991.

Yoon, A. and Kim, Y. 2020. The role of data re-use experience in biological scientists' data sharing: an empirical analysis. Electronic Library, Vol. 38, no.1: 186-208.

Zhao, M., Yan, E. and Li, K. 2018. Data set mentions and citations: A content analysis of full‐text publications. Journal of the Association for Information Science and Technology, Vol.69, no.1: 32-46.