Session 2 - Collect and Create
Differences in the etymological roots of the terms data and capta make the distinction between constructivist and realist approaches clear. Capta is “taken” actively while data is assumed to be a “given” able to be recorded and observed. From this distinction, a world of differences arises. Humanistic inquiry acknowledges the situated, partial, and constitutive character of knowledge production, the recognition that knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact. (Drucker, 2011)
Description
The collection and creation stage of the lifecycle is the first moment when both the CARE and FAIR principles are critical. While research data created or collected is done so to answer a question or provide new/more information, this work isn’t done in a vacuum and the implications for the “subject(s)”, the data creator/collector(s), and the data user/re-user must be evaluated. Additionally, documentation is an important component in this stage of the lifecycle. Without proper and full documentation for the data collected/created, discovery is limited, it is difficult for other researchers to reuse or reproduce the data, and importantly, the context is lost.
Questions
- Have you considered the best method(s) of data collection that allows for the gathering of relevant data and not unnecessary data or data that may be useful for a future research project but not within the scope of the current research and scope of consent?
- If gathering consent from a potential participant, have you thoroughly explained what kind of data will be collected, how it will be collected, and how it will be potentially used?
- Is there an option to opt-out of certain activities or moments of participation?
- When organizing collected data, is there documentation explaining and/or supporting the reasoning or method for organizing?
- If conventional/traditional methods of documentation fail to adequately describe the content and context of the data, how will you go about find the best way to document or explain the gap/issue with the available documentation method(s)?
- Are there specific protocols or vocabularies that could be used?
Resources
- Leon, Sharon M., The Peril and Promise of Historians as Data Creators: Perspective, Structure, and the Problem of Representation in Journal of Slavery and Data Preservation 4, no. 5, 1-17 (December 2023)
- Brown, N. M., Mendenhall, R., Black, M., Moer, M. V. et al., In Search of Zora/When Metadata Isn’t Enough: Rescuing the Experiences of Black Women Through Statistical Modeling in Journal of Library Metadata, 19(3–4), 141–162 (August 2019)
- Henrietta Lacks, the Tuskegee Experiment, and Ethical Data Collection: Crash Course Statistics #12 (April 2019) [YouTube video]
- Gebru, T., Morgenstern, J., Vecchione, B., Vaughan, J. W. et al., Datasheets for Datasets (March 2018)
- Pushkarna, M., Zaldivar, A., & Kjartansson, O., Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI (April 2022)