Do Your Data Management and Curation Practices Support Data ...

Do Your Data Management and Curation Practices Support Data ...

Harvard University, Boston, MA, February 3, 2017 Do your data management and curation practices support data reuse? Ixchel M. Faniel, PhD Research Scientist, OCLC Research [email protected], @imfaniel Dissemination Information Packages

for Information Reuse (DIPIR) The DIPIR Project was made possible by a National Leadership Grant from the Institute of Museum and Library Services, LG-06-10-0140-10, Dissemination Information Packages for Information Reuse and support from OCLC Online Computer Library Center, Inc., and the University of Michigan. HOW DOES CONTEXTUAL INFORMATION SERVE TO MEANINGFULLY COMMUNICATE

DATA TO REUSERS? Data collection information Sometimes they'll simply declare we were only interested in broad-based information. We were only collecting broadbased artifacts...So, they're walking huge tracts of land, but they're only hitting big thingsI've heard of things like shoulder surveys, where they literally walk side by side and pick those little things, but then, again, you've only, you're doing a very narrow tract. So there are procedures.

- Archaeologist 01 Artifact Information I actually mean what strata it's from. I was talking about the importance of having a clear stratigraphy. And so, if they had labeled stratigraphy, let's say, A, B, C, D, E, and if they're comparing the fauna from E to A, that tells me that when they excavated, they

were really careful about preserving that information. - Archaeologist 06 Repository Information I don't give [the archaeological repository] a sort of blanket trust that all the data in there is correctthey provide enough metadata for me to check that on my ownI sort of trust going there because I know that I can find the information I need to validate it.


STANDARDIZATION Everything is turned into intentionally difficult codes... hundreds of lines you have to translate, It was really important to streamline that translation process. - Data Producer 3 INTEGRATION I don't know what kind of format the datasets had before they got integrated, but I believe there was a lot of work. - Data Reuser 9 Repositories cant reverse producer practices

DATA DOCUMENTATION Just so you have an idea of the issue with tooth wearthe following [seven] sites all record a single field Payne Wear[four datasets] all code each tooth in a separate field with a Payne number. But they dont come up with a letter code for the entire specimenThe remaining datasets dont provide Payne Stages. - Repository Staff 2 Data sharing influence on repositories & reuse DATA CONDITION

It took 10 times longer to deal with those [coded] datasets. - Repository Staff 2 DATA PRODUCERS SELECTION I did think quite carefully about [including]those big subjective descriptions we write about the units...,but I decided to...I couldnt really be sure if people would necessarily want them out but they are an important part of the data set. - Data Producer 10 Data reuse influence on repositories & sharing

REPOSITORY PROCEDURES There are some inherent issues with CSVthe simplicity is why it is preferred for interoperability and longevitywe need to give users a few tips on working with CSV. I'm also looking into other open spreadsheet formats. - Repository Staff 1 DATA PRODUCERS DOCUMENTATION I had a completely different recording system for [teeth, now Im] just usingPayne. - Data Reuser 6 [Im] just dropping numeric codes, not doingnumeric codes

anymore. - Data Reuser 10 Next Steps SLO-data Some Conclusions Stakeholders reusers needs must

be considered throughout the data lifecycle How do we do it? Research data management takes time and effort Who should bear the burden? How can we lessen the burden?


Institute of Museum and Library Services Co-PI: Elizabeth Yakel (University of Michigan) Partners: Nancy McGovern, Ph.D. (MIT), Eric Kansa, Ph.D. (Alexandria Archive, Open Context), William Fink, Ph.D. (University of Michigan Museum of Zoology), Sarah Whitcher Kansa (Alexandria Archive, Open Context) OCLC Fellow: Julianna Barrera-Gomez Doctoral Students: Rebecca Frank, Adam Kriesberg, Morgan Daniels, Ayoung Yoon

Masters Students: Alexa Hagen, Jessica Schaengold, Gavin Strassel, Michele DeLia, Kathleen Fear, Mallory Hood, Annelise Doll, Monique Lowe Undergraduates: Molly Haig Select References

Faniel, Ixchel M., and Elizabeth Yakel. 2017. Practices Do Not Make Perfect: Disciplinary Data Sharing and Reuse Practices and Their Implications for Repository Data Curation. In Curating Research Data Volume 1: Practical Strategies for Your Digital Repository, 103-126. Chicago, IL: Association of College and Research Libraries Press.

Frank, R., Yakel, E., & Faniel, I. M. (2015). Destruction/reconstruction: Preservation of archaeological and zoological research data. Archival Science, 15(2), 141-167. doi: 10.1007/s10502-014-9238-9 Frank, R. D., Kriesberg, A., Yakel, E., & Faniel, I. M. (2015). Looting hoards of gold and poaching spotted owls: Data confidentiality among archaeologists & zoologists. Proceedings of the Association for Information Science and Technology (ASIS&T), 52. Kriesberg, A., Frank, R., Faniel, I., & Yakel, E. (2013). The role of data reuse in the apprenticeship process. Proceedings of the Association for Information Science and Technology (ASIS&T), 50. Faniel, I., Kansa, E., Whitcher Kansa, S., Barrera-Gomez, J., & Yakel, E. (2013). The challenges of digging data: A study of context in archaeological data reuse. Proceedings of the Joint Conference on Digital Libraries (JCDL), 295-304.

Yakel, Elizabeth, Ixchel Faniel, Adam Kriesberg, and Ayoung Yoon. 2013. Trust in Digital Repositories. International Journal of Digital Curation 8 (1): 14356. doi:10.2218/ijdc.v8i1.251. Daniels, M., Faniel, I., Fear, K., & Yakel, E. (2012). Managing fixity and fluidity in data repositories. In Mai, J. (Ed.), Proceedings of the 2012 iConference (pp. 279-286). New York: ACM. Additional references for the DIPIR project: Critical Perspectives on the Practice of Digital Archaeology

Thank you Ixchel M. Faniel, PhD Research Scientist [email protected], @imfaniel

Recently Viewed Presentations



    MỤC LỤC THÁNH CA L Khi nguyện cầu, tôi nhắc tên anh nhiều, xin Chúa cho anh được luôn thái an. Khi nguyện cầu, tôi sẽ xin Jêsus, thêm sức cho anh trong ngày gặp khó khăn. 2 Khi ánh...
  • Decision Structures - Wilfrid Laurier University

    Decision Structures - Wilfrid Laurier University

    Write a program to calculate x**y, given x and y. (revisited) inputs: x (w - because of the multiply instruction) y (b - it is a counter) output: ans (l - because of the multiply instruction)
  • Lecture 12: Surface Processes I: chemical and physical ...

    Lecture 12: Surface Processes I: chemical and physical ...

    Lecture 18: Chemical Geodynamics, or Mantle Blobology Questions What can geochemistry tell us about the deep interior of the Earth? Is the mantle homogeneous and if not how many reservoirs are there?
  • Warm Up - Augusta County Public Schools

    Warm Up - Augusta County Public Schools

    Weave 2 quotes into sentences that explain how tone is created in the poem. Remember to use quotation marks. Remember to use introductory information. Remember to explain how the example ties to the tone.
  • What is Clinical Coding? - DRG

    What is Clinical Coding? - DRG

    Codificarea Clinică în mediul de case-mix iunie 2002 Sue Walker & Shannon Watts National Centre for Classification in Health, Australia

    Activité. Classeur. Une Chanson de Noël. I. Corrections . 1. Le cahier d'exercices pp. 95-103 (TOUT !) 2. Feuille : Le passé composé : Act. G-H-I. Activité ...
  • Kementerian Riset, Teknologi, dan Pendidikan Tinggi Kebijakan Nasional

    Kementerian Riset, Teknologi, dan Pendidikan Tinggi Kebijakan Nasional

    Pasal 4 Permenristekdikti No. 62 Tahun 2016 Tentang SPM Dikti. Perencanaan, pelaksanaan, evaluasi, pengendalian, dan pengembangan. SPMI dan SPME . didasarkan. pada ...
  • 高性能健壮系统中的内存管理 -

    高性能健壮系统中的内存管理 -

    高性能健壮系统中的内存管理 合理的内存管理策略带来 系统性能的提高 更高的运行速度 更小的内存占用 稳定的内存使用量 让系统更健壮 减少因系统设计规模的扩大而带来的负作用 易于调试 减少编码失误率 控制 bug 的坏影响的范围 让系统可以长期不间断的工作 内存是最难管理的资源 内存作为一 ...