Do you ever wonder what happens to the vast treasure trove of data on which researchers rely for some of their most startling discoveries? Most of it goes "dark" and is never seen again after a research project is over.
CHE is reporting that Researchers at the University of North Carolina at Chapel Hill are leading an effort to create a one-stop shop for data sets that would otherwise be lost to the public after the papers they were produced for are published. The goal of the project, called DataBridge, is to expand the life cycle of so-called dark data. It will serve as an archive for data sets and metadata, and will group them into clusters of information to make relevant data easier to find.
The hope is that eventually researchers from around the country will submit their data after publishing their findings.
Ideally, this is a great way to share data that is often very time consuming and expensive to extract.
Ultimately, The researchers are also interested in including another type of “dark data”: archives of social-media posts. For example, the group has imagined creating algorithms to sort through tweets posted during the Arab Spring, for researchers studying the role of social media in the movement.
And in some cases, the project could serve as a model for libraries at research institutions that are looking to better track data in line with federal requirements.
But as commenters to the article noted, there are issues with reusing data sets. What about authenticity and ownership rights?
To that end, librarians tried to get involved with the MLA beta repository that allows its members to post such data sets, as well as blog posts and conference papers, to assign a suitable license to them, and to receive a DOI for them (thereby going some way towards solving the authenticity and ownership issues). It was developed in collaboration with the librarians at Columbia's CDRS, and while only members can deposit their work, anyone can view and download it.
This is a good start for organizing data in a searchable format with disruptive technologies, such as the Internet of Things, capable of producing vast data sets that librarians should be at the forefront of sifting through.