Tuesday, March 3, 2015

Wrangling The Wild Web Through Deliberate Searching

The Washington Post shared a recent article about a historian who tried to use the wild-web (aka Internet archive) to do historical research and mostly failed.

As the researcher described, "[historians] use anything we can to get a view of how humans behaved in the past. In the 21st century, the web gives us a unique window onto society. Never before has humanity produced so much data about public and private lives – and never before have we been able to get at it in one place."

The British Library and Institute of Historical Research created a research project to mine this data. The researchers were "among the first in the world to use the web archive for academic research." And the researchers thought that searching the web archive would be as simple as a Google search.    "[S]ince we could navigate Google reasonably easily, we thought we could use the archive in the same way. Do a search. Get a group of webpages on a particular subject. Read them. Draw some conclusions. How hard could it be?" As the researchers soon learned. "Very."

The way that researchers must search the web archive is very different from, "say, the Library of Congress. There (and elsewhere), professional archivists have sorted and cataloged the material. "[I]f the archivist has chosen to keep [the documents], they’re probably of interest to [the researcher]. With the internet, we have everything. Nobody has – or can – read through it. And so what is 'relevant' is completely in the eye of the beholder." In other words, there is no librarian doing the behind-the-scenes heavy lifting.

If researchers want to truly search the wild-web instead of merely using Google as the gatekeeper, then the researchers must take new approaches to the data. They have to know how to use deliberate search techniques to understand what they are searching and the results that search will generate. For example, "[s]maller samples of Web sites, specifically chosen for their historical importance" may be used. Similarly, much more focused searches on smaller time periods, more marginal topics, or specific cultural groups can produce a more manageable 'corpus' for reading and manipulating in the same way we would on our trips to traditional archives."

This is where the new role of the librarian comes in. Librarians must instruct on the deliberate search techniques that make this type of in-depth research possible. We need to convey to our students the reasons that they are retrieving certain results and give them an understanding about how to effectively use various search techniques to make research manageable.

As the researcher in the Washington Post article explained, "[t]his mass of data we have, far from rendering the [Internet] archive unintelligible, may give us richer and more fruitful answers. We just need to work out the right questions to ask."

