Tuesday, December 1, 2015

Bias in Machine Reading & Artificial Intelligence

In August, The Wall Street Journal ran an interesting article on social bias in web technology (sub. req'd). The article noted that [w]hile automation is often thought to eliminate flaws in human judgment, bias—or the tendency to favor one outcome over another, in potentially unfair ways—can creep into complex computer code. Programmers may embed biases without realizing it, and they can be difficult to spot and root out. The results can alienate customers and expose companies to legal risk. Computer scientists are just starting to study the problem and devise ways to guard against it.

One common error is endemic to a popular software technique called machine learning, said Andrew Selbst, co-author of “Big Data’s Disparate Impact,” a paper to be published next year by the California Law Review. Programs that are designed to “learn” begin with a limited set of training data and then refine what they’ve learned based on data they encounter in the real world, such as on the Internet. Machine-learning software adopts and often amplifies biases in either data set.

In other words, machine learning deals with designing and developing algorithms to evolve behaviors based on empirical data. One key goal of machine learning is to be able to generalize from limited sets of data (paraphrased from. Machine learning is the specific capability to "adapt to new circumstances and to detect and extrapolate patterns".

This differs from artificial intelligence in that AI encompasses other areas apart from machine learning, including knowledge representation, natural language processing/understanding, planning, robotics etc.

When it comes to the social bias embedded in web technology, [t]ake recent research from Carnegie Mellon that found male Web users were far more likely than female users to be shown Google ads for high-paying jobs. The researchers couldn’t say whether this outcome was the fault of advertisers—who may have chosen to target ads for higher-paying jobs to male users—or of Google algorithms, which tend to display similar ads to similar people. If Google’s software notices men gravitating toward ads for high-paying jobs, the company’s algorithm will automatically show that type of ad to men, the researchers said.

From this work, there is an emerging discipline known as algorithmic accountability taking shape. These academics, who hail from computer science, law and sociology, try to pinpoint what causes software to produce these types of flaws, and find ways to mitigate them. Researchers at Princeton University’s Web Transparency and Accountability Project, for example, have created software robots that surf the Web in patterns designed to make them appear to be human users who are rich or poor, male or female, or suffering from mental-health issues. The researchers are trying to determine whether search results, ads, job postings and the like differ depending on these classifications.

One of the biggest challenges, they say, is that it isn’t always clear that the powerful correlations revealed by data-mining may be biased. Xerox Corp., for example, quit looking at job applicants’ commuting time even though software showed that customer-service employees with the shortest commutes were likely to keep their jobs at Xerox longer. Xerox managers ultimately decided that the information could put applicants from minority neighborhoods at a disadvantage in the hiring process.

This is an important consideration as we start to rely more and more on machine learning and artificial intelligence to do our thinking for us. While we should be using machine learning to augment our intelligence rather than to replace our analysis, if machine learning is resulting in biased data for our decision making, it can lead to disastrous results. And if we start to rely on machines to do our thinking for us, there is no system of checks and balances. Kudos to the academics focusing on this area.

Monday, November 30, 2015

Nextgen Wayback Machine Slated For 2017

As a big fan of the Internet Archive's Wayback Machine (blogged here, here, and here), I was excited to hear that there is a nextgen Wayback Machine in the works.

The Wayback Machine, a service used by millions to access 19 years of the Web’s history, is about get an update.  When completed in 2017, the next generation Wayback Machine will have more and better webpages that are easier to find.

Today, people’s work, and to some extent their lives, are conducted and shared largely online. That means a portion of the world’s cultural heritage now resides only on the Web. And we estimate the average life of a Web page is only one hundred days before it is either altered or deleted.  “The Internet Archive is helping to preserve the world’s digital history in a transformational way,” said Kelli Rhee, LJAF Vice President of Venture Development. “Taking the Wayback Machine to the next level will make the entire Web more reliable, transparent and accessible for everyone.”

Project goals include:

– Highlighting the provenance of pages found in the Wayback Machine. Hundreds of organizations and individuals participate in building web collections at the Internet Archive. Patrons will be able to see the partner that selected websites or webpages for collection by the Internet Archive.

– Rewriting the Wayback Machine code. This will enable us to improve reliability and functionality.

– Optimizing the scope and quality of pages we crawl. We now capture about 1 billion pages per week. This project will help us improve what we capture.

– Improving the playback of media-rich and interactive websites. Supporting new formats while maintaining older ones is a key challenge for keeping as many webpages visible as possible.

– Updating the user interface. Making it easier for patrons to discover archived websites and learn from our digital history.  

– Finding websites based on keywords. While indexing all of the pages in the Wayback Machine is beyond what we can do, we will index homepages of websites so that patrons won’t have to enter specific URLs to dive into the Wayback Machine.

– Partnering with other services to repair broken links by pointing to the Wayback Machine. For example, we are working with the Wikimedia Foundation to identify broken links in Wikipedia sites and replacing them with links to archived pages from the Wayback Machine.

Please help us make the Wayback Machine better by sending suggestions for features and capabilities you would like to see to info@archive.org.

This is a huge undertaking from the Internet's biggest archive. As mentioned, please email the Archive with suggested features or capabilities to make the nextgen service even better.

Tuesday, November 24, 2015

Polish Copyright Law Accomplishes What US Is Trying To Do Via Litigation

The Polish have figured something out that the United States can't seem to get right. While the United States is slowly allowing digitization of print after long, drawn-out litigation, (i.e., HathiTrust & Google Books), the Polish have revised their copyright law to account for the digitization of materials.

The new Polish Copyright Act enters into force on 20th November 2015 bringing library services in Poland into the twenty-first century.

Major new provisions enabling digitization for socially beneficial purposes, such as education and preservation of cultural heritage, are the centrepiece for libraries of the new law.

The law also implements a European Directive enabling the use of orphan works (in-copyright works where the copyright holder cannot be identified or found to obtain permission), and an EU Memorandum of Understanding on the use of works that are no longer commercially available. In addition, the introduction of public lending right is limited to works in public libraries.

As a result, library services in Poland can be said to have entered the twenty-first century. Crucially, the library community participated for the first time in high-level policy discussions on copyright, and librarians became recognized as important stakeholders in a national reform process.

This is a wonderful step forward to allow access to information and potentially pave the way for a global library. United States - take note!

Monday, November 23, 2015

Libraries In The Year 2100

Libraries have been around for a very long time, and they will continue to be around for a lot longer, albeit in a different form that what we are used to seeing today.

So what will libraries look like in 85 years? Jim O'Donnell from Slate put it into perspective:
That’s not so very far away. The next time you see a tiny baby, bear in mind that she or he has a very good chance of living to see the 22nd century. What will the world of libraries look like then? Nobody can know—but perhaps we can talk about what libraries should be in that imaginable future.

O'Donnell posits three variations of libraries in the future:

1. One Global Library: 

Once an encyclopedia or a book or a journal or a database is in digital form, there is no good reason why it should not be made as universally and freely available as possible, and no good reason why it should not be centrally held and maintained. Right now, major university libraries harbor knowledge riches galore, astonishing things, really—and we cannot share them. Most people who live on the planet today are unable to have access to sources of knowledge that, from a technical point of view, could be reached on their smartphones today—literally today, within the next hour of the moment you read this, if the provider made the choice to allow the access.

If that has to change, it will change. We will see the consolidation of collections and a consolidation of the technical infrastructure of presenting those collections. (Oh, there will be redundancy and backups, just as there is now for things like Google searches, hosted on many servers in many locations, transparently sharing the load. Such distribution speeds service and improves the resilience in case of disaster or emergency.) And we will see the emergence of business models for paying for what we now think of as “publishing” that allow completely free and open access to the contents of this global library.

2. Many Small Libraries: 

Physical collections will all be what we now call “special collections”: unique materials they possess uniquely because of where they are and what their history might be. Readers will still make their way to the [the various] libraries to see whatever unique collections they have, but readers will also find in those places much of what they now go there to find: intelligent people engaged in the work of knowledge and the work of community. Librarians will be there as coaches, mentors, guides, facilitators, and other members of the public will be there as knowledge-seekers, knowledge-sharers, entrepreneurs of the spirit, and entrepreneurs of the world of business. Libraries are the ideal “third place” for a free society and will never lose that powerful attraction. 

3. No Libraries:

We could also lose libraries to hubris and shortsightedness. “We don’t need libraries any more; it’s all digital”—we’ve all heard some version of that peremptory dismissal, entirely worthy to be heard on the stage of a debate among presidential candidates.

But we do need libraries. In a world of superabundant information, they curate and collect and discriminate and care for the good stuff—the stuff really smart people have worked to create and preserve, the stuff you can rely on when you want to understand the world deeply and accurately, the stuff too complicated to come into existence by crowdsourcing, too unpopular to be foisted on us by corporations or politicians. Librarians—smart, professional, dispassionate about everything but the truth—are the Jedi knights of our culture’s future and deserve to be respected for that.

If we let ourselves be taken in by techno-optimism and carelessness and if we then let libraries fade away, we will be in a poorer place. There are many historical explanations offered for the disappearance of the great ancient library of Alexandria, but my personal judgment is that it did not fall victim to Julius Caesar or Christian monks or Islamic warriors. Libraries are more likely to disappear because the responsible leaders of a community deprive them of support, take them for granted, treat them dismissively.

Like O'Donnell, I aspire to something closer to numbers 1 & 2, but I fear that number 3 will happen anyway. If we are already at a point of the public questioning a library's existence, then what will it be like when the algorithms are finding material and thinking for us? My hope is that we will stay sophisticated enough to realize that access to reputable, unbiased information is the key to an informed citizenry and a truly democratic society. 

Friday, November 20, 2015

Google Truth Rankings: Vetting or Gatekeeping?

Salon is reporting about a proposed Knowledge-Based Trust score that Google might implement to keep "bad information" at bay.

Google could launch an effort to keep trolls and bad information at bay, with a program that would rank websites according to veracity, and sort results according to those rankings. Currently, the search engine ranks pages according to popularity, which means that pages containing unsubstantiated celebrity gossip or conspiracy theories, for example, show up very high.

New Scientist’s Hal Hodson reports on the proposed Knowledge-Based Trust score:

The software works by tapping into the Knowledge Vault, the vast store of facts that Google has pulled off the internet. Facts the web unanimously agrees on are considered a reasonable proxy for truth. Web pages that contain contradictory information are bumped down the rankings.

Vetting for truth is a good thing. It seems as though people will believe anything on the Internet so long as it gets enough views or shares. Because Google's current algorithm takes popularity into consideration, there is a chance for "bad information" to rise to the top of the search results, which has a circular effect of causing more people to believe in the truth of the story.

However, technology acting as a gatekeeper to information is not so good. Particularly when it comes to science; there are times when what was a "truth" yesterday is "bad information" today. We need the ability to vet information for ourselves. To bring information together in a way that creates breakthroughs. If we have a machine do this for us, we lose the ability to use our critical thinking skills and make connections between various sources of information.

Vetting information is such an integral part of the research process. And it often leads to new ideas. I'm not so sure that this is a function that we want to relieve ourselves of in favor of artificial intelligence for many reasons.

Tuesday, November 17, 2015

AALL Rebranding Initiative

The American Association of Law Libraries (AALL) is currently investigating rebranding the name of the Association.

AALL's comprehensive, Association-wide rebranding initiative is steadily moving forward. At its November 7 meeting, the AALL Executive Board voted unanimously to recommend to the membership a new name, "Association for Legal Information." This is our opportunity to redefine and reinvigorate the value of the law librarians and legal information professionals and to shape the brand to align with and support our strategic goals.

From the FAQs:

Why the name Association for Legal Information?

An Association is an organization of people who work for a common purpose (legal information).

With the object or purpose of legal information.

Legal Information
Knowledge concerning a particular subject.

Why is AALL undertaking a branding project?

AALL, its members, and the legal profession have undergone significant changes in recent years. Rapid advances in technology, the proliferation of information, and the compression of the legal profession have transformed what it means to be a law librarian. As physical law libraries shift to virtual information hubs, new skills and expertise are required. 

Today, 51 percent of AALL members do not have “librarian” in their titles, and 57 percent work in an organization that does not have “library” in the name. AALL has a tremendous opportunity to be at the forefront of this change. 

This project allows us to clarify who we are and what we do, and to tell the story about our work and profession in a way that makes it clear and compelling. 

My first question: why is AALL spending up to $185,000 on this? We should stop worrying about how others perceive us and let our actions speak for us. We should use this money, instead, to explore comprehensive consortiums that take into account the fact that most law libraries are starting to license the majority of information publishers. We need a plan in place to deal with the inevitable roadblocks that we will face, etc....

Even if we do rebrand, why ALI? Who are we trying to kid, here? "Librarian," in today's world, can mean legal information professional. We don't have to be originalists. The meaning of our name can evolve. AALL has such a strong, rich identity. It was founded in 1906. When I talk about "AALL" with my law school colleagues, they know what I mean. I don't feel good about saying that I am a member of ALI (even if pronounced "ally"). I already feel like I spend a lot of time defending what I do, and I don't feel like explaining this new ALI affiliation - as in, "no, I am not a member of the American Law Institute."

Ultimately, I chose this profession because I want to be a law librarian - one who is cutting edge and deals in the legal information profession.

The AALL member discussion board on topic seems to reiterate our unease with this change. And, in some ways, this plays into the stereotype of librarians as old curmudgeons who are adverse to change. Well, in this case, it may just be true - although the consensus seems to be that we are not opposed to a change, rather, just this particular change.

What if the members do not vote for the name change?

The rebranding project will proceed to phase two, creative development, using the name American Association of Law Libraries. A new visual identity and comprehensive messaging will continue to be developed and implemented.

Phase two of the project should have been phase one. Now let's move forward.

For a great post on why the name should change, see Dewey B. Strategic.

Monday, November 16, 2015

NELLCO & MALLCO Webinars: Nerd Know How Series

NELLCO & MALLCO are hosing a "Nerd Know-How Series" with Beth Ziesenis.

Author Beth Ziesenis is a technology expert who speaks to 60-plus groups a year about the best free and bargain apps and online resources that will help you Release Your Inner Nerd to become more organized, efficient and awesome at work and home.

Each 90 minute webinar is based on one chapter of Beth's most-recent book, Nerd Know How: The 27+ Best Apps for Work and How to Use 'Em! In each session, attendees will learn about low- or no-cost technology tools to help you maximize your efficiency and effectives in each of the 8 areas. Attendees can register for one or all of the sessions by clicking on the register now button below.

The eight sessions are:
Webinar 1: Organize - Tuesday, December 1 - The webinar series kicks off with the building blocks of organization. Learn how to get your ducks in a row with in-depth looks at Dropbox, Evernote, IFTTT and LastPass.

Webinar 2: Collaborate - Thursday, December 17 - How can you play well with others? This webinar has the answer. You'll discover collaboration tools such as Trello, ScheduleOnce, Join.me and Zoom.

Webinar 3: Share - Tuesday, January 12 - Everyone suffers from information overload, so you need to be clear and precise when you communicate. This webinar focuses on four tools that help you share information with technology: Jing, Adobe Reader, Issuu and Prezi.

Webinar 4: Design - Monday, January 25 -Graphic designers have talent that many of us don't. This webinar gives you tools that will put the power of someone else's genius into your hands. Join us for facts about Pixlr, Canva, 123RF and Dafont.

Webinar 5: Create - Tuesday, February 2 - Building on the Design Webinar, you'll learn more about how to create professional-quality graphics and multimedia for next to nothing with Animoto, Piktochart and Tagxedo.

Webinar 6: Travel - Friday, February 19 - If you've ever found yourself in an airport taxi trying to dig out the reservation confirmation for the hotel you need to go to, you need this webinar. We'll look at travel tools such as TripIt and Waze that get you where you need to go without the hassle.

Webinar 7: Outsource - Tuesday, March 1 - So maybe you have attended all the webinars in this series, and you're thinking, "How am I going to find time to put these apps into action?" The answer... outsource! This webinar shares services that let you outsource tasks and projects without breaking the bank. Focus: Fancy Hands, Fiverr and Upwork (formerly Elance).

Webinar 8: Google - Tuesday, March 22 - For the final webinar in the series, you'll discover free Google services that can streamline your organization, email, collaboration, research and much, much, MUCH more.

Thursday, November 12, 2015

Legal Writing Institute: Program On Teaching International Law Students

As my first semester teaching legal research and writing to international LL.M. students comes to a close, I am excited to review my course materials to improve for the next go-round.

The Global Legal Writing Skills Committee of the Legal Writing Institute provides invaluable resources for teaching research and writing to international law students.

In addition to their bibliography of resources, they also have a video series on Teaching International Law Students that is hosted by Michigan State College of Law.

At the website you will find presentations such as:

  • Adapting Classroom Techniques & Materials for International LL.M. Students

The beauty of teaching is that you can review and revise and incorporate new methods for comprehension each time. These presentations will be invaluable as I contemplate ways to make my class even better.