Library of Congress Debuts Newspaper Navigator to Search Historical Newspaper Images
Usually when searching historic newspapers for genealogy, we look for birth, death, and marriage announcements and similar articles. But now, through a machine learning experiment at the Library of Congress (LOC), you can easily search over 1.5 MILLION historic newspapers images for free. The name of the tool and project? Newspaper Navigator.
Newspaper Navigator and Chronicling America
The Chronicling America collection of over 17 million historic American newspapers dating back to 1789 now has a new way to search images including advertisements and photographs. Here’s how it works according to the recent LOC press release on 15 September 2020:
- The user begins by entering a keyword that returns a selection of photos. Then the user can choose photos to search against, allowing the discovery of related images that were previously undetectable by search engines.
- The text of the newspapers is made searchable by character recognition technology, but users looking for specific images were required to page through the individual issues. Through the creative ingenuity of Innovator in Residence Benjamin Lee and advances in machine learning, Newspaper Navigator now makes images in the newspapers searchable by enabling users to search by visual similarity.
- This dataset consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project.
- The dataset also includes text corresponding to the visual content, identified by extracting the Optical Character Recognition, or OCR, within each predicted bounding box. For example, if the visual content recognition model predicted a bounding box around a headline, the corresponding textual content provides a machine-readable version of the headline; likewise, for a photograph, illustration, or map, this textual representation often contains the title and caption.
The following types of visual content are part of the Newspaper Navigator data set:
- Editorial Cartoon
Using Newspaper Navigator for Genealogy Research
I’ll admit, my first time using Newspaper Navigator was not easy. At least not to get the best results and take full advantage of the features. I highly recommend viewing the brief video on Newspaper Navigator HERE.
- Tip #1: Don’t search using location names, given names, surnames, etc. The tagging of images was done by artificial learning and the focus was on basic terms such as factory or worker.
- Tip #2: Take advantage of the AI Navigator. This is where the power of Newspaper Navigator comes through. Add at least one image to your Collection and then click Train My AI Navigator. It is here that you can start to narrow down the image types. Click the minus sign over an image for a similar type you want to exclude from the search; click the plus sign over an image to add more similar images to the search.
Once you understand the search syntax and how the AI Navigator works, it is pretty easy. Now, I can’t guarantee you’ll find anything useful for specific genealogy research – at least not right away. Remember Newspaper Navigator is a work in progress and over time more and more images will be added and more robust tagging will be used. In the future, for genealogy research it would be so cool to be able to search images by location, ship names, professions and more.
For more assistance with Newspaper Navigator, see the About section HERE.
©2020, copyright Thomas MacEntee. All rights reserved.