Digital Treasures Of The New York Public Library

"Prospectors returning to camp. 62 degrees below zero, Alaska." The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Photography Collection, The New York Public Library. (1898 - 1900). — A stereograph featuring “Prospectors returning to camp. 62 degrees below zero, Alaska.” The Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Photography Collection, The New York Public Library (1898 – 1900)

Searching for a 14th Century manuscript for a school report? How about an old baseball photo for your stash of sports memorabilia? You might try the New York Public Library’s Digital Collections. Last week, the library made more than 187,000 digitized, public-domain items more easily accessible in the highest resolution available.

Specifically, the library removed permissions and payment processes that encumbered access to this material, according to Ben Vershbow, director of NYPL Labs, one of the departments involved in the project. The institution, which celebrated its 120th anniversary last year, also added updates to its API and GitHub account to enable further applications of its content.

Science Friday recently spoke with Vershbow about the library’s archives, its approach to digitization, and the importance of making digital collections available to the public.

GIF made with the NYPL Labs Stereogranimator — Made with the NYPL Labs Stereogranimator

Science Friday: The NYPL has been digitizing for a while, right?
Ben Vershbow: The library’s digitization story started at some scale around ’99, roughly. In 2005, we launched the predecessor [called the Digital Gallery] to the current Digital Collections website. That was really the library’s first big move at-scale, with over a quarter of a million items—in-copyright, out of copyright, a whole mix. That’s grown over the years, and will undoubtedly undergo further evolutions.

What kind of equipment do you use to digitize?
Obviously digitization that happened in the earlier days was done using somewhat different tools, and what was high-resolution then is not so high-resolution now. A lot of the early-wave digitization the library did, and many libraries and cultural heritage organizations did, involved flatbed scanners.

The digitization we do today is overhead photography in a copy-stand setup, and there are variations on that. We have a lot of what you call ‘transmissive media materials’—so, materials that are not reflective but where light moves through them, like slides and glass plate negatives—and that work requires its own kind of modification to a copy stand. There’s also book-scanning equipment, of course, which can vary widely for different kinds of books. And there are other high-volume apparati being developed. What we are making available through this public domain project represents a lot of different types of digitization, to be sure.

The library’s in-house digitization lab is in our department at NYPL Labs, in an awesome space in a non-public facility that the library runs in Queens, and it’s really very meticulous, expert work.

What can we find when we plunge into this archive of high-resolution, public-domain images?
Well, it’s incredibly diverse. You’ll see there’s a lot of maps, there’s a lot of stereographs, there’s a lot of sheet music, there’s a lot of other kinds of photography, there’s a lot of correspondence and manuscript material. I think the single biggest collection and maybe genre category is the stereoscopic views. These are 3-D images, and they were an incredibly popular form of entertainment and virtual sightseeing, in a sense, in their day in the late 19th, early 20th century.

Is there a trend among cultural institutions to digitally open their collections to the public?
There’s a whole movement for opening up collections in GLAMs [an acronym for galleries, libraries, archives, and museums]. The web has become a vibrant cultural commons, and I think that we’ve seen that—whether they’re legacy cultural institutions like libraries and museums and archives, or more Internet-native public institutions like Wikipedia, Wikimedia, and the Internet Archive—more are offering unrestricted open content into the web.

We’re trying to share data so that people can build aggregators so that you don’t have to go to each institution’s web presence to search. Wikimedia itself is fed by a lot of different institutions that are releasing content into that commons, for example. And the Digital Public Library of America (DPLA) is a great way to expose what’s been digitized. It points you to that local collection and web property, and then you can work through the particular use parameters of that institution [if the item still has use restrictions]. [There’s also a European forerunner to the DPLA, called europeana.]

Related Segment

Rebooting Science Museums for the 21st Century

Why has it become important to GLAM institutions to get this content out to the public?
It’s a very powerful thing. It creates a common resource base that everyone can draw on and repurpose very freely, leading to all kinds of new uses and illuminating projects. [The library created a few examples of how users might repurpose its content. Here’s one.]

When you think of the web and the Internet in general as a cultural medium, and as a place that is not just about finding your way to resources that live elsewhere, but is in fact made itself of resources—of materials that can be used in a digitally native context, even if they make their way back into physical forms and other forms of distribution—it does start to feel quite limited if your materials are mostly there as a reference that points you back to something that you either need to pay for or require permission to use.

Now, copyright and all kinds of other things that we have to work through and respect and abide by do require us to place requirements on certain materials, but the ones that don’t have those constrictions, that are out of copyright?—I think people are starting to realize, let’s just make those as freely available as possible, because then you’re really able to attain greater impact in terms of these things being used in ways that are both expected and unexpected. For all of us working in the space, it’s obvious to us that we have to do this for any material that we can. Let’s just get it out there and see what people can do.

How would you describe NYPL Labs?
NYPL Labs is a new kind of what was traditionally called a ‘digital library program.’ We’re really looking at that entire life cycle of bringing our research collections onto the Internet and even working proactively to engage new users and create context where things can be used. For example, we host hackathons that are exploring ways that we can engage local technologists and creators to work with us on projects or show us how these things can be used in new ways. We feel like we’re sketching a new kind of organ, in a way, of research libraries that supports people working in a new mode with cultural data that the libraries collected.

What other projects are you excited about?
The NYC Space/Time Directory, which is an initiative made up of a lot of different projects but with this unifying dream of opening up historical geographic data about New York City—as a resource in itself, because I think having a record of the city’s changes is very important for people who are understanding the city’s development, and it’s certainly a historical interest for a wide number of people, but also as an organizing framework for aggregating other information, such as photographs about these past places. It’s great to be able to search in our digital collection site and just find photos, but what if you could browse a map and find them geographically and also temporally? Or you might search for a place that doesn’t exist anymore. Or you might want to see what the layout of the city was at a certain time. We have coverage of the city at these different time periods.

This project is something we’re undertaking in the coming two years and just got a Knight Foundation grant [through the Knight News Challenge] to do.

This interview has been edited for length and clarity.

*This article was updated on January 12, 2016, to include one more Internet-native public institution—the Internet Archive—to Ben Vershbow’s discussion about GLAMs and open content.

Meet the Writer

About Julie Leibach

@julieleibach

Julie Leibach is a freelance science journalist and the former managing editor of online content for Science Friday.

Cookie	Duration	Description
_abck	1 year	This cookie is used to detect and defend when a client attempt to replay a cookie.This cookie manages the interaction with online bots and takes the appropriate actions.
ASP.NET_SessionId	session	Issued by Microsoft's ASP.NET Application, this cookie stores session data during a user's website visit.
AWSALBCORS	7 days	This cookie is managed by Amazon Web Services and is used for load balancing.
bm_sz	4 hours	This cookie is set by the provider Akamai Bot Manager. This cookie is used to manage the interaction with the online bots. It also helps in fraud preventions
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
csrftoken	past	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks
JSESSIONID	session	The JSESSIONID cookie is used by New Relic to store a session identifier so that New Relic can monitor session counts for an application.
nlbi_972453	session	A load balancing cookie set to ensure requests by a client are sent to the same origin server.
PHPSESSID	session	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
TiPMix	1 hour	The TiPMix cookie is set by Azure to determine which web server the users must be directed to.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
visid_incap_972453	1 year	SiteLock sets this cookie to provide cloud-based website security services.
X-Mapping-fjhppofk	session	This cookie is used for load balancing purposes. The cookie does not store any personally identifiable data.
x-ms-routing-name	1 hour	Azure sets this cookie for routing production traffic by specifying the production slot.

Cookie	Duration	Description
__cf_bm	30 minutes	This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
S	1 hour	Used by Yahoo to provide ads, content or analytics.
sp_landing	1 day	The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
sp_t	1 year	The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
__jid	30 minutes	Cookie used to remember the user's Disqus login credentials across websites that use Disqus.
_gat	1 minute	This cookie is installed by Google Universal Analytics to restrain request rate and thus limit the collection of data on high traffic sites.
_gat_UA-28243511-22	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
countryCode	session	This cookie is used for storing country code selected from country selector.

Cookie	Duration	Description
_fbp	3 months	This cookie is set by Facebook to display advertisements when either on Facebook or on a digital platform powered by Facebook advertising, after visiting the website.
fr	3 months	Facebook sets this cookie to show relevant advertisements to users by tracking user behaviour across the web, on sites that have Facebook pixel or Facebook social plugin.
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
NID	6 months	NID cookie, set by Google, is used for advertising purposes; to limit the number of times the user sees an ad, to mute unwanted ads, and to measure the effectiveness of ads.
personalization_id	2 years	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
vglnk.Agent.p	1 year	VigLink sets this cookie to track the user behaviour and also limit the ads displayed, in order to ensure relevant advertising.
vglnk.PartnerRfsh.p	1 year	VigLink sets this cookie to show users relevant advertisements and also limit the number of adverts that are shown to them.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
_dc_gtm_UA-28243511-20	1 minute	No description
abtest-identifier	1 year	No description
AnalyticsSyncHistory	1 month	No description
ARRAffinityCU	session	No description available.
ccc	1 month	No description
COMPASS	1 hour	No description
cookies.js_dtest	session	No description
debug	never	No description available.
donation-identifier	1 year	No description
f	never	No description available.
GFE_RTT	5 minutes	No description available.
incap_ses_1185_2233503	session	No description
incap_ses_1185_823975	session	No description
incap_ses_1185_972453	session	No description
incap_ses_1319_2233503	session	No description
incap_ses_1319_823975	session	No description
incap_ses_1319_972453	session	No description
incap_ses_1364_2233503	session	No description
incap_ses_1364_823975	session	No description
incap_ses_1364_972453	session	No description
incap_ses_1580_2233503	session	No description
incap_ses_1580_823975	session	No description
incap_ses_1580_972453	session	No description
incap_ses_198_2233503	session	No description
incap_ses_198_823975	session	No description
incap_ses_198_972453	session	No description
incap_ses_340_2233503	session	No description
incap_ses_340_823975	session	No description
incap_ses_340_972453	session	No description
incap_ses_374_2233503	session	No description
incap_ses_374_823975	session	No description
incap_ses_374_972453	session	No description
incap_ses_375_2233503	session	No description
incap_ses_375_823975	session	No description
incap_ses_375_972453	session	No description
incap_ses_455_2233503	session	No description
incap_ses_455_823975	session	No description
incap_ses_455_972453	session	No description
incap_ses_8076_2233503	session	No description
incap_ses_8076_823975	session	No description
incap_ses_8076_972453	session	No description
incap_ses_867_2233503	session	No description
incap_ses_867_823975	session	No description
incap_ses_867_972453	session	No description
incap_ses_9117_2233503	session	No description
incap_ses_9117_823975	session	No description
incap_ses_9117_972453	session	No description
li_gc	2 years	No description
loglevel	never	No description available.
msToken	10 days	No description

Opening the Memory Bank

Rebooting Science Museums for the 21st Century

Meet the Writer

About Julie Leibach

Explore More

Rebooting Science Museums for the 21st Century

Meet the Writer

About Julie Leibach

Explore More

How One Guy Raised $1.3 Million for a Tesla Museum

Rebooting Science Museums for the 21st Century

Where ‘Postnatural’ Organisms Find a Home