nlp4arc 2017

Event Information
3 February 2017 9:00am – 5:00pm
Student Union rooms 3206A and 3206B, University of North Carolina, Chapel Hill, North Carolina

Suggested hashtag: #nlp4arc

About the Symposium
The symposium consisted of a number of short talks and unconference style break-out sessions on the application of natural language processing (NLP) to support use, access, and analysis of digital primary source materials.

A rapidly growing body of materials with significant cultural value are “born digital.” Information professionals must be prepared to extract digital materials from their original environments and media in ways that reflect the rich metadata and ensure the integrity of the materials. They must also support new forms of access: allowing users to make sense of materials and understand their context.

There are many types of contextual information that can be vital to making sense and meaningful use of digital objects. These can include objects, agents, occurrences, purposes, times, places, form of expressions, concepts/abstractions and relationships.

There are many existing open-source tools that libraries, archives and museums (LAMs) can use to identify, extract and expose such contextual entities from the wide diversity of born-digital materials that LAMs already hold and continue to receive. NLP tools and methods can help to both (1) facilitate curatorial decision making and description, and (2) generate access points to be presented to end users.

Program

9:00 – 9:15	Welcome and introduction – Cal Lee
9:15-10:45	Challenges and Opportunities in Applying NLP to Digital Collections Daniel Pitti, University of Virginia – Observations on the Challenge of Identity Don Mennerich, New York University, NLP in Archival Processing Mary Elings, University of California, Berkeley – Using NLP to Support Dynamic Arrangement, Description, and Discovery of Born Digital Collections: The ArchExtract Experiment Ryan Shaw, University of North Carolina at Chapel Hill – Discourse Processing of Oral History Transcripts
10:45-11:00	Break
11:00-12:30	From Projects to Programs Josh Schneider, Stanford University Libraries – ePADD: Opening the World of Email Research through NLP Jeremy Gibson and Nitin Arora, North Carolina Department of Natural and Cultural Resources – Processing the Unprocessable, Accessing the Inaccessible: T.O.M.E.S, NLP, and Government Email Hugh Cayless, Duke University Libraries – Getting the Books to Talk to each Other: Projects at DC3 Carl Wilson, Open Preservation Foundation – Not Just Building Tools: Strategies for Sustaining Software and Associated Communities
12:30-1:30	Lunch
1:30-2:00	Kam Woods, University of North Carolina at Chapel Hill – BitCurator NLP Development and Plans
2:00-2:30	Generation of Breakout Topics
2:30-2:45	Break
2:45-3:30	Breakout Sessions
3:30-4:00	Reporting Back from Breakout Sessions
4:00-5:00	Wrap Up and Next Steps