Offensive Language In Catalogue Legacy Descriptions

A workshop on using corpus linguistics technique to identify problematic and offensive language in archive catalogue legacy descriptions.

This event will present the results from a three-month National Archive-funded Testbed project aiming to develop methods to automatically identify bias and offensive language in legacy archive descriptions.

Anecdotal evidence suggests that offensive language and bias exist in legacy descriptions. In light of conversations taking place within the heritage sector on appropriate language in catalogue descriptions, the project has built a proof-of-concept prototype methodology from work conducted upon descriptions from Special Collections in Leeds.

Using corpus analysis of sets of legacy descriptions, we have developed computational methods that will aid the discovery of bias and offensive language for revision purposes.

Along with presenting the methods developed by the project, we will outline broader discussions on descriptions that has informed our work. We will also present work from a data collection exercise conducted during the project that sought to gather experiences with legacy descriptions from archive professionals and researchers.

We will outline how the methods developed in Leeds could have wider application in revising legacy descriptions from the National Archives and other major collections. will outline how the methods developed in Leeds could have wider application in revising legacy descriptions from the National Archives and other major collections to mitigate harm caused by offensive and discriminatory language to users and archive staff.

The event is being organised by the University of Leeds Special Collections Team and is being hosted by the Leeds Arts and Humanities Research Institute

Register to attend now via Eventbrite