Workshop: AI for Data Rescue

AI for Data Rescue Workshop

Registration

Online workshop registration is free via this eventbrite link. Once signed up for the workshop via eventbrite you will receive, via email the week before, a MS Teams link for the virtual session on March 26, 2024.

Scope

Data rescue is a critical activity for any research area that requires access to historical data that is accurate and has good spatial and temporal coverage. Historical data, especially data that pre-dates 1900s, is typically recorded using typed or handwritten notes and logbooks in a wide variety of formats. Over the years, or centuries, documents are lost or damaged and the contextual metadata around data recordings often becomes separated from the data recordings themselves. To avoid losing this data many institutions, such as the UK Met Office, British Antarctic Survey and British Library, have run digitization projects where pages of historical documents are scanned into a digital image format. However, this work has now generated a big challenge of how to extract, or rescue, data from archives of 100,000's of scanned document pages so it can be used for scientific research such as climate science and weather forecasting.

Excitingly, recent work in the areas of citizen science and AI, specifically image processing and natural language processing, are starting to effectively tackle this big challenge. One of the largest examples of citizen science is the 2020 [Rainfall Rescue] project involving the UK Met Office and University of Reading. This project developed a web platform for citizen science and was able to coordinate an incredible 16,000 volunteers over 16 days, manually transcribing over 66,000 pages of weather measurement data across a temporal span from 1677 to 1960. Recent examples of AI being used for data rescue includes the [Sphaera] project, using Large Language Models (LLMs) to extract pictures and diagrams from scanned historical texts, and the [GloSAT] project, using deep learning-based Region-based Convolutional Neural Networks (RCNN) models to detect table cells containing measurement data within scanned measurement logbooks.

This workshop seeks to bring together researchers and practitioners working on data rescue from both the AI and citizen science communities, with the aim of sharing experience gained and lessons learnt between these often separate disciplines. We will discuss the key challenges AI and citizen science face today in the context of data rescue and attempt to signpost the current directions of travel that might overcome these challenges in the short and medium term.

This online workshop is a AI UK-2024 fringe event. It is supported by the Natural Environment Research Council (grant NE/S015604/1) project GloSAT and the Centre for Machine Intelligence (CMI).

AIUK NERC CMI

This event is part of the AI UK Fringe 2024, a series of events exploring key topics around data science and AI. AI UK is the UK’s showcase of data science and AI from the Alan Turing Institute – find out more: #AIUK #AiEvents #AI #DataScience #Innovation

Participants

We invite potential participants from a wide range of areas as we expect the findings and wider AI challenges identified during this workshop will transfer well beyond just the area of data rescue.

Organising Committee

Dr Stuart E. Middleton, University of Southampton, UK
Prof Ed Hawkins MBE, NCAS, University of Reading, UK
Dr Gyanendro Loitongbam, University of Southampton, UK
Dr Praveen Teleti, NCAS, University of Reading, UK

Confirmed Speakers

Dr Gyanendro Loitongbam, University of Southampton, UK
Dr Praveen Teleti, University of Reading, UK
Prof Matteo Valleriani, Max Planck Institute for the History of Science, Germany
Dr Linden Ashcroft, University of Melbourne, AU
Dr Philip Brohan, UK Met Office, UK
Dr Gilbert Compo, U. of Colorado/CIRES & NOAA Affiliate, USA

Schedule

Timezone UK (GMT+0)

Welcome (10 mins) – 10:00

Dr Stuart E. Middleton
Prof Ed Hawkins MBE

Keynotes – 10:10 (chair Prof Ed Hawkins MBE)

Dr Linden Ashcroft - Weather Data Rescue in the Southern Hemisphere
Dr Praveen Teleti - The Value of Ships’ Logs and the Role of Citizen Science in their Recovery
Dr Philip Brohan - Using Amazon Textract and Google Vision for Data Rescue
Format is 15 mins keynote with 5 min Q&A each

Panel Discussion – 11:10

Format is 20 mins 'fireside chat' from keynote speakers, then 20 min open questions and discussion from the floor

Lunch break – 12:00

Keynotes – 13:00 (chair Dr Stuart Middleton)

Dr Gyanendro Loitongbam - Tabular Data Reconstruction and NLP Techniques for Data Rescue
Prof Matteo Valleriani - Datasets of Knowledge Atoms: AI for Data Rescue
Dr Gilbert Compo - The Need for Massive Data Rescue to Improve Weather Reconstructions
Format is 15 mins keynote with 5 min Q&A each

Panel Discussion – 14:00

Format is 20 mins 'fireside chat' from keynote speakers, then 20 min open questions and discussion from the floor

Closing remarks – 14:40

Dr Stuart E. Middleton
Prof Ed Hawkins MBE

Finish – 15:00

Contact

Questions about workshop and registration should be emailed to Stuart E. Middleton {sem03}@soton.ac.uk