University of Southampton Electronics and Computer Science
Research
Professional Engagement
Publications
Interests


Research areas


Stuart's research interest is in Natural Language Processing (NLP), specifically information extraction and human-in-the-loop NLP. In a juxtaposition to NLP areas where web-scale datasets are available, his research focuses on developing novel solutions to problems where training datasets are small, evolving, sparse or fragmented in nature. This can involve both finding new ways to fine-tune Large Language Models (LLMs) and researching novel methods to get the most out of smaller models.

Natural Language Processing

Information Extraction - text/behaviour classification, few/zero shot learning, graph-based models, location extraction, IR-augmented QA, event/topic extraction, multi-modal tabular data extraction, argument mining.
Human-in-the-loop NLP - rationale-based learning, active learning, adversarial training, adversarial prompting, interactive sense making.

Domain expertise

Law enforcement, Defence, Mental Health, Environmental Science, Legal, Misinformation.

Examples of Impact and Outreach

Commercial : Innovate UK, Tackling challenges, building prosperity: The Industrial Strategy Challenge Fund, Orbital Witness: New technology spots key legal issues in real estate transactions, NLP research transfered to 2 person startup Orbital Witness helping them win £3.85 million venture capital during project lifetime, LPLP project 2021 Innovate UK link, PDF (see page 47)

Software : Middleton, S.E., geoparsing algorithm 'geoparsepy' is available open source from PyPI, averaging 1,500 downloads a month in 2021 [source pypistats.org] PyPI stats geoparsepy

Policy : Middleton, S.E., Invited AI Expert, UK Cabinet Office, London, Ministerial AI Roundtable: use of AI in policing, chaired by Policing Minister Nick Hurd, July 2019 FloraGuard outputs

Outreach : Cowell, C. Sajeva, M. Lavorgna, A. Middleton, S.E. Clarke, G. FloraGuard webinar, Royal Botanic Gardens, Kew, 2020, stakeholder analysis [314 registered, 170 attended live, 50 countries, major stakeholders such as DEFRA, WWF, US Dept of Justice, UN Office on Drugs and Crime (UNODC), European Commission and CITES] vimeo 1h 30mins duration

Steering Committees, Panels, Session Chair, Editorial Positions

Visiting Researcher - The Institute for Experimental AI, Northeastern University, USA 2024+
Deputy Director - UKRI MINDS Centre for Doctoral Training [internship/sponsorship lead 2022 to 2024, deputy director 2024+]
Board Member - Centre for Machine Intelligence (CMI) 2023+
Full Member - EPSRC Peer Review College 2021+
Invited Expert - Roundtable Discussion with Senior Civil Servants, ‘Exploring the Role of AI in the Armed Forces’, Kings College London, 2024
Organising Committee and Panel - RAI UK 2024 Workshop, Responsible AI for Mental Health
Organising Committee and Workshop Co-chair - AIUK 2024 Workshop: AI for Data Rescue
Invited Expert - TAS/RUSI workshop 2024 on 'Using AI in an Intelligence Context: Future Scenario Workshop'
Area Chair - ACL 2023
Organising Committee and Workshop Co-chair - AIUK 2023 & AI Fest 5 Workshop - AI and Defence: Readiness, Resilience and Mental Health
Turing Fellow - 2021 to 2023
Organising Committee and Workshop Co-chair - RUSI and UKRI TAS Hub conference, Trusting Machines? Cross-sector Lessons from Healthcare and Security 2021
Sector Leads Committee - UKRI Trustworthy Autonomous Systems (TAS) Hub - 2020 to 2024
Guest Editor - MDPI Sensors journal 2021 special issue 'Sensors Application on Early Warning System'
Session Chair - ECAI 2020
Steering Committee (chair) - ACM WebSci'20 Workshop 2020, Socio-technical AI systems for defence, cybercrime and cybersecurity
Session Chair - ACM WebSci 2020
Invited Expert - UK Cabinet Office Ministerial AI Roundtable event 2019 on 'use of AI in policing'
Invited Expert - ATI/DSTL workshop 2019 on 'Decision Support for Military Commanders'
Steering Committee - RGS-IBG Annual Conference 2018, Using New Forms of Data in Research Session Convenor
Steering Committee Short paper/demo Chair - IEEE International Conference on Intelligent Environments [IE] 2016 Posters & Short Paper Track Chair
Steering Committee - MediaEval Benchmarking Initiative for Multimedia Evaluation [MediaEval] 2016 Verifying Multimedia Use Task Committee
Invited expert - BBC South Today

Collaborations

The Institute for Experimental AI, Northeastern University

Centre for Machine Intelligence

Rebooting Democracy: Democratic Innovation for the Information Age

Centre for Democratic Futures

Research grants - data from 2015 (£739k UoS as PI; £8.1M UoS as CoI, £53.5M total project funding)

UKRI grants for Stuart E. Middleton

DR-Africa: Data Rescue Africa : a MetOffice funded Grant (PI £201k UoS £358k total). DRAfrica will explore NLP models for automated observation transcription in the context of downstream applications such as climate change modelling.

Development of Advanced Wing Solutions (DAWS2) : an AIT/InnovateUK funded Grant (CoI UoS £1.1M £42.1M total 10079510). Objectives include Large Language Model (LLM) based engineering digital assistant and co-pilot applications to support the next generation of aircraft wing design.

Exploring Fairness and Bias of Multimodal Natural Language Processing for Mental Health, an International Partnerships Project (CoI £64k UoS, RAI UK EP/Y009800/1). Partnership project between the University of Southampton and Northeastern University focused on the responsible use of AI in addressing mental health issues.

UKRI Centre for Doctoral Training in Machine Intelligence for Nano-electronic Devices and Systems (MINDS CDT) : an EPSRC funded Grant (CoI £6.1M UoS EP/S024298/1). The MINDS CDT operates as a centre of training excellence for the next generation of systems that employ Artificial Intelligence (AI) algorithms in low-cost/low-power device technologies (hardware-enabled AI).

ProTechThem : an ESRC funded project (CoI £757k UoS ES/V011278/1). ProTechThem will explore sharenting (parents sharing online information about minors). Motivation for sharenting and automated detection of risk behaviours online will be explored through online ethnography, criminological analysis and multi-lingual few-shot NLP algorithms to support improvement to cybersecurity behaviours.

SafeSpacesNLP : an UKRI TAS Hub funded project (PI 1.25 FTE, UKRI TASHub). Behaviour classification NLP in a socio-technical AI setting for online harmful behaviours for children and young people. Exploring human-in-the-loop and graph-based NLP models for behaviour classification of online forum posts.

GloSAT : a UK NERC platform grant (PI UoS £260k £3.3M total NE/S015604/1). Global Surface Air Temperature (GloSAT) aims to improve understanding of climate variability and change. Objectives include multi-modal NLP for information extraction and data rescue of climate change sensors data from historical texts.

Gendered body language and speech styles in UK Parliament using machine learning : a Interdisciplinary Research Pump-Priming Fund project (CoI). NLP, audio processing and computer vision will be combined with political science methodologies to explore how gender mediates body language and speech styles in Parliamentary debates.

Multimodal audio-textual argumentation mining of political debates : a Web Science Institute grant (CoI £13k UoS). Development of a multimodel dataset for training NLP models to perform argument mining of political debates.

CYShadowWatch a UK DSTL funded project (PI £116k UoS £133k total ACC2005442). Automated Multilingual Information Extraction for Online Cybercrime Sites. CYShadowWatch explored NLP methods of statistical machine translation and information extraction applied to online Russian cybercrime forums.

FloraGuard project : an UK ESRC funded project (CoI £240k UoS ES/R003254/1). FloraGuard examined and mapped from a multidisciplinary perspective the criminal market in endangered plants affecting the UK, exploring human-in-the-loop NLP for interactive sensemaking to support law enforcement. Quantitative evidence came from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; NLP & machine learning used to socio-economically map this activity at a community level.

Legal & Property Language Processing (LPLP) project : an Innovate UK funded project (PI £142k UoS 104875). LPLP developed cutting-edge NLP techniques to extract and analyse legal rights and obligations related to property and land. Objectives include the development of NLP algorithms to extract legal rights and obligations from Land Registry documents and the development of machine learning based legal risk models for property and land.

Intel-Analysis DSTL : a UK DSTL funded project (CoI £83k UoS ACC102157). Intel-Analysis DSTL used argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence was obtained in real-time from a combination of human intelligence reports and information extraction from social media via NLP.

REVEAL project : an EU funded FP7 project (CoI €688k UoS €6.5M total 610928). REVEAL advanced the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including NLP for digital text forensics, trust and credibility analytics and decision support for journalists verifying user generated content.

Digital Police Officer (DPO) project : a UK WSI funded project (PI). The DPO project aimed to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply NLP techniques guided by insights from criminology.

See the publications link for details of the above work.

Electronics and Computer Science