Research Professional Engagement Publications Interests |
Research areasStuart's research interest is in Natural Language Processing (NLP), specifically information extraction and human-in-the-loop NLP. In a juxtaposition to NLP areas where web-scale datasets are available, his research focuses on developing novel solutions to problems where training datasets are small, evolving, sparse or fragmented in nature. This can involve both finding new ways to fine-tune Large Language Models (LLMs) and researching novel methods to get the most out of smaller models. Natural Language Processing Information Extraction - text/behaviour classification, few/zero shot learning, graph-based models, location extraction, IR-augmented QA, event/topic extraction, multi-modal tabular data extraction, argument mining. Human-in-the-loop NLP - rationale-based learning, active learning, adversarial training, adversarial prompting, interactive sense making. Domain expertise Law enforcement, Defence, Mental Health, Environmental Science, Legal, Misinformation. Examples of Impact and OutreachCommercial : Innovate UK, Tackling challenges, building prosperity: The Industrial Strategy Challenge Fund, Orbital Witness: New technology spots key legal issues in real estate transactions, NLP research transfered to 2 person startup Orbital Witness helping them win £3.85 million venture capital during project lifetime, LPLP project 2021 Innovate UK link, PDF (see page 47) Software : Middleton, S.E., geoparsing algorithm 'geoparsepy' is available open source from PyPI, averaging 1,500 downloads a month in 2021 [source pypistats.org] PyPI stats geoparsepy Policy : Middleton, S.E., Invited AI Expert, UK Cabinet Office, London, Ministerial AI Roundtable: use of AI in policing, chaired by Policing Minister Nick Hurd, July 2019 FloraGuard outputs Outreach : Cowell, C. Sajeva, M. Lavorgna, A. Middleton, S.E. Clarke, G. FloraGuard webinar, Royal Botanic Gardens, Kew, 2020, stakeholder analysis [314 registered, 170 attended live, 50 countries, major stakeholders such as DEFRA, WWF, US Dept of Justice, UN Office on Drugs and Crime (UNODC), European Commission and CITES] vimeo 1h 30mins duration Steering Committees, Panels, Session Chair, Editorial Positions
Visiting Researcher - The Institute for Experimental AI, Northeastern University, USA 2024+ CollaborationsThe Institute for Experimental AI, Northeastern University Centre for Machine Intelligence Rebooting Democracy: Democratic Innovation for the Information Age Research grants - data from 2015 (£739k UoS as PI; £8.1M UoS as CoI, £53.5M total project funding)UKRI grants for Stuart E. Middleton DR-Africa: Data Rescue Africa : a MetOffice funded Grant (PI £201k UoS £358k total). DRAfrica will explore NLP models for automated observation transcription in the context of downstream applications such as climate change modelling. Development of Advanced Wing Solutions (DAWS2) : an AIT/InnovateUK funded Grant (CoI UoS £1.1M £42.1M total 10079510). Objectives include Large Language Model (LLM) based engineering digital assistant and co-pilot applications to support the next generation of aircraft wing design. Exploring Fairness and Bias of Multimodal Natural Language Processing for Mental Health, an International Partnerships Project (CoI £64k UoS, RAI UK EP/Y009800/1). Partnership project between the University of Southampton and Northeastern University focused on the responsible use of AI in addressing mental health issues. UKRI Centre for Doctoral Training in Machine Intelligence for Nano-electronic Devices and Systems (MINDS CDT) : an EPSRC funded Grant (CoI £6.1M UoS EP/S024298/1). The MINDS CDT operates as a centre of training excellence for the next generation of systems that employ Artificial Intelligence (AI) algorithms in low-cost/low-power device technologies (hardware-enabled AI). ProTechThem : an ESRC funded project (CoI £757k UoS ES/V011278/1). ProTechThem will explore sharenting (parents sharing online information about minors). Motivation for sharenting and automated detection of risk behaviours online will be explored through online ethnography, criminological analysis and multi-lingual few-shot NLP algorithms to support improvement to cybersecurity behaviours. SafeSpacesNLP : an UKRI TAS Hub funded project (PI 1.25 FTE, UKRI TASHub). Behaviour classification NLP in a socio-technical AI setting for online harmful behaviours for children and young people. Exploring human-in-the-loop and graph-based NLP models for behaviour classification of online forum posts. GloSAT : a UK NERC platform grant (PI UoS £260k £3.3M total NE/S015604/1). Global Surface Air Temperature (GloSAT) aims to improve understanding of climate variability and change. Objectives include multi-modal NLP for information extraction and data rescue of climate change sensors data from historical texts. Gendered body language and speech styles in UK Parliament using machine learning : a Interdisciplinary Research Pump-Priming Fund project (CoI). NLP, audio processing and computer vision will be combined with political science methodologies to explore how gender mediates body language and speech styles in Parliamentary debates. Multimodal audio-textual argumentation mining of political debates : a Web Science Institute grant (CoI £13k UoS). Development of a multimodel dataset for training NLP models to perform argument mining of political debates. CYShadowWatch a UK DSTL funded project (PI £116k UoS £133k total ACC2005442). Automated Multilingual Information Extraction for Online Cybercrime Sites. CYShadowWatch explored NLP methods of statistical machine translation and information extraction applied to online Russian cybercrime forums. FloraGuard project : an UK ESRC funded project (CoI £240k UoS ES/R003254/1). FloraGuard examined and mapped from a multidisciplinary perspective the criminal market in endangered plants affecting the UK, exploring human-in-the-loop NLP for interactive sensemaking to support law enforcement. Quantitative evidence came from a combination of surface (web forums, social media) and dark web (TOR forums) crawling of cyber-criminal activity; NLP & machine learning used to socio-economically map this activity at a community level. Legal & Property Language Processing (LPLP) project : an Innovate UK funded project (PI £142k UoS 104875). LPLP developed cutting-edge NLP techniques to extract and analyse legal rights and obligations related to property and land. Objectives include the development of NLP algorithms to extract legal rights and obligations from Land Registry documents and the development of machine learning based legal risk models for property and land. Intel-Analysis DSTL : a UK DSTL funded project (CoI £83k UoS ACC102157). Intel-Analysis DSTL used argumentation schemes and evidential reasoning to support teams of analysts trying to evaluate conflicting hypotheses during real-time events. Evidence was obtained in real-time from a combination of human intelligence reports and information extraction from social media via NLP. REVEAL project : an EU funded FP7 project (CoI €688k UoS €6.5M total 610928). REVEAL advanced the necessary technologies for making a higher level analysis of social media possible. Focussed on social media verification, including NLP for digital text forensics, trust and credibility analytics and decision support for journalists verifying user generated content. Digital Police Officer (DPO) project : a UK WSI funded project (PI). The DPO project aimed to apply linguistic analysis to identify cyber criminals operating under pseudonyms on different online forums and within the same forum. The project will apply NLP techniques guided by insights from criminology. See the publications link for details of the above work. |
Electronics and Computer Science |