Quantization effects on training and inference in Large Language Models

About the project

This project aims to systematically investigate the effects of various quantization methods on the training and inference of Large Language Models (LLMs).

LLMs have revolutionized natural language processing by achieving unprecedented levels of understanding and generation capabilities. However, their substantial computational and memory demands pose significant challenges in resource-constrained environments. Quantization techniques, which reduce the precision of model parameters and activations, offer a promising avenue to enhance computational efficiency.

Large Language Models (LLMs) have revolutionized natural language processing by achieving unprecedented levels of understanding and generation capabilities. However, their substantial computational and memory demands pose significant challenges for training and deployment, especially in resource-constrained environments. Quantization techniques, which reduce the precision of model parameters and activations, offer a promising avenue to enhance computational efficiency and reduce memory footprint. Despite their potential, the impact of quantization on the training dynamics and inference performance of LLMs is not fully understood.

We will explore uniform and non-uniform quantization schemes, analyze their influence on model convergence during training by differential testing and verification, and assess the trade-offs between computational efficiency and performance metrics such as accuracy and generalization by hyper-parameter optimization. By conducting extensive experiments across different model architectures and datasets, we seek to identify optimal quantization strategies that minimize performance degradation while maximizing efficiency gains.

The outcomes of this research will provide valuable insights into the practical implementation of guaranteed quantization in LLMs, guiding both academia and industry in developing more efficient models. This will facilitate the broader deployment of advanced language technologies, making them accessible and sustainable across diverse computational platforms.

Potential supervisors

Lead supervisor

Dr Chao Huang

Associate Professor

Research interests

Reinforcement Learning
Formal Methods
Design Automation for Cyber Physical Systems

Entry requirements

You must have a UK 2:1 honours degree, or its international equivalent.

You need to have:

a strong background in software engineering, especially software testing
a good knowledge of reinforcement learning, e.g., Q learning, policy gradient, actor-critic
a background in statistics (desirable)

Fees and funding

We offer a range of funding opportunities for both UK and international students. Horizon Europe fee waivers automatically cover the difference between overseas and UK fees for qualifying students.

Competition-based Presidential Bursaries from the University cover the difference between overseas and UK fees for top-ranked applicants.

Competition-based studentships offered by our schools typically cover UK-level tuition fees and a stipend for living costs (minimum of £19,237 in 2024-25) for top-ranked applicants.

Funding will be awarded on a rolling basis, so apply early for the best opportunity to be considered.

How to apply

Apply now

You need to:

choose programme type (Research), 2025/26, Faculty of Engineering and Physical Sciences
select Full time or Part time
choose the relevant PhD in Computer Science
add name of the supervisor Dr. Chao Huang in section 2 of the application form

Applications should include:

personal statement
your CV (resumé)
2 academic references
degree transcripts to date

Contact us

Faculty of Engineering and Physical Sciences

If you have a general question, email our doctoral college (feps-pgr-apply@soton.ac.uk).

Project leader

For an initial conversation, please email Dr. Chao Huang: chao.huang@soton.ac.uk

Postgraduate research project