About the project
This project aims to pioneer advancements in the efficiency of Generative AI Models (GenAI), focusing on achieving lower latencies and smaller model sizes without compromising performance.
As GenAI become increasingly central to a wide range of applications, from generating images to generating videos and music, their computational demand and the time required for training and inference have escalated.
This research seeks to address these challenges by developing innovative techniques for efficiency, including architectural innovations, compression strategies, algorithmic improvements, and system-level optimizations.
The goal of this project is to enable the deployment of state-of-the-art Generative AI models across broader scenarios of computing environments, from high-end servers to consumer-level machines.
You will contribute to making GenAI more democratic, efficient, and scalable, paving the way for their application in real-time and resource-constrained scenarios.
The overall objectives of your research will be:
- to develop cutting-edge techniques for model compression, such as pruning, quantization, and knowledge distillation, tailored for GenAI models
- to design and experiment with new GenAI architectures that are more efficient, requiring less computational power and memory
- to create new algorithms and system-wide optimizations to accelerate both training and inference processes for GenAI, making them more suitable for deployment across a variety of computing environments
- to develop and utilize benchmarks and metrics specifically designed to evaluate the efficiency and performance of GenAI under various computational constraints.