Biography
I am Yeqi Huang, a PhD student specializing in AI-System at the Edinburgh-AISys Group. My research centers on enhancing system capabilities to support large-scale AI applications, encompassing training and inference processes. I enjoy engaging with various AI projects. However, AI nowadays are still not strong enough. My objective is to enhance the accessibility of AI for all individuals and to imbue AI with substantial utility in practical industrial applications.
This ambitious objective necessitates a methodical approach for its attainment. My prior research has centered on issues within High Performance Computing. So, I know well about the cutting edge problems in real sience and industry. To introduce AI into those field, we need serving larger AI models and we need to improve the performance of AI model's training and inference. In an attempt to address this issue, I endeavored to approach it from a more granular perspective by focusing on enhancing the system and compiler infrastructure. My recent research has centered on the latest generation of 2D Mesh Architecutre AI chips such as TPU and Cerebras.
BTW, I am an open-source developer. I love playing in Hackathons and shring my ideas on Github. And this is my blog: Personl Blog
Skills
TECHNICAL
HOBBIES
Interests
- Computer System
- Distributed Machine Learning
- Serverless System
Education
University of Edinburgh
University of Science and Technology of China
Recent Publications
MICA: An End-to-End Compiler Stack for Mesh Accelerators
Submitted to 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26) OSDI 2026 (Under Review)
SwarmX: A Scheduler Agent Framework for Large Agentic Workflow Clusters
Submitted to 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26) OSDI 2026 (Under Review)
ContextPilot: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
9th Conference on Machine Learning and Systems (MLSys) MLSys 2026
WaferLLM: Large Language Model Inference at Wafer Scale
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25) OSDI 2025
BioVLM: Evidence-Enriched Data Synthesis from Biomedical Papers
Submitted to ACL 2025 System Demonstration ACL 2025 Demo (Under Review)
LLM-Monitor: Efficient Privacy Violation Monitoring for LLMs
Submitted to ACL 2025 System Demonstration ACL 2025 Demo (Under Review)
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
Advances in Neural Information Processing Systems (NeurIPS) NeurIPS 2025
(OSDI 2024) ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models
Symplectic Structure-Preserving Particle-in-Cell Whole-Volume Simulation of Tokamak Plasmas
SC21: International Conference for High Performance Computing, Networking, Storage and Analysis SC 2021
Projects
BioVLM
Evidence-enriched data synthesis pipeline from biomedical papers, training BioVLM-8B that surpasses GPT-5.2 on LAB-Bench.
LLM-Monitor
Low-cost, evolvable privacy violation monitoring system for LLMs, 15x faster than GPT-5.2.
AI4Math Sketchpad
Grant-funded system for converting mathematical proofs into structured data for auto-formalization.
BTMR-Paper
Insanely Fast Paper Reading Tool - An AI-powered web application for extracting, analyzing, and summarizing academic papers
Bili-Investigate
A Streamlit-based web application for tracking Bilibili content creators' video updates with smart incremental fetching
YA-PapersWithCode
Yet Another Papers With Code - A modern recreation of the Papers With Code platform with AI-powered semantic search
Contact
yeqi.huang@ed.ac.uk
10 Crichton Street, Edinburgh, United Kingdom
Informatics Forum Room 1.43