SwarmX: A Scheduler Agent Framework for Large Agentic Workflow Clusters

Yeqi Huang, Yanwei Ye, Guomin Chen, Wenhao Su, Bin Gong, Jialian Li, Yao Fu, Yinsicheng Jiang, Xuan Sun, Le Xu, Luo Mai

Submitted to 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26) OSDI 2026 (Under Review)

2026

MICA: An Efficient Compiler for Mesh-Based AI Accelerators

Yeqi Huang, Congjie He, Haocheng Xiao, Yanwei Ye, Yi-Chieh Wang, Boyao Song, Ziming Miao, Lingxiao Ma, Fan Yang, Luo Mai

Submitted to 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 26) OSDI 2026 (Under Review)

2026

RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse

Yinsicheng Jiang, Yeqi Huang, Liang Cheng, Cheng Deng, Xuan Sun, Luo Mai

Submitted to 8th Conference on Machine Learning and Systems (MLSys 2026) MLSys 2026 (Under Review)

2025

WaferLLM: Large Language Model Inference at Wafer Scale

Congjie He, Yeqi Huang, Pei Mu, Ziming Miao, Jilong Xue, Lingxiao Ma, Fan Yang, Luo Mai

17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25) OSDI 2025

2025

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Yinsicheng Jiang, Yao Fu, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, Jilong Xue, Li Dong, Ziming Miao, Dayou Du, Tairan Xu, Kai Zou, Edoardo Ponti, Luo Mai

arXiv preprint arXiv

2024

(OSDI 2024) ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models

Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii Ustiugov, Yuvraj Patel, Luo Mai

2024