Publications

(2025). WaferLLM: Large Language Model Inference at Wafer Scale. OSDI 2025.

Code OSDI 2025 GitHub

(2024). MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems. arXiv.

PDF DOI ArXiv

(2024). (OSDI 2024) ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models.

PDF DOI OSDI