Publications
WaferLLM: Large Language Model Inference at Wafer Scale
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25) OSDI 2025
2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
arXiv preprint arXiv
2024