Publications
RAGBoost: Efficient Retrieval-Augmented Generation with Accuracy-Preserving Context Reuse
Submitted to 8th Conference on Machine Learning and Systems (MLSys 2026) MLSys 2026 (Under Review)
2025
WaferLLM: Large Language Model Inference at Wafer Scale
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI 25) OSDI 2025
2025
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
arXiv preprint arXiv
2024