Dong Li

Dr. Dong Li is an associate professor at University of California, Merced. He is the director of Parallel Architecture, System, and Algorithm Lab (PASA) and a co-director of High Performance Computing Systems and Architecture Group at UC Merced. He is a co-founder and Chief Scientist of Yotta Labs Inc.. Previously (2011-2014), he was a research scientist at the Oak Ridge National Laboratory (ORNL). Before that, he earned a PhD in computer science from Virginia Tech. He is an associate editor for IEEE Transactions on Parallel and Distributed Systems (TPDS). He was the Director of NVIDIA GPU Research Center at Merced. He is the Director of the planning NSF IUCRC Center for Memory System Research (CEMSYS). Dong's research focuses on high performance computing (HPC), and maintains a strong relevance to computer systems (especially systems for large-scale AI/ML).

Check out our open-source project, Bloombee, for de-centralized AI inference and fine-tuning.

Recent Research Impacts

Our work on high-performance stall-free tensor offloading from GPU memory to CPU memory enables larger AI model training. This work is a collaborative work with UVa, Microsoft, Snowflake AI Research and Argonne. Check out this report from PyTorch about this work.
Our work on training large machine learning models using heterogeneous memory is integrated into Microsoft DeepSpeed. This is a collaborative work with Microsoft. This work has been widely reported by medias (Link1, Link2, Link3, and Link4 etc), and has been widely used in industry (e.g., NVIDIA, HP and Microsoft).
Our work on debugging persistent memory programs won the distinguished artifact award at ASPLOS'21, and has been integrated into Intel PMDK (Intel PMDK is the de facto development kit to program persistent memory).
Our work on accelerating power grid simulation using machine learning has been highlighted by the U.S. Department of Energy.
Our work on MPI fault tolerance benchmark suite and understanding natural error resilience in HPC applications are reported by HPCwire (Link1 and Link2). HPCwire is the #1 news and information resource covering HPC.

Awards

Amazon Research Award, 2025
The 2nd place in AWS Programming Contest in ASPLOS’25/EuroSys’25
Virginia Tech CS Early Career Alumni Award, 2023
Oracle Research Award, 2022
ASPLOS Distinguished Artifact Award, 2021
Facebook Faculty Research Award, 2021
Western Digital Faculty Award, 2021 and 2026
Berkeley Lab University Faculty Fellowship, 2016
NSF CAREER Award, 2016
NVIDIA GPU Research Center, 2016
SC best poster nomination (2.9% of all poster submissions), 2016
SC best student paper nomination, 2014
Oak Ridge National Lab (CSMD) Distinguished Contributor Award, 2013

Current Research Topics

Selected Recent Publications (a complete list of the publication can be found from here)

[ICS'26] Dong Xu, Han Meng, Xinyu Chen, Dengcheng Zhu, Wei Tang, Fei Liu, Liguang Xie, Wu Xiang, Rui Shi, Yue Li, Henry Hu, Hui Zhang, Jianping Jiang, and Dong Li. CXL-CCL: Inter-Node Collective GPU-Communication Using a CXL Shared Memory Pool. In 40th ACM International Conference on Supercomputing (ICS) (acceptance rate: )
[SC'25] Xi (Sherry) Wang, Bin Ma, Jongryool Kim, Byungil Koh, Hoshik Kim, and Dong Li. cMPI: Using CXL Memory Sharing for MPI One-Sided and Two-Sided Inter-Node Communications. In 37th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 21.2%).
[SC'25] Bin Ma, Victor Nikitin, Xi (Sherry) Wang, Tekin Bicer, and Dong Li. mLR: Scalable Laminography Reconstruction based on Memoization. In 37th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools (acceptance rate: 21.2%).
[HPCA'25] Bin Ma, Jie Ren, Shuangyan Yang, Benjamin Francis, Ehsan Ardestani, Min Si, and Dong Li. Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory. In 31st International Symposium on High-Performance Computer Architecture (acceptance rate: 21%).
[HPCA'25] Shuangyan Yang, Minjia Zhang, and Dong Li. Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization. In 31st International Symposium on High-Performance Computer Architecture (acceptance rate: 21%).
[SC'24] Dong Xu, Yuan Feng, Kwangsik Shin, Daewoo Kim, Hyeran Jeon, and Dong Li. "Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link". In 36th ACM/IEEE International Conference for High Performance Computing, Performance Measurement, Modeling and Tools, 2024 (acceptance rate: 22.7%).
[ATC'24] Dong Xu, Junhee Ryu, Jinho Baek, Kwangsik Shin, Pengfei Su, and Dong Li. "FlexMem: Adaptive Page Profiling and Migration for Tiered Memory". In USENIX ATC, 2024 (acceptance rate: 15.7%).
[EuroSys'24] Jie Ren, Dong Xu, Junhee Ryu, Kwangsik Shin, Daewoo Kim, and Dong Li. "Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems". In European Conference on Computer Systems, 2024 (acceptance rate: 15.9%).
[HPCA'24] Jie Ren, Dong Xu, Shuangyan Yang, Jiacheng Zhao, Zhicheng Li, Christian Navasca, Chenxi Wang, Harry Xu, and Dong Li. "Enabling Large Dynamic Neural Network Training with Learning-based Memory Managemen". In 30th International Symposium on High-Performance Computer Architecture, 2024. (acceptance rate: 18%)
[HPDC'23] Wenqian Dong, Gokcen Kestor, and Dong Li. "Auto-HPCnet: An Automatic Framework to Build Neural Network-based Surrogate for High-Performance Computing Applications". In 32nd International Symposium on High-Performance Parallel and Distributed Computing, 2023.
[PPoPP'23] Zhen Xie, Jie Liu, and Jiajia Li and Dong Li. "Merchandiser: Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications with Load-Balance Awareness". In 28th Principles and Practice of Parallel Programming, 2023
[ASPLOS'23] Shuangyan Yang, Minjia Zhang, Wenqian Dong, and Dong Li. "Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning". In 28th Architectural Support for Programming Languages and Operating Systems, 2023
[ATC'22] Xin He, Jianhua Sun, Hao Chen, and Dong Li. "Campo: A Cost-Aware and High-Performance Mixed Precision Optimizer for Neural Network Training". In USENIX Annual Technical Conference, 2022
[VLDB'21] Jie Liu, Wenqian Dong, Qingqing Zhou, and Dong Li. "Fauce: Fast and Accurate Deep Ensembles with Uncertainty for Cardinality Estimation". In 47th International Conference on Very Large Data Bases, 2021
[ATC'21] Jie Ren, Samyam Rajbhandari, Reza Yazdani Aminabadi, Olatunji Ruwase, Shuangyan Yang, Minjia Zhang, Dong Li and Yuxiong He. "ZeRO-Offload: Democratizing Billion-Scale Model Training". In USENIX Annual Technical Conference, 2021
[EuroSys'21] Zhen Xie, Wenqian Dong, Jiawen Liu, Hang Liu and Dong Li. "Tahoe: Tree Structure-Aware High Performance Inference Engine for Decision Tree Ensemble on GPU". In European Conference on Computer Systems, 2021
[FAST'21] Kai Wu, Jie Ren, Ivy Peng and Dong Li. "ArchTM: Architecture-Aware, High Performance Transaction for Persistent Memory". In the 19th USENIX Conference on File and Storage Technologies, 2021
[ASPLOS'21] Bang Di, Jiawen Liu, Hao Chen and Dong Li. "Fast, Flexible and Comprehensive Bug Detection for Persistent Memory Programs". In 26th Architectural Support for Programming Languages and Operating Systems 2021 (Distinguished Artifact Award)
[PPoPP'21] Jiawen Liu, Jie Ren, Roberto Gioiosa, Dong Li and Jiajia Li. "Sparta: High-Performance, Element-Wise SparseTensor Contraction on Heterogeneous Memory". In 26th Principles and Practice of Parallel Programming 2021
[HPCA'21] Jie Ren, Jiaolin Luo, Kai Wu, Minjia Zhang, Hyeran Jeon and Dong Li. "Sentinel: Efficient Tensor Migration and Allocation on Heterogeneous Memory Systems for Deep Learning". In 27th IEEE International Symposium on High-Performance Computer Architecture, 2021
[ICS'21] Zhen Xie, Wenqian Dong, Jie Liu, Ivy Peng, Yanbao Ma and Dong Li. "MD-HM: Memoization-based Molecular Dynamics Simulations on Big Memory System". In 35th International Conference on Supercomputing, 2021
[ICS'21] Xin He, Jiawen Liu, Zhen Xie, Hao Chen, Guoyang Chen, Weifeng Zhang and Dong Li. "Enabling Energy-Efficient DNN Training on Hybrid GPU-FPGA Accelerators". In 35th International Conference on Supercomputing, 2021
[ICS'21] Jie Ren, Jiaolin Luo, Ivy Peng, Kai Wu and Dong Li. "Optimizing Large-Scale Plasma Simulations on Persistent Memory-based Heterogeneous Memory with Effective Data Placement Across Memory Hierarchy". In 35th International Conference on Supercomputing, 2021
[ICS'21] Jiawen Liu, Dong Li, Roberto Gioiosa and Jiajia Li. "Athena: High-Performance Sparse Tensor Contraction Sequence on Heterogeneous Memory". In 35th International Conference on Supercomputing, 2021

Recent News

[4/2026] Our work on using the CXL memory for inter-node GPU collective-communication is accepted into ICS'26. This work is based on the collaboration with Xconn and Bytedance.
[9/2025] Thanks Amazon for Amazon Research Award! Looking forward to the research collaboration with AWS.
[6/2025] Our work on using the CXL memory for inter-node communication is accepted into SC'25. This work is based on the collaboration with SK Hynix.
[6/2025] Our work on using big memory and memoization to accelerate laminography reconstruction is accepted into SC'25. This work is based on the collaboration with Argonne National Lab (ANL).
<6/2025> Welcome new PhD student, Han Meng. :)
[4/2025] Our team won the 2nd place in ASPLOS'25 / EuroSys'25 Contest on an Optimized Neuron Kernel Interface (NKI) Implementation of Llama 3.2 1B (Inference).
[1/2025] Our work on the CXL memory evaluation (Performance Characterization of CXL Memory and Its Use Cases) has been accepted to IPDPS'25.
[12/2024] Thanks MICRON for their generous donation of CXL memory hardware!
[10/2024] Our collaborative work with Meta on using memory tiering for recommendation models has been accepted to HPCA'25. :)
[10/2024] Our collaborative work with Microsoft on using memory tiering for GNN has been accepted to HPCA'25.
[6/2024] Our paper on CXL memory ("Efficient Tensor Offloading for Large Deep-Learning Model Training based on Compute Express Link") is accepted by SC'24.
[6/2024] Many thanks to AMD for their hardware donation!
[4/2024] Our paper on tiered memory ("FlexMem: Adaptive Page Profiling and Migration for Tiered Memory") is accepted by USENIX ATC'24.
[4/2024] Multiple undergraduate/master students (from UC Merced, UMN, UIUC, and Wisconsin) will join the lab for summer internship. :)
[4/2024] Dong was invited to give a talk at the Empowering Software through Machine Learning (ESwML) workshop associated with EuroSys'24.
[3/2024] Dong was invited to join two panels in ExHET'24 and GPGPU'24 workshops associated with PPoPP'24.
[1/2024] Our paper on tiered memory ("Rethinking Memory Profiling and Migration for Multi-Tiered Large Memory Systems") is accepted by EuroSys'24.
[1/2024] Welcome new PhD student, Shangye Chen!
[10/2023] Our paper on dynamic neural network ("Enabling Large Dynamic Neural Network Training with Learning-based Memory Managemen") is accepted by HPCA'24.
[8/2023] The undergraduate student Elgin Li joined PASA to enrich his experiences in research. Welcome Elgin!
[6/2023] Our project on tensor network is funded by the NFS PPoSS program. Thanks to our collaborators at NCSU and Oregon State!
[6/2023] NSF funds us to establish an IUCRC center, the Center for Memory System Research (CEMSYS)! Looking forward to the future collaboration between UC Merced, UC Davis and industry partners.

Selected Services

PC member and chairmanship:PPoPP'27, ASPLOS'27, SC'26, ASPLOS'26, ICDCS'26, ICPP'26, SC'25, HPDC'25, ATC'25, PPoPP'25, ICS'25, HPDC'24, SC'24, ICS'24, ASPLOS'24 (artifact evaluation co-chair), ICS'23, ICML'23, ICDCS'23-24, CLOUD'23, ICS'22 (PC and online activity chair), SC'22, HPDC'22 (poster and travel grant chair), ICS'21, SC'21, ICML'21, HPDC'21 (poster co-chair), IPDPS'21, IPDPS'20 (primary PC), NeurIPS'20, HPDC'18-20 (travel chair), ICPP'19-20, SC'18, ASPLOS'18 (shadow PC), Cluster'18 (poster chair), IPDPS'17, NAS (2016-2017), SC'15, CCGrid (2012-2018 and 2023), Cluster (2015-2020), ISC (2013-2016), etc.
PC (co-) chair: IPDPS (AI/ML track, 2025), CCGrid (MLSys track, 2024), Cluster (data/storage track, 2023), HCA (2023), PASA (2012, 2013, and 2016), HP-PAC (2013 and 2014)
Editorial position: TPDS review board member; Co-editor, special Issue of the Journal of High Performance Computing Applications for the fourth International Workshop on Accelerators and Hybrid Exascale Systems
Steering committee: PASA (2014 and 2015)
Technical reviewers for major journals: IEEE Transaction on Parallel and Distributed Systems (TPDC), IEEE Transaction on Reliability (TR), IEEE Transaction on Computers (TC), Journal of Parallel and Distributed Computing (JPDC), Journal of Supercomputing, International Journal of High Performance Computing, etc.

Current Students

Dong Xu (PhD student, since 2021 Fall)
Bin Ma (PhD student, since 2023 Spring)
Xi (Sherry) Wang (PhD student, since 2023 Fall)
Dinghong Song (co-advised PhD student, since 2024 Fall)
Hexiao Duan (co-advised PhD student, since 2025 Spring)
Han Meng (PhD student, since 2025 Summer)

Alumni

Shuangyan Yang (PhD student, graduated in May 2025)
Jie Liu (PhD student, graduated in May 2024. First employment: Meta Research)
Jie Ren (PhD student, graduated in May 2022. First employment: assistant professor at William and Mary)
Wenqian Dong (PhD student, graduated in May 2022. Current employment: assistant professor at Oregon State University)
Jiawen Liu (PhD student, graduated in May 2022. First employment: Meta Research)
Zhen Xie (Postdoc, finished in May 2021. Current employment: assistant professor at Binghamton University)
Kai Wu (PhD student, graduated in May 2021. Current employment: Microsoft)
Luanzheng Guo (PhD student, graduated in Oct 2020. First employment: Pacific Northwest National Lab)
Shangye Chen (Master student, graduated in 2025. First employment: Modesto Junior College)
Jianbo Wu (Master student, graduated in 2025)
Neelam Sinha (Master student, graduated in 2020. First employment: National Cancer Institute)
Hanlin He (Master student, graduated in 2018. First employment: Byton)
Wei Liu (Master student, graduated in 2017. First employment: ctrip)
Himanshu Pillai (Master student, graduated in 2016. First employment: Barcelona Supercomputer Center)
Armando Montanez. (Undergraduate student, graduated in 2018, as the UC Merced outstanding student. Join Google)
Jing Liang (Undergraduate student, graduated in 2017. First employment: )
Nigel Tan (Undergraduate student, graduated in 2017. Join Rice as a PhD student)
Hanlin He (Undergraduate student, graduated in 2016, as the UC Merced outstanding student. Join PASA as a PhD student)
Zachary Canann (Undergraduate student, graduated in 2016, as the UC Merced outstanding student. First employment: PayPal)
Kevin Song (Undergraduate student, graduated in 2015, as the UC Merced outstanding student. Join UT Austin as a PhD student)

Research Sponsors: Our research is generously supported by:

National Science Foundation
Lawrence Livermore National Lab
Argonne National Lab
Lawrence Berkeley National Lab
SK Hynix
NVIDIA
Meta
AWS
Western Digital
Oracle
Intel (Equipment donation)
MICRON (Equipment donation)
Xilinx (Equipment donation)
AMD (Equipment donation)
University of California, Merced