default search action
Shaohuai Shi
Person information
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c35]Hucheng Liu, Shaohuai Shi, Xuan Wang, Zoe L. Jiang, Qian Chen:
Performance Analysis and Optimizations of Matrix Multiplications on ARMv8 Processors. DATE 2024: 1-6 - [c34]Shaohuai Shi, Xinglin Pan, Qiang Wang, Chengjian Liu, Xiaozhe Ren, Zhongzhe Hu, Yu Yang, Bo Li, Xiaowen Chu:
ScheMoE: An Extensible Mixture-of-Experts Distributed Training System with Tasks Scheduling. EuroSys 2024: 236-249 - [c33]Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu:
FedImpro: Measuring and Improving Client Update in Federated Learning. ICLR 2024 - [c32]Jing Peng, Zihan Li, Shaohuai Shi, Bo Li:
Sparse Gradient Communication with AlltoAll for Accelerating Distributed Deep Learning. ICPP 2024: 148-157 - [c31]Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu:
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning. ICPP 2024: 866-875 - [c30]Xinglin Pan, Wenxiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, Bo Li:
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules. INFOCOM 2024: 1880-1889 - [c29]Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang:
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing. IWQoS 2024: 1-10 - [i41]Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xinmei Tian, Tongliang Liu, Bo Han, Xiaowen Chu:
FedImpro: Measuring and Improving Client Update in Federated Learning. CoRR abs/2402.07011 (2024) - [i40]Xinglin Pan, Wenxiang Lin, Shaohuai Shi, Xiaowen Chu, Weinong Sun, Bo Li:
Parm: Efficient Training of Large Sparsely-Activated Models with Dedicated Schedules. CoRR abs/2407.00599 (2024) - [i39]Yizhou Luo, Qiang Wang, Shaohuai Shi, Jiaxin Lai, Shuhan Qi, Jiajia Zhang, Xuan Wang:
Scheduling Deep Learning Jobs in Multi-Tenant GPU Clusters via Wise Resource Sharing. CoRR abs/2407.13088 (2024) - [i38]Zichen Tang, Junlin Huang, Rudan Yan, Yuxin Wang, Zhenheng Tang, Shaohuai Shi, Amelie Chi Zhou, Xiaowen Chu:
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning. CoRR abs/2408.14736 (2024) - 2023
- [j4]Lin Zhang, Shaohuai Shi, Wei Wang, Bo Li:
Scalable K-FAC Training for Deep Neural Networks With Distributed Preconditioning. IEEE Trans. Cloud Comput. 11(3): 2365-2378 (2023) - [j3]Zhenheng Tang, Shaohuai Shi, Bo Li, Xiaowen Chu:
GossipFL: A Decentralized Federated Learning Framework With Sparsified and Adaptive Communication. IEEE Trans. Parallel Distributed Syst. 34(3): 909-922 (2023) - [c28]Lin Zhang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li, Chengjian Liu:
DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining. ICDCS 2023: 142-153 - [c27]Lin Zhang, Longteng Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li:
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning. ICDCS 2023: 361-371 - [c26]Lin Zhang, Shaohuai Shi, Bo Li:
Eva: Practical Second-order Optimization with Kronecker-vectorized Approximation. ICLR 2023 - [c25]Shaohuai Shi, Xinglin Pan, Xiaowen Chu, Bo Li:
PipeMoE: Accelerating Mixture-of-Experts through Adaptive Pipelining. INFOCOM 2023: 1-10 - [c24]Lin Zhang, Shaohuai Shi, Bo Li:
Accelerating Distributed K-FAC with Efficient Collective Communication and Scheduling. INFOCOM 2023: 1-10 - [i37]Lin Zhang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li, Chengjian Liu:
Decoupling the All-Reduce Primitive for Accelerating Distributed Deep Learning. CoRR abs/2302.12445 (2023) - [i36]Zhenheng Tang, Xiaowen Chu, Ryan Yide Ran, Sunwoo Lee, Shaohuai Shi, Yonggang Zhang, Yuxin Wang, Alex Qiaozhong Liang, Salman Avestimehr, Chaoyang He:
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training. CoRR abs/2303.01778 (2023) - [i35]Lin Zhang, Longteng Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li:
Evaluation and Optimization of Gradient Compression for Distributed Deep Learning. CoRR abs/2306.08881 (2023) - [i34]Chen Qiu, Yulin Wu, Weixin Huang, Botao Liu, Shaohuai Shi, Xuan Wang:
A Generic Multi-Player Transformation Algorithm for Solving Large-Scale Zero-Sum Extensive-Form Adversarial Team Games. CoRR abs/2307.01441 (2023) - [i33]Lin Zhang, Shaohuai Shi, Bo Li:
Eva: A General Vectorized Approximation Framework for Second-order Optimization. CoRR abs/2308.02123 (2023) - [i32]Longteng Zhang, Lin Zhang, Shaohuai Shi, Xiaowen Chu, Bo Li:
LoRA-FA: Memory-efficient Low-rank Adaptation for Large Language Models Fine-tuning. CoRR abs/2308.03303 (2023) - [i31]Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu:
FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs. CoRR abs/2309.01172 (2023) - [i30]Yuxin Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Xinglin Pan, Yang Zheng, Xiaoyu Wu, Amelie Chi Zhou, Bingsheng He, Xiaowen Chu:
Reliable and Efficient In-Memory Fault Tolerance of Large Language Model Pretraining. CoRR abs/2310.12670 (2023) - [i29]Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu:
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models. CoRR abs/2311.03687 (2023) - 2022
- [c23]Qiang Wang, Shaohuai Shi, Kaiyong Zhao, Xiaowen Chu:
EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching. ECCV (32) 2022: 437-453 - [c22]Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu:
Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. ICML 2022: 21111-21132 - [i28]Yang Xiang, Zhihua Wu, Weibao Gong, Siyu Ding, Xianjie Mo, Yuang Liu, Shuohuan Wang, Peng Liu, Yongshuai Hou, Long Li, Bin Wang, Shaohuai Shi, Yaqian Han, Yue Yu, Ge Li, Yu Sun, Yanjun Ma, Dianhai Yu:
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters. CoRR abs/2205.09470 (2022) - [i27]Zhenheng Tang, Yonggang Zhang, Shaohuai Shi, Xin He, Bo Han, Xiaowen Chu:
Virtual Homogeneity Learning: Defending against Data Heterogeneity in Federated Learning. CoRR abs/2206.02465 (2022) - [i26]Lin Zhang, Shaohuai Shi, Wei Wang, Bo Li:
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning. CoRR abs/2206.15143 (2022) - [i25]Qiang Wang, Shaohuai Shi, Kaiyong Zhao, Xiaowen Chu:
EASNet: Searching Elastic and Accurate Network Architecture for Stereo Matching. CoRR abs/2207.09796 (2022) - [i24]Shaohuai Shi, Qing Yang, Yang Xiang, Shuhan Qi, Xuan Wang:
An Efficient Split Fine-tuning Framework for Edge and Cloud Collaborative Learning. CoRR abs/2211.16703 (2022) - 2021
- [j2]Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li:
A Quantitative Survey of Communication Optimizations in Distributed Deep Learning. IEEE Netw. 35(3): 230-237 (2021) - [j1]Shaohuai Shi, Xiaowen Chu, Bo Li:
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning. IEEE Trans. Parallel Distributed Syst. 32(8): 1903-1917 (2021) - [c21]Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiangping Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, Guiguang Ding:
Automated Model Design and Benchmarking of Deep Learning Models for COVID-19 Detection with Chest CT Scans. AAAI 2021: 4821-4829 - [c20]Shaohuai Shi, Lin Zhang, Bo Li:
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks. ICDCS 2021: 550-560 - [c19]Shaohuai Shi, Xiaowen Chu, Bo Li:
Exploiting Simultaneous Communications to Accelerate Data Parallel Distributed Deep Learning. INFOCOM 2021: 1-10 - [c18]Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu:
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters. MLSys 2021 - [i23]Xin He, Shihao Wang, Xiaowen Chu, Shaohuai Shi, Jiangping Tang, Xin Liu, Chenggang Yan, Jiyong Zhang, Guiguang Ding:
Automated Model Design and Benchmarking of 3D Deep Learning Models for COVID-19 Detection with Chest CT Scans. CoRR abs/2101.05442 (2021) - [i22]Shaohuai Shi, Lin Zhang, Bo Li:
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks. CoRR abs/2107.06533 (2021) - [i21]Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu:
FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks. CoRR abs/2110.02582 (2021) - 2020
- [c17]Yuxin Wang, Qiang Wang, Shaohuai Shi, Xin He, Zhenheng Tang, Kaiyong Zhao, Xiaowen Chu:
Benchmarking the Performance and Energy Efficiency of AI Accelerators for AI Training. CCGRID 2020: 744-751 - [c16]Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu:
Layer-Wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees. ECAI 2020: 1467-1474 - [c15]Zhenheng Tang, Shaohuai Shi, Xiaowen Chu:
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection. ICDCS 2020: 1207-1208 - [c14]Shaohuai Shi, Qiang Wang, Xiaowen Chu:
Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format. ICPADS 2020: 19-26 - [c13]Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu:
FADNet: A Fast and Accurate Network for Disparity Estimation. ICRA 2020: 101-107 - [c12]Shaohuai Shi, Qiang Wang, Xiaowen Chu, Bo Li, Yang Qin, Ruihao Liu, Xinxiao Zhao:
Communication-Efficient Distributed Deep Learning with Merged Gradient Sparsification on GPUs. INFOCOM 2020: 406-415 - [i20]Zhenheng Tang, Shaohuai Shi, Xiaowen Chu:
Communication-Efficient Decentralized Learning with Sparsification and Adaptive Peer Selection. CoRR abs/2002.09692 (2020) - [i19]Qiang Wang, Shaohuai Shi, Canhui Wang, Xiaowen Chu:
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs. CoRR abs/2002.10105 (2020) - [i18]Zhenheng Tang, Shaohuai Shi, Xiaowen Chu, Wei Wang, Bo Li:
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey. CoRR abs/2003.06307 (2020) - [i17]Qiang Wang, Shaohuai Shi, Shizhen Zheng, Kaiyong Zhao, Xiaowen Chu:
FADNet: A Fast and Accurate Network for Disparity Estimation. CoRR abs/2003.10758 (2020) - [i16]Shaohuai Shi, Zhenheng Tang, Xiaowen Chu, Chengjian Liu, Wei Wang, Bo Li:
Communication-Efficient Distributed Deep Learning: Survey, Evaluation, and Challenges. CoRR abs/2005.13247 (2020) - [i15]Shaohuai Shi, Qiang Wang, Xiaowen Chu:
Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format. CoRR abs/2005.14469 (2020) - [i14]Shaohuai Shi, Xianhao Zhou, Shutao Song, Xingyao Wang, Zilin Zhu, Xue Huang, Xinan Jiang, Feihu Zhou, Zhenyu Guo, Liqiang Xie, Rui Lan, Xianbin Ouyang, Yan Zhang, Jieqian Wei, Jing Gong, Weiliang Lin, Ping Gao, Peng Meng, Xiaomin Xu, Chenyang Guo, Bo Yang, Zhibo Chen, Yongjian Wu, Xiaowen Chu:
Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters. CoRR abs/2010.10458 (2020)
2010 – 2019
- 2019
- [c11]Xin He, Xiaoming Liu, Zhili Wu, Wu Yu, Xiaowen Chu, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang:
Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models. IEEE BigData 2019: 4839-4844 - [c10]Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, Xiaowen Chu:
A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks. ICDCS 2019: 2238-2247 - [c9]Shaohuai Shi, Kaiyong Zhao, Qiang Wang, Zhenheng Tang, Xiaowen Chu:
A Convergence Analysis of Distributed SGD with Communication-Efficient Gradient Sparsification. IJCAI 2019: 3411-3417 - [c8]Shaohuai Shi, Xiaowen Chu, Bo Li:
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms. INFOCOM 2019: 172-180 - [i13]Shaohuai Shi, Qiang Wang, Kaiyong Zhao, Zhenheng Tang, Yuxin Wang, Xiang Huang, Xiaowen Chu:
A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks. CoRR abs/1901.04359 (2019) - [i12]Xin He, Shihao Wang, Shaohuai Shi, Zhenheng Tang, Yuxin Wang, Zhihao Zhao, Jing Dai, Ronghao Ni, Xiaofeng Zhang, Xiaoming Liu, Zhili Wu, Wu Yu, Xiaowen Chu:
Computer-Aided Clinical Skin Disease Diagnosis Using CNN and Object Detection Models. CoRR abs/1911.08705 (2019) - [i11]Shaohuai Shi, Zhenheng Tang, Qiang Wang, Kaiyong Zhao, Xiaowen Chu:
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees. CoRR abs/1911.08727 (2019) - [i10]Shaohuai Shi, Xiaowen Chu, Ka Chun Cheung, Simon See:
Understanding Top-k Sparsification in Distributed Deep Learning. CoRR abs/1911.08772 (2019) - [i9]Shaohuai Shi, Xiaowen Chu, Bo Li:
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning. CoRR abs/1912.09268 (2019) - 2018
- [c7]Shaohuai Shi, Qiang Wang, Xiaowen Chu:
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. DASC/PiCom/DataCom/CyberSciTech 2018: 949-957 - [c6]Shaohuai Shi, Qiang Wang, Xiaowen Chu, Bo Li:
A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning. ICPADS 2018: 425-432 - [i8]Shaohuai Shi, Qiang Wang, Xiaowen Chu, Bo Li:
Modeling and Evaluation of Synchronous Stochastic Gradient Descent in Distributed Deep Learning on Multiple GPUs. CoRR abs/1805.03812 (2018) - [i7]Xianyan Jia, Shutao Song, Wei He, Yangzihao Wang, Haidong Rong, Feihu Zhou, Liqiang Xie, Zhenyu Guo, Yuanzhou Yang, Liwei Yu, Tiegang Chen, Guangxiao Hu, Shaohuai Shi, Xiaowen Chu:
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. CoRR abs/1807.11205 (2018) - [i6]Shaohuai Shi, Xiaowen Chu:
MG-WFBP: Efficient Data Communication for Distributed Synchronous SGD Algorithms. CoRR abs/1811.11141 (2018) - 2017
- [c5]Pengfei Xu, Shaohuai Shi, Xiaowen Chu:
Performance Evaluation of Deep Learning Tools in Docker Containers. BigCom 2017: 395-403 - [c4]Shaohuai Shi, Pengfei Xu, Xiaowen Chu:
Supervised Learning Based Algorithm Selection for Deep Neural Networks. ICPADS 2017: 344-351 - [i5]Shaohuai Shi, Pengfei Xu, Xiaowen Chu:
Improving the Performance of Fully Connected Neural Networks by Out-of-Place Matrix Transpose. CoRR abs/1702.03192 (2017) - [i4]Shaohuai Shi, Xiaowen Chu:
Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units. CoRR abs/1704.07724 (2017) - [i3]Pengfei Xu, Shaohuai Shi, Xiaowen Chu:
Performance Evaluation of Deep Learning Tools in Docker Containers. CoRR abs/1711.03386 (2017) - [i2]Shaohuai Shi, Xiaowen Chu:
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs. CoRR abs/1711.05979 (2017) - 2016
- [c3]Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu:
Benchmarking State-of-the-Art Deep Learning Software Tools. CCBD 2016: 99-104 - [i1]Shaohuai Shi, Qiang Wang, Pengfei Xu, Xiaowen Chu:
Benchmarking State-of-the-Art Deep Learning Software Tools. CoRR abs/1608.07249 (2016) - 2011
- [c2]Shuhan Qi, Xuan Wang, Shaohuai Shi:
Mixed Precision Method for GPU-based FFT. CSE 2011: 580-586 - 2010
- [c1]Jiang-Feng Peng, Hu Chen, Shaohuai Shi:
The GPU-based String Matching System in Advanced AC Algorithm. CIT 2010: 1158-1163
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-15 00:20 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint