苏州市嵌入式系统软件重点实验室成立于2008年,以“面向国家重大需求,构建具有海量数据处理能力的端-边-云高能效智能计算平台,服务于大数据、人工智能等领域的教学科研及企业技术创新”为建设目标。实验室的主要任务为:面向人工智能应用,围绕异构计算软硬件系统设计,从应用层、系统层、结构层研究性能和功耗问题,针对终端和云端计算环境,提供软硬件协同优化方法;以国产智能处理器为基础,研制与开发高能效智能计算平台,并在智能终端设备、智能服务器、以及云计算中心推广应用;建设高能效计算基础设施,形成产学研创新平台,服务于人工智能科技创新、人才培养和产业发展。
实验室总占地600余平方米。实验室在职人员20余人,在读研究生50余人,研究人员主要来源于中国科学技术大学。实验室重点研究系统级设计、优化问题,主要研究方向包括:面向特定领域的异构多核体系结构(含智能处理器、异构计算系统),嵌入式系统(含实时系统、可重构计算系统),以及并行与分布式系统(含高性能计算系统、云计算系统)。
主要研究人员:
序号 | 姓名 | 职称 | 主要研究方向 |
1 | 李曦 | 教授 | 实时操作系统、时间可预测系统 |
2 | 周学海 | 教授 | 异构多核体系结构、可重构系统、领域专用加速系统 |
3 | 陈华平 | 教授 | 高性能计算、并行与分布式计算 |
4 | 王超 | 副教授 | 智能计算机体系结构、FPGA硬件加速系统、神经网络处理器 |
5 | 朱宗卫 | 副研究员 | 边缘智能计算、移动终端操作系统 |
6 | 宫磊 | 副研究员 | FPGA硬件加速系统、神经网络处理器 |
7 | 王腾 | 副研究员 | 智能计算机体系结构、FPGA硬件加速系统、神经网络处理器 |
8 | 陈香兰 | 讲师 | 操作系统、时间可预测计算 |
9 | 李京 | 教授 | 组合软件技术、大型网络系统和分布式算法 |
10 | 许胤龙 | 教授 | 存储系统、数据处理、高性能计算 |
11 | 孙广中 | 教授 | 高性能计算与算法优化、大数据处理与应用 |
近期发表的主要论文:
[1]. Lei Gong, Chao Wang, Haojun Xia, Xianglan Chen, Xi Li, Xuehai Zhou: Enabling Fast and Memory-Efficient Acceleration for Pattern Matching Workloads: The Lightweight Automata Processing Engine. IEEE Trans. Computers 72(4): 1011-1025 (2023)
[2]. Wenqi Lou, Lei Gong, Chao Wang, Zidong Du, Xuehai Zhou: OctCNN: A High Throughput FPGA Accelerator for CNNs Using Octave Convolution Algorithm. IEEE Trans. Computers 71(8): 1847-1859 (2022)
[3]. Yuanbo Wen, Qi Guo, Zidong Du, Jianxing Xu, Zhenxing Zhang, Xing Hu, Wei Li, Rui Zhang, Chao Wang, Xuehai Zhou, Tianshi Chen: Enabling One-Size-Fits-All Compilation Optimization for Inference across Machine Learning Computers. IEEE Trans. Computers 71(9): 2313-2326 (2022)
[4]. Teng Wang, Lei Gong, Chao Wang, Yang Yang, Yingxue Gao, Xuehai Zhou, Huaping Chen: ViA: A Novel Vision-Transformer Accelerator Based on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11): 4088-4099 (2022)
[5]. Weihong Liu, Jiawei Geng, Zongwei Zhu, Jing Cao, Zirui Lian: Sniper: cloud-edge collaborative inference scheduling with neural network similarity modeling. DAC 2022: 505-510
[6]. Yuanbo Wen, Qi Guo, Qiang Fu, Xiaqing Li, Jianxing Xu, Yanlin Tang, Yongwei Zhao, Xing Hu, Zidong Du, Ling Li, Chao Wang, Xuehai Zhou, Yunji Chen: BabelTower: Learning to Auto-parallelized Program Translation. ICML 2022: 23685-23700
[7]. Chao Wang, Lei Gong, Fahui Jia, Xuehai Zhou: An FPGA Based Accelerator for Clustering Algorithms With Custom Instructions. IEEE Trans. Computers 70(5): 725-732 (2021)
[8]. Chao Wang, Lihui Jin, Lei Gong, Chongchong Xu, Yahui Hu, Luchao Tan, Xuehai Zhou: Tinker: A Middleware for Deploying Multiple NN-Based Applications on a Single Machine. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(7): 1495-1499 (2021)
[9]. Lei Gong, Chao Wang, Xi Li, Xuehai Zhou: Improving HW/SW Adaptability for Accelerating CNNs on FPGAs through a Dynamic/Static Co-Reconfiguration Approach. IEEE Trans. Parallel Distributed Syst. 32(7): 1854-1865 (2021)
[10]. Chao Wang, Lei Gong, Xi Li, Qi Yu, Aili Wang, Patrick Hung, Xuehai Zhou: SOLAR: Services-Oriented Deep Learning Architectures-Deep Learning as a Service. IEEE Trans. Serv. Comput. 14(1): 262-273 (2021)
[11]. Changlong Li, Hang Zhuang, Qingfeng Wang, Chao Wang, Xuehai Zhou: LKSM: Light Weight Key-Value Store for Efficient Application Services on Local Distributed Mobile Devices. IEEE Trans. Serv. Comput. 14(4): 1026-1039 (2021)
[12]. Jing Cao, Zirui Lian, Weihong Liu, Zongwei Zhu, Cheng Ji: HADFL: Heterogeneity-aware Decentralized Federated Learning Framework. DAC 2021: 1-6
[13]. Xi Zeng, Tian Zhi, Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Bingrui Wang, Yuanbo Wen, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, Ninghui Sun, Yunji Chen: Addressing Irregularity in Sparse Neural Networks through a Cooperative Software/Hardware Approach. IEEE Trans. Computers 69(7): 968-985 (2020)
[14]. Shengyuan Zhou, Qi Guo, Zidong Du, Dao-Fu Liu, Tianshi Chen, Ling Li, Shaoli Liu, Jinhong Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, Yunji Chen: ParaML: A Polyvalent Multicore Accelerator for Machine Learning. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(9): 1764-1777 (2020)
[15]. Chao Wang, Lei Gong, Xiang Ma, Xi Li, Xuehai Zhou: WooKong: A Ubiquitous Accelerator for Recommendation Algorithms with Custom Instruction Sets on FPGA. IEEE Trans. Computers 69(7): 1071-1082 (2020)
[16]. Xuan Wang, Chao Wang, Jing Cao, Lei Gong, Xuehai Zhou: WinoNN: Optimizing FPGA-Based Convolutional Neural Network Accelerators Using Sparse Winograd Algorithm. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(11): 4290-4302 (2020)
[17]. Chao Wang, Lei Gong, Xi Li, Xuehai Zhou: A Ubiquitous Machine Learning Accelerator with Automatic Parallelization on FPGA. IEEE Trans. Parallel Distributed Syst. 31(10): 2346-2359 (2020)
[18]. Lei Gong, Chao Wang, Xi Li, Huaping Chen, Xuehai Zhou: MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks with All Layers Mapped on Chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11): 2601-2612 (2018)
[19]. Xuda Zhou, Zidong Du, Qi Guo, Shaoli Liu, Chengsi Liu, Chao Wang, Xuehai Zhou, Ling Li, Tianshi Chen, Yunji Chen: Cambricon-S: Addressing Irregularity in Sparse Neural Networks through a Cooperative Software/Hardware Approach. MICRO 2018: 15-28
[20]. Chao Wang, Xi Li, Aili Wang, Xuehai Zhou: A Classroom Scheduling Service for Smart Classes. IEEE Trans. Serv. Comput. 10(2): 155-164 (2017)
[21]. Chao Wang, Xi Li, Yunji Chen, Youhui Zhang, Oliver Diessel, Xuehai Zhou: Service-Oriented Architecture on FPGA-Based MPSoC. IEEE Trans. Parallel Distributed Syst. 28(10): 2993-3006 (2017)
[22]. Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, Xuehai Zhou: DLAU: A Scalable Deep Learning Accelerator Unit on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3): 513-517 (2017)
[23]. Bo Wan, Xi Li, Haizhao Luo, Chao Wang, Xianglan Chen, Xuehai Zhou: Work-in-Progress: TTI: A Timing ISA for LET Model in Safety-Critical Systems. RTSS 2017: 363-365
[24]. Chao Wang, Junneng Zhang, Xi Li, Aili Wang, Xuehai Zhou: Hardware Implementation on FPGA for Task-Level Parallel Dataflow Execution Engine. IEEE Trans. Parallel Distributed Syst. 27(8): 2303-2315 (2016)
[25]. Chao Wang, Xi Li, Junneng Zhang, Peng Chen, Yunji Chen, Xuehai Zhou, Ray C. C. Cheung: Architecture Support for Task Out-of-Order Execution in MPSoCs. IEEE Trans. Computers 64(5): 1296-1310 (2015)
[26]. Shaoli Liu, Tianshi Chen, Ling Li, Xi Li, Mingzhe Zhang, Chao Wang, Haibo Meng, Xuehai Zhou, Yunji Chen: FreeRider: Non-Local Adaptive Network-on-Chip Routing with Packet-Carried Propagation of Congestion Information. IEEE Trans. Parallel Distributed Syst. 26(8): 2272-2285 (2015)
[27]. Dao-Fu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Temam, Xiaobing Feng, Xuehai Zhou, Yunji Chen: PuDianNao: A Polyvalent Machine Learning Accelerator. ASPLOS 2015: 369-381
[28]. Chao Wang, Xi Li, Junneng Zhang, Xuehai Zhou, Xiaoning Nie: MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs. ACM Trans. Archit. Code Optim. 10(2): 9:1-9:26 (2013)
高能效智能计算基础设施
实验室以国产智能芯片为基础,研发并建设了自主安全可控的高能效智能计算集群系统,设计并部署了超大容量的分布式存储系统,完成了多个典型示范应用系统的研发。
合作与服务 实验室对外提供合作与服务内容包括:
1、高能效人工智能云计算服务;
2、典型示范应用的研发与推广服务;
3、研究机构的合作研究与学术交流服务;
4、人工智能创新研发与成果转化合作服务。
联系方式
地址:江苏省苏州市工业园区仁爱路188号 邮编:215123
中国科学技术大学苏州高等研究院
嵌入式系统实验室 王 超 cswang@ustc.edu.cn 0512-62888062
高能效智能计算联合实验室 朱宗卫 zzw1988@ustc.edu.cn 0512-62886872