Dr. Yanzhi Wang
Assistant Professor, 2015-present
329 Dana, 360 Huntington Avenue
Boston, MA 02115
Yanzhi Wang is currently an assistant professor in the Department of Electrical and Computer Engineering, and Khoury College of Computer Science (Affiliated) at Northeastern University. He has received his Ph.D. Degree in Computer Engineering from University of Southern California (USC) in 2014, under the supervision of Prof. Massoud Pedram. He received the Ming Hsieh Scholar Award (the highest honor in the EE Dept. of USC) for his Ph.D. study. He received his B.S. Degree in Electronic Engineering from Tsinghua University in 2009 with distinction from both the university and Beijing city.
Dr. Wang’s current research interests are the following. His group works on both algorithms and actual implementations (mobile and embeded systems, FPGAs, circuit tapeouts, GPUs, emerging devices, and UAVs).
- Real-time and energy-efficient deep learning and artificial intelligence systems
- Model compression of deep neural networks (DNNs)
- Neuromorphic computing and non-von Neumann computing paradigms
- Cyber-security in deep learning systems
For a brief list of technical achievements, his research (i) achieves and maintains the highest model compression rates on representative DNNs since 09/2018 (ECCV18, ASPLOS19, ICCV19, ISLPED19, ASP-DAC20, AAAI20-1, AAAI20-2, etc.), (ii) achieves, for the first time, real-time and fastest execution of representative large-scale DNNs on a mobile device (ASPLOS20, AAAI20, ICML19), (iii) achieves the highest performance/energy efficiency in DNN implementations on many platforms (FPGA19, ISLPED19 , AAAI19, HPCA19, ISSCC19, ASP-DAC20 , DATE20, AAAI20, PLDI20, ICS20, IJCAI20). It is worth mentioning that his work on AQFP superconducting based DNN inference acceleration, which is validated through cryogenic testing, has by far the highest energy efficiency among all hardware devices (ISCA19, ICCAD18).
His research works have been published broadly in top conference and journal venues, ranging from (i) EDA, solid-state circuit and system conferences such as DAC, ICCAD, DATE, ISLPED, FPGA, LCTES, ISSCC, etc., (ii) architecture and computer system conferences such as ASPLOS, ISCA, MICRO, HPCA, CCS, VLDB, PLDI, ICS, INFOCOM, ICDCS, etc., (iii) machine learning algorithm conferences such as AAAI, CVPR, ICML, ICCV, ICLR, IJCAI, ECCV, ACM MM, ICDM, etc., and (iv) IEEE and ACM transactions with impact factor up to 11.7. He ranks No. 2 in CSRankings at Northeastern University in the past 10 years, and around No. 35 throughout the U.S. His research works have been cited for around 6,600 times according to Google Scholar with H-index 36. He has received four Best Paper Awards, has another ten Best Paper Nominations and four Popular Papers in IEEE TCAD. Besides, he has received Massachusetts Acorn Innovation Award, Google Equipment Research Award, MathWorks Faculty Award, MIT Tech Review TR35 China Finalist, Ming Hsieh Scholar Award, Young Student Support Award of DAC (for himself and six of his Ph.D. students), etc.
Yanzhi has delivered over 90 invited technical presentations on research of real-time and efficient deep learning systems. His research works have been broadly featured and cited in around 300 media, including Boston Globe, Communications of ACM, VentureBeat, The Register, Medium, The New Yorker, Wired, NEU News, Import AI, Italian National TV, Quartz, ODSC, MIT Tech Review, TechTalks, IBM Research Blog, ScienceDaily, AAAS, CNET, ZDNet, New Atlas, Tencent News, Sina News, to name a few.
The first Ph.D. student of Yanzhi, Dr. Caiwen Ding, has graduated in June 2019, and has become a tenure-track assistant professor in Dept. of CSE at University of Connecticut. The second Ph.D. student, Ning Liu, will start as a superstar employee at DiDi AI Research (DiDi Inc.). The third Ph.D. student, Ao Ren, will become a tenure-track assistant professor in Dept. of ECE at Clemson University. The fourth Ph.D. student, Ruizhe Cai, will join Facebook Infrastructure. The postdoc/visiting scholar, Chen Pan, will join Dept. of CSE at Texas A&M Corpus Christi, as tenure-track assistant professor.
Ph.D., Postdoc, and Visiting Scholar/Students Positions Available: Northeastern University has been rising thanks to the strong leadership and efforts from faculty members. The university is located in between the famous Museum of Fine Arts (MFA) and Boston Symphony and Berkelee College of Music, the Best Location at Boston! Please apply to NEU.
CoCoPIE (the Most Important Contribution):
Assuming hardware is the major constraint for enabling real mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning inference. Billions of dollars have been spent to fuel this intelligent hardware race. We challenge this assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence (AI) on mainstream end devices without special hardware.
The principle of compression-compilation co-design is to design the compression of Deep Learning Models and their compilation to executables in a hand-in-hand manner. This synergistic method can effectively optimize both the size and speed of Deep Learning models, and also can dramatically shorten the tuning time of the compression process, largely reducing the time to the market of AI products. CoCoPIE holds numerous records on mobile AI: the first time to support all kinds of DNNs including CNNs, RNNs, transformer and language models, etc.; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current frameworks such as TensorFlow-Lite; a majority of representative DNNs and applications can be executed in real-time, for the first time, in off-the-shelf mobile devices; CoCoPIE framework on general-purpose mobile devices even outperforms a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance.
Two Representative Contributions:
Yanzhi’s group has made the following two key contributions on DNN model compression and acceleration. The first is a systematic, unified DNN model compression framework (ECCV18, ASPLOS19, ICCV19, AAAI20-1, AAAI20-2, HPCA19, etc.) based on the powerful mathematical optimization tool ADMM (Alternating Direction Methods of Multipliers), which applies to non-structured and various types of structured weight pruning as well as weight quantization technique of DNNs. It achieves unprecedented model compression rates on representative DNNs, consistently outperforming competing methods. When weight pruning and quantization are combined, we achieve up to 6,645X weight storage reduction without accuracy loss, which is two orders of magnitude higher than prior methods. Our most recent results (on Arxiv) suggest that non-structured weight pruning is not desirable at any hardware platform.
Recently, the second major contribution has been made (ASPLOS20, AAAI20, ICML19, IJCAI20, etc.) based on the ADMM solution framework. The compiler has been identified as the bridge between DNN algorithm-level compression and hardware-level acceleration, maintaining highest possible parallelism degree without accuracy compromise. Using mobile device (embedded CPU/GPU) as an example, we have developed a combination of pattern and connectivity pruning techniques, possessing both flexibility (and high accuracy) and regularity (and then hardware parallelism and acceleration). Accuracy and hardware performance are not a tradeoff anymore. Rather, it is possible for DNN model compression to be desirable at all of theory, algorithm, compiler, and hardware levels. For mobile devices, we achieve undoubtfully the fastest in DNN acceleration (e.g., 18.9ms inference time for VGG-16, 26ms for ResNet-50, and 5.4ms for MobileNet-V2 on a smartphone without accuracy loss), even outperforming prior work on FPGA and ASIC in many cases. All DNNs can be potentially be real-time in mobile devices through our algorithm-compiler-hardware co-design.
- 05/2020 “CoCoPIE: A software solution for putting real artificial intelligence in smaller spaces” reported in W&M News, also in TechXplore.
- 05/2020 The CoCoPIE acceleration framework has been reported in Xinzhiyuan (新智元), also cited in Tencent (腾讯快报), Sohu (搜狐). Another report is in Jiqizhixin (机器之心), also cited in Sina (新浪财经), thepaper.cn (澎湃).
- 05/2020 The adversarial T-shirt work has been reported in “This ugly T-shirt makes you invisible to facial recognition tech” by Wired (UK), also in Dazed, MIT News, LatestTechNews (UK), TechPowerNews, NEU News, HeadTopics (UK)
- 05/2020 The CoCoPIE Bilibili Channel is open here. Welcome to check and advise.
- 04/2020 The CoCoPIE system and demonstration paper “Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization“, have been accepted in IJCAI 2020 (proceeding paper in demonstration track). It introduces the CoCoPIE mobile acceleration of three key applications: automatic style transfer, superresolution, and auto-coloring.
- 04/2020 The CoCoPIE team and framework have been reported by Medium, and also in WebSystemer, MC.AI.
- 04/2020 The CoCoPIE YouTube Channel is open here. Welcome to check and advise!
- 04/2020 The key, conceptual paper of CoCoPIE: “CoCoPIE: Making Mobile AI Sweet as PIE — Compression-Compilation Co-Design Goes a Long Way” is on Arxiv. It introduces our key idea of compression-compilation co-design of DNNs, achieving real-time execution of most representative DNNs using off-the-shelf mobile devices, outperforming existing frameworks by up to 180X. Using the CoCoPIE solution, the pure software-based framework even outperforms representative ASIC and FPGA solutions of DNNs in terms of energy efficiency and performance.
- 04/2020 The postdoc/visiting scholar, Chen Pan, will join Dept. of CSE at Texas A&M Corpus Christi, as tenure-track assistant professor.
- 04/2020 Yanzhi will serve as PC Member/Reviewer for NeurIPS, 2020.
- 04/2020 Yanzhi will serve as PC Member for ICCAD, 2020.
- More news