Dr. Yanzhi Wang
Assistant Professor, 2015-present
329 Dana, 360 Huntington Avenue
Boston, MA 02115
Youtube channel and Bilibili channel:
Yanzhi Wang is currently an assistant professor in the Department of Electrical and Computer Engineering, and Khoury College of Computer Science (Affiliated) at Northeastern University. He has received his Ph.D. Degree in Computer Engineering from University of Southern California (USC) in 2014, under the supervision of Prof. Massoud Pedram. He received the Ming Hsieh Scholar Award (the highest honor in the EE Dept. of USC) for his Ph.D. study. He received his B.S. Degree in Electronic Engineering from Tsinghua University in 2009 with distinction from both the university and Beijing city.
Dr. Wang’s current research interests are the following. His group works on both algorithms and actual implementations (mobile and embedded systems, FPGAs, circuit tapeouts, GPUs, emerging devices, and UAVs).
- Real-time and energy-efficient deep learning and artificial intelligence systems
- Model compression and mobile acceleration of deep neural networks (DNNs)
- Deep learning acceleration for autodriving
- Neuromorphic computing and non-von Neumann computing paradigms
- Cyber-security in deep learning systems
For a brief list of technical achievements, his research (i) achieves and maintains the highest model compression rates on representative DNNs since 09/2018 (ECCV18, ASPLOS19, ICCV19, ISLPED19, ASP-DAC20, AAAI20-1, AAAI20-2, etc.), (ii) achieves, for the first time, real-time and fastest execution of representative large-scale DNNs on an off-the-shelf mobile device (ASPLOS20, AAAI20, ICML19, IJCAI20, ECCV20, DAC20, AAAI21-1, AAAI21-2, CACM, TPAMI, etc.), (iii) achieves the highest performance/energy efficiency in DNN implementations on many platforms (FPGA19, ISLPED19 , AAAI19, HPCA19, ISSCC19, ASP-DAC20 , DATE20, AAAI20, PLDI20, ICS20, IJCAI20, PACT20, JSSC21). It is worth mentioning that his work on AQFP superconducting based DNN inference acceleration, which is validated through cryogenic testing, has by far the highest energy efficiency among all hardware devices (ISCA19, ICCAD18).
His research works have been published broadly in top conference and journal venues, ranging from (i) EDA, solid-state circuit and system conferences such as DAC, ICCAD, DATE, ISLPED, FPGA, LCTES, ISSCC, RTAS, etc., (ii) architecture and computer system conferences such as ASPLOS, ISCA, MICRO, HPCA, CCS, VLDB, PLDI, ICS, PACT, INFOCOM, ICDCS, etc., (iii) machine learning algorithm conferences such as AAAI, CVPR, ICML, ICCV, ICLR, IJCAI, ECCV, ACM MM, ICDM, etc., and (iv) IEEE and ACM transactions (including Communications of ACM, Proc. of IEEE, JSSC, TPAMI, etc.) and Nature and Science series journals. He ranks No. 2 in CSRankings at Northeastern University in the past 10 years, and around No. 35 throughout the U.S. His research works have been cited for around 9,300 times according to Google Scholar with H-index 44.
He has received six Best Paper or Top Paper Awards (ISLPED’14, IEEE CLOUD’14, ISVLSI’14, ICASSP’17, KDD Workshop’19, ICLR Workshop’21), one Communications of ACM Featured Article (Article) (Interview Video), has another 11 Best Paper Nominations (GLS-VLSI’13, IEEE TCAD’13, ASP-DAC’15, ISLPED’17, ASP-DAC’17, ISQED’18, ASP-DAC’18, DATE’19, ICCAD’19, DATE’20, DATE’21) and four Popular Papers in IEEE TCAD. He received the U.S. Army Research Office Young Investigator Award, IEEE TC-SDM Early Career Award, Martin W. Essigmann Excellence in Teaching Award, etc. Besides, his group has received Massachusetts Acorn Innovation Award, Google Equipment Research Award, MathWorks Faculty Award, MIT Tech Review TR35 China Finalist, Ming Hsieh Scholar Award, Young Student Support Award of DAC (for himself and six of his Ph.D. students), DAC Service Award, etc. His group and students have received first place in ISLPED Design Contest twice (2012, 2020), first place in Student Research Competition at CGO 2021, and awards in multiple other contests such as Low Power Computer Vision Challenge 2019 and NeurIPS MicroNet Challenge 2019.
Yanzhi has delivered over 100 invited technical presentations on research of real-time and efficient deep learning systems. His research works have been broadly featured and cited in around 500 media, including Boston Globe, Communications of ACM (three times), VentureBeat, The Register, Medium, The New Yorker, Wired, NEU News, Import AI, Italian National TV, MRS TV, Quartz, ODSC, MIT Tech Review, TechTalks, IBM Research Blog, ScienceDaily, AAAS, CNET, ZDNet, New Atlas, Tencent News, Sina News, to name a few.
The first Ph.D. student of Yanzhi, Caiwen Ding, has graduated in June 2019, and has become a tenure-track assistant professor in Dept. of CSE at University of Connecticut. The second Ph.D. student, Ning Liu, will start as a superstar employee at DiDi AI Research (DiDi Inc.). The third Ph.D. student, Ao Ren, is currently joining School of CS at Chongqing University as full professor (with tenure). The fourth Ph.D. student, Ruizhe Cai, has joined Facebook Infrastructure. The fifth Ph.D. student, Sheng Lin, will join Tencent U.S. as research scientist. The postdoc/visiting scholar, Chen Pan, will join Dept. of CSE at Texas A&M Corpus Christi, as tenure-track assistant professor. His co-advised Ph.D. student, Tianyun Zhang, will join Dept. of ECE at Cleveland State University as assistant professor.
Ph.D., Postdoc, and Visiting Scholar/Students Positions Available: Northeastern University has been rising thanks to the strong leadership and efforts from faculty members. The university is located in between the famous Museum of Fine Arts (MFA) and Boston Symphony and Berkelee College of Music, the Best Location at Boston! Please apply to NEU.
CoCoPIE (the Most Important Contribution):
Assuming hardware is the major constraint for enabling real mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning inference. Billions of dollars have been spent to fuel this intelligent hardware race. We challenge this assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence (AI) on mainstream end devices without special hardware.
The principle of compression-compilation co-design is to design the compression of Deep Learning Models and their compilation to executables in a hand-in-hand manner. This synergistic method can effectively optimize both the size and speed of Deep Learning models, and also can dramatically shorten the tuning time of the compression process, largely reducing the time to the market of AI products. CoCoPIE holds numerous records on mobile AI: the first time to support all kinds of DNNs including CNNs, RNNs, transformer and language models, etc.; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current frameworks such as TensorFlow-Lite; a majority of representative DNNs and applications can be executed in real-time, for the first time, in off-the-shelf mobile devices; CoCoPIE framework on general-purpose mobile devices even outperforms a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance.
Two Representative Contributions:
Yanzhi’s group has made the following two key contributions on DNN model compression and acceleration. The first is a systematic, unified DNN model compression framework (ECCV18, ASPLOS19, ICCV19, AAAI20-1, AAAI20-2, HPCA19, etc.) based on the powerful mathematical optimization tool ADMM (Alternating Direction Methods of Multipliers), which applies to non-structured and various types of structured weight pruning as well as weight quantization technique of DNNs. It achieves unprecedented model compression rates on representative DNNs, consistently outperforming competing methods. When weight pruning and quantization are combined, we achieve up to 6,645X weight storage reduction without accuracy loss, which is two orders of magnitude higher than prior methods. Our most recent results (on Arxiv) suggest that non-structured weight pruning is not desirable at any hardware platform.
Recently, the second major contribution has been made (ASPLOS20, AAAI20, ICML19, IJCAI20, ECCV20, DAC20, AAAI21-1, AAAI21-2, CACM, TPAMI, etc.) based on the ADMM solution framework. The compiler has been identified as the bridge between DNN algorithm-level compression and hardware-level acceleration, maintaining highest possible parallelism degree without accuracy compromise. Using mobile device (embedded CPU/GPU) as an example, we have developed a novel category of fine-grained structured pruning schemes, such as pattern-based and block-based pruning, possessing both flexibility (and high accuracy) and regularity (and then hardware parallelism and acceleration). Accuracy and hardware performance are not a tradeoff anymore. Rather, it is possible for DNN model compression to be desirable at all of theory, algorithm, compiler, and hardware levels. For mobile devices, we achieve undoubtfully the fastest in DNN acceleration (e.g., 6.7ms on Samsung S10 phone with 78.2% ImageNet Top-1 accuracy, or 3.9ms with over 70% ImageNet acuracy), even outperforming prior work on FPGA and ASIC in many cases. All DNNs can be potentially be real-time in mobile devices through our algorithm-compiler-hardware co-design.
- 06/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Texas Instruments Inc..
- 06/2021 [Talk] Yanzhi will give invited presentation at Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC^2), 2021.
- 06/2021 [Student] Co-advised Ph.D. student Tianyun Zhang has accepted an offer as a Tenure-Track Assistant Professor in Department of Electrical and Computer Engineering at Cleveland State University, starting at Fall 2021.
- 06/2021 [Committee] Yanzhi will serve as committee member of ICLR 2022.
- 06/2021 [Committee] Yanzhi will serve as publication chair of NanoArch 2022.
- 06/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Leidos (joint meeting between Leidos and Northeastern Univ.).
- 06/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to OPPO/Zeku Inc..
- 06/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration at SRC Meeting.
- 06/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to NIO Automobiles Inc..
- 05/2021 [Paper] Our paper “GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity” has been accepted in IEEE Trans. on Pattern Recognition and Machine Intelligence (TPAMI). (Impact Factor 17.86)
- 05/2021 [Award] The CoCoPIE acceleration framework “CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design” has been selected as Featured Article by Communications of the ACM, announced together with the Turing Award winners. (Article) (Interview Video)
- 05/2021 [Paper] Our paper “Lottery ticket preserves weight correlation: Is it desirable or not?” is accepted in ICML 2021.
- 05/2021 [Paper] Our papers “Towards fast and accurate multi-person pose estimation on mobile devices” and “A compression-compilation framework for on-mobile real-time BERT applications” are accepted in IJCAI 2021 Demonstration Track.
- 05/2021 [Committee] Yanzhi serves as an organizer of the ROAD4NN workshop co-located with DAC 2021.
- 04/2021 [Committee] Yanzhi serves as an organizer of the HALO workshop co-located with ICCAD 2021.
- 04/2021 [Award] Yanzhi’s group has received the Best Paper Award in HAET Workshop at ICLR 2021.
- 04/2021 [Award] Yanzhi has received the Martin W. Essigmann Excellence in Teaching Award.
- 04/2021 [Award] Yanzhi receives the most welcomed speaker award from TechBeat.
- 04/2021 [Paper] Our collaborative paper “EXTRA: An Experience-driven Control Framework for Distributed Stream Data Processing with a Variable Number of Threads” is accepted in IWQoS 2021.
- 04/2021 [Paper] Our collaborative paper “PnP-DRL: a plug-and-play deep reinforcement learning approach for experience-driven networking” is accepted IEEE Journal on Selected Areas in Communications (JSAC).
- 04/2021 [Paper] One collaborative paper “NS-FDN: Near-Sensor processing architecture of Feature-configurable Distributed Network for beyond-real-time always-on keyword spotting” is accepted in IEEE TCAS-I.
- 04/2021 [Award] Yanzhi is awarded for IEEE TC-SDM Early Career Award.
- 03/2021 [Paper] Our paper “ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning” is accepted in ICS 2021 (acceptance ratio 24%).
- 03/2021 [Student] Xue’s Ph.D. student Kaidi Xu has accepted an offer as a Tenure-Track Assistant Professor in Department of Computer Science at Drexel University, starting at Fall 2021. (News here)
- 03/2021 [Paper] Multiple papers “Towards Real-Time 3D Object Detection for Autonomous Vehicles with Pruning Search”, “An Infrastructure-Aided High Definition Map Data Provisioning Service for Autonomous Driving”, and “Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework” are accepted in RTAS 2021.
- 03/2021 [Paper] Our collaborative paper “Distributed Graph Processing System and Processing-In-Memory Architecture with Precise Loop-Carried Dependency Guarantee” is accepted in ACM Trans. on Computer Systems (TOCS) 2021.
- 03/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Amazon.
- 03/2021 [Paper] Our paper “FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator” is accepted in ISCA 2021.
- 03/2021 [Award] Malith has won the 1st Place in the Student Research Competition at CGO 2021.
- 03/2021 [Paper] Our paper “NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration” will appear as Oral Paper at CVPR 2021.
- 03/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Rutgers University.
- 03/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Honor Inc.
- 02/2021 [Paper] Our papers “NPAS: A Compiler-aware Framework of Unified Network Pruning and Architecture Search for Beyond Real-Time Mobile Acceleration”, “Teachers Do More Than Teach: Compressing Image-to-Image Models” are accepted in CVPR 2021.
- 02/2021 [Paper] Collaborative paper “DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion” is accepted in PLDI 2021.
- 02/2021 [Paper] Collaborative paper “Radio Frequency Fingerprinting on the Edge” is accepted in IEEE Trans. on Mobile Computing.
- 02/2021 [Paper] Our work “Non-Structured DNN Weight Pruning Considered Harmful” is accepted by IEEE TNNLS (Impact Factor 12.18). It has a strong conclusion that non-structured DNN weight pruning is not preferred on any platform. We suggest not to continue working on sparsity-aware DNN acceleration with non-structured weight pruning.
- 02/2021 [Paper] Two papers accepted in DAC 2021: “A unified DNN weight pruning framework using reweighted optimization methods”, and “Neural pruning search for real-time object detection of autonomous vehicles”.
- 02/2021 [Award] The CoCoPIE acceleration framework “CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design” has been selected as Featured Article by Communications of the ACM.
- 02/2021 [Grant] Yanzhi’s group has received a research gift grant from Snap Inc. U.S. Thanks Snap!
- 02/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to the Jiangmen.com.
- 02/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to a seminar jointly held by Chinese Academy of Sciences and Beijing Institute of Technology.
- 01/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to University of Illinois Chicago.
- 01/2021 [Media] Our Mix-and-Match FPGA quantization and acceleration is reported in CSDN.
- 01/2021 [Paper] Collaborative paper on SRAM-based process-in-memory for DNN acceleration accepted in JSSC (Journal of Solid-State Circuits).
- 01/2021 [Paper] Invited paper on ACM Trans. on Design Automation of Electronic Systems (TODAES).
- 01/2021 [Committee] Yanzhi will serve as committee member of ICCV 2021.
- 01/2020 [Grant] The CoCoPIE Team receives the Alpha Fund at Northeastern University. Thanks!
- 01/2021 [Paper] Congratulations to Kaidi with two papers accepted in ICLR 2021.
- 01/2021 [Award] Our paper “TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators” receives best paper nomination at DATE 2021.
- 01/2021 [Paper] One collaborative paper “NS-FDN: Near-Sensor processing architecture of Feature-configurable Distributed Network for beyond-real-time always-on keyword spotting” has been conditionally accepted in IEEE TCAS-I.
- 12/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Sohu Inc..
- 12/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to XPeng Inc..
- 12/2020 [Media] CoCoPIE for Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device is reported in Technology.org (link), also in Onread (link).
- 12/2020 [Media] CoCoPIE for YoLoBile: real-time YoLo-v4 acceleration on mobile devices is reported in Jiqizhixin (link), also in Sohu (link).
- 12/2020 [Project] The CoCoPIE Team receives an operator licensing project from Tencent USA. Thanks Tencent!
- 12/2020 [Paper] Two papers on extreme on-device acceleration and on-device DNN for autodriving have been accepted in Workshop on Accelerated Machine Learning (AccML), co-located with the HiPEAC 2021.
- 12/2020 [Paper] Three papers accepted in AAAI 2021, including “RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices”, “YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design”, and “A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices” (Demonstration Paper). The acceptance rate is 21%.
- More news