CoCoPIE (the Most Important Contribution)

The above demonstrations are from the CoCoPIE Youtube channel here and Bilibili channel here. Welcome to check and advise. It is worth noticing that for the first time, on-mobile real-time accelerations have been achieved for YOLO-v4 based object detection and 3D activity detection networks (e.g., C3D, R(2+1)D, S3D) using off-the-shelf mobile devices. For object detection, we achieve 19FPS on a Samsung Galaxy S10 phone with higher mAP accuracy than YOLO-v3. For activity detection, we can achieve only 6.8ms per frame performance without accuracy loss, outperforming current frameworks by 40X speedup. The demos are shown above.

It has been a consensus that the company who enables real intelligence on end devices (such as mobile devices and IoT devices) will define the future of computing. Racing towards this goal, many companies, whether giant technology firms such as Google, Microsoft, Amazon, Apple and Facebook, or startups spent tens of billions of dollars each year on R&D. Assuming hardware is the major constraint for enabling real mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. Billions of dollars have been spent to fuel this intelligent hardware race.

We challenge this assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence (AI) on mainstream end devices without special hardware. The principle of compression-compilation co-design is to design the compression of Deep Learning Models and their compilation to executables in a hand-in-hand manner. This synergistic method can effectively optimize both the size and speed of Deep Learning models, and also can dramatically shorten the tuning time of the compression process, largely reducing the time to the market of AI products. When applied to models running on mainstream end devices, the method can produce real-time experience across a set of AI applications that had been broadly perceived possible only with special AI accelerators.

CoCoPIE stands for Compression-Compilation co-design for Performance, Intelligence, and Efficiency. CoCoPIE holds numerous records on mobile AI: the first time to support all kinds of DNNs including CNNs, RNNs, transformer and language models, etc.; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current frameworks such as TensorFlow-Lite; we achieve unprecedented 6.7ms on Samsung S10 phone with 78.2% ImageNet Top-1 accuracy, or 3.9ms with over 70% ImageNet accuracy (refer to Figure 1); a majority of representative DNNs and applications can be executed in real-time, for the first time, in off-the-shelf mobile devices; CoCoPIE framework on general-purpose mobile devices even outperforms a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance (refer to Figure 2).

Figure 1. Top-1 ImageNet accuracy vs. latency on mobile CPU (left) and mobile GPU (right) of a Samsung Galaxy S10 phone.
Figure 2. Comparison with representative ASIC and FPGA solutions. (a) Comparison of energy efficiency and inference latency with Google cloud TPU and edge TPU. (b) Comparison of energy efficiency with Eyeriss. (c) Comparison of energy efficiency with NVIDIA Jetson AGX Xavier. (d) Comparison of energy efficiency with FPGA solution ESE.

CoCoPIE consists of two main components, which both reflect the Compression-Compilation co-design principle. The first component, CoCo-Gen, generates efficient DNN execution codes via a synergy of fine-grained structured DNN pruning schemes (e.g., pattern-based pruning, block-based pruning) and automatic compiler-level code generation. The second component, CoCo-Tune, dramatically shortens the process in identifying the appropriate set of DNN parameters to prune by a composability-based compiler framework.

Figure 3. Examples of style transfer, automatic coloring, and super resolution implemented on off-the-shelf mobile device using CoCoPIE framework.

Demonstrations: Comprehensive real-time demonstrations of the CoCoPIE framework can be found at the CoCoPIE Youtube Channel here, including broad applications such as real-time style transfer, super-resolution (enhancing resolution), automatic coloring, object detection, 3D activity detection, background segmentation, healthcare, neural translation, natural language processing, and GAN-based applications. Sample example applications are shown in the above Figure 3. It is interesting to note that the CoCoPIE compiler code generation is by far the strongest even without the aid of DNN compression. The following demos show the real-time style transfer from CoCoPIE compiler (left) and an example reference (Tencent NCNN, right), using the same DNN model on the same mobile device (Samsung Galaxy S10). We can clearly see the advantage of CoCoPIE compiler. We also show a summary of inference time of representative DNNs (the original model, without pruning) of our framework in comparison with TFLite, TVM, MNN, and PyTorch Mobile, as shown in the following table.

References: The key, conceptual paper of CoCoPIE: “CoCoPIE: Making Mobile AI Sweet as PIE — Compression-Compilation Co-Design Goes a Long Way” is on Arxiv. More detailed descriptions are in the following papers:

[AAAI 2021] Yuxuan Cai, Hongjia Li, Geng Yuan, Wei Niu, Yanyu Li, Xulong Tang, Bin Ren, and Yanzhi Wang, “YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design“.

[AAAI 2021] Wei Niu, Mengshu Sun, Zhengang Li, Jou-An Chen, Jiexiong Guan, Xipeng Shen, Xue Lin, Bin Ren, and Yanzhi Wang, “RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices”.

[ECCV 2020] Xiaolong Ma, Wei Niu, Tianyun Zhang, Sijia Liu, Sheng Lin, Hongjia Li, Xiang Chen, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang, “An Image Enhancing Pattern-based Sparsity for Real-Time Inference on Mobile Devices“.

[DAC 2020] Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao, “RTMobile: Beyond Real-time Mobile Acceleration of RNNs for Speech Recognition“.

[AAAI 2020]: Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang, “PCONV: the missing but desirable sparsity in DNN weight pruning for real-time execution on mobile device“.

[ASPLOS 2020]: Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren, “PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning”.

[IJCAI 2020]: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, and Bin Ren, “Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization“.

[PLDI 2019]: Hui Guan, Xipeng Shen, Seung-Hwan Lim, “Wootz: A Compiler-based Framework for Fast CNN Pruning via Composability“.

CoCoPIE News:

  • 02/2021 [Award] The CoCoPIE acceleration framework “CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design” has been selected as Featured Article by Communications of the ACM.
  • 02/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to the Jiangmen.com.
  • 02/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to a seminar jointly held by Chinese Academy of Sciences and Beijing Institute of Technology.
  • 01/2021 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to University of Illinois Chicago.
  • 01/2020 [Grant] The CoCoPIE Team receives the Alpha Fund at Northeastern University. Thanks!
  • 12/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Sohu Inc..
  • 12/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to XPeng Inc..
  • 12/2020 [Media] CoCoPIE for Achieving Real-Time LiDAR 3D Object Detection on a Mobile Device is reported in Technology.org (link), also in Onread (link).
  • 12/2020 [Media] CoCoPIE for YoLoBile: real-time YoLo-v4 acceleration on mobile devices is reported in Jiqizhixin (link), also in Sohu (link).
  • 12/2020 [Project] The CoCoPIE Team receives an operator licensing project from Tencent USA. Thanks Tencent!
  • 12/2020 [Paper] Two papers on extreme on-device acceleration and on-device DNN for autodriving have been accepted in Workshop on Accelerated Machine Learning (AccML), co-located with the HiPEAC 2021.
  • 12/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Kwai Inc..
  • 12/2020 [Paper] Three papers accepted in AAAI 2021, including “RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices”, “YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design”, and “A Compression-Compilation Co-Design Framework Towards Real-Time Object Detection on Mobile Devices” (Demonstration Paper). The acceptance rate is 21%.
  • 11/2020 [Grant] The CoCoPIE Team receives a pilot customer discovery grant by NSF iCorps program at Northeastern University. Thanks!
  • 11/2020 [Talk] Wei Niu remotely presented compression-compilation co-design to Google Brain.
  • 11/2020 [Paper] Tianyun’s paper “StructADMM: Achieving Ultra-High Efficiency in Structured Pruning for DNNs” has been accepted by IEEE TNNLS (Impact Factor 12.18).
  • 11/2020 [Media] Our research on mobile deep learning acceleration has been featured in the SMART Center interview on MRS TV at the 2020 MRS Virtual Spring/Fall Meeting & Exhibit (Link).
  • 11/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration to Baidu.
  • 11/2020 [Talk] Yanzhi remotely presented compression-compilation co-design for real-time DNN acceleration at HALO workshop with ICCAD, 2020.
  • 10/2020 [Paper] One demonstration paper accepted in AAAI 2021.
  • 10/2020 [Paper] Our paper accepted in NeurIPS 2020 workshop on autonomous driving.
  • 10/2020 [Media] Our CoCoPIE for real-time BERT acceleration on mobile devices is reported in CSDN (link)
  • 10/2020 [Talk] Yanzhi remotely presented compression-compilation co-design at ByteDance.
  • 09/2020 [Media] Our CoCoPIE for YoLoBile: real-time YoLo-v4 acceleration on mobile devices is reported in CVer (link), also cited in Tencent News.
  • 09/2020 [Media] Our CoCoPIE for real-time BERT acceleration on mobile devices is reported in CSDN (link), also cited in Tencent News.
  • 09/2020 [Talk] Yanzhi remotely presented compression-compilation co-design at Chongqing University (Youtube Link, BiliBili Link)
  • 09/2020 [Talk] Yanzhi remotely presented compression-compilation co-design to Young Scholar seminar at Zhejiang University (Youtube Link, BiliBili Link)
  • 09/2020 [Paper] Student Zhenglun Kong’s paper on BERT model compression is accepted by EMNLP 2020.
  • 08/2020 [Media] Our CoCoPIE mobile acceleration framework is reported in CSDN (link), also cited in Tencent News, Zhuanzhi.AI, KKNews, etc.
  • 08/2020 [Award] Our CoCoPIE mobile acceleration framework received first place in ISLPED Design Contest 2020 (Youtube Link) (BiliBili Link).
  • 08/2020 [Paper] Two demonstration papers accepted in ECCV 2020, one on the CoCoPIE real-time acceleration on superresolution, style transfer, automatic coloring, and object detection applications, and the other on real-time mobile acceleration of 3D CNN for activity detection.
  • 07/2020 The automatic pattern generation and mobile DNN acceleration led by Xiaolong has been accepted in ECCV. Congrats!
  • 07/2020 The CoCoPIE acceleration framework ” CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design” has been conditionally accepted by Communications of the ACM (CACM). Congrats!
  • 06/2020 The real-time on-mobile 3D activity detection has been reported in Medium.
  • 06/2020 Yanzhi has received the U.S. Army Research Office Young Investigator Award.
  • 06/2020 The CoCoPIE acceleration framework enables, for the first time, on-mobile real-time acceleration of 3D activity detection networks (e.g., C3D, R(2+1)D, S3D) using off-the-shelf mobile devices. We can achieve only 9ms per frame performance without accuracy loss, outperforming current frameworks by 30X speedup. Please see our demos.
  • 06/2020 Yanzhi presents his work on DNN model compression and mobile acceleration to on HealthDL Workshop with Mobisys 2020.
  • 06/2020 The CoCoPIE system and demonstration has been awarded in IEEE ISLPED Design Contest 2020.
  • 05/2020 “CoCoPIE: A software solution for putting real artificial intelligence in smaller spaces” reported in W&M News, also in TechXplore.
  • 05/2020 The CoCoPIE acceleration framework has been reported in Xinzhiyuan (新智元), also cited in Tencent (腾讯快报), Sohu (搜狐). Another report is in Jiqizhixin (机器之心), also cited in Sina (新浪财经), thepaper.cn (澎湃).
  • 05/2020 The adversarial T-shirt work has been reported in “This ugly T-shirt makes you invisible to facial recognition tech” by Wired (UK), also in Dazed, MIT News, LatestTechNews (UK), TechPowerNews, NEU News, HeadTopics (UK)
  • 05/2020 The CoCoPIE Bilibili Channel is open here. Welcome to check and advise.
  • 04/2020 The CoCoPIE system and demonstration paper “Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization“, have been accepted in IJCAI 2020 (proceeding paper in demonstration track). It introduces the CoCoPIE mobile acceleration of three key applications: automatic style transfer, superresolution, and auto-coloring.
  • 04/2020 The CoCoPIE team and framework have been reported by Medium, and also in WebSystemerMC.AI.