CoCoPIE (the Most Important Contribution)

The above demonstrations are from the CoCoPIE Youtube channel here and Bilibili channel here. Welcome to check and advise. It is worth noticing that for the first time, on-mobile real-time acceleration of 3D activity detection networks (e.g., C3D, R(2+1)D, S3D) using off-the-shelf mobile devices. We can achieve only 9ms per frame performance without accuracy loss, outperforming current frameworks by 30X speedup. This is shown on the demo on the right.

It has been a consensus that the company who enables real intelligence on end devices (such as mobile devices and IoT devices) will define the future of computing. Racing towards this goal, many companies, whether giant technology firms such as Google, Microsoft, Amazon, Apple and Facebook, or startups spent tens of billions of dollars each year on R&D. Assuming hardware is the major constraint for enabling real mobile intelligence, the industry has mainly dedicated their efforts to developing specialized hardware accelerators for machine learning and inference. Billions of dollars have been spent to fuel this intelligent hardware race.

We challenge this assumption. By drawing on a recent real-time AI optimization framework CoCoPIE, it maintains that with effective compression-compiler co-design, it is possible to enable real-time artificial intelligence (AI) on mainstream end devices without special hardware. The principle of compression-compilation co-design is to design the compression of Deep Learning Models and their compilation to executables in a hand-in-hand manner. This synergistic method can effectively optimize both the size and speed of Deep Learning models, and also can dramatically shorten the tuning time of the compression process, largely reducing the time to the market of AI products. When applied to models running on mainstream end devices, the method can produce real-time experience across a set of AI applications that had been broadly perceived possible only with special AI accelerators.

CoCoPIE stands for Compression-Compilation co-design for Performance, Intelligence, and Efficiency. CoCoPIE holds numerous records on mobile AI: the first time to support all kinds of DNNs including CNNs, RNNs, transformer and language models, etc.; the fastest DNN pruning and acceleration framework, up to 180X faster compared with current frameworks such as TensorFlow-Lite (refer to Figure 1); a majority of representative DNNs and applications can be executed in real-time, for the first time, in off-the-shelf mobile devices; CoCoPIE framework on general-purpose mobile devices even outperforms a number of representative ASIC and FPGA solutions in terms of energy efficiency and/or performance (refer to Figure 2).

Figure 1. Execution time comparison with SOTA mobile acceleration frameworks (TFLite, TVM, Alibaba MNN) on VGG-16, ResNet-50, and MobileNet-V2 DNN models on ImageNet and CIFAR-10 datasets.
Figure 2. Comparison with representative ASIC and FPGA solutions. (a) Comparison of energy efficiency and inference latency with Google cloud TPU and edge TPU. (b) Comparison of energy efficiency with Eyeriss. (c) Comparison of energy efficiency with NVIDIA Jetson AGX Xavier. (d) Comparison of energy efficiency with FPGA solution ESE.

CoCoPIE consists of two main components, which both reflect the Compression-Compilation co-design principle. The first component, CoCo-Gen, generates efficient DNN execution codes via a synergy of pattern-based DNN pruning and pattern-aware code generation. The second component, CoCo-Tune, dramatically shortens the process in identifying the appropriate set of DNN parameters to prune by a composability-based compiler framework.

Figure 3. Examples of style transfer, automatic coloring, and super resolution implemented on off-the-shelf mobile device using CoCoPIE framework.

Demonstrations: Comprehensive real-time demonstrations of the CoCoPIE framework can be found at the CoCoPIE Youtube Channel here, including broad applications such as real-time style transfer, super-resolution (enhancing resolution), automatic coloring, and GAN-based applications. Sample example applications are shown in the above Figure 3. It is interesting to note that the CoCoPIE compiler code generation is by far the strongest even without the aim of DNN compression. The following demos show the real-time style transfer from CoCoPIE compiler (left) and an example reference (Tencent NCNN, right), using the same DNN model on the same mobile device (Samsung Galaxy S10). We can clearly see the advantage of CoCoPIE compiler.

References: The key, conceptual paper of CoCoPIE: “CoCoPIE: Making Mobile AI Sweet as PIE — Compression-Compilation Co-Design Goes a Long Way” is on Arxiv. More detailed descriptions are in the following papers:

[AAAI 2020]: Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang, “PCONV: the missing but desirable sparsity in DNN weight pruning for real-time execution on mobile device“.

[ASPLOS 2020]: Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren, “PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning”.

[IJCAI 2020]: Wei Niu, Pu Zhao, Zheng Zhan, Xue Lin, Yanzhi Wang, and Bin Ren, “Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization“.

[PLDI 2019]: Hui Guan, Xipeng Shen, Seung-Hwan Lim, “Wootz: A Compiler-based Framework for Fast CNN Pruning via Composability“.

CoCoPIE News:

  • 07/2020 The automatic pattern generation and mobile DNN acceleration led by Xiaolong has been accepted in ECCV. Congrats!
  • 07/2020 The CoCoPIE acceleration framework ” CoCoPIE: Enabling Real-Time AI on Off-the-Shelf Mobile Devices via Compression-Compilation Co-Design” has been conditionally accepted by Communications of the ACM (CACM). Congrats!
  • 06/2020 The real-time on-mobile 3D activity detection has been reported in Medium.
  • 06/2020 Yanzhi has received the U.S. Army Research Office Young Investigator Award.
  • 06/2020 The CoCoPIE acceleration framework enables, for the first time, on-mobile real-time acceleration of 3D activity detection networks (e.g., C3D, R(2+1)D, S3D) using off-the-shelf mobile devices. We can achieve only 9ms per frame performance without accuracy loss, outperforming current frameworks by 30X speedup. Please see our demos.
  • 06/2020 Yanzhi presents his work on DNN model compression and mobile acceleration to on HealthDL Workshop with Mobisys 2020.
  • 06/2020 The CoCoPIE system and demonstration has been awarded in IEEE ISLPED Design Contest 2020.
  • 05/2020 “CoCoPIE: A software solution for putting real artificial intelligence in smaller spaces” reported in W&M News, also in TechXplore.
  • 05/2020 The CoCoPIE acceleration framework has been reported in Xinzhiyuan (新智元), also cited in Tencent (腾讯快报), Sohu (搜狐). Another report is in Jiqizhixin (机器之心), also cited in Sina (新浪财经), thepaper.cn (澎湃).
  • 05/2020 The adversarial T-shirt work has been reported in “This ugly T-shirt makes you invisible to facial recognition tech” by Wired (UK), also in Dazed, MIT News, LatestTechNews (UK), TechPowerNews, NEU News, HeadTopics (UK)
  • 05/2020 The CoCoPIE Bilibili Channel is open here. Welcome to check and advise.
  • 04/2020 The CoCoPIE system and demonstration paper “Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization“, have been accepted in IJCAI 2020 (proceeding paper in demonstration track). It introduces the CoCoPIE mobile acceleration of three key applications: automatic style transfer, superresolution, and auto-coloring.
  • 04/2020 The CoCoPIE team and framework have been reported by Medium, and also in WebSystemerMC.AI.