Research
I am dedicated to achieving the general and practical implementation of AI. My research is particularly focused on the following domains and methodologies:
Efficient Deep Learning: Accelerating pre-training/fine-tuning and inference, data/model compression, and fast & robust DNN design for Large Language Models (GPTs, LLaMA, BERT, etc.) and Vision Models (ViTs, Diffusion, DETR, ResNet, etc.)
Methodologies: Token/weight pruning, quantization, sparse training, data distillation, and latency-aware neural architecture search.
AI for Health: Efficient algorithms for AI agents and cell foundation models.
|
News
- 09/2024: Three papers (Token reduction for SSM + Video Diffusion + Efficient LLM search) are accepted to NeurIPS'24.
- 09/2024: One main paper (Token reduction for SSM) and one findings paper (Pruning LLMs without retraining) are accepted to EMNLP'24.
- 05/2024: Honored to be selected as the 2024 Machine Learning and Systems Rising Stars.
- 04/2024: I will join Harvard as a Postdoctoral Research Fellow in August.
- 01/2024: One paper (Activation-Guided Quantization for LLMs) is accepted to AAAI'24
- 01/2024: I gave a talk at the Robotics Institute at CMU during the VASC Seminar. The topic was "Towards Efficient Techniques and Applications for Universal AI Implementation."
- 09/2023: One paper (Hardware-oriented 3D Detector) is accepted to NeurIPS'23, see you in New Orleans!
|
Experiences
-
Harvard University, August 2024 - present Postdoctoral researcher
Mentors: Marinka Zitnik, Manolis Kellis
-
Northeastern University, September 2019 - July 2024, Research Assistant
Advisor: Prof. Yanzhi Wang
-
Microsoft, June 2022 - August 2022 Research Intern
Mentor: Subhabrata Mukherjee
-
ARM, June 2021 - August 2021 Research Intern
Mentors: Lingchuan Meng, Danny Loh
-
Samsung Research America, October 2021 - December 2021 Research Intern
Mentors: Avik Ray, Yilin Shen, Hongxia Jin
|
Selected Publications
* means equal contribution
|
|
Exploring Token Pruning in Vision State Space Models
Zheng Zhan*, Zhenglun Kong*, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang
[NeurIPS 2024] Advances in Neural Information Processing Systems
We revisit the unique computational characteristics of SSMs and discover that naive application disrupts the sequential token positions. This insight motivates us to design a novel and general token pruning method specifically for SSM-based vision models.
|
|
Rethinking Token Reduction for State Space Models
Zheng Zhan*, Yushu Wu*, Zhenglun Kong*, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang
[EMNLP 2024] Conference on Empirical Methods in Natural Language Processing
We propose a tailored, unified post-training token reduction method for SSMs. Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging, to devise a fine-grained intra-layer token reduction strategy.
|
|
EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge
Xuan Shen, Zhenglun Kong, Changdi Yang, Zhaoyang Han, Lei Lu, Peiyan Dong, Cheng Lyu, Chih-hsiang Li, Xuehang Guo, Zhihao Shu, Wei Niu, Miriam Leeser, Pu Zhao, Yanzhi Wang
arXiv:2402.10787
PDF /
Code
We design an entropy \& distribution guided quantization method to reduce information distortion in quantized query, key, and attention maps, tackling the bottleneck of QAT for LLMs.
|
|
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen, Peiyan Dong, Lei Lu, Zhenglun Kong, Zhengang Li, Ming Lin, Chao Wu, Yanzhi Wang
[AAAI 2024] The Thirty-Seventh AAAI Conference on Artificial Intelligence
PDF
We propose an activation-guided quantization framework for popular Large Language Models (LLMs). Specifically, with 4-bit or 8-bit for the activation and 4-bit for the weight quantization.
|
|
Lightweight Vision Transformer Coarse-to-Fine Search via Latency Profiling
Zhenglun Kong, Dongkuan Xu, Zhengang Li, Peiyan Dong, Hao Tang, Yanzhi Wang, Subhabrata Mukherjee
[TMLR] Transactions on Machine Learning Research
PDF
We introduce a truly efficient, hardware-oriented approach for searching efficient vision transformer structure. This approach has been optimized to seamlessly adapt to the constraints of the target hardware and fulfill the specific speed requirements.
|
|
HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception
Zhenglun Kong*, Peiyan Dong*, Xin Meng, Pinrui Yu, Yanyue Xie, Yifan Gong, Geng Yuan, Fei Sun, Hao Tang, Yanzhi Wang
[NeurIPS 2023] Advances in Neural Information Processing Systems
PDF /
Code
We present a hardware-oriented transformer-based framework for 3D detection tasks, which achieves higher detection precision and remarkable speedup across high-end and low-end GPUs.
|
|
SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
Peiyan Dong*, Zhenglun Kong*, Xin Meng, Peng Zhang, Hao Tang, Yanzhi Wang, Chih-Hsien Chou
[ICML 2023] International Conference on Machine Learning
PDF /
Code
We propose a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices.
|
|
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Yanzhi Wang, et al.
[AAAI 2023 Oral] The Thirty-Seventh AAAI Conference on Artificial Intelligence
PDF /
Code
We introduce sparsity into data and propose an end-to-end efficient training framework to accelerate ViT training and inference.
|
|
Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang
[IJCAI 2023 Oral] The 32nd International Joint Conference on Artificial Intelligence
PDF /
Code
We generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.
|
|
SPViT: Enabling Faster Vision Transformers via Latency-aware Soft Token Pruning
Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang
[ECCV 2022] European Conference on Computer Vision
CVPRW 2022 Spotlight
PDF /
Code
We propose a dynamic, latency-aware soft token pruning framework for Vision Transformer. Our framework significantly reduces the computation cost of ViTs while maintaining comparable performance on image classification.
|
|
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing Li*, Zhenglun Kong*, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding
[EMNLP 2020 Findings] Conference on Empirical Methods in Natural Language Processing
PDF /
Code
We propose an efficient transformer-based large-scale language representation using hardware-friendly block structure pruning. We incorporate the reweighted group Lasso into block-structured pruning for optimization.
|
|
A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Wei Niu*, Zhenglun Kong*, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang
[IJCAI 2021 Demo] 30th International Joint Conference on Artificial Intelligence
PDF
We propose a compression-compilation codesign framework that can guarantee BERT model to meet both resource and real-time specifications of mobile devices.
|
|
Automatic Tissue Image Segmentation Based on Image Processing and Deep Learning
Zhenglun Kong, Ting Li, Junyi Luo, Shengpu Xu
Journal of Healthcare Engineering
PDF
We realize automatic image segmentation with convolutional neural network to extract the accurate contours of four tissues: the skull, cerebrospinal fluid (CSF), grey matter (GM), and white matter (WM) on 5 MRI head image datasets.
|
Teaching Experiences
-
Teaching Assistant at Northeastern University
EECE 7205 Fundamentals of Computer Engineering, Fall 2021
Instructor: Prof. Xue Lin   
EECE 5552 Assistive Robotics, Fall 2019
Instructor: Prof. Alireza Ramezani   
EECE 5644 34595 Machine Learning/Pattern Recognition, Spring 2019
Instructor: Prof. Jennifer Dy   
|
Professional Talks
Towards Efficient Deep Learning for Practical AI Implementation
Carnegie Mellon University, Pittsburgh, PA, Jan. 2024.
The Lottery Ticket Hypothesis for Vision Transformers
IJCAI, Macao, SAR, Aug. 2023.
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
AAAI, Washington, DC, Feb. 2023.
Enabling Faster Vision Transformers via Soft Token Pruning (link)
The 8th EMC2 - Energy Efficient Training and Inference of Transformer Based Models, Washington, DC, Feb. 2023.
Vision Transformer Optimization
ARM, San Jose, CA, Aug. 2021.
A Compression-Compilation Framework for On-mobile Real-time BERT Applications.
IJCAI, Montreal-themed virtual reality, Aug 2021.
Hardware-friendly Block Structured Pruning for Transformer
ARM, San Jose, CA, Jun. 2021.
Compiler-aware Neural Architecture Optimization for Transformer
Samsung Research America, Mountain View, CA, Oct. 2020.
|
Professional Services
-
Conference Reviewer:
-
ICML2022, ECCV2022, NeurIPS2022, AAAI2023, CVPR2023, KDD2023, IJCAI2023, ICML2023, NeurIPS2023, AAAI2023
-
Journal Reviewer:
-
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
-
IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
-
IEEE Transactions on Image Processing (TIP)
-
Pattern Recognition
-
Neurocomputing
-
Academic Committee Member:
|
|