Zhenglun Kong

I am currently a postdoctoral researcher at Harvard, working closely with Marinka Zitnik at Harvard and Manolis Kellis at MIT. I received my PhD from Northeastern University in 2024, supervised by Prof. Yanzhi Wang. Prior to that, I earned my master degree from Northeastern University in 2019 and B.E. degree from Huazhong University of Science and Technology (HUST), China, in 2017. I was a research intern at Microsoft Research, ARM, and Samsung Research. My research focuses on developing efficient deep learning methods applicable to real-world scenarios, including computer vision and natural language processing. I was selected as the Machine Learning and Systems Rising Stars 2024.

Email  /  CV  /  Google Scholar  /  Github  /  LinkedIn  /  Twitter

profile photo
Research

I am dedicated to achieving the general and practical implementation of AI. My research is particularly focused on the following domains and methodologies:

  • Efficient Deep Learning: Accelerating pre-training/fine-tuning and inference, data/model compression, and fast & robust DNN design for Large Language Models (GPTs, LLaMA, BERT, etc.) and Vision Models (ViTs, Diffusion, DETR, ResNet, etc.)

  • Methodologies: Token/weight pruning, quantization, sparse training, data distillation, and latency-aware neural architecture search.

  • AI for Health: Efficient algorithms for AI agents and cell foundation models.

News

  • 09/2024: Three papers (Token reduction for SSM + Video Diffusion + Efficient LLM search) are accepted to NeurIPS'24.
  • 09/2024: One main paper (Token reduction for SSM) and one findings paper (Pruning LLMs without retraining) are accepted to EMNLP'24.
  • 05/2024: Honored to be selected as the 2024 Machine Learning and Systems Rising Stars.
  • 04/2024: I will join Harvard as a Postdoctoral Research Fellow in August.
  • 01/2024: One paper (Activation-Guided Quantization for LLMs) is accepted to AAAI'24
  • 01/2024: I gave a talk at the Robotics Institute at CMU during the VASC Seminar. The topic was "Towards Efficient Techniques and Applications for Universal AI Implementation."
  • 09/2023: One paper (Hardware-oriented 3D Detector) is accepted to NeurIPS'23, see you in New Orleans!
Experiences

Selected Publications

* means equal contribution
3DSP Exploring Token Pruning in Vision State Space Models
Zheng Zhan*, Zhenglun Kong*, Yifan Gong, Yushu Wu, Zichong Meng, Hangyu Zheng, Xuan Shen, Stratis Ioannidis, Wei Niu, Pu Zhao, Yanzhi Wang
[NeurIPS 2024] Advances in Neural Information Processing Systems

We revisit the unique computational characteristics of SSMs and discover that naive application disrupts the sequential token positions. This insight motivates us to design a novel and general token pruning method specifically for SSM-based vision models.

3DSP Rethinking Token Reduction for State Space Models
Zheng Zhan*, Yushu Wu*, Zhenglun Kong*, Changdi Yang, Yifan Gong, Xuan Shen, Xue Lin, Pu Zhao, Yanzhi Wang
[EMNLP 2024] Conference on Empirical Methods in Natural Language Processing

We propose a tailored, unified post-training token reduction method for SSMs. Our approach integrates token importance and similarity, thus taking advantage of both pruning and merging, to devise a fine-grained intra-layer token reduction strategy.

3DSP EdgeQAT: Entropy and Distribution Guided Quantization-Aware Training for the Acceleration of Lightweight LLMs on the Edge
Xuan Shen, Zhenglun Kong, Changdi Yang, Zhaoyang Han, Lei Lu, Peiyan Dong, Cheng Lyu, Chih-hsiang Li, Xuehang Guo, Zhihao Shu, Wei Niu, Miriam Leeser, Pu Zhao, Yanzhi Wang
arXiv:2402.10787
PDF / Code

We design an entropy \& distribution guided quantization method to reduce information distortion in quantized query, key, and attention maps, tackling the bottleneck of QAT for LLMs.

3DSP Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen, Peiyan Dong, Lei Lu, Zhenglun Kong, Zhengang Li, Ming Lin, Chao Wu, Yanzhi Wang
[AAAI 2024] The Thirty-Seventh AAAI Conference on Artificial Intelligence
PDF

We propose an activation-guided quantization framework for popular Large Language Models (LLMs). Specifically, with 4-bit or 8-bit for the activation and 4-bit for the weight quantization.

3DSP Lightweight Vision Transformer Coarse-to-Fine Search via Latency Profiling
Zhenglun Kong, Dongkuan Xu, Zhengang Li, Peiyan Dong, Hao Tang, Yanzhi Wang, Subhabrata Mukherjee
[TMLR] Transactions on Machine Learning Research
PDF

We introduce a truly efficient, hardware-oriented approach for searching efficient vision transformer structure. This approach has been optimized to seamlessly adapt to the constraints of the target hardware and fulfill the specific speed requirements.

3DSP HotBEV: Hardware-oriented Transformer-based Multi-View 3D Detector for BEV Perception
Zhenglun Kong*, Peiyan Dong*, Xin Meng, Pinrui Yu, Yanyue Xie, Yifan Gong, Geng Yuan, Fei Sun, Hao Tang, Yanzhi Wang
[NeurIPS 2023] Advances in Neural Information Processing Systems
PDF / Code

We present a hardware-oriented transformer-based framework for 3D detection tasks, which achieves higher detection precision and remarkable speedup across high-end and low-end GPUs.

3DSP SpeedDETR: Speed-aware Transformers for End-to-end Object Detection
Peiyan Dong*, Zhenglun Kong*, Xin Meng, Peng Zhang, Hao Tang, Yanzhi Wang, Chih-Hsien Chou
[ICML 2023] International Conference on Machine Learning
PDF / Code

We propose a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices.

3DSP Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Yanzhi Wang, et al.
[AAAI 2023 Oral] The Thirty-Seventh AAAI Conference on Artificial Intelligence
PDF / Code

We introduce sparsity into data and propose an end-to-end efficient training framework to accelerate ViT training and inference.

3DSP Data Level Lottery Ticket Hypothesis for Vision Transformers
Xuan Shen, Zhenglun Kong, Minghai Qin, Peiyan Dong, Geng Yuan, Xin Meng, Hao Tang, Xiaolong Ma, Yanzhi Wang
[IJCAI 2023 Oral] The 32nd International Joint Conference on Artificial Intelligence
PDF / Code

We generalize the LTH for ViTs to input data consisting of image patches inspired by the input dependence of ViTs. That is, there exists a subset of input image patches such that a ViT can be trained from scratch by using only this subset of patches and achieve similar accuracy to the ViTs trained by using all image patches.

3DSP SPViT: Enabling Faster Vision Transformers via Latency-aware Soft Token Pruning
Zhenglun Kong, Peiyan Dong, Xiaolong Ma, Xin Meng, Mengshu Sun, Wei Niu, Xuan Shen, Geng Yuan, Bin Ren, Minghai Qin, Hao Tang, Yanzhi Wang
[ECCV 2022] European Conference on Computer Vision
CVPRW 2022 Spotlight
PDF / Code

We propose a dynamic, latency-aware soft token pruning framework for Vision Transformer. Our framework significantly reduces the computation cost of ViTs while maintaining comparable performance on image classification.

3DSP Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing Li*, Zhenglun Kong*, Tianyun Zhang, Ji Li, Zhengang Li, Hang Liu, Caiwen Ding
[EMNLP 2020 Findings] Conference on Empirical Methods in Natural Language Processing
PDF / Code

We propose an efficient transformer-based large-scale language representation using hardware-friendly block structure pruning. We incorporate the reweighted group Lasso into block-structured pruning for optimization.

3DSP A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Wei Niu*, Zhenglun Kong*, Geng Yuan, Weiwen Jiang, Jiexiong Guan, Caiwen Ding, Pu Zhao, Sijia Liu, Bin Ren, Yanzhi Wang
[IJCAI 2021 Demo] 30th International Joint Conference on Artificial Intelligence
PDF

We propose a compression-compilation codesign framework that can guarantee BERT model to meet both resource and real-time specifications of mobile devices.

3DSP Automatic Tissue Image Segmentation Based on Image Processing and Deep Learning
Zhenglun Kong, Ting Li, Junyi Luo, Shengpu Xu
Journal of Healthcare Engineering
PDF

We realize automatic image segmentation with convolutional neural network to extract the accurate contours of four tissues: the skull, cerebrospinal fluid (CSF), grey matter (GM), and white matter (WM) on 5 MRI head image datasets.

Education

  • Northeastern University, Sep 2019 - Present,
    PhD in Computer Engineering
    Advisor: Prof. Yanzhi Wang
  • Northeastern University, Sep 2017 - May 2019
    M.S. in Computer Engineering
    Advisor: Prof. Yun Raymond Fu
  • Huazhong University of Science and Technology, China, Sep 2013 - July 2017
    B.E. in Optoelectronic Information Science and Engineering

Teaching Experiences
  • Teaching Assistant at Northeastern University
    • EECE 7205 Fundamentals of Computer Engineering, Fall 2021
      Instructor: Prof. Xue Lin   

    • EECE 5552 Assistive Robotics, Fall 2019
      Instructor: Prof. Alireza Ramezani   

    • EECE 5644 34595 Machine Learning/Pattern Recognition, Spring 2019
      Instructor: Prof. Jennifer Dy   

  • Guest Lecturer
    • CSC 791&591 Advanced Topics in Efficient Deep Learning
      North Carolina State University, 2022 Fall
      Instructor: Prof. Dongkuan Xu   

Professional Talks
  • Towards Efficient Deep Learning for Practical AI Implementation
    Carnegie Mellon University, Pittsburgh, PA, Jan. 2024.

  • The Lottery Ticket Hypothesis for Vision Transformers
    IJCAI, Macao, SAR, Aug. 2023.

  • Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
    AAAI, Washington, DC, Feb. 2023.

  • Enabling Faster Vision Transformers via Soft Token Pruning (link)
    The 8th EMC2 - Energy Efficient Training and Inference of Transformer Based Models, Washington, DC, Feb. 2023.

  • Vision Transformer Optimization
    ARM, San Jose, CA, Aug. 2021.

  • A Compression-Compilation Framework for On-mobile Real-time BERT Applications.
    IJCAI, Montreal-themed virtual reality, Aug 2021.

  • Hardware-friendly Block Structured Pruning for Transformer
    ARM, San Jose, CA, Jun. 2021.

  • Compiler-aware Neural Architecture Optimization for Transformer
    Samsung Research America, Mountain View, CA, Oct. 2020.

Professional Services
  • Conference Reviewer:
    • ICML2022, ECCV2022, NeurIPS2022, AAAI2023, CVPR2023, KDD2023, IJCAI2023, ICML2023, NeurIPS2023, AAAI2023
  • Journal Reviewer:
    • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
    • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
    • IEEE Transactions on Image Processing (TIP)
    • Pattern Recognition
    • Neurocomputing
  • Academic Committee Member:
    • MLNLP