Zhi-Qi Cheng, Ph.D.
About
Dr. Zhi-Qi Cheng is an Assistant Professor of Computer Science & Systems. His research focuses on multimodal artificial intelligence, computer vision, and foundation models for perception and decision support.
Before joining the University of Washington, Dr. Cheng was part of Carnegie Mellon University’s School of Computer Science, where he advanced from Research Associate and Postdoctoral Researcher to Project Scientist and Instructor at the Language Technologies Institute (LTI). He played key roles in several U.S. government–funded AI programs, serving as Technical Lead for the DARPA KAIROS project and contributing to IARPA DIVA and NIST PSIAP. His team developed large-scale video analytics and schema-based reasoning systems for multimodal event understanding.
Dr. Cheng’s work has also made real-world impact: his video analytics pipelines supported The Washington Post’s Pulitzer Prize for Public Service–winning investigations (2022). His research has been published in top-tier venues including CVPR, ICCV, NeurIPS, ICLR, AAAI, IJCAI, and ACM Multimedia.
He previously held research internships at Alibaba DAMO Academy, Google Brain, and Microsoft Research, focusing on multimodal learning, scalable perception systems, and foundation models. His contributions have been recognized with the Intel Ph.D. Fellowship and the IBM Outstanding Student Scholarship.
Dr. Cheng’s research and public-facing work have been featured in leading media outlets including The Washington Post, The New York Times, and CBS News.
Selected recent publications (full list on Google Scholar).
- MaxSup: Overcoming Representation Collapse in Label Smoothing (NeurIPS 2025, Oral; acceptance: 77/21,575 ≈ 0.36%)
- Human-Aware Vision-and-Language Navigation (NeurIPS 2024, Spotlight) [Code] [Project]
- Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning (NeurIPS 2024) [Code] [Project]
- Towards Calibrated Robust Fine-Tuning of Vision-Language Models (NeurIPS 2024)
- Rethinking Spatial Invariance of Convolutional Networks for Object Counting (CVPR 2022) — methods used in The Washington Post investigations that won the 2022 Pulitzer Prize for Public Service [Code]
- StableAnimator: High-Quality Identity-Preserving Human Image Animation (CVPR 2025) [Code] [Project]
- Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images (CVPR 2017)
- ChartReader: A Unified Framework for Chart Derendering and Comprehension (ICCV 2023) [Code]
- MetaDesigner: AI-Driven, User-Centric, Multilingual WordArt Synthesis (ICLR 2025)
- SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Oral – Industry Track) [Project]
- GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement (ACM MM 2022, Oral) [Code]
- Securing the Skies: A Comprehensive Survey on Anti-UAV Methods (CVPR 2025 Anti-UAV Workshop, Best Paper)
* Abridged list.