Zhi-Qi Cheng, Ph.D.

About
Dr. Zhi-Qi Cheng is an AI/CS researcher whose career prior to the University of Washington spanned Carnegie Mellon University’s School of Computer Science, national research programs, and leading industry labs. At Carnegie Mellon’s Language Technologies Institute (LTI), he advanced from Research Associate and Postdoctoral Researcher to Project Scientist / Instructor, contributing to multimodal perception, video understanding, and decision-support systems. He played a central role in U.S. government–sponsored initiatives, serving as technical lead for the DARPA KAIROS program and contributing to IARPA DIVA and NIST PSIAP, where he helped deliver large-scale video analytics and schema-based reasoning systems.
Dr. Cheng’s technical work also supported The Washington Post’s Pulitzer Prize for Public Service–winning investigations (2022), providing video analytics pipelines that shaped award-winning public-service reporting. His research has appeared in premier conferences including CVPR, ICCV, NeurIPS, ICLR, AAAI, IJCAI, and ACM Multimedia, establishing him as an active contributor to both fundamental and applied AI research.
Beyond academia, Dr. Cheng gained industry experience through research internships at Alibaba DAMO Academy, Google Brain, and Microsoft Research, focusing on multimodal learning, scalable perception systems, and foundation models. His contributions have been recognized with prestigious awards such as the Intel Ph.D. Fellowship and the IBM Outstanding Student Scholarship. His research and public-facing work have been featured widely in media outlets including The Washington Post, The New York Times, and CBS News.
Selected recent publications (full list on Google Scholar).
- MaxSup: Overcoming Representation Collapse in Label Smoothing (NeurIPS 2025, Oral; acceptance: 77/21,575 ≈ 0.36%)
- Human-Aware Vision-and-Language Navigation (NeurIPS 2024, Spotlight) [Code] [Project]
- Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning (NeurIPS 2024) [Code] [Project]
- Towards Calibrated Robust Fine-Tuning of Vision-Language Models (NeurIPS 2024)
- Rethinking Spatial Invariance of Convolutional Networks for Object Counting (CVPR 2022) — methods used in The Washington Post investigations that won the 2022 Pulitzer Prize for Public Service [Code]
- StableAnimator: High-Quality Identity-Preserving Human Image Animation (CVPR 2025) [Code] [Project]
- Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images (CVPR 2017)
- ChartReader: A Unified Framework for Chart Derendering and Comprehension (ICCV 2023) [Code]
- MetaDesigner: AI-Driven, User-Centric, Multilingual WordArt Synthesis (ICLR 2025)
- SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Oral – Industry Track) [Project]
- GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement (ACM MM 2022, Oral) [Code]
- Securing the Skies: A Comprehensive Survey on Anti-UAV Methods (CVPR 2025 Anti-UAV Workshop, Best Paper)
* Abridged list.