Zhi-Qi Cheng, Ph.D.
About
Dr. Zhi-Qi Cheng is a tenure-track Assistant Professor of Computer Science & Systems at the University of Washington Tacoma and a Graduate Faculty member with doctoral endorsement through the University of Washington Graduate School. He directs the Multimodal Intelligence Lab (MILab), where his research focuses on multimodal foundation models, embodied AI, and intelligent systems for open-world robotics, mobility, public safety, and real-world decision-making.
Before joining the University of Washington, Dr. Cheng spent seven years at Carnegie Mellon University’s School of Computer Science, primarily in the Language Technologies Institute (LTI), where he served as a Research Associate (2017–2019), Postdoctoral Research Associate (2019–2022) and Project Scientist (2022–2024). His work focused on multimodal understanding, event-centric reasoning, and large-scale AI systems that integrate video, language, audio, maps, and knowledge sources for complex real-world environments. During this period, he was mentored by Prof. Alexander G. Hauptmann and Prof. Teruko Mitamura, whose guidance helped shape his research direction, system-building experience, and contributions to large-scale AI programs.
From 2019 to 2024, Dr. Cheng served as a core technical and system-delivery lead for CMU’s DARPA KAIROS system, contributing to multimodal event understanding, schema-guided reasoning, and integrated AI system development. KAIROS was a long-running collaborative CMU effort spanning language technologies, multimodal reasoning, speech, knowledge representation, and system integration. He also contributed to U.S. government-funded AI programs including DARPA AIDA, KAIROS-Plus, IARPA DIVA, and NIST PSIAP, focusing on multimodal perception, reasoning, and deployable intelligent systems.
Dr. Cheng’s research spans foundational AI research and real-world applications. His technical analysis contributed to The Washington Post investigations included in its 2022 Pulitzer Prize for Public Service-winning coverage. His work has been published at leading AI conferences including NeurIPS, ICLR, CVPR, ICCV, ACL, AAAI, and ACM Multimedia. He has held visiting research appointments or internships at Meta AI, Alibaba DAMO Academy, and Microsoft Research. His work has been recognized with the Intel Ph.D. Fellowship and the CSC-IBM Outstanding Student Scholarship, and has been featured by The Washington Post, The New York Times, and CBS News.
Research Areas:
- Multimodal Foundation Models
- Embodied AI & World Models
- Mobility, Public Safety & Secure Deployment
Teaching:
Dr. Cheng teaches undergraduate and graduate courses in machine learning, algorithms, computer graphics, robotics, vision-language models, and multimodal AI systems. His courses emphasize technical depth, hands-on implementation, empirical evaluation, reproducible experimentation, and real-world AI systems. Courses are available to students across Seattle, Tacoma, and Bothell through UW cross-campus registration, subject to course capacity, prerequisites, registration periods, and applicable campus policies. Undergraduate cross-campus enrollment is subject to UW policy, including home-campus credit requirements and cross-campus credit limits; graduate and graduate non-matriculated students have no cross-campus registration restrictions.
- TCSS 437 — Mobile Robotics
- TCSS 455 — Introduction to Machine Learning
- TCSS 458 — Computer Graphics
- TCSS 543 — Advanced Algorithms
- TCSS 590 — Vision-Language Models
- Independent Research, Thesis, and Design Project Supervision — TCSS 499 / 600 / 700 / 702
Research Supervision & Independent Study:
Dr. Cheng advises undergraduate and M.S. students through lab-based research in MILab, independent study, supervised research credits, capstone projects, and master’s thesis/design project supervision. Current UW students across Seattle, Tacoma, and Bothell interested in working with Dr. Cheng should contact him before registration to discuss research fit, project scope, supervision capacity, expected deliverables, quarter timeline, and credit pathway. Relevant pathways include TCSS 499 for undergraduate research, TCSS 600 for graduate independent study or research, and TCSS 700 / TCSS 702 for master’s thesis or design project supervision. Individualized research credits require instructor approval and may require a faculty number or departmental registration support.
Ph.D. Advising & Recruiting:
As a UW Graduate Faculty member with doctoral endorsement, Dr. Cheng can participate in doctoral supervision, dissertation advising, and doctoral committee service through the University of Washington Graduate School. His primary Ph.D. recruiting pathway is the Computer Science & Systems — School of Engineering & Technology (Tacoma) — Ph.D. program. Prospective Ph.D. students are encouraged to contact Dr. Cheng before applying to discuss research fit and potential advising. Competitive applicants may be considered for program-nominated UW Graduate School recruitment awards, including the GSEE Doctoral Recruitment Fellowship and Top-Off Funding and GSFEI Top Scholar Awards, subject to program procedures, eligibility requirements, nomination rules, and funding availability.
Computer Science & Systems — School of Engineering & Technology (Tacoma) — Ph.D. program:
https://apply.grad.uw.edu/portal/prog_detail_find_a_program?progid=2-Z-TCSCI-00-41
Full publication list on Google Scholar.
Multimodal Foundation Models, Generative Modeling & Efficient AI Systems
- Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning (NeurIPS 2024) [ Code ] [ Project ]
- MaxSup: Overcoming Representation Collapse in Label Smoothing (NeurIPS 2025, Oral) [ Code ]
- Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding (ICLR 2026, Oral) [ Code ]
- Towards Calibrated Robust Fine-Tuning of Vision-Language Models (NeurIPS 2024) [ Code ]
- MetaDesigner: AI-Driven, User-Centric, Multilingual WordArt Synthesis (ICLR 2025) [ Demo ]
- SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Industry Track Oral) [ Code ] [ Project ]
- ChartReader: A Unified Framework for Chart Derendering and Comprehension (ICCV 2023) [ Code ]
- FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio (CVPR 2024) [ Code ]
- StableAnimator: High-Quality Identity-Preserving Human Image Animation (CVPR 2025) [ Code ] [ Project ]
Embodied AI, World Models & Vision-Language Learning
- Human-Aware Vision-and-Language Navigation (NeurIPS 2024, Spotlight) [ V2 Code ] [ V1 Code ] [ Project ]
- Towards Unified World Models for Visual Navigation via Memory-Augmented Planning and Foresight (arXiv 2025) [ Code ]
- Language-Conditioned World Modeling for Visual Navigation (arXiv 2026) [ Code ]
- GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement (ACM Multimedia 2022, Oral) [ Code ]
- A Video-grounded Dialogue Dataset and Metric for Event-driven Activities (AAAI 2025, Oral) [ Code ]
- ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding (NAACL 2025, Oral) [ Code & Data ]
Mobility, Public Safety & Secure Deployment
- Rethinking Spatial Invariance of Convolutional Networks for Object Counting (CVPR 2022) [ Code ]
- BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition (CVPR 2024) [ Code ]
- DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving (IJCAI 2023) [ Code ]
- SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply-Chain Disruptions (EMNLP 2024, Industry Track Oral) [ Code ] [ Project ]
- POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search (AAAI 2025) [ Code ]
- Securing the Skies: A Comprehensive Survey on Anti-UAV Methods, Benchmarking, and Future Directions (CVPR 2025 Anti-UAV Workshop, Best Paper)
Project Reports & Technical Reports
- Robust Automatic Detection of Traffic Activity (U.S. DOT / Mobility21 Final Research Report, 2023)
- CHRONOS-KAIROS Final Systems Description (DARPA KAIROS Final Research Report, 2024)