On Mon June 09, 2025

Speaker

Hyunwoo J Kim


Title

Efficient Deep Video Understanding Towards AGI


Abstract

Video has become one of the most popular modalities that modern individuals consume and produce. However, developing AI systems that deeply understand videos is still a challenging goal due to the difficulty of annotations, the sheer volume of data, and the substantial computational burden required for training and inference of video models. To address these problems, I introduce new strategies for pre-training and fine-tuning video foundation models, including parameter-efficient fine-tuning (PEFT). Additionally, to deploy video models to users, I present training-free cost-efficient inference techniques for video transformers. To demonstrate the generalizability of video foundation models, I highlight our recent work in 'Video Question Answering' which implicitly requires tackling various subtasks and achieving a deeper understanding of videos. Lastly, I discuss how Video QA and Multimodal QA systems can serve as stepping stones towards artificial general intelligence, and outline future research directions.


Bio

Hyunwoo J. Kim is an associate professor in the School of Computing (SoC) at Korea Advanced Institute of Science & Technology (KAIST). He is also affiliated faculty in the Kim Jaechul Graduate School of AI at KAIST. Prior to the position, he led his lab at Korea University (Mar. 2019 ~ Jan. 2025). Earlier in his career, he worked at Amazon Lab126 in Sunnyvale California. In 2017, he earned the Ph.D. in the Department of Computer Sciences at the University of Wisconsin-Madison (Ph.D. minor: Statistics) under the supervision of Dr. Vikas Singh. In 2013, he completed his internship in the Machine Learning Analytics Team at Amazon in Seattle, Washington.


Language

English