Rongkun Zheng

I am a third-year (starting from 2022) Ph.D. student at the University of Hong Kong, supervised by Prof. Hengshuang Zhao. Before that, I received my B.Eng. from Tsinghua University in 2022. I've done internships with SenseTime, and Shlab.

My research interests lie in the field of deep learning and computer vision, I've published multiple research works for video perception, open-world multi-modal learning, and video understanding. Now I have interests in MLLM and reinforcement learning.

Email  /  Scholar  /  Github

profile photo
Selected Publications

Google Scholar

ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
ICCV, 2025
pdf / code

Our ViLLa is an effective and efficient LMM capable of segmenting and tracking with reasoning capabilities. It can handle complex video reasoning segmentation tasks, such as: (a) segmenting objects with complex interactions; (b) segmenting objects with complex motion; (c) segmenting objects in long videos with occlusions.

SyncVIS: Synchronized Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
NeurIPS, 2024
pdf / code

In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS.

TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
NeurIPS, 2023
pdf / code

In this work, we analyze that providing extra taxonomy information can help models concentrate on specific taxonomy, and propose our model TMT-VIS, which is a taxonomy-aware multi-dataset joint training framework for video instance segmentation.

Academic Service

Reviewer

  • CVPR (2023, 2024, 2025)
  • ICCV (2023)
  • ECCV (2024)
  • NeurIPS (2023, 2024, 2025)
  • ICLR (2023)

Design and source code from Jon Barron's website