Rongkun Zheng
I am a third-year (starting from 2022) Ph.D. student at the University of Hong Kong, supervised by Prof. Hengshuang Zhao. Before that, I received my B.Eng. from Tsinghua University in 2022. I've done internships with
SenseTime, and Shlab.
My research interests lie in the field of deep learning and computer vision, I've published multiple research works for video perception, open-world multi-modal learning, and video understanding.
Now I have interests in MLLM and reinforcement learning.
Email /
Scholar /
Github
|
|
|
ViLLa: Video Reasoning Segmentation with Large Language Model
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
ICCV, 2025
pdf /
code
Our ViLLa is an effective and efficient LMM capable of segmenting and tracking with reasoning capabilities. It can handle complex video reasoning segmentation tasks, such as:
(a) segmenting objects with complex interactions; (b) segmenting objects with complex motion; (c) segmenting objects in long videos with occlusions.
|
|
SyncVIS: Synchronized Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
NeurIPS, 2024
pdf /
code
In this work, we analyze the cause of this phenomenon and the limitations of the current solutions, and propose to conduct synchronized modeling via a new framework named SyncVIS.
|
|
TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation
Rongkun Zheng, Lu Qi, Xi Chen, Yi Wang, Kun Wang, Yu Qiao, Hengshuang Zhao
NeurIPS, 2023
pdf /
code
In this work, we analyze that providing extra taxonomy information can help models concentrate on specific taxonomy, and propose our model TMT-VIS, which is a taxonomy-aware multi-dataset joint training framework for video instance segmentation.
|
Academic Service
Reviewer
- CVPR (2023, 2024, 2025)
- ICCV (2023)
- ECCV (2024)
- NeurIPS (2023, 2024, 2025)
- ICLR (2023)
|
|