TrackGo

Abstract

Recent years have seen substantial progress in diffusion-based controllable video generation. However, achieving precise control in complex scenarios, including fine-grained object parts, sophisticated motion trajectories, and coherent background movement, remains a challenge. In this paper, we introduce TrackGo, a novel approach that leverages free-form masks and arrows for conditional video generation. This method offers users with a flexible and precise mechanism for manipulating video content. We also propose the TrackAdapter for control implementation, an efficient and lightweight adapter designed to be seamlessly integrated into the temporal self-attention layers of a pretrained video generation model. This design leverages our observation that the attention map of these layers can accurately activate regions corresponding to motion in videos. Our experimental results demonstrate that our new approach, enhanced by the TrackAdapter, achieves state-of-the-art performance on key metrics such as FVD, FID, and ObjMC scores.

Video(about 1min)

Methodology

Our method consists of two parts: image-to-video(I2V) model and TrackAdapter.

Camera Motion

TrackGo can also achieve the effect of camera motion. By simply selecting the entire screen as the motion area, an effect that moves along a specified trajectory can be achieved.

Experiments

We compare our approach with other approaches on VIPSeg dataset and our internal dataset. We select FVD, FID and ObjMC as evaluation metrics.

		VIPSeg			Internal dataset
Method	Base Arch	FVD ↓	FID ↓	ObjMC ↓	FVD ↓	FID ↓	ObjMC ↓
DragNUWA	SVD	321.31	30.15	298.98	178.37	38.07	129.80
DragAnything	SVD	294.91	28.16	236.02	169.73	32.85	133.89
TrackGo	SVD	248.27	25.60	191.15	136.11	29.19	79.52

Citation

@article{zhou2024trackgo, title={TrackGo: A Flexible and Efficient Method for Controllable Video Generation }, author={Haitao, Zhou and Chuang, Wang and Rui, Nie and Jinxiao, Lin and Dongdong, Yu and Qian, Yu and Changhu, Wang}, booktitle={arXiv preprint}, year={2024} }

Accepted by AAAI 2025 🎉🎉🎉