JavisDiT:
Joint Audio-Video Diffusion Transformer with
Hierarchical Spatio-Temporal Prior Synchronization
Kai Liu1,2,
Wei Li3,
Lai Chen1,
Shengqiong Wu2,
Yanhao Zheng1,
Jiayi Ji2,
Fan Zhou1,
Rongxin Jiang1,
Jiebo Luo4,
Hao Fei2*,
Tat-Seng Chua2,
1 Zhejiang University,
2 National University of Singapore,
3 University of Science and Technology of China,
4 University of Rochester
Work in progress (*Correspondence)