publications

Sorted by year.

2024

  1. MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
    Shuyuan Tu, Qi Dai, Zihao Zhang, Sicheng Xie, Zhi-Qi Cheng, Chong Luo, and 3 more authors
    arXiv preprint arXiv:2405.20325, 2024
  2. MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
    Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, and 9 more authors
    In CVPR , 2024
  3. MotionEditor: Editing Video Motion via Content-Aware Diffusion
    Shuyuan Tu, Qi DaiZhi-Qi ChengHan Hu, Xintong Han, Zuxuan Wu, and 1 more author
    In CVPR , 2024
  4. SimDA: Simple Diffusion Adapter for Efficient Video Generation
    Zhen Xing, Qi DaiHan HuZuxuan Wu, and Yu-Gang Jiang
    In CVPR , 2024
  5. BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
    Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng, Yan Yan, Qi Dai, and Xian-Sheng Hua
    In CVPR , 2024
  6. ARTV: Auto-Regressive Text-to-Video Generation with Diffusion Models
    Wenming Weng, Ruoyu Feng , Yanhui Wang, Qi Dai, Wang Chunyu, Dacheng Yin, and 7 more authors
    In CVPRW , 2024

2023

  1. SVFormer: Semi-supervised Video Transformer for Action Recognition
    Zhen Xing, Qi DaiHan Hu, Jingjing Chen, Zuxuan Wu, and Yu-Gang Jiang
    In CVPR , 2023
  2. ResFormer: Scaling ViTs with Multi-Resolution Training
    Rui Tian, Zuxuan WuQi DaiHan Hu, Yu Qiao, and Yu-Gang Jiang
    In CVPR , 2023
  3. On Data Scaling in Masked Image Modeling
    Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Yixuan Wei, Qi Dai, and 1 more author
    In CVPR , 2023
  4. HiVit: A simpler and more efficient design of hierarchical vision transformer
    Xiaosong Zhang, Yunjie Tian, Lingxi Xie, Wei Huang, Qi Dai, Qixiang Ye, and 1 more author
    In ICLR , 2023
  5. Implicit Temporal Modeling with Learnable Alignment for Video Recognition
    Shuyuan Tu, Qi DaiZuxuan WuZhi-Qi ChengHan Hu, and Yu-Gang Jiang
    In ICCV , 2023
  6. All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
    Jia Ning , Chen Li, Zheng Zhang , Chunyu Wang, Zigang Geng, Qi Dai, and 2 more authors
    In ICCV , 2023
  7. ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules
    Zhi-Qi ChengQi Dai, and Alexander G Hauptmann
    In ICCV , 2023
  8. VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
    Zhen Xing, Qi Dai, Zihao Zhang, Hui Zhang, Han HuZuxuan Wu, and 1 more author
    arXiv preprint arXiv:2311.18837, 2023
  9. A survey on video diffusion models
    Zhen Xing, Qijun Feng, Haoran Chen, Qi DaiHan Hu, Hang Xu, and 2 more authors
    arXiv preprint arXiv:2310.10647, 2023
  10. Parallel sentence-level explanation generation for real-world low-resource scenarios
    Yan Liu, Xiaokang Chen, and Qi Dai
    In ICASSP , 2023
  11. Deep Uncoupled Discrete Hashing via Similarity Matrix Decomposition
    Dayan Wu, Qi Dai , Bo Li , and Weiping Wang
    ACM TOMM, 2023

2022

  1. SimMIM: A Simple Framework for Masked Image Modeling
    Zhenda Xie, Zheng Zhang, Yue Cao, Yutong Lin, Jianmin Bao , Zhuliang Yao, and 2 more authors
    In CVPR , 2022
  2. Rethinking Spatial Invariance of Convolutional Networks for Object counting
    Zhi-Qi ChengQi Dai , Hong Li, Jingkuan Song , Xiao Wu, and Alexander G Hauptmann
    In CVPR , 2022
  3. On the Connection between Local Attention and Dynamic Depth-Wise Convolution
    Qi Han, Zejia Fan, Qi Dai, Lei Sun , Ming-Ming Cheng, Jiaying Liu, and 1 more author
    ICLR, 2022
  4. GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement
    Zhi-Qi ChengQi Dai , Siyao Li, Teruko Mitamura, and Alexander Hauptmann
    In ACM Multimedia , 2022
  5. MPII: Multi-level Mutual Promotion for Inference and Interpretation
    Yan Liu, Sanyuan Chen, Yazheng Yang, and Qi Dai
    In ACL , 2022

2021

  1. Temporal Action Detection with Multi-Level Supervision
    Baifeng Shi, Qi Dai, Judy Hoffman, Kate Saenko, Trevor Darrell, and Huijuan Xu
    In ICCV , 2021
  2. Self-Supervised Learning with Swin Transformers
    Zhenda Xie, Yutong Lin , Zhuliang Yao, Zheng Zhang, Qi Dai, Yue Cao, and 1 more author
    arXiv preprint arXiv:2105.04553, 2021
  3. Cross-Modal Attention Consistency for Video-Audio Unsupervised Learning
    Shaobo Min, Qi Dai, Hongtao Xie, Chuang Gan, Yongdong Zhang, and Jingdong Wang
    arXiv preprint arXiv:2106.06939, 2021
  4. A Novel Class Restriction Loss for Unsupervised Domain Adaptation
    Qi He, Qi Dai , Xiao Wu, and Jun-Yan He
    Neurocomputing, 2021

2020

  1. Informative Dropout for Robust Representation Learning: A Shape-bias Perspective
    Baifeng Shi, Dinghuai Zhang, Qi Dai, Zhanxing Zhu, Yadong Mu, and Jingdong Wang
    In ICML , 2020
  2. Weakly-Supervised Action Localization by Generative Attention Modeling
    Baifeng Shi, Qi Dai, Yadong Mu, and Jingdong Wang
    In CVPR , 2020
  3. Reinforced Short-length Hashing
    Xingbo Liu, Xiushan Nie, Qi Dai, Yupan Huang, Li Lian, and Yilong Yin
    IEEE TCSVT, 2020

2019

  1. Deep Incremental Hashing Network for Efficient Image Retrieval
    Dayan Wu, Qi Dai, Jing Liu , Bo Li , and Weiping Wang
    In CVPR , 2019
  2. Learning Spatial Awareness to Improve Crowd Counting
    Zhi-Qi Cheng , Jun-Xiu Li, Qi Dai , Xiao Wu, and Alexander G Hauptmann
    In ICCV , 2019
  3. Improving the Learning of Multi-Column Convolutional Neural Network for Crowd Counting
    Zhi-Qi Cheng , Jun-Xiu Li, Qi Dai , Xiao Wu, Jun-Yan He, and Alexander G Hauptmann
    In ACM Multimedia , 2019
  4. Decoupling Localization and Classification in Single Shot Temporal Action Detection
    Yupan Huang, Qi Dai, and Yutong Lu
    In ICME , 2019

2018

  1. Recurrent Tubelet Proposal and Recognition Networks for Action Detection
    Dong Li, Zhaofan Qiu, Qi DaiTing Yao, and Tao Mei
    In ECCV , 2018
  2. Deep Domain Adaptation Hashing with Adversarial Learning
    Fuchen Long, Ting YaoQi Dai, Xinmei Tian, Jiebo Luo, and Tao Mei
    In SIGIR , 2018

2016

  1. Binary Optimized Hashing
    In ACM Multimedia , 2016
  2. A Bayesian Hashing Approach and its Application to Face Recognition
    Qi DaiJianguo Li , Jun Wang, Yurong Chen, and Yu-Gang Jiang
    Neurocomputing, 2016

2015

  1. Optimal Bayesian Hashing for Efficient Face Recognition
    Qi DaiJianguo Li , Jun Wang, Yurong Chen, and Yu-Gang Jiang
    In IJCAI , 2015
  2. Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling
    Yu-Gang JiangQi Dai, Wei Liu, Xiangyang Xue, and Chong-Wah Ngo
    IEEE TIP, 2015
  3. Super Fast Event Recognition in Internet Videos
    Yu-Gang JiangQi DaiTao Mei, Yong Rui, and Shih-Fu Chang
    IEEE TMM, 2015
  4. Fudan-Huawei at MediaEval 2015: Detecting Violent Scenes and Affective Impact in Movies with Deep Learning
    Qi Dai, Rui-Wei Zhao, Zuxuan Wu , Xi Wang, Zichen Gu , Wenhai Wu, and 1 more author
    In MediaEval , 2015

2014

  1. Fudan-NJUST at MediaEval 2014: Violent Scenes Detection Using Deep Neural Networks
    Qi DaiZuxuan WuYu-Gang Jiang, Xiangyang Xue, and Jinhui Tang
    In MediaEval , 2014
  2. Challenge Huawei challenge: Fusing Multimodal Features with Deep Neural Networks for Mobile Video Annotation
    Jian Tu, Zuxuan WuQi DaiYu-Gang Jiang, and Xiangyang Xue
    In ICMEW , 2014

2013

  1. Beauty is here: Evaluating Aesthetics in Videos using Multimodal Features and Free Training Data
    Yanran Wang, Qi Dai, Rui Feng, and Yu-Gang Jiang
    In ACM Multimedia , 2013
  2. Fudan at MediaEval 2013: Violent Scenes Detection Using Motion Features and Part-Level Attributes
    Qi Dai, Jian Tu, Ziqiang Shi, Yu-Gang Jiang, and Xiangyang Xue
    In MediaEval , 2013

2012

  1. Trajectory-based Modeling of Human Actions with Motion Reference Points
    Yu-Gang JiangQi Dai, Xiangyang Xue, Wei Liu, and Chong-Wah Ngo
    In ECCV , 2012
  2. Fast Semantic Diffusion for Large-scale Context-based Image and Video Annotation
    Yu-Gang JiangQi Dai , Jun Wang, Chong-Wah Ngo, Xiangyang Xue, and Shih-Fu Chang
    IEEE TIP, 2012
  3. A Fast Video Event Recognition System and its Application to Video Search
    Yu-Gang JiangQi Dai, Yingbin Zheng, Xiangyang Xue, Jie Liu , and Dong Wang
    In ACM Multimedia (Demo) , 2012
  4. The Shanghai-Hongkong team at MediaEval2012: Violent Scene Detection using Trajectory-based Features
    Yu-Gang JiangQi Dai, Chun Chet Tan, Xiangyang Xue, and Chong-Wah Ngo
    In MediaEval , 2012