1st Workshop on Long Multi-Scene Video Foundations

Generation, Understanding and Evaluation

Cutting-edge video modeling techniques have achieved impressive results in computer vision, especially in understanding and generating video content. However, these techniques are usually limited to short, single-scene videos and face challenges when applied to real-world scenarios involving complex, long-form narratives with multiple dynamic scenes. This workshop aims to bring together experts in long, multi-scene video modeling to discuss generation, understanding, evaluation, and ethical considerations. The workshop will establish a collaborative platform for exchanging recent breakthroughs and deliberating on the future direction of visual computing models capable of handling extended video content. Through this exchange of ideas and insights, we hope to overcome the challenges of creating and understanding long video narratives and contribute to their practical applications in various fields, including entertainment, education, and health.

Event Starts In:

00
Days
00
Hours
00
Minutes
00
Seconds

This workshop will be held at Honolulu, Hawaii as part of ICCV 2025.

ICCV 2025 Logo

Call for Papers

We invite contributions that explore technical, methodological, and societal aspects of working with complex video data that spans extended temporal durations and diverse content. Topics of interest include, but are not limited to:

  • Multi-scene video generation, including text-to-video (T2V) approaches
  • Vision-language models designed for long-form video understanding and generation
  • Efficient training and inference strategies for large-scale video models
  • Techniques for editing long-form or multi-scene video content
  • Representation learning tailored to long videos
  • Ensuring long-term temporal consistency in generated or analyzed video sequences
  • Long-range reasoning and semantic understanding across scenes
  • Factuality and grounding in video generation and comprehension tasks
  • Development of evaluation metrics and benchmarks for long-form video
  • Analysis of ethical and societal implications of large-scale long video models

Submission Tracks

1. Proceedings Track

For the Proceedings Track, submitted papers are expected to present original, unpublished work and must adhere to a strict double-blind review process. To ensure anonymity, authors should carefully avoid including any identifying information within the paper itself. The formatting of your submission is crucial and must strictly follow the comprehensive guidelines detailed in the ICCV 2025 Author Kit. Submissions should be between 4 and 8 pages in length, with additional pages allowed exclusively for the list of references. Papers accepted into this prestigious track will be compiled and published in the official ICCV 2025 Workshop Proceedings and authors of accepted papers will have the opportunity to present their work in person at the workshop.

Submit to Proceedings Track via: OpenReview

2. No Proceedings Track

The No Proceedings Track offers a more flexible avenue for presenting a wider range of contributions. This track is well-suited for:

  • Extended abstracts and short papers: We welcome submissions of up to 4 pages that describe work still in progress, report negative findings that are valuable to the community, or articulate insightful position papers on relevant topics.
  • Previously published work: We also accept submissions of work that has already been published elsewhere. This includes papers that may have been accepted at the main ICCV 2025 conference itself.

Submit to No Proceedings Track via: NOT OPEN

Important Dates

Proceedings
Submission Deadline

June 8, 2025

Preliminary Author Notification

June 26, 2025

Camera-ready Deadline

July 12, 2025

Non Proceedings
Submission Deadline

August 30, 2025

Preliminary Author Notification

September 14, 2025

All dates are in GMT.

Speakers

Katerina Fragkiadaki

JPMorgan Chase Associate Professor of Computer Science, Carnegie Mellon University

Ishan Misra

Research Director and the Tech Lead of MovieGen, Meta

Sayak Paul

Research Engineer, Hugging Face

Jiajun Wu

Assistant Professor of Computer Science, Stanford University

Event Schedule

Coming Soon!

Stay tuned for the official schedule.

09:00 AM - 10:00 AM

Welcome & Registration

10:00 AM - 11:00 AM

Opening Keynote

Organizers

Vasco Ramos

PhD Candidate, NOVA University of Lisbon

Regev Cohen

Senior Research Scientist, Google

Hritik Bansal

PhD Candidate, University of California

Sivan Doveh

Student Researcher, Google

Jehanzeb Mirza

Postdoctoral Researcher, MIT CSAIL

Hila Chefer

PhD Candidate, Tel Aviv University

Inbar Mosseri

Team Lead and Senior Staff Research Scientist, Google DeepMind

Joao Magalhaes

Full Professor, NOVA University of Lisbon