Recent advancements in Human-Centric Foundation Models (HFM) have revolutionized the understanding and synthesis of human-related data across perception, generation, and embodied intelligence. These models, trained on vast multimodal datasets, achieve robust perception of human-centric information and high-fidelity synthesis of digital humans. However, despite these successes, current HFM remain fundamentally passive: trained on static datasets with limited support for real-time interaction, they struggle to adapt to real-time human feedback or environmental adaptation. These limitations hinder their deployment in real-world applications in video generation, game industry, and robotics.
This challenge highlights the need for Interactive Human-Centric Foundation Models (I-HFM), a new
research direction in the AI community, designed to perceive, generate, and act in ways that seamlessly
align with human needs and behaviors through bidirectional engagement between humans, environments, and AI
systems. I-HFMs cover three critical dimensions of interactivity:
(a) Interact with users, where HFM
empower users to intuitively guide the creation and refinement of human-centric content, (e.g.,
personalized avatars, animations, or virtual try-ons), with natural language, gestures, or iterative
feedback.
(b) Interact with physical or virtual environments, enabling I-HFMS to learn skills from
environmental feedback, like humans, by integrating perception, reasoning, and action to adapt to dynamic
scenarios.
(c) Interact with other agents (including humans) to collaborate to solve complex, socially
embedded tasks in the real world.
This interactivity of HFM is not merely a technical advancement feature but a fundamental step toward
achieving AGI. By enabling continuous adaptation to human feedback and real-world dynamics, I-HFMs
transform AI from passive perception models into proactive, human-like autonomous agents.This ability to
perceive, reason, and act in environments as human beings bridges the gap between static data-driven
systems and truly responsive and socially intelligent AI, paving the way for AGI that seamlessly integrates
into human societies.
Call for papers
We welcome full paper submissions. The papers must be no longer than 8 pages in total (excluding references):
- Paper Length: Minimum of 5 pages, Maximum of 8 pages (excluding references).
- Format: Submit as PDF following the official ICCV 2025 template and guidelines.
- Review Policy: Submissions must be anonymous and follow ICCV 2025 double-blind review rules.
- Dual Submission: Not permitted under ICCV 2025 and I-HFM 2025 guidelines.
- Supplementary Materials: Optional videos, images, etc. can be uploaded as a separate zip file. The deadline matches the paper submission deadline.
- Presentation Requirement: At least one author of each accepted paper must attend and present the work in person.
- Presentation Format: Accepted papers will be presented either as oral or poster presentations.
- Conference Policy: Presentation rules follow the ICCV 2025 main conference policy.
- Compliance: Failure to meet these rules may result in removal from the workshop program.
Submission Portal
Via OpenReview
- Submission deadline (archived paper): July 7, 2025, 11:59 PM AOE
- Notification to authors (archived paper): July 11, 2025
- Camera ready deadline (archived paper): Aug 18, 2025, 11:59 PM AOE
Via OpenReview (stay tuned)
- Submission deadline (non-archived papers): Sept 1, 2025, 11:59 PM AOE
- Notification to authors (non-archived papers): Sept 15, 2025
Topics
We welcome submissions addressing the construction, analysis, and application of interactive human-centric foundation models, including topics such as:
Establishing metrics, datasets, and benchmarks to evaluate the interactivity, multimodal integration, and real-world performance of human-centric foundation models.
Developing scalable methods that integrate human-centric priors into perceptual, generative, and embodiment tasks, enabling dynamic user engagement and real-time control.
Creating systems that leverage human-centric foundation models to support real-time interactions, allowing users to explore, intervene, and adapt within complex environments.
Leveraging diverse modalities (vision, language, audio, motion) to build comprehensive models that capture the rich complexity of human behavior through efficient learning strategies.
Investigating the deployment of interactive human-centric foundation models in social robotics, autonomous systems,digital content creation, and beyond to enhance decision-making and user engagement.
Speakers and panelists

Michael Black
Director, Max Planck Institute for Intelligent Systems.
Ziwei Liu
Associate Professor, Nanyang Technological University.
Evonnc Ng
Research scientist, Meta.
Angjoo Kanazawa
Assistant Professor, UC Berkeley.
Lan Xu
Assistant Professor, ShanghaiTech University.
Ailing Zeng
Technical staff member, AnuttaconWorkshop Schedule
Time | Session | Duration | Details |
---|---|---|---|
9:00AM - 9:10AM | Opening Remarks | 10 min | Welcome and Introduction to the Workshop |
9:10AM - 9:30AM | Invited Talk #1 | 20 min | Talk1 |
9:40AM - 10:00AM | Invited Talk #2 | 20 min | Talk2 |
10:10AM - 10:30AM | Invited Talk #3 | 20 min | Talk3 |
10:40AM - 11:00AM | Invited Talk #4 | 20 min | Talk4 |
11:00AM - 12:00AM | Poster Session & Coffee Socials #1 | 60 min | Networking and refreshments |
12:00AM - 13:00PM | Lunch Break | 60 min | Time for lunch and informal discussions |
13:00PM - 13:20PM | Invited Talk #5 | 20 min | Talk5 |
13:20PM - 13:50PM | Oral Presentations * 2 | 15 min * 2 | Oral Presentations * 2 |
13:50PM - 14:20PM | Oral Presentations * 2 | 15 min * 2 | Oral Presentations * 2 |
14:25PM - 15:25PM | Poster Session & Coffee Socials #2 | 60 min | Networking and refreshments |
15:30PM - 15:50PM | Invited Talk #6 | 20 min | Talk6 |
16:00PM - 16:20PM | Invited Talk #7 | 20 min | Talk7 |
16:30PM - 17:00PM | Panel Discussion | 30 min | Interactive session with panelists |
17:00PM - 17:15PM | Awards and Conclusive Remarks | 15 min | Concluding the workshop and award announcements |
Organization
Workshop Organizers
Organizing Commitee

Shixiang Tang
Postdoctoral researcher, the Chinese University of Hong Kong
Yizhou WANG
Ph.D. student, The Chinese University of Hong Kong
Xin Chen
Research Scientist, ByteDance
Wanli Ouyang
Professor, Chinese University of Hong Kong
Shiyao Xu
Ph.D. student, University of Trento
Jing Liu
Research Scientist, ByteDance
Emily Kim
Ph.D, Carnegie Mellon University
Xiaowei Zhou
Professor, Zhejiang University.
Taku Komura
Professor, The University of Hong Kong
Gül Varol
Permanent researcher, École des Ponts ParisTech
Nicu Sebe
Professor, The University of Trento