World Models
Ha & SchmidhuberThe foundational paper demonstrating that agents can learn inside their own dreams (latent space) to solve tasks like CarRacing and Doom.
Read Paper
The hub for next-generation World Models.
From Dreamer to Sora, exploring agents that learn by imagining.
Humans don't process raw sensory data directly to make decisions. We build a mental model of our environment—a World Model.
In AI, a World Model learns a compressed representation of the spatial and temporal dynamics of the environment. The agent can then "hallucinate" or simulate possible futures in this latent space to learn optimal policies without trial-and-error in the real world.
The "V-M-C" framework established by Ha & Schmidhuber (2018) remains the blueprint for modern predictive agents.
Usually a VAE (Variational Autoencoder). It compresses high-dimensional raw sensory data (like pixels) into a compact latent vector z.
z_t = V(obs_t)
An RNN or Transformer that predicts the future latent state. This is the "World Model" itself—it understands dynamics and causality.
h_t+1 = M(h_t, z_t, a_t)
The Agent. It uses the representations from V and M to select actions that maximize expected future reward, often trained purely in simulation.
a_t = C(z_t, h_t)
Key milestones in the development of World Models.
The foundational paper demonstrating that agents can learn inside their own dreams (latent space) to solve tasks like CarRacing and Doom.
Read PaperScalable reinforcement learning using world models. DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch.
View ProjectJoint Embedding Predictive Architecture. Moves away from pixel reconstruction to predicting in abstract representation space.
Read BlogLarge-scale video generation models acting as generalist world simulators, understanding physics, object permanence, and causality.
View Showcase