Simulating the future to plan the present

Internal Representations
of Reality

The hub for next-generation World Models.
From Dreamer to Sora, exploring agents that learn by imagining.

2018
First Paper
SOTA
Atari Performance
Video
Gen AI Frontier
AGI
Path Forward

01 The Mental Model

Humans don't process raw sensory data directly to make decisions. We build a mental model of our environment—a World Model.

In AI, a World Model learns a compressed representation of the spatial and temporal dynamics of the environment. The agent can then "hallucinate" or simulate possible futures in this latent space to learn optimal policies without trial-and-error in the real world.

  • Sample Efficiency Learning from imagination requires fewer real-world interactions.
  • Safety Test dangerous scenarios in simulation before deployment.
Input
Latent State
Action
Internal Loop (Dreaming)

02. Core Architecture

The "V-M-C" framework established by Ha & Schmidhuber (2018) remains the blueprint for modern predictive agents.

V Model (Vision)

Usually a VAE (Variational Autoencoder). It compresses high-dimensional raw sensory data (like pixels) into a compact latent vector z.

z_t = V(obs_t)

M Model (Memory)

An RNN or Transformer that predicts the future latent state. This is the "World Model" itself—it understands dynamics and causality.

h_t+1 = M(h_t, z_t, a_t)

C Model (Controller)

The Agent. It uses the representations from V and M to select actions that maximize expected future reward, often trained purely in simulation.

a_t = C(z_t, h_t)

03. Evolution & Research

Key milestones in the development of World Models.

18

World Models

Ha & Schmidhuber

The foundational paper demonstrating that agents can learn inside their own dreams (latent space) to solve tasks like CarRacing and Doom.

Read Paper
20

Dreamer V1/V2/V3

Danijar Hafner et al.

Scalable reinforcement learning using world models. DreamerV3 is the first algorithm to collect diamonds in Minecraft from scratch.

View Project
22

I-JEPA

Yann LeCun (Meta)

Joint Embedding Predictive Architecture. Moves away from pixel reconstruction to predicting in abstract representation space.

Read Blog
24

Sora / Gen-3

OpenAI / Runway

Large-scale video generation models acting as generalist world simulators, understanding physics, object permanence, and causality.

View Showcase