Separating Knowledge and Perception with Procedural Data

MIT CSAIL
ICML 2025
TL;DR: Train Embeddings on Procedural Data, Apply on Real Images with Visual Memory, For Separation Between Perception and Knowledge

Abstract

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full separation of perception and knowledge with respect to all real-world images while retaining strong performance. In standard approaches, perceptual skills and knowledge are entangled—there is no boundary between what the model is able to do (e.g. classification, segmentation), and what it knows (e.g. people, dogs). Even in visual memory approaches, the embedding model is trained on real data, leaking knowledge into the representation. Our method addresses these limitations with efficient training (no real data), easy editability (data addition/removal through database modification), and explainability (decisions traced to specific examples).


In standard approaches, perceptual skills and knowledge are entangled

  • There is no boundary between what the model is able to do (e.g. classification, segmentation), and what it knows (e.g. people, dogs).
  • Even in visual memory approaches, the embedding model is trained on real data, leaking knowledge into the representation.

Approach: Procedural Training + Visual Memory

Train representation models with procedural data only, and apply without further training by using visual memory
  • Efficient: No training with real data
  • Editable: Easy data addition/removal through database modification
  • Explainable: Can trace decisions back to specific examples
Comparison of classifier vs visual memory perception

Key Properties of Procedural Data

Fundamental visual features without privacy risk
  • Procedural data = simple OpenGL programs
  • Zero privacy risk: no personal information or real objects
  • Learn fundamental visual features that generalize to real images
Examples of procedural data
Examples of procedural data from prior work, our new Masked Shaders dataset, and the real datasets Places and ImageNet.

Novel Procedural Dataset: Shaders KML

The Shaders KML process
The Shaders KML process: (1) Sample shaders, (2) Extract mask via K-means in RGB space, (3) Mix shaders using mask.
Innovation: Data-driven mixing masks
  • Prior work: constant mixing masks (standard Mixup)
  • Our approach: extract masks from shader patterns using K-means
  • Result: Greater diversity and stronger representations

Human Visual Similarity Results

Key finding: The best procedural datasets have equivalent performance to training on the real dataset Places!
Performance on NIGHTS benchmark
Examples and performance on NIGHTS benchmark. Our procedural models match the performance of models trained on real data.

Nearest Neighbours

Qualitative results: Procedural models select good neighbours even within ImageNet, a large scale pool of O(1M) images.
Nearest neighbours visualization
Nearest neighbours of ImageNet query images according to the strongest procedural model.
Quantitative results: Procedural models beat out-of-distribution real data models on fine-grained classification. On ImageNet-1K, they are within 10%.
Classification accuracy comparison
Nearest neighbours of a bird query image from CUB (left) and KNN classification accuracy (right) on various datasets.

Zero-shot Semantic Segmentation

Remarkable ability: Despite never seeing real images, procedural models can segment bikes and other objects!
Zero-shot segmentation pipeline
Zero-shot segmentation pipeline using procedural models. Features are extracted and projected using PCA to obtain segmentations.
Segmentation results
Zero-shot segmentation using PCA features. In examples (left), procedural models clearly separate objects from backgrounds. In quantitative results (right), showing R² between PCA features and human label segmentations, we verify the qualitative observations.

In-Context Semantic Segmentation

Remarkable ability: Despite never seeing real images, procedural models can segment bikes and other objects!
In-context segmentation pipeline
In-context segmentation pipeline: given a single exemplar prompt, procedural models can segment arbitrary classes.
In-context segmentation results
In-context segmentation on Ade20k (Zhou et al.). Procedural models can segment arbitrary classes given a single exemplar prompt, demonstrating strong generalization despite never training on real images.

BibTeX

@inproceedings{rodriguezmunoz2025separating,
  title = {Separating Knowledge and Perception with Procedural Data},
  author = {Rodríguez-Muñoz, Adrián and Baradad, Manel and Isola, Phillip and Torralba, Antonio},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year={2025},
}

Acknowledgements

This work was supported by the La Caixa Foundation and DSTA Singapore.