Separating Knowledge and Perception with Procedural Data

Abstract

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full separation of perception and knowledge with respect to all real-world images while retaining strong performance. In standard approaches, perceptual skills and knowledge are entangled—there is no boundary between what the model is able to do (e.g. classification, segmentation), and what it knows (e.g. people, dogs). Even in visual memory approaches, the embedding model is trained on real data, leaking knowledge into the representation. Our method addresses these limitations with efficient training (no real data), easy editability (data addition/removal through database modification), and explainability (decisions traced to specific examples).

In standard approaches, perceptual skills and knowledge are entangled

There is no boundary between what the model is able to do (e.g. classification, segmentation), and what it knows (e.g. people, dogs).
Even in visual memory approaches, the embedding model is trained on real data, leaking knowledge into the representation.

Approach: Procedural Training + Visual Memory

Train representation models with procedural data only, and apply without further training by using visual memory

Efficient: No training with real data
Editable: Easy data addition/removal through database modification
Explainable: Can trace decisions back to specific examples

Comparison of classifier vs visual memory perception

Key Properties of Procedural Data

Fundamental visual features without privacy risk

Procedural data = simple OpenGL programs
Zero privacy risk: no personal information or real objects
Learn fundamental visual features that generalize to real images

Examples of procedural data from prior work, our new Masked Shaders dataset, and the real datasets Places and ImageNet.

Novel Procedural Dataset: Shaders KML

The Shaders KML process: (1) Sample shaders, (2) Extract mask via K-means in RGB space, (3) Mix shaders using mask.

Innovation: Data-driven mixing masks

Prior work: constant mixing masks (standard Mixup)
Our approach: extract masks from shader patterns using K-means
Result: Greater diversity and stronger representations

Human Visual Similarity Results

Key finding: The best procedural datasets have equivalent performance to training on the real dataset Places!

Performance on NIGHTS benchmark — Examples and performance on NIGHTS benchmark. Our procedural models match the performance of models trained on real data.

Nearest Neighbours

Qualitative results: Procedural models select good neighbours even within ImageNet, a large scale pool of O(1M) images.

Quantitative results: Procedural models beat out-of-distribution real data models on fine-grained classification. On ImageNet-1K, they are within 10%.

Classification accuracy comparison — Nearest neighbours of a bird query image from CUB (left) and KNN classification accuracy (right) on various datasets.

Zero-shot Semantic Segmentation

Remarkable ability: Despite never seeing real images, procedural models can segment bikes and other objects!

Zero-shot segmentation pipeline using procedural models. Features are extracted and projected using PCA to obtain segmentations.

Segmentation results — Zero-shot segmentation using PCA features. In examples (left), procedural models clearly separate objects from backgrounds. In quantitative results (right), showing R² between PCA features and human label segmentations, we verify the qualitative observations.

In-Context Semantic Segmentation

Remarkable ability: Despite never seeing real images, procedural models can segment bikes and other objects!

In-context segmentation pipeline: given a single exemplar prompt, procedural models can segment arbitrary classes.

In-context segmentation results — In-context segmentation on Ade20k (Zhou et al.). Procedural models can segment arbitrary classes given a single exemplar prompt, demonstrating strong generalization despite never training on real images.

BibTeX

@inproceedings{rodriguezmunoz2025separating,
  title = {Separating Knowledge and Perception with Procedural Data},
  author = {Rodríguez-Muñoz, Adrián and Baradad, Manel and Isola, Phillip and Torralba, Antonio},
  booktitle = {Proceedings of the International Conference on Machine Learning (ICML)},
  year={2025},
}

Acknowledgements

This work was supported by the La Caixa Foundation and DSTA Singapore.