Compartmentalizing Knowledge with Procedural Data

MIT CSAIL

Abstract

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1% on NIGHTS visual similarity, outperforms by 8% and 15% on CUB200 and Flowers102 fine-grained classification, and is within 10% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an \(R^2\) on COCO within 10% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.



Our approach is as follows: first, an embedding model is trained on procedural data generated with OpenGL code using self-supervised learning (SSL). In this stage, unlearning and attribution is difficult, but procedural data is much less exposed to privacy/bias risk and real-world semantics. Next, we use the embedding model on real world tasks using only a visual memory of reference image embeddings, without extra training. When working with real instead of procedural data, there is high privacy/bias risk and real world semantics. However, isolating all real data to only the memory makes efficient data unlearning and privacy analysis possible. The overall system has perfect control over all real world data, while approximating the performance of standard training.


Limitations of prior visual memory approaches

Prior visual memory approaches can efficiently unlearn data in the memory, but not data used to train the embedding model; our proposal to use procedural data makes the latter case much less likely.
Unlike real world data, procedural data is non-realistic and is generated via simple code, and thus is much less exposed to the privacy or bias risks that motivate unlearning.

KNN classification with procedural embeddings and visual memory

Despite having no training on real images, not even linear probes, procedural models are effective KNN classifiers.

Performance: On fine-grained datasets, procedural data performs better than realistic data with no semantic overlap. When there is semantic overlap, procedural data is within < 10%.
Visualization: Procedural models can effectively search for perceptually similar images on a wide variety of datasets, despite not seeing real-world data during training.

Semantic segmentation with procedural embeddings and visual memory

Additionally, procedural models have remarkable semantic segmentation ability.

Performance: \(R^2\) of PCA features and human label segmentations. The best procedural model is within 10% of real data models and highly above random and RGB features.
Visualization: Procedurally trained models are effective at zero-shot and in-context segmentations, but struggle at KNN segmentation due to not seeing real-world objects during training.

Handling of sensitive data

Sensitive data is information that legally or ethically needs to be handled with high standards of care and control, such as facial identity or medical data. In this scenario, directly training on the data is often not acceptable; procedural models with memory thus offer an elegant solution.

Despite never training on faces, a model trained on the Shaders KML Mixup procedural dataset can match for appearance and facial expressions on CelebA.
KNN classification accuracy on the MedMNIST datasets. Procedural models match or exceed the best result from the original paper (Yang et al., 2022) (a normally trained ResNet) in 7/10 datasets.

Related Work

[1] Baradad, M., Wulff, J., Wang, T., Isola, P., and Torralba, A. Learning to See by Looking at Noise
[2] Baradad, M., Chen, C.-F., Wulff, J., Wang, T., Feris, R., Torralba, A., and Isola, P. Procedural Image Programs for Representation Learning
[3] Geirhos, R., Jaini, P., Stone, A., Medapati, S., Yi, X., Toderici, G., Ogale, A., and Shlens, J. Towards flexible perception with visual memory
[4] Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J., Bojanowski, P., and Joulin, A. Emerging Properties in Self-Supervised Vision Transformers
[5] Wah, C., Branson, S., Welinder, P., Perona, P., and Belongie, S. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001
[6] Nilsback, M.-E. and Zisserman, A. Automated Flower Classification over a Large Number of Classes
[7] Bossard, L., Guillaumin, M., and Van Gool, L. Food-101 – Mining Discriminative Components with Random Forests
[8] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge
[9] Lin, T.-Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C. L., and Dollar, P. Microsoft COCO: Common Objects in Context
[10] Liu, Z., Luo, P.,Wang, X., and Tang, X. Deep Learning Face Attributes in the Wild

BibTeX


    @misc{rodriguezmunoz2025compartmentalizing,
      title = {Compartmentalizing Knowledge with Procedural Data},
      author = {Rodríguez-Muñoz, Adrián and Baradad, Manel and Isola, Phillip and Torralba, Antonio},
      year={2025},
    }
  

Acknowledgements

Adrián Rodríguez-Muñoz is supported by the LaCaixa fellowship. The Torralba and Isola labs are supported by DSTA Singapore and the ONR MURI program.