Compartmentalizing Knowledge with Procedural Data

MIT CSAIL

Abstract

We train representation models with procedural data only, and apply them on visual similarity, classification, and semantic segmentation tasks without further training by using visual memory—an explicit database of reference image embeddings. Unlike prior work on visual memory, our approach achieves full compartmentalization with respect to all real-world images while retaining strong performance. Compared to a model trained on Places, our procedural model performs within 1% on NIGHTS visual similarity, outperforms by 8% and 15% on CUB200 and Flowers102 fine-grained classification, and is within 10% on ImageNet-1K classification. It also demonstrates strong zero-shot segmentation, achieving an \(R^2\) on COCO within 10% of the models trained on real data. Finally, we analyze procedural versus real data models, showing that parts of the same object have dissimilar representations in procedural models, resulting in incorrect searches in memory and explaining the remaining performance gap.



Our approach is as follows: first, an embedding model is trained on procedural data generated with OpenGL code using self-supervised learning (SSL). In this stage, unlearning and attribution is difficult, but procedural data is much less exposed to privacy/bias risk and real-world semantics. Next, we use the embedding model on real world tasks using only a visual memory of reference image embeddings, without extra training. When working with real instead of procedural data, there is high privacy/bias risk and real world semantics. However, isolating all real data to only the memory makes efficient data unlearning and privacy analysis possible. The overall system has perfect control over all real world data, while approximating the performance of standard training.


Limitations of prior visual memory approaches

Prior visual memory approaches can efficiently unlearn data in the memory, but not data used to train the embedding model; our proposal to use procedural data makes the latter case much less likely.
Unlike real world data, procedural data is non-realistic and is generated via simple code, and thus is much less exposed to the privacy or bias risks that motivate unlearning.

Procedurally trained embeddings are effective KNN classifiers of real images

Performance: On fine-grained datasets, procedural data performs better than realistic data with no semantic overlap. When there is semantic overlap, procedural data is within < 10%.
Visualization: Procedural models can effectively search for perceptually similar images on a wide variety of datasets, despite not seeing real-world data during training.

Procedural models are also capable of semantic segmentation

Performance: \(R^2\) of PCA features and human label segmentations. The best procedural model is within 10% of real data models and highly above random and RGB features.
Visualization: Procedurally trained models are effective at zero-shot and in-context segmentations, but struggle at KNN segmentation due to not seeing real-world objects during training.

Handling of sensitive data

Sensitive data is information that legally or ethically needs to be handled with high standards of care and control, such that directly training on the data is often not acceptable. Despite never training on faces, a model trained on the Shaders KML Mixup procedural dataset can match for appearance and facial expressions on CelebA.

Related Work

[1] Wightman, R.: PyTorch Image Models (2019). https://doi.org/10.5281/zenodo.4414861, https://github.com/rwightman/pytorch-image-models
[2] Liu, C., Dong, Y., Xiang, W., Yang, X., Su, H., Zhu, J., Chen, Y., He, Y., Xue, H., Zheng, S.: A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking (Feb 2023). https://doi.org/10.48550/arXiv.2302.14301, http://arxiv.org/abs/ 2302.14301, arXiv:2302.14301 [cs]
[3] Xie, C., Tan, M., Gong, B., Yuille, A., Le, Q.V.: Smooth Adversarial Training (Jul 2021), http://arxiv.org/abs/2006.14536, arXiv:2006.14536 [cs]
[4] Simon-Gabriel, C.J., Ollivier, Y., Bottou, L., Schölkopf, B., Lopez-Paz, D.: First-order Adversarial Vulnerability of Neural Networks and Input Dimension (Jun 2019), http://arxiv. org/abs/1802.01421, arXiv:1802.01421 [cs, stat]
[5] Ross, A.S., Doshi-Velez, F.: Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients (Nov 2017), http://arxiv.org/abs/ 1711.09404, arXiv:1711.09404 [cs]
[6] Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards Deep Learning Models Resistant to Adversarial Attacks. In: International Conference on Learning Representations (2018), https://arxiv.org/abs/1706.06083

BibTeX


    @inproceedings{rodriguezmunoz2024characterizing,
      title={Characterizing model robustness via natural input gradients},
      author={Adrián Rodríguez-Muñoz and Tongzhou Wang and Antonio Torralba},
      booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
      year={2024},
      url={}
    }
  

Acknowledgements

Adrián Rodríguez-Muñoz is supported by the LaCaixa fellowship. Tongzhou Wang is supported by the ONR MURI program.