Visual DNA:
Representing and Comparing Images using Distributions of Neuron Activations

Oxford Robotics Institute
University of Oxford
CVPR 2023

Overview

What are Visual DNAs useful for?

Illustration of dataset comparisons

Example usage comparing different datasets

Illustration of image to dataset and image pairs comparisons

Example usage comparing an image to a dataset or pairs of images.

Visual DNAs allow you to generate compact granular representations of images and datasets. Visual DNAs can be compared conditionally thanks to the granularity they provide.

What is a Visual DNA?

Visualisation of Visual DNA composition


Visual DNAs consist of the Distributions of Neuron Activations of a pre-trained frozen feature extractor when fed with the images to represent. We then accumulate activations from different images using histograms or by fitting a Gaussian.

Comparing Visual DNAs comes down to comparing distributions (histograms or Gaussians) for each neuron. As we don't expect all neurons of the pre-trained network to be sensitive to the particular properties we are interested in comparing, this granularity allows to create custom comparisons that only rely on neurons of interest.

How can I use it?

pip install vdna

Documentation available here.

Video of example usage of the library

Abstract

Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ.

For this, we propose representing images – and by extension datasets – using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes.

We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes

Results

Number of images required to represent a dataset

Illustration of dataset comparisons

Comparing datasets with FID requires thousands of samples to stabilise.

Illustration of image to dataset and image pairs comparisons

Comparing datasets with DNAs can provide a stable result with a few hundred samples.

Finding most similar images to a reference dataset

Illustration of dataset comparisons

We can generate DNAs of datasets and images. In this experiment, we generate one DNA from multiple Cityscapes images and DNAs for individual images from other datasets. We visualise the ranking of individual image DNAs when compared to the Cityscapes DNA.

Finding most similar images to a reference image

Illustration of image pair comparison

Comparing DNAs of images with very few neurons can be sufficient to find images with similar semantic attributes.

Removing sensitivity to attributes when comparing DNAs

Illustration of sensitivity removal.

When comparing two DNAs, we can use weighted combinations of neurons to try and control what the comparison is sensitive to. Here, we show the results of removing the sensitivity to specific attributes. For each of the 40 attributes in CelebA, we use a weighted combination of distances over different layers or neurons with weights optimised such that the resulting distance between DNAs of images with and without the attribute becomes zero. We want to prevent sensitivity deviations over all 39 other attributes. This highlights the power of the granularity of our proposed representation.

Selecting neurons for realism

Illustration of fake image ranking

When comparing DNAs of synthetic StyleGANv2 images to DNAs of the corresponding real datasets, we observe that we can achieve better evaluations by selecting neurons that are more sensitive to differences between real and synthetic images.

BibTeX

@article{ramtoula2023vdna,
  author    = {Ramtoula, Benjamin and Gadd, Matthew and Newman, Paul and De Martini, Daniele},
  title     = {Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations},
  journal   = {CVPR},
  year      = {2023},
}