Selecting appropriate datasets is critical in modern computer vision. However, no general-purpose tools exist to evaluate the extent to which two datasets differ.
For this, we propose representing images – and by extension datasets – using Distributions of Neuron Activations (DNAs). DNAs fit distributions, such as histograms or Gaussians, to activations of neurons in a pre-trained feature extractor through which we pass the image(s) to represent. This extractor is frozen for all datasets, and we rely on its generally expressive power in feature space. By comparing two DNAs, we can evaluate the extent to which two datasets differ with granular control over the comparison attributes of interest, providing the ability to customise the way distances are measured to suit the requirements of the task at hand. Furthermore, DNAs are compact, representing datasets of any size with less than 15 megabytes.
We demonstrate the value of DNAs by evaluating their applicability on several tasks, including conditional dataset comparison, synthetic image evaluation, and transfer learning, and across diverse datasets, ranging from synthetic cat images to celebrity faces and urban driving scenes
Comparing datasets with FID requires thousands of samples to stabilise.
Comparing datasets with DNAs can provide a stable result with a few hundred samples.
We can generate DNAs of datasets and images. In this experiment, we generate one DNA from multiple Cityscapes images and DNAs for individual images from other datasets. We visualise the ranking of individual image DNAs when compared to the Cityscapes DNA.
Comparing DNAs of images with very few neurons can be sufficient to find images with similar semantic attributes.
When comparing two DNAs, we can use weighted combinations of neurons to try and control what the comparison is sensitive to. Here, we show the results of removing the sensitivity to specific attributes. For each of the 40 attributes in CelebA, we use a weighted combination of distances over different layers or neurons with weights optimised such that the resulting distance between DNAs of images with and without the attribute becomes zero. We want to prevent sensitivity deviations over all 39 other attributes. This highlights the power of the granularity of our proposed representation.
When comparing DNAs of synthetic StyleGANv2 images to DNAs of the corresponding real datasets, we observe that we can achieve better evaluations by selecting neurons that are more sensitive to differences between real and synthetic images.
@article{ramtoula2023vdna,
author = {Ramtoula, Benjamin and Gadd, Matthew and Newman, Paul and De Martini, Daniele},
title = {Visual DNA: Representing and Comparing Images using Distributions of Neuron Activations},
journal = {CVPR},
year = {2023},
}