The ImageNetV2 dataset contains new test data for the ImageNet benchmark. ImageNetV2 contains three test sets with 10,000 new images each. Importantly, these test sets were sampled after a decade of progress on the original ImageNet dataset. This makes the new test data independent of existing models and guarantees that the accuracy scores are not affected by adaptive overfitting. We designed the data collection process for ImageNetV2 so that the resulting distribution is as similar as possible to the original ImageNet dataset. Our paper "Do ImageNet Classifiers Generalize to ImageNet?" describes ImageNetV2 and associated experiments in detail. In addition to the three test sets, we also release our pool of candidate images from which the test sets were assembled. Each image comes with rich metadata such as the corresponding Flickr search queries or the annotations from MTurk workers. The aforementioned paper also describes CIFAR-10.1, a new test set for CIFAR-10.
Model accuracy on the original test sets vs. our new test sets. Each data point corresponds to one model in our testbed (shown with 95% Clopper-Pearson confidence intervals).