Hashing


  • SIFT1M[161MB]  GIST1M[2.6GB]

    Source:  Aude Oliva, Antonio Torralba
    Description:

    We provides several evaluation sets to evaluate the quality of approximate nearest neighbors search algorithm on different kinds of data and varying database sizes. In particular, we provide a very large set of 1 billion vectors, to our knowledge this is the largest set provided to evaluate ANN methods.

    Each comprises 3 subsets of vectors:

    (1) base vectors: the vectors in which the search is performed

    (2) query vectors

    (3) learning vectors: to find the parameters involved in a particular method

    In addition, we provide the groundtruth for each set, in the form of the pre-computed k nearest neighbors and their square Euclidean distance.

    We use three different file formats:

    (1) The vector files are stored in .bvecs or .fvecs format.

    (2) The groundtruth file in is .ivecs format.

    Vector set dimension nb base vectors nb query vectors nb learn vectors file format
    SIFT1M 128 1,000,000 10,000 100,000 fvecs
    GIST1M 960 1,000,000 1,000 500,000 fvecs
  • CIFAR-10[160MB]

    Source:  Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. 2009
    Description:

    The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

    The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. Here are the classes in the dataset:airplane?automobile、bird、cat、deer、dog、frog、horse、ship、truck.

    The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

  • NUSWIDE[1.2GB]

    Source:  Lab for Media Search in National University of Singapore. 2009
    Description:

    A Real-World Web Image Dataset from National University of Singapore.Here we Description a web image dataset created by Lab for Media Search in National University of Singapore. The dataset includes:

    (1) 269,648 images and the associated tags from Flickr, with a total number of 5,018 unique tags;

    (2) six types of low-level features extracted from these images, including 64-D color histogram, 144-D color correlogram, 73-D edge direction histogram, 128-D wavelet texture, 225-D block-wise color moments and 500-D bag of words based on SIFT descriptions;

    (3) ground-truth for 81 concepts that can be used for evaluation.

    Based on this dataset, we identify several research issues on web image annotation and retrieval. We also provide the baseline results for web image annotation by learning from the tags using the traditional k-NN algorithm. The benchmark results show that it is possible to learn models from these data to help general image retrieval.

  • ImageNet[1TB]

    Source:  The ImageNet Large Scale Visual Recognition Challenge. 2017
    Description:

    ImageNet is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures. Click here to learn more about ImageNet, Click here to join the ImageNet mailing list.

  • YouTubeFace

    Source:  the Singapore-based research center of the University of Illinois at Urbana-Champaign (UIUC). 2014
    Description:

    The data set contains 3,425 videos of 1,595 different people. All the videos were downloaded from YouTube. An average of 2.15 videos are available for each subject. The shortest clip duration is 48 frames, the longest clip is 6,070 frames, and the average length of a video clip is 181.3 frames.

    Number of videos per person
    videos 1 2 3 4 5 6
    people 591 471 307 167 51 8

    In designing our video data set and benchmarks we follow the example of the 'Labeled Faces in the Wild' LFW image collection. Specifically, our goal is to produce a large scale collection of videos along with labels indicating the identities of a person appearing in each video. In addition, we publish benchmark tests, intended to measure the performance of video pair-matching techniques on these videos. Finally, we provide descriptor encodings for the faces appearing in these videos, using well established descriptor methods.