Datasets
MNIST Series
MNIST
- Image of handwritten digit
- 60000 training examples
- 10000 test examples
- 28*28 grayscale images
- 10 classes
Fashion-MNIST
- Images of clothing and accessories
- 60000 training examples (same as mnist)
- 10000 test examples (same as mnist)
- 28*28 grayscale images. (still the same as mnist)
- 10 classes. (same as mnist)
- Designed to be a direct drop-in replacement for MNIST
Kuzushiji MNIST (KMNIST)
- Images of Kanji characters.
- Also a drop-in replacement of MNIST.
- Provide Kuzushiji-49 with 49 classes.
- Provide Kuzushiji-Kanji for all Kanji characters, but less complete.
EMNIST
- Images of handwritten digits & characters
- Same format as MNIST, but with more classes
- Designed to be more challenging than MNIST
QMNIST
- Recover missing testing data in MNIST
Images
COCO Datasets
- Large scale image dataset object detection, segmentation, and captioning
- Includes segmentations and bounding boxes of objects in images
- Includes image captions
LSUN
- Images of scenes or objects
- 10 scene categories
- 20 object categories
- Partially automated labeled.
- Huge amount (1 million) images per category
ImageNet
- Images organized following WordNet hierarchy (only nouns)
- Largest image datasets
- 15 millions of images
- 22 thousands of classes
CIFAR
- Tiny images of 32*32 coloured images
- CIFAR-10 : 10 classes with 6000 images each
- CIFAR-100: 100 classes with 600 images each
STL-10
- Images of 10 classes
- Inspired by CIFAR-10 but more - focused on unsupervised tasks
- Higher resolution: 96*96 coloured images
- Images acquired from ImageNet
- Assistance to building priors
SVHN
- Images of street view house numbers
- Two formats
- Original images with character level bounding boxes
- MNIST-like 32*32 image center around a single character
SBU Captioned Photo
- 1 million images with associated visually relevant captions
- Resources
- Data
- 1 million image urls and their captions
- A script to crawl images
- Search tool: search for images using text queries
- Data
The Flickr30k Dataset
Pascal VOC Data Sets
- 20 classes
- 11530 images
- 27450 ROI annotated objects
- 6929 segmentations
- Challenges
Cityscapes Dataset
- Urban street scenes
- Features
- Annotations: semantic, instance-wise, dance pixel
- 30 Classes
- 50 cities
- Different seasons, daytime, and weather conditions
- 5000 fine annotated images
- 20000 coarse annotated images
Semantic Boundaries Dataset
- Mark up the pixels that lie on the boundary of the object
Videos
Kinetics-400 Dataset
- 650000 video clip url links
- 700 human cation classes
- Each action class has at least 600 video clips
HMDB Dataset
- 1,000,000,000 videos of human motions
- General facial actions: smile, laugh, chew, talk
- Facial actions with object manipulation: smoke, eat, drink
- General body movements
- Body movements with object interaction
- Body movements for human interaction
UCF101 Dataset
- An action recognition data set of realistic action videos
- 5 action categories
- Human-Object interaction
- Body-Motion only
- Human human interaction
- Playing musical instruments
- Sports