The images in the training set are randomly altered in order to improve the generalization of the network.
Cubuk et al. (2018), who evaluate a number of different data augmentation techniques, use the following transforms:
- Blur - The entire image is blurred by a random amount.
- Color balance
- Cropping - The image is randomly cropped and the result is fed into the network instead.
- Cutout - Mask a random square region of the image, replacing it with grey. Was used to get state of the art results on the CIFAR-10, CIFAR-100 and SVHN datasets. Proposed in Improved Regularization of Convolutional Neural Networks with Cutout, DeVries and Taylor (2017)
- Equalize - Perform histogram equalization on the image. This adjusts the contrast.
- Flipping - The image is flipped with probability 0.5 and left as it is otherwise. Normally only horizontal flipping is used but vertical flipping can be used where it makes sense - satellite imagery for example.
- Posterize - Decrease the bits per pixel
- Sample pairing - Combine two random images into a new synthetic image. See Data Augmentation by Pairing Samples for Images Classification, Inoue (2018).
- Solarize - Pixels above a random value are inverted.
60000 32x32 colour images in 10 (100) classes with 6000 (600) images each. 50000 images in the training set and 10000 in the test.
Notable results - CIFAR-10
- 98.5% - AutoAugment: Learning Augmentation Strategies from Data, Cubuk et al. (2018)
- 97.6% - Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. (2017)
- 97.4% - Improved Regularization of Convolutional Neural Networks with Cutout, DeVries and Taylor (2017)
- 96.1% - Wide Residual Networks, Zagoruyko and Komodakis (2016)
- 94.2% - All you need is a good init, Mishkin and Matas (2015)
- 93.6% - Deep Residual Learning for Image Recognition, He et al. (2015)
- 93.5% - Fast and Accurate Deep Network Learning by Exponential Linear Units, Clevert et al. (2015)
Notable results - CIFAR-100
- 89.3% - AutoAugment: Learning Augmentation Strategies from Data, Cubuk et al. (2018)
- 84.8% - Improved Regularization of Convolutional Neural Networks with Cutout, de Vries and Taylor (2017)
- 81.1% - Wide Residual Networks, Zagoruyko and Komodakis (2016)
- 75.7% - Fast and Accurate Deep Network Learning by Exponential Linear Units, Clevert et al. (2015)
- 72.3% - All you need is a good init, Mishkin and Matas (2015)
Common Objects in COntext. A dataset for image recognition, segmentation and captioning.
Detection task - Notable results (mAP):
Imagenet Large Scale Recognition Challenge. Popular image classification task in which the algorithm must use a dataset of ~1.4m images to classify 1000 classes.
Notable results (top-1 accuracy):
- 85.0% - RandAugment: Practical data augmentation with no separate search, Cubuk et al. (2019)
- 83.9% - Regularized Evolution for Image Classifier Architecture Search, Real et al. (2018)
- 83.5% - AutoAugment: Learning Augmentation Strategies from Data, Cubuk et al. (2018)
- 82.7% - Learning Transferable Architectures for Scalable Image Recognition, Zoph et al. (2017)
- 78.6% - Deep Residual Learning for Image Recognition, He et al. (2015)
- 76.3% - Very deep convolutional networks for large-scale image recognition, Simonyan and Zisserman (2014)
- 62.5% - ImageNet Classification with Deep Convolutional Neural Networks, Krizhevsky et al. (2012)
70000 28x28 pixel grayscale images of handwritten digits (10 classes), 60000 in the training set and 10000 in the test set.
Street View House Numbers.
The name of the general topic. Includes face identification and verification.
The normal face recognition pipeline is:
- Face detection - Identifying the area of the photo that corresponds to the face.
- Face alignment - Often done by detecting facial landmarks like the nose, eyes and mouth.
- Feature extraction and similarity calculation
- Photos being taken at different angles.
- Different lighting conditions.
- Changes in facial hair.
- People aging over time.
Multiclass classification problem. Given an image of a face, determine the identity of the person.
Binary classification problem. Given two images of faces, assess whether they are from the same person or not.
Commonly used architectures for solving this problem include Siamese and Triplet networks.
Partitions an object into meaningful parts with associated labels. May also be referred to as per-pixel classification.
Unlike semantic segmentation, different instances of the same object type have to be labelled as separate objects (eg person 1, person 2). Harder than semantic segmentation.
Unlike instance segmentation, in semantic segmentation it is only necessary to predict what class each pixel belongs to, not separate out different instances of the same class.
Learning to segment from only image-level labels. The labels will describe the classes that exist within the image but not what the class is for every pixel.
The results from weak-supervision are generally poorer than otherwise but datasets tend to be much cheaper to acquire.
When the dataset is only weakly-supervised it can be very hard to correctly label highly-correlated objects that are usually only seen together, such as a train and rails.
- Daytime to nighttime
- Greyscale to colour
- Streetmap to satellite view
Contrast with two-stage detectirs.
Region of interest¶
See ‘region proposal’.
A region in an image (usually defined by a rectangle) identified as containing an object of interest with high probability, relative to the background.
The first stage proposes regions that may contain objects of interest. The second stage classifies these regions as either background or one of the classes.
There is often a significant class-imbalance problem since background regions greatly outnumber the other classes.
Contrast with one-stage detectors.
A heatmap over an image which shows each pixel’s importance for the classification.