Training with limited data¶
The learning algorithm requests examples to be labelled as part of the training process. Useful when there is a small set of labelled examples and a larger set of unlabelled examples and labelling is expensive.
Class imbalance problem¶
When one or more classes occur much more frequently in the dataset than others. This can lead to classifiers maximising their objective by predicting the majority class(es) all of the time, ignoring the features.
Methods for addressing the problem include:
- Focal loss
- Weight the loss function (increase the weight for the minority class)
- Oversampling the minority class
- Undersampling the majority class
1623 handwritten characters from 50 alphabets with 20 examples of each character. Useful for one-shot learning. Introduced in One shot learning of simple visual concepts, Lake et al. (2011).
Classification where only a few (normally < 20) members of that class have been seen before.
Learning from tasks with the goal of using that knowledge to solve other unseen tasks.
Classification where only one member of that class has been seen before. Matching Networks achieve 93.2% top-5 accuracy on ImageNet compared to 96.5% for Inception v3.
Training using a limited set of labelled data and a (usually much larger) set of unlabelled data.
A network designed for semi-supervised learning that also works very well for permutation invariant MNIST.
Simultaneously minimize the sum of supervised and unsupervised cost functions by backpropagation, avoiding the need for layer-wise pre-training. The learning task is similar to that of a denoising autoencoder, but minimizing the reconstruction error at every layer, not just the inputs. Each layer contributes a term to the loss function.
The architecture is an autoencoder with skip-connections from the encoder to the decoder. Can work with both fully-connected and convolutional layers.
There are two encoders - one for clean and one for noisy data. The clean one is used to predict labels and get the supervised loss. The noisy one links with the decoder and helps create the unsupervised losses. Both encoders have the same parameters.
The loss is the sum of the supervised and the unsupervised losses. The supervised cost is the cross-entropy loss as normal. The unsupervised cost (reconstruction error) is the squared difference.
The hyperparameters are the weight for the denoising cost of each layer as well as the amount of noise to be added within the corrupted encoder.
Achieved state of the art performance for semi-supervised MNIST and CIFAR-10 and permutation invariant MNIST.
Method for semi-supervised learning. A model is trained on the labelled data and then used to classify the unlabelled data, creating more labelled examples. This process then continues iteratively. Usually only the most confident predictions are used at each stage.
Layers are first trained using an auto-encoder and then fine tuned over labelled data. Improves the initialization of the weights, making optimization faster and reducing overfitting. Most useful in semi-supervised learning.
The process of taking results (usually weights) that have been obtained on one dataset and applying them to another to improve accuracy on that one.
Useful for reducing the amount of training time and data required.
Learning without any training examples. This is made possible by generalising from a wider dataset.
An example is learning to recognise a cat having only read information about them - no images of cats are seen. This could be done by using Wikipedia with a dataset like ImageNet to learn a joint embedding between words and images.