The task of retrieving the most relevant documents from a set given a query. If the ranking is personalized, a context including user history or location may also be taken into account. Often referred to as ‘learning to rank’.

Ranking problems tend to be hard to optimise for since the performance of the algorithm depends on the order of the documents when they are sorted according to their predicted scores. This is non-differentiable.


An instance where two documents have been ranked in the wrong order given the ground truth. That is to say the less relevant document is ranked above the more relevant one.


Builds upon LambdaRank.


Combines the boosted tree model MART (Friedman, 1999) with LambdaRank.


Builds upon RankNet.

The loss is a function of the labeled relevance y and the predicted score s, summing over pairs of relevance labels where y_i > y_j:

L(y,s) = \sum_{y_i > y_j} \Delta NDCG(i,j) \log(1 + \exp(-\sigma(s_i - s_j)))

where \Delta NDCG(i,j) is the change in NDCG that would result from the ranking of documents i and j being swapped:

\Delta NDCG(i,j) = |G_i - G_j| |\frac{1}{D_i} - \frac{1}{D_j}|

G and D are the gain and discount functions:

G_i = \frac{2^{y_i} - 1}{maxDCG}

D_i = \log(1+i)

Listwise ranking

The loss function is defined over the list of documents, as opposed to just a pair of documents for example.

Pairwise ranking

Learning to rank is seen as a classification problem where the task is to predict whether a document A is more relevant than some other document B given a query.

Simple to train using the cross-entropy loss but requires more computational effort at inference time since there are O(n^2) possible comparisons in a list of n items.

Pointwise ranking

Poses learning to rank as a regression problem where a relevance score must be predicted given a document and query. A squared loss is typically used to minimise the difference between the predicted and target relevance.

Example papers


A pairwise ranking algorithm. Can be built using any differentiable model such as neural networks or boosted trees. For a given pair of documents i and j the model computes the probability that i should be ranked higher than j:

P_{ij} = P(y_i > y_j) = \frac{1}{1 + \exp(-\sigma(s_i - s_j))}

Given the prediction, the model is then trained using the cross-entropy loss.


A listwise ranking algorithm. Optimises a smoothed approximation of NDCG which is obtained by treating the scores as random variables.

Each score is viewed as being sampled from a Gaussian distribution centered on the true score.