Ranking¶

The task of retrieving the most relevant documents from a set given a query. If the ranking is personalized, a context including user history or location may also be taken into account. Often referred to as ‘learning to rank’.

Ranking problems tend to be hard to optimise for since the performance of the algorithm depends on the order of the documents when they are sorted according to their predicted scores. This is non-differentiable.

Inversion¶

An instance where two documents have been ranked in the wrong order given the ground truth. That is to say the less relevant document is ranked above the more relevant one.

LambdaLoss¶

Builds upon LambdaRank.

NDCG-Loss2¶

$L(y,s) = \sum_{y_i > y_j} \Delta NDCG(i,j) \log(1 + \exp(-\sigma(s_i - s_j)))$

$\Delta NDCG(i,j) = |G_i - G_j| |\frac{1}{D_{|i-j|}} - \frac{1}{D_{|i-j+1|}}|$

Proposed in

The LambdaLoss Framework for Ranking Metric Optimization, Wang et al. (2018)

LambdaMART¶

Combines the boosted tree model MART (Friedman, 1999) with LambdaRank.

Further reading

From RankNet to LambdaRank to LambdaMART: An Overview, Burges (2010)

LambdaRank¶

Builds upon RankNet.

The loss is a function of the labeled relevance $y$ and the predicted score $s$ , summing over pairs of relevance labels where $y_i > y_j$ :

$L(y,s) = \sum_{y_i > y_j} \Delta NDCG(i,j) \log(1 + \exp(-\sigma(s_i - s_j)))$

where $\Delta NDCG(i,j)$ is the change in NDCG that would result from the ranking of documents i and j being swapped:

$\Delta NDCG(i,j) = |G_i - G_j| |\frac{1}{D_i} - \frac{1}{D_j}|$

$G$ and $D$ are the gain and discount functions:

$G_i = \frac{2^{y_i} - 1}{maxDCG}$

$D_i = \log(1+i)$

Proposed in
Learning to Rank with Nonsmooth Cost Functions, Burges et al. (2006)

Further reading
From RankNet to LambdaRank to LambdaMART: An Overview, Burges (2010)
The LambdaLoss Framework for Ranking Metric Optimization, Wang et al. (2018)

Listwise ranking¶

The loss function is defined over the list of documents, as opposed to just a pair of documents for example.

Example papers

SoftRank: Optimising Non-Smooth Rank Metrics, Taylor et al. (2008)

Metrics¶

See the main section on metrics or jump to one of its subsections:

Pairwise ranking¶

Learning to rank is seen as a classification problem where the task is to predict whether a document A is more relevant than some other document B given a query.

Simple to train using the cross-entropy loss but requires more computational effort at inference time since there are $O(n^2)$ possible comparisons in a list of $n$ items.

Example papers
Learning to Rank using Gradient Descent, Burges et al. (2005)
Learning to Rank with Nonsmooth Cost Functions, Burges et al. (2006)

Pointwise ranking¶

Poses learning to rank as a regression problem where a relevance score must be predicted given a document and query. A squared loss is typically used to minimise the difference between the predicted and target relevance.

Example papers

RankNet¶

A pairwise ranking algorithm. Can be built using any differentiable model such as neural networks or boosted trees. For a given pair of documents i and j the model computes the probability that i should be ranked higher than j:

$P_{ij} = P(y_i > y_j) = \frac{1}{1 + \exp(-\sigma(s_i - s_j))}$

Given the prediction, the model is then trained using the cross-entropy loss.

Proposed in
Learning to Rank using Gradient Descent, Burges et al. (2005)

Further reading
From RankNet to LambdaRank to LambdaMART: An Overview, Burges (2010)

SoftRank¶

A listwise ranking algorithm. Optimises a smoothed approximation of NDCG which is obtained by treating the scores as random variables.

Each score is viewed as being sampled from a Gaussian distribution centered on the true score.

Proposed in

SoftRank: Optimising Non-Smooth Rank Metrics, Taylor et al. (2008)