Beyond Exceedance-Based Backtesting of Value-at-Risk Models

Properties of Exceedance-Based Backtest

Exceedance-based backtesting evaluates the accuracy of a Value-at-Risk (VaR) model by comparing the actual number of exceedances (i.e., number of times the loss exceeds the VaR) against the expected number of losses under the model’s confidence level. A well-calibrated VaR model should align with observed risk outcomes, neither underestimating nor overestimating market risk. This evaluation is typically performed using exceedance-based backtesting, which analyzes the frequency of VaR breaches, and Probability Integral Transform (PIT)-based backtesting, which examines the entire distribution of predicted losses.

Properties of an Exceedance-Based Backtest Indicating Model Accuracy
An exceedance-based backtest evaluates a VaR model by analyzing how often actual losses exceed the predicted VaR threshold. The following properties indicate a well-calibrated model:
- Unconditional Coverage: The proportion of times the observed loss exceeds the VaR forecast should match the expected probability. For example – a 95% VaR model should result in exceedances in 5% of cases.
- Independence of Exceedances: Exceedances should occur randomly and not exhibit clustering. If exceedances are correlated, the model fails to capture risk properly.
- No Systematic Bias: A low exceedance rate suggests the model overestimates risk, while a high exceedance rate suggests underestimation. A well-calibrated model should not consistently overpredict or underpredict risks.
Reflection in a Probability Integral Transform (PIT)-Based Backtest
A PIT-based backtest provides a more detailed evaluation than an exceedance-based test by examining the entire forecasted loss distribution, rather than just the exceedance threshold. It assesses the calibration and predictive accuracy of the model in the following ways:
- Uniformity of PIT Values: If the VaR model is correctly specified, PIT values should follow a uniform distribution (0,1). A non-uniform PIT distribution may suggest that the risk estimates are either overly conservative, resulting in excessively wide predictions, or overly aggressive, leading to overly narrow forecasts.
- No Autocorrelation in PIT Values: Just as exceedances should be independent, PIT values should not show patterns or trends over time. If they do, it suggests misspecification in the risk model.
- Correct Tail Behavior: A PIT-based backtest ensures tail risk is appropriately captured. This was a major limitation of exceedance-based backtesting, which only evaluates a single quantile (e.g., 95%) rather than the full loss distribution.

Overall, while exceedance-based backtesting ensures that the frequency and independence of VaR breaches align with expectations, PIT-based backtesting provides a more robust validation by examining the entire loss distribution. PIT-based methods can identify subtle deficiencies in risk models, such as poor tail estimation or incorrect distribution assumptions, that exceedance-based tests might overlook. Combining both approaches provides a comprehensive validation framework for market risk models.

Deriving Probability Integral Transforms

Probability Integral Transforms (PITs) are used in VaR model validation to assess whether the entire distribution of predicted losses aligns with observed outcomes. Unlike traditional exceedance-based backtesting, which only evaluates whether the number of breaches aligns with expectations, PIT-based methods analyze the full probability distribution of losses, providing a more comprehensive diagnostic tool for assessing model performance. The derivation of PITs follows these steps:

Obtain the Forecasted VaR Distribution
A VaR model provides a distribution forecast for future losses, typically based on parametric, historical, or Monte Carlo simulation approaches. The model estimates the probability density function (PDF) or cumulative distribution function (CDF) of losses at a given time.
Compute the CDF of the Forecasted Distribution
Let F_t (x) be the cumulative distribution function of the forecasted loss distribution at time t, which represents the probability that the realized loss X_t will be less than or equal to a given threshold.
Apply the Probability Integral Transform
PITs are derived by transforming realized loss values into their corresponding probabilities under the model’s predicted CDF. The PIT value for each time period t is calculated as:

\[
\text{PIT}_t = F_t (X_t)
\]

Here, \( X_t \) is the actual observed loss at time t, and \( F_t (X_t) \) gives the probability of observing a loss equal to or smaller than \( X_t \) under the predicted distribution.
Interpretation of PIT Values
- If the VaR model is well-calibrated, the PIT values should follow a uniform distribution between 0 and 1.
- A non-uniform PIT distribution suggests model misspecification, meaning the predicted loss distribution is either too conservative (overestimating risk) or too aggressive (underestimating risk).
- If the PIT values cluster toward 0 or 1, the model fails to capture tail risks accurately.
- If PIT values exhibit autocorrelation, it suggests systematic bias or temporal dependencies in risk estimates.

Deriving Probability Integral Transforms

Application in VaR Model Validation
- By plotting the empirical distribution of PIT values and testing for uniformity (e.g., using Kolmogorov-Smirnov tests or QQ plots), financial institutions can evaluate whether the VaR model provides accurate probability forecasts across different quantiles, not just at a fixed confidence level. This approach ensures that the model captures both central tendencies and extreme tail risks, addressing potential weaknesses in standard exceedance-based tests.

Example: Deriving Probability Integral Transforms (PITs) for Validating a VaR Model
Let’s take a simple example of a bank monitoring daily portfolio losses using a Value-at-Risk (VaR) model. The bank estimates the 1-day ahead loss distribution and checks whether its model is correctly predicting risksStep 1: Obtain the Forecasted VaR Distribution
Suppose the bank’s risk model forecasts that daily losses will follow a normal distribution with:
Expected daily loss (mean) = 0
Expected daily variation (standard deviation) = 2
The model predicts that:
The 95% VaR is 3.29 (i.e., 1.645 times 2)
The 99% VaR (exceeded only 1% of the time) is 4.66 (i.e., 2.33 times 2)
This means the model expects 95% of daily losses to be below 3.29 and 99% of losses to be below 4.66.

Step 2: Compute the CDF of the Forecasted Distribution

The bank tracks actual losses over time and compares them with the predicted distribution.
Day 1: The actual observed loss is 3.0
The model estimates that 92.7% of predicted losses should be less than or equal to 3.0.
This means that, according to the model, 3.0 falls at the 92.7th percentile of the expected loss distribution.

Step 3: Apply the Probability Integral Transform (PIT)

The PIT value is simply the percentile rank of the observed loss within the forecasted distribution.
For Day 1, we calculated:
The PIT value = 0.927 (or 92.7%)
This means the actual loss of 3.0 was larger than expected, but not extreme—it falls just below the 95% VaR threshold.
For other days:
Day 2: Observed loss = 1.5, PIT = 0.69 (loss is smaller than expected, in the lower 69% of the forecasted range).
Day 3: Observed loss = 5.2, PIT = 0.995 (this is higher than 99% of expected losses, an extreme case).

Step 4: Interpretation of PIT Values

If the model is accurate, the PIT values for multiple days should be uniformly distributed between 0 and 1, meaning:
Some PIT values should be low (around 0.1), some mid-range (0.5), and some high (0.9).
This is discussed in detail in the next table as part of the 3rd learning objective of this reading.

Backtesting with PITs

Backtesting with Probability Integral Transforms (PITs) evaluates how well a VaR model forecasts the full loss distribution, rather than just checking exceedances at a single confidence level (as in traditional exceedance-based backtesting).
A well-calibrated model should produce PIT values that are uniformly distributed between 0 and 1. If PITs deviate significantly from uniformity, it signals model misspecification, meaning the model either overestimates or underestimates risk.
Backtesting using PITs involves:
1. Computing PIT values for each observed loss using the cumulative distribution function (CDF) of the forecasted loss distribution.
2. Plotting the PIT distribution to visually assess uniformity.
3. Applying statistical goodness-of-fit tests to formally check if the PIT distribution significantly deviates from uniformity.

Goodness-of-Fit Tests to Evaluate PITs

Several statistical tests can be used to assess whether PIT values are uniformly distributed. The three most common tests are:

Kolmogorov-Smirnov (K-S) Test
- Purpose: Measures the largest absolute difference between the empirical CDF of the PIT values and the expected uniform CDF.
- Test Statistic: The KS test statistic is given by
  \[D = \max_i \left\{ \left| z_i – A_i \right| \right\}\]
  where:
  - \( Fz_i \) represents the theoretical CDF under the null hypothesis (Uniform[0,1] distribution).
    \( FA_i \) is the empirical CDF of the observed PIT values.
  - FD is the maximum absolute difference between the two distributions.If the PIT values follow a true uniform distribution, D should be close to zero under the null hypothesis.
- Strengths:
  - Simple and widely used.
  - Sensitive to large deviations from uniformity.
- Weaknesses:
  - Less sensitive to differences in the tails of the distribution.
  - Only captures the maximum deviation, ignoring overall distribution shape.
Anderson-Darling (A-D) Test
- Purpose: A refinement of the K-S test that gives more weight to deviations in the tails of the distribution.
- Test Statistic: The AD test statistic is given by\[
  A^2 = -\frac{n-1}{n} \sum_{j=1}^{n} (2j – 1) \left[ \log(z_j) + \log(1 – z_{n+1-j}) \right]
  \]
  where:
  - \( Fz_j \) are the ordered PIT values.
  - Fn is the sample size.
  - The logarithm terms focus on tail deviations, making the test more sensitive to extreme values maximum absolute difference between the two distributions.
- Strengths:
  - Better at detecting misspecifications in the tails, which are crucial in risk management.
  - üMore powerful than K-S when testing uniformity over the entire range.
- Weaknesses:
  - Slightly more complex than K-S.
  - Still not optimal for capturing small, systematic deviations across the whole range.
Cramér-von Mises (CVM) Test
- Purpose: Measures the average squared difference between the empirical and expected CDFs.
- Test Statistic: The CVM test statistic is given by\[
  W^2 = \sum_{j=1}^{n} \left[ z_j – \frac{(2j-1)}{2n} \right]^2 + \frac{1}{12n}
  \]
  where:
  - Fz_j are the ordered PIT values.
  - Fn is the sample size.
  - The first term captures squared deviations from the expected CDF, while the second term ensures proper scaling.
- Strengths:
  - Captures deviations across the entire distribution, not just the extremes.
  - More balanced than K-S and A-D in detecting general non-uniformity.
- Weaknesses:
  - Less sensitive to specific deviations in the tails compared to A-D.
  - Computationally more intensive than K-S.