This content contains information and techniques in determining the best prediction model using the skill score method.
A Climate Seasonal Forecast Assessment refers to the systematic evaluation of seasonal climate forecasts, which typically predict weather patterns over several months. This process involves analyzing the accuracy, skill, and reliability of these forecasts to determine how closely they match the actual weather outcomes (observation). The primary goal of this assessment is to verify whether the forecasted seasonal conditions, such as temperature, precipitation, and wind patterns, align with the observed climate data for the given period.
The verification aspect plays a crucial role in this assessment, as it measures the forecast’s skill and consistency. Verification involves comparing the predicted values with real-world data, often using statistical techniques to determine the forecast’s accuracy. The results of the verification help meteorologists refine forecasting models, identify areas for improvement, and better understand the underlying climatic variables that influence seasonal patterns. Validation is the confirmation through testing and providing objective evidence that certain requirements for a specific purpose are met by a model. Verification, on the other hand, is the process of comparing model calculations (forecasts) with actual values (observations), which is generally equated with validation (Jolliffe and Stephenson 2011).
The type of verification adjusts to the type of forecasts, here are some examples of the types of forecasts and verification (WWRP 2017)
Nature of forecast: | Example(s): | Verification methods: |
---|---|---|
deterministic (non-probabilistic) | quantitative precipitation forecast | visual, dichotomous, multi-category, continuous, spatial |
probabilistic | probability of precipitation, ensemble forecast | visual, probabilistic, ensemble |
qualitative (worded) | 5-day outlook | visual, dichotomous, multi-category |
Space-time domain: | ||
time series | daily maximum temperature forecasts for a city | visual, dichotomous, multi-category, continuous, probabilistic |
spatial distribution | map of geopotential height, rainfall chart | visual, dichotomous, multi-category, continuous, probabilistic, spatial, ensemble |
pooled space and time | monthly average global temperature anomaly | dichotomous, multi-category, continuous, probabilistic, ensemble |
Specificity of forecast: | ||
dichotomous (yes/no) | occurrence of fog | visual, dichotomous, probabilistic, spatial, ensemble |
multi-category | cold, normal, or warm conditions | visual, multi-category, probabilistic, spatial, ensemble |
continuous | maximum temperature | visual, continuous, probabilistic, spatial, ensemble |
object- or event-oriented | tropical cyclone motion and intensity | visual, dichotomous, multi-category, continuous, probabilistic, spatial |
Allan Murphy (Murphy 1993), a pioneer in the field of prediction verification, distinguishes 3 types of ‘goodness’ of a prediction :
In addition, Murphy also mentioned that there are 8 aspects (attributes) that show the quality of a forecast, including:
No. | Aspects/Attributes | Description | Scores/metrics that can be used |
---|---|---|---|
1 | Bias | Difference between predicted and observed means | Mean Error (ME), Mean Square Error (MSE), Scatter plot |
2 | Accuracy | The degree of agreement between prediction and observation | Continuous Rank Probability Score (CRPS), Taylor Diagram |
3 | Uncertainty | The diversity of observation values, the greater the uncertainty of the observation, the more difficult it is to predict | Verification Rank Histogram (VRH) |
4 | Sharpness | The ability of the forecast to predict extreme values. Sharpness is “only” possessed by predictions (not observations); even poor predictions still have the attribute of sharpness | Ensemble Spread (SPRD), Spread Skill Relationship (SSR) |
5 | Resolution | The ability of a forecast to represent how much the prediction differs from the probabilistic mean of the climatology of an event and whether the prediction system can predict it correctly; Typically used to measure the mean square of probabilistic prediction error | Brier Score (BS), Relative Operating Characteristic (ROC) |
6 | Discrimination | The forecast’s ability to clearly distinguish a situation that leads to the occurrence or non-occurrence of an event | Relative Operating Characteristic (ROC) |
7 | Reliability | Statistical consistency between the probabilistic prediction of an event and the actual frequency of occurrence | Reliability Diagram (RD), Brier Score (BS) |
8 | Skill | The relative accuracy of a prediction model to a reference (climatological conditions) or an increase in prediction accuracy due to an improved prediction system; Reliability is used to measure the superiority of a prediction system based on a baseline of past observations | Skill Score (SS): BSS, CRPSS, ROCSS |
Some meteorological agencies in other countries have evaluated their predictions using several verification metrics, for example:
\(~\)
There are several approaches that are commonly used to assess seasonal climate forecasts (Wilks 2011; Jolliffe and Stephenson 2011; WWRP 2017), such as:
No. | Metric | Formulation | Description |
---|---|---|---|
1 | Mean Error (ME) | \(\text{ME}=\frac{\sum_{i=1}^{n}\left(f_i-o_i\right)}{n}\) | \(n=\) Number of data pairs (forecast & observation); \(f_i=\) Forecast value; \(o_i=\) Observatioin value; Best Score \(=0\); Verification method = continuous; |
2 | Mean Absolute Error (MAE) | \(\text{MAE}=\frac{\sum_{i=1}^{n}\left|f_i-o_i\right|}{n}\) | \(n=\) Number of data pairs (forecast & observation); \(f_i=\) Forecast value; \(o_i=\) Observatioin value; Best Score \(=0\); Verification method = continuous; |
3 | Root Mean Square Error (RMSE) | \(\text{RMSE}=\sqrt{\frac{\sum_{i=0}^{N - 1} (f_i - o_i)^2}{N}}\) | \(N=\) Number of data pairs (forecast & observation); \(f_i=\) Forecast value; \(o_i=\) Observatioin value; Best Score \(=0\); Verification method = continuous; |
4 | Boxplot |
|
\(Q_1=(n+1)*0.25\); \(Q_2=(n+1)*0.5\); \(Q_3=(n+1)*0.75\); \(IQR=Q_3-Q_1\); Best Score = Closest to Observation; Verification method = continuous, visual; |
5 | Scatter plot |
|
Best Score = gather around the diagonal; Verification method = continuous, visual; |
6 | Correlation Coefficient \((r)\) | \(r=\frac{\sum{(f_i-\overline{f})(o_i-\overline{o})}}{\sqrt{\sum{(f_i-\overline{f})^2}}\sqrt{\sum{(o_i-\overline{o})^2}}}\) | \(f_i=\) i-th forecast value; \(\overline{f}=\) mean of forecast values; \(o_i=\) i-th observation value; \(\overline{f}=\) mean of observation values; Best Score = 1; Verification method = continuous; |
7 | Dichotomous Contingency Table |
|
\(\text{HR} = \frac{a}{a + c} = \hat{p}(\hat{x} = 1 \mid x = 1)\); \(\text{PC or Accuracy} = \frac{a + d}{n} = \hat{p}\left[(\hat{x} = 1, x = 1) \, \text{or} \, (\hat{x} = 0, x = 0)\right]\); \(\text{FAR} = \frac{b}{a + b} = \hat{p}(x = 0 \mid \hat{x} = 1)\); \(\text{CSI} = \frac{a}{a + b + c}\); ROC |
8 | Multi-Category Contingency Table |
|
\(\text{Accuracy} = \frac{1}{N} \sum_{i=1}^{K}n(F_i, O_i)\); \(\text{HSS} = \frac{\frac{1}{N} \sum_{i=1}^{K} n(F_i, O_i)-\frac{1}{N^2} \sum_{i=1}^{K} N(F_i) N(O_i)}{1 - \frac{1}{N^2} \sum_{i=1}^{K} N(F_i) N(O_i)}\); Best Score Accuracy=1; Best Score HSS=1 |
Prediction of the onset of the Indonesian season by BMKG for both the wet and dry seasons is based on rainfall values obtained from the output of several models. These models consist of dynamic and statistical models that have their own advantages and disadvantages. The number of models used will certainly cause more uncertainty in the prediction of season onset. To overcome this, several things can be done :
Some of the metrics previously described can be used to evaluate the rainfall output of models used in seasonal prediction. Here we will use an evaluation metric called the Taylor Skill Score (TSS)(Xiaoli Yang and Sheffield 2020) or commonly called Skill Score (SS), whose basis is the Taylor Diagram. The Taylor Diagram has 3 output results (Correlation Coefficient, RMSE, and \({NSD}_m\)) drawn in one graph. If referring to each of the Taylor Diagram outputs, it will certainly be difficult to determine the best model. TSS combines the 3 outputs that produce a value that states the skill of the rainfall prediction model being evaluated.
\[\begin{equation} \tag{1} SS = \frac{4(1 + CC)^4}{\left( \text{NSD}_m + \frac{1}{\text{NSD}_m} \right)^2 (1 + CC_0)^4} \end{equation}\] \[\begin{equation} \tag{2} \text{NSD}_m = \frac{\sigma_{\text{mod}}}{\sigma_{\text{obs}}} \end{equation}\]
Where: \[ \text{NSD}_m = \text{Normalized standard deviation of the simulation (model)} \]
\[ \sigma_{\text{mod}} = \text{Standard deviation of the simulation (model)} \]
\[ \sigma_{\text{obs}} = \text{Standard deviation of the observation} \]
\[ m = \text{Model value simulation} \]
\[ \text{CC}_0 = \text{Maximum correlation coefficient usually 1} \]
\[ \text{CC} = \text{Correlation coefficient between the simulated (model) and observed data} \]
The closer SS is to 1, the better the ability of the individual model to represent the observations
\(~\)
Here is an example of rainfall forecast data from several models (Initial January 2011) and observations, for 21 dekad days (ZOM Aceh_01):
CODE TO CREATE A TAYLOR DIAGRAM
library(openair)
library(Metrics)
library(tidyr)
#data <- read.csv('path/to/your/data.csv')
t_dt3 <- pivot_longer(data = data, cols = colnames(data[!colnames(data) %in% c('Jan_2011','OBS')]), names_to = 'MODEL', values_to = 'CH')
t_dt3$MODEL <- factor(t_dt3$MODEL, levels = unique(t_dt3$MODEL))
p2 <- TaylorDiagram(t_dt3, obs = "OBS", mod = "CH", group = "MODEL", normalise = T, cols = c('cornflowerblue','blue','yellow2','deeppink','deeppink4','orange1','orange4'),
text.obs = 'Observasi', key = T, main=paste0('Taylor Diagram'))
ECMWF_RAW | ECMWF_COR | MME1 | CFSv2_RAW | CFSv2_COR | ARIMA | WARIMA | |
---|---|---|---|---|---|---|---|
RMSE | 42.9393894 | 32.8570424 | 38.5247122 | 46.4539831 | 31.4249754 | 43.5126742 | 42.0122118 |
CC | 0.6972082 | 0.7312276 | 0.5420013 | 0.4109492 | 0.7422360 | 0.3165167 | 0.4026214 |
NSDm | 0.6112191 | 0.5841298 | 0.6217545 | 0.6297243 | 0.7395452 | 0.3083050 | 0.3535605 |
SS | 0.4107342 | 0.4259743 | 0.2842024 | 0.2014521 | 0.5264507 | 0.0595302 | 0.0955697 |
W | 0.2049660 | 0.2125712 | 0.1418237 | 0.1005293 | 0.2627113 | 0.0297070 | 0.0476915 |
Based on the above example, the best model in ZOM ACEH_01 for the January 2011 prediction initial is the CFSv2_COR Model. SS calculations should be carried out throughout the availability of data (in this example from 2011 - 2020) and the entire initial (January - December). So that the SS will be obtained for each month (January - December) for 10 years (2011 - 2020). Based on all the SS obtained, the average SS for the 10 years can be calculated and the best model with the highest SS is obtained.
BEST MODEL FOR EACH ZOM BASED ON AVERAGE SS 2011-2020
Based on the highest Skill Score (maximum 1), we will get the best model in each ZOM for each initial release. From the tabular results above, we can display the Skill Score value and the best model for each ZOM spatially.
BEST MODEL FOR EACH ZOM BASED ON AVERAGE SS 2011-2020 ON INITIAL JANUARY
Then, we can also see the distribution of Skill Scores for 10 years (2011-2020) using boxplot to find out what the dominant skill score is.
EXAMPLE OF SKILL SCORE DISTRIBUTION IN SEVERAL ZOMS IN A GIVEN MONTH (2011-2020)
SS FOR 2011-2020 AT ZOM ACEH_01
CODE TO CREATE A BOXPLOT
#ss_aceh01 <- read.csv('path/to/your/data.csv')
plot.new()
boxplot(x = ss_aceh01[,3:ncol(ss_aceh01)], lwd = 2, yaxt='n', ylim=c(0,1), col = 'white', medlwd=2, cex.axis=0.5)
points(x = 1:ncol(ss_aceh01[,3:ncol(ss_aceh01)]),apply(X = ss_aceh01[,3:ncol(ss_aceh01)], MARGIN = 2,FUN=mean),pch=21,col='blue3',lwd=2)
for (i in 1:ncol(ss_aceh01)-2) {
points(jitter(rep(i, nrow(ss_aceh01))), ss_aceh01[[i+2]], col = "red", pch = 16, cex=0.45)
}
text(x=1:ncol(ss_aceh01[,3:ncol(ss_aceh01)]), y=apply(X = ss_aceh01[,3:ncol(ss_aceh01)], MARGIN = 2, FUN=mean)+0.02, labels= paste0('Mean = ', round(apply(X = ss_aceh01[,3:ncol(ss_aceh01)], MARGIN = 2, FUN=mean),2)), cex=0.5)
axis(side=2, at=seq(0,1,0.1), labels=seq(0,1,0.1), cex.axis=0.8)
segments(x0 = seq(0,7,0.5), y0 = 0, y1 = 1, lty=2, lwd=0.5,col = 'lightgrey')
segments(x0 = 0, x1 = 7, y0 = seq(0,1,0.1), lty=2, lwd=0.5,col = 'lightgrey')
mtext(~ italic('Skill Score (SS)'), side = 2, line = 2, cex = 0.8)
title(main = paste0('Boxplot Skill Score ZOM : ACEH_01'), line = 0.5)
mtext(~ italic('Periode 2011 - 2020'), side = 1, line = par('usr')[3]-1, col = "black", cex = 0.6, at=par('usr')[2]-0.7)
mtext('Init : JAN', side = 3, font=3,line = par('usr')[3]-1, col = "darkblue", cex = 0.8, at=par('usr')[2]-0.4)
From the highest Skill Score results, we will know the best model for each initial in each ZOM. For season onset prediction, we can use the best model that is already known according to the initial month used as a seasonal prediction of rainfall. From the prediction of rainfall values based on this best model, the season onset can then be determined.
Another method that we can do if we still want to use all the rainfall seasonal prediction model outputs is by weighting each rainfall model output. This weight can be obtained based on the Skill Score value of each model that we have calculated. Mathematically, it can be formulated as follows:
\[\begin{equation} \tag{3} W_i = \frac{SS_i}{\sum_{j=1}^n SS_j} \end{equation}\]
\[\begin{equation} \tag{4} \sum_{i=1}^n W_i = 1 \end{equation}\]
Where: \[ SS_i = \text{$i$-th model Skill Score, where } i = 1, 2, \dots, n \]
\[ W_i = \text{$i$-th model Weight, where } i = 1, 2, \dots, n \]
We will do the weight calculation as practice. In this case we are using ZOM BALI_11 with the initials February. From the results of the weight calculation (W), if we add up the total is 1.
ZOM | TYPE | INIT | MME1 | ECMWF_RAW | ECMWF_COR | CFSv2_RAW | CFSv2_COR | ARIMA | WARIMA | |
---|---|---|---|---|---|---|---|---|---|---|
25896 | BALI_11 | SS | Feb | 0.4786296 | 0.4334377 | 0.5389604 | 0.0695950 | 0.5097606 | 0.5386813 | 0.4729669 |
34284 | BALI_11 | W | Feb | 0.1543724 | 0.1430077 | 0.1771860 | 0.0249932 | 0.1750009 | 0.1737069 | 0.1517329 |
Next, we will weight the February 2024 initial seasonal rainfall prediction (generally used for dry season prediction). We multiply each model with the weight (W) that we have obtained in the previous weight calculation.
SEASONAL RAINFALL PREDICTION DATA INITIAL FEBRUARY 2024
DAS | MME1 | ECMWF_RAW | ECMWF_COR | CFSv2_RAW | CFSv2_COR | ARIMA | WARIMA |
---|---|---|---|---|---|---|---|
Feb I | 163.8225 | 96.9621 | 131.9956 | 55.9008 | 202.7789 | 107.4265 | 92.3780 |
Feb II | 72.1478 | 83.7558 | 114.0584 | 29.5072 | 107.0710 | 99.2311 | 92.3780 |
Feb III | 81.4403 | 75.8253 | 103.3301 | 30.6811 | 111.3555 | 87.2150 | 85.6466 |
Mar I | 25.0803 | 97.1974 | 107.0056 | 49.4504 | 151.6484 | 75.1727 | 84.5403 |
Mar II | 120.4728 | 93.7277 | 103.1808 | 75.5555 | 231.6704 | 74.0638 | 80.1041 |
Mar III | 169.5966 | 85.7359 | 94.4199 | 83.6646 | 256.5225 | 63.1690 | 44.2504 |
Apr I | 29.1713 | 74.2389 | 60.4546 | 68.8359 | 74.1464 | 50.7099 | 40.5221 |
Apr II | 38.4456 | 63.8937 | 52.0237 | 70.8249 | 76.2927 | 50.6072 | 8.5760 |
Apr III | 60.3156 | 55.0750 | 44.8474 | 75.9045 | 81.7458 | 38.5234 | 38.8828 |
May I | 23.8206 | 44.4667 | 28.0870 | 72.5172 | 34.3506 | 28.6324 | 10.1355 |
May II | 31.2781 | 39.5156 | 24.9615 | 66.9831 | 31.7315 | 24.5754 | 24.9866 |
May III | 32.9556 | 29.2346 | 18.4672 | 123.3038 | 58.4087 | 28.1022 | 7.3104 |
Jun I | 19.5028 | 24.1109 | 15.9519 | 60.1589 | 27.3728 | 13.4558 | 7.2317 |
Jun II | 8.7416 | 22.2295 | 14.7116 | 39.5339 | 17.9883 | 12.1809 | 0.0000 |
Jun III | 5.8922 | 14.4046 | 9.5367 | 36.8783 | 16.7778 | 12.9941 | 2.9549 |
Jul I | 10.4091 | 9.8209 | 9.3945 | 32.3297 | 22.2831 | 8.1527 | 2.2763 |
Jul II | 15.2450 | 10.0609 | 9.6385 | 35.1482 | 24.2216 | 5.2905 | 14.9037 |
Jul III | 19.0328 | 10.5429 | 10.0975 | 27.4801 | 18.9454 | 0.0000 | 8.1212 |
Aug I | 1.3834 | 6.5196 | 3.7593 | 22.5320 | 8.8758 | 14.0602 | 13.5241 |
Aug II | 3.0947 | 6.9658 | 4.0088 | 17.2212 | 6.7787 | 12.4973 | 10.8369 |
Aug III | 7.1116 | 6.9975 | 4.0227 | 25.4537 | 9.9893 | 12.2050 | 3.6793 |
WEIGHTED FOR INITIAL FEBRUARY 2024
DAS | MME1 | ECMWF_RAW | ECMWF_COR | CFSv2_RAW | CFSv2_COR | ARIMA | WARIMA | RRw |
---|---|---|---|---|---|---|---|---|
Feb I | 25.2896762 | 13.8663244 | 23.3877699 | 1.3971397 | 35.486489 | 18.6607283 | 14.0167813 | 132.104908 |
Feb II | 11.1376307 | 11.9777222 | 20.2095495 | 0.7374793 | 18.737521 | 17.2371305 | 14.0167813 | 94.053814 |
Feb III | 12.5721364 | 10.8435998 | 18.3086451 | 0.7668188 | 19.487312 | 15.1498505 | 12.9954065 | 90.123769 |
Mar I | 3.8717067 | 13.8999741 | 18.9598922 | 1.2359236 | 26.538606 | 13.0580195 | 12.8275444 | 90.391666 |
Mar II | 18.5976780 | 13.4037804 | 18.2821913 | 1.8883735 | 40.542527 | 12.8653959 | 12.1544269 | 117.734373 |
Mar III | 26.1810380 | 12.2608917 | 16.7298826 | 2.0910458 | 44.891667 | 10.9728935 | 6.7142412 | 119.841660 |
Apr I | 4.5032442 | 10.6167324 | 10.7117076 | 1.7204292 | 12.975686 | 8.8086614 | 6.1485355 | 55.484997 |
Apr II | 5.9349404 | 9.1372895 | 9.2178703 | 1.7701407 | 13.351291 | 8.7908217 | 1.3012613 | 49.503615 |
Apr III | 9.3110653 | 7.8761477 | 7.9463306 | 1.8970961 | 14.305588 | 6.6917818 | 5.8997998 | 53.927809 |
May I | 3.6772437 | 6.3590794 | 4.9766226 | 1.8124367 | 6.011386 | 4.9736465 | 1.5378887 | 29.348303 |
May II | 4.8284761 | 5.6510341 | 4.4228279 | 1.6741218 | 5.553041 | 4.2689175 | 3.7912891 | 30.189707 |
May III | 5.0874358 | 4.1807722 | 3.2721289 | 3.0817562 | 10.221575 | 4.8815471 | 1.1092281 | 31.834443 |
Jun I | 3.0106945 | 3.4480437 | 2.8264530 | 1.5035632 | 4.790265 | 2.3373658 | 1.0972868 | 19.013672 |
Jun II | 1.3494620 | 3.1789891 | 2.6066893 | 0.9880786 | 3.147969 | 2.1159068 | 0.0000000 | 13.387094 |
Jun III | 0.9095932 | 2.0599683 | 1.6897695 | 0.9217066 | 2.936130 | 2.2571653 | 0.4483555 | 11.222689 |
Jul I | 1.6068780 | 1.4044641 | 1.6645737 | 0.8080226 | 3.899562 | 1.4161805 | 0.3453896 | 11.145071 |
Jul II | 2.3534076 | 1.4387859 | 1.7078071 | 0.8784659 | 4.238802 | 0.9189966 | 2.2613815 | 13.797646 |
Jul III | 2.9381394 | 1.5077156 | 1.7891354 | 0.6868156 | 3.315462 | 0.0000000 | 1.2322532 | 11.469521 |
Aug I | 0.2135588 | 0.9323528 | 0.6660953 | 0.5631467 | 1.553273 | 2.4423543 | 2.0520508 | 8.422832 |
Aug II | 0.4777363 | 0.9961629 | 0.7103032 | 0.4304128 | 1.186279 | 2.1708677 | 1.6443142 | 7.616076 |
Aug III | 1.0978349 | 1.0006962 | 0.7127660 | 0.6361693 | 1.748136 | 2.1200932 | 0.5582708 | 7.873967 |
The seasonal rainfall prediction value that we will use is the total of each weighted model in each dasarian. In this case we can see in the yellow column.