First look at model evaluation results

Early in the week I wrapped up TRAPS readmes. Afterwards, I began looking at the preliminary model results.

I first compared the modeled bottom DO to ORCA buoy data. Then I practiced running the model-model-data comparison scripts in Parker’s obsmod folder.

I used the model-model-data comparison scripts to compare the new model run to observational data as well as the GRC run (WWTPs as tiny river implementation).

Preliminary results suggest that the new model run appears to outperform the GRC run in some cases, but underperform in other. More details below.

Model evaluation

This week the model evaluation run has continued chugging along on klone, and so far it has made it to the middle of August.

ORCA buoy DO comparison

When I was first looking at preliminary results from our “WWTP as tiny river” tests for GRC, I compared modeled DO to ORCA buoy DO. In the timeseries, we observed that the model tended to overestimate bottom DO.

Now that we have new results with fully functional TRAPS and improved biogeochemistry, I have decided to re-make the same ORCA buoy comparison figure. Figure 1 shows a timeseries of bottom DO from the GRC run, the new run, and the ORCA buoy observations.

Note that the depths of comparison are not the same. ORCA buoy bottom DO data refers to the deepest DO measurement available. Model bottom DO is taken from the bottom sigma layer, and the average depth of the bottom sigma layer is the “model depth.”

Fig 1. Model-model-ORCA buoy comparison of bottom DO.

In general, the new results have a higher initial bottom DO compared to prior observations and the ORCA measurements. At Twanoh, Dabob Bay, and Hoodsport, the bottom DO in the new model run remains higher than observations and the GRC run. However, bottom DO at Point Wells, North Buoy, and Carr Inlet decreases more significantly in the new model run compared to the GRC run. At Point Wells and Carr Inlet, specifically, the new model run appears to do a better job capturing the obersved summer DO depletion than the GRC run.

This preliminary analysis suggests that the new model run has mixed performance depending on the location– either improving DO skill in Main Basin and South Sound, or overestimating DO in Hood Canal.

We can also conclude that the new results are influenced by a combination of both initial conditions and model dynamics (which is good, since we changed both).

obsmod

This week I also began exploring some of Parker’s model skill assessment scripts in his obsmod folder. In addition to creating modeled-observed plots, the script also calculates the following statistics:

\[\mathrm{Bias} = \mathrm{mean}\big( \mathrm{modeled} - \mathrm{observed}\big)\] \[\mathrm{RMSE} = \sqrt{\mathrm{mean}\big[ (\mathrm{modeled} - \mathrm{observed})^2\big]}\]

The figures below show model-model-data comparison plots. Every plot compares the new model run and the GRC run to one of the following set of observations:

NCEI Salish
- shallow bottle data
- deep bottle data
NCEI Coastal (no data for new run yet)
- shallow bottle data
- deep bottle data
Ecology
- shallow bottle data
- deep bottle data
DFO
- shallow bottle data
- deep bottle data

Fig 2. Model-model-data comparison. New run compared to GRC run compared to nceiSalish shallow bottle data.

Fig 3. Model-model-data comparison. New run compared to GRC run compared to nceiSalish deep bottle data.

Fig 4. Model-model-data comparison. New run compared to GRC run compared to nceiCoastal shallow bottle data.

Fig 5. Model-model-data comparison. New run compared to GRC run compared to nceiCoastal deep bottle data.

Fig 6. Model-model-data comparison. New run compared to GRC run compared to Ecology shallow bottle data.

Fig 7. Model-model-data comparison. New run compared to GRC run compared to Ecology deep bottle data.

Fig 8. Model-model-data comparison. New run compared to GRC run compared to DFO shallow bottle data.

Fig 9. Model-model-data comparison. New run compared to GRC run compared to DFO deep bottle data.

The table below provides a summary of whether the new model run has improved or worsened bias/RMSE compared to the GRC run. “Improved” implies that the new run has lower bias and RMSE compared to the GRC run. “Worsened” implies that the new run has higher bias and RMSE compared to the GRC run.

Variable	Salinity	Temp	DO	NO3	NH4	DIN	DIC	TA	Chl
NCEI Salish (shallow)	worsened	worsened RMSE same bias	improved	worsened	improved RMSE same bias	worsened	improved	improved	N/A
NCEI Salish (deep)	worsened	worsened RMSE same bias	worsened RMSE improved bias	worsened	improved	worsened	improved	improved	N/A
NCEI Coastal (shallow)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
NCEI Coastal (deep)	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A	N/A
Ecology (shallow)	worsened	worsened	worsened RMSE improved bias	worsened	immproved RMSE worsened bias	worsened	N/A	N/A	N/A
Ecology (deep)	worsened	worsened	worsened	worsened	improved	worsened	N/A	N/A	N/A
DFO (shallow)	worsened	improved RMSE worsened bias	improved		worsened	N/A	N/A	N/A	improved
DFO (deep)	worsened	worsened	improved	worsened	N/A	N/A	N/A	N/A	improved

In general, the new model appears to have less skill in capturing salinity, temperature, NO3, and DIN. The new model tends to improve DIC, TA, and Chl. There are mixed results for DO and NH4.