Quantifying Uncertainty in Ecosystem Studies

Index » Individual prediction vs. uncertainty in the regression model » Uncertainty in prediction of individuals vs. uncertainty in prediction
##

### Uncertainty in prediction of individuals vs. uncertainty in prediction

##

### Re: Uncertainty in prediction of individuals vs. uncertainty in prediction

##

### Re: Uncertainty in prediction of individuals vs. uncertainty in prediction

##

### Re: Uncertainty in prediction of individuals vs. uncertainty in prediction

Index » Individual prediction vs. uncertainty in the regression model »
Uncertainty in prediction of individuals vs. uncertainty in prediction

**1** of 1

**Ruth Yanai****New member**Offline

- Registered: 2/19/2015
- Posts: 9

I now think we need to include error in individual tree predictions when we estimate uncertainty in biomass models. Pierre Bernier, David Pare, and their colleagues in Quebec have been including the error in individual trees (Bernier et al. 2010), and ignoring the error in the mean prediction (we did the opposite in Yanai et al. 2010). We finally got together at IUFRO with some biometricians, who said we are both right (and both wrong); we should do both.

*Last edited by quantifying uncertainty (3/16/2017 2:02 pm)*

**quantifying uncertainty****Administrator**Offline

- Registered: 2/19/2015
- Posts: 4

**Propagate both the PI and the CI**

*From Isabelle Auger, the statistician in Canada via Sylvie Tremblay (Oct 21, 2011)**This email to Ruth Yanai includes references to the equations in the Ecosystems paper by Yanai et al. (2010) (available from the QUEST **web site** or from Ecosystems, open access) and to the Excel sheet in which the calculations were made, also available under “Sample Code.”*

Applying the mean error of ŷ at a specified value of x (sm, eq. 5) to each tree is equivalent to estimate the error of the equation.

However, 1) it underestimates the error of the equation, and 2) it doesn't include the error of a prediction.

1) it underestimates the error of the equation

Isabelle calculated the error of an equation, developed with 853 observations, by two ways: 1) with sm, like you, and 2) with the error of the parameters equation. She concluded that using sm underestimates the error of the equation, because the error of the parameters is function of MSE/fct(X) (compare to MSE/n). The same conclusion was obtained with a smaller sample (n=30).

2) it doesn't include the error of a prediction

Once the error of the equation is estimated, it remains to estimate the prediction error for each tree. This error is small when the number of estimated trees is large, because the errors tend to cancel out, but it can also be larger when there is only few trees per plot.

Isabelle thinks that the way you calculate the error in your Excel sheet mixed up those two sources of error. In an equation, there is two sources of error: one from the parameters (constant error for all trees) et one from the residual error (different for each tree). The first one should be simulated from a normal multivariate distribution with the parameter errors and correlation between them. If this information is not available, an error from a N(0,MSE) could be use (constant for each tree), see 1). This error does not depend on the Xs. To this first error, a second one should be added, the one from the residual error. This error applies at the tree level, so depend on the Xs. This error should come from a N(0, Var), where Var = MSE*sqrt(1/n+(xi-xbar)^2/sum(xi-xbar)^2). In your Excel sheet, you used : prediction + N(0,MSE)*sqrt( 1/n+(xi-xbar)^2/sum(xi-xbar)^2). It’s strange to replace the sigma2 in a formula by a random number. The variance formula should be use to generate a random number. Isabelle think that the formula should be : prediction + N(0,MSE) + N(0,Var), where the random number from N(0,MSE) is the same for all tree and the one from N(0,Var) is different for each tree.

For these two reasons, Isabelle suggests that the error applied to each tree in a simulation should be: 1) a same equation error + 2) a different prediction error.

**remarra****New member**Offline

- Registered: 3/16/2017
- Posts: 1

As I'm finishing up uncertainty analysis on my dataset, I've run into a question about how you're calculating the confidence and prediction intervals (uncertainty in prediction of mean and individual, respectively). Referring to the workshop spreadsheet ("Uncertaintyinbiomassregression5_3_000"), and to column G of "Biomass Calculations" worksheet, is it correct to add the mean-prediction-error (col.E) and the individual-prediction-error (col.F), rather than just the latter, to the regression in doing the calculations? Looking at the equations for each, it seems that "mean-prediction-error" is already incorporated into the "individual-prediction-error". Otherwise, it seems that the mean-prediction-error is being accounted for twice? My understanding of the prediction interval is that it incorporates both the uncertainty in knowing the true (i.e., population) mean, as well as the uncertainty in the model (regression) parameters.

I’ve been doing some reading on this, and I’m even further convinced on this point.

The point I tried to make is that uncertainty in knowing the value of the population mean is already being accounted for when accounting for the uncertainty in predicting an individual value. So by adding the mean-prediction-error term to the individual-prediction-error term, the former is being accounted for twice.

Put another way, when we calculate a confidence interval, we’re determining the upper and lower limits of the range within which the true mean lies. When we calculate a prediction interval, we’re determining the upper and lower limits of the range within which individual future observations lie; those observations are based on the estimate of the parameter itself.

Uncertainty in predicting an individual takes into account the variability in the conditional distribution as well as the uncertainty in the estimate of the conditional mean; the latter is an inextricable part of the former.

**Ruth Yanai****New member**Offline

- Registered: 2/19/2015
- Posts: 9

Thank you for thinking about this. It makes me uneasy that the combined uncertainty (PI and CI) has not been used before. The problem is that when propagating uncertainty using Monte Carlo, applying the PI to individuals results in decreasing uncertainty with increasing sample size (this is related to the Central Limit Theorem—with the PI randomly applied to each individual, and with the error averaging zero, the sum of errors approaches zero as the sample size approaches infinity). Clearly, it should never be the case that the uncertainty in the PI should be smaller than the CI, which is what happens in the Monte Carlo. Perhaps what we need is a version of the PI that excludes, rather than includes, the CI. Then we could combine them and get the right answer.

This is related to the problem, common in error propagation, that measurement error is already contributing to sampling error, such that we should subtract it rather than add it, which counts it twice. Measurement error and sampling error contribute to model error.

Can you suggest how we should propagate error in the model? I can’t accept that we should use the PI alone, randomly applied to individuals, if this results in zero model error when taken ad absurdum to infinite samples.

I’ve tried asking statisticians about this problem. One in my department initially had the answer you gave, that the PI includes the CI. But when I took him through the steps of the Monte Carlo, he agreed that the PI was not sufficient. The CI works in the case of large samples. He suggested that the fact that small samples have larger uncertainty is related to (causally? or analogously? I wasn’t sure) to the t-distribution. I tried asking Jim Clark but he wasn’t interested. Maybe he will be more interested with a more explicit exposition of the problem. I’ll also send this back to the biometrician who is a co-author on the paper we are trying to finish. There was a statistician in Canada who suggested the approach we are using now, when I protested the use of the PI alone, some years ago. I will consult her as well.

If there is a better way, I want to know before we publish our illustration of the problem.

**1** of 1