Computational Statistics 2024-03-21

4. Cross Validation

4.1 Training and Test Set

We impose the $\iid$ assumption, i.e. $(\rvec X_i, Y_i) \sim F_{\rvec X, Y}$ Recall that the estimate $\mhat$ is constructed by using some estimator based on the data $(\rvec X_1, Y_1), \ldots, (\rvec X_n, Y_n)$ We then would like to evaluate the accuracy of the estimated target which is based on the training data. A principal problem thereby is that if we use the training data again to measure the predictive power of our estimated target, e.g. $\mhat$ the results will be overly optimistic. Thus, we could look how well the estimated.