I am currently using a Bayesian network model with 20 variables and 210 data points, with 15 locations measured at 14 different time points each. There are also some restrictions on what types of connections are allowed.
I have looked at leave-one-out cross-validation methods (and arguably leave-two-out would be computationally feasible with this sample size), and I am currently using that method. However, I want to know which of the following, if any, are appropriate means of going about this:
k-fold cross-validation, leaving out a single location (14 points). My main concern is that if the model were to be applied to locations not sampled, it is quite possible that correlations within a single locations's set of measurements may be unnecessarily influential on a model. Additionally, it's been suggested that k-fold cross-validation produces better results for simpler models such as linear regression. [1] . Since my sample size is small, however, I'm not sure how much I can afford to cut out.
Bagging (bootstrap aggregating) [3] . I've mostly seen this method applied to decision trees, but it's used to avoid overfitting.
Jacknifing to determine the properties of the estimators being derived.
Additionally, one interesting property (at least for the datasets used in Zuk et al. [2] ) of the Bayesian Networks is that underfitting decays exponentially fast with sample size, whereas overfitting decays as a power of N. Keeping that in mind, I am trying to decide what methods are most appropriate for my scenario.
Sources used:
1FAQs.org: What are cross-validation and bootstrapping?
2Zuk et al. "On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network".
3Elidan, Gel. "Bagged Learning Structure of Bayesian Networks".
I asked this question on cross-validated, although the only response was that leave-one-out cross-validation is preferable to bagging, and the explanation was not super-detailed.
[link][comment]