Closed Form Missing Value Imputation Question

Sorry if this is a trivial question, I haven't had any applicable formal education, but I've been looking into how to do missing value imputation, and I am surprised that most methods seem to use iterative methods that involve the whole data set each iteration. I was expecting the imputation of missing values to have a closed form solution for the linear model methods such as PCA.

I'm also not so much interested in imputing missing values of a fixed data set, instead I keep getting new online samples with some missing values, and would like to fill in the missing values (usually about 1% to 5% missing) while their real values haven't arrived yet. My dataset so far contains about 8 million samples of 1000 values (30GB), and all values have a mean of zero and standard deviation of one, and are highly correlated with at least a handful of other values.

Before I had read anything about imputation I assumed I could first build a covariance matrix estimate with my dataset (I'm OK with assuming the covariances are not changed by any missing data), and that after that I wouldn't need the data anymore, except for maybe a random sample of it that fits in memory to tune some hyper-parameters like the number of principal components to use in the case of PCA, or the noise and regularization for a marginalized denoising autoencoder.

I also expected that the calculation of missing values was generally done using a closed form solution as a function of the known values and the weights of the linear reconstruction model, and possibly a regularization term as well.

I was writing some code for handling the large amount of data to estimate the covariance matrix, and calculating a reconstruction matrix, but then I started reading to find the formula to do the imputation in closed form... but I couldn't find anything on that, just iterative methods.

Am I missing something here? Is this a bad or impossible way to approach the problem?

I would really like to know how to calculate the missing values from the known values and a weight matrix W (W = P*P^T , where P is a matrix of the first few principal components).

I think I would need to find the missing values of A that minimizes something like

(A * W - A)² + lambda * Amissing ²

where lambda is the regularization term, A is a vector of input values, and Amissing are only the originally missing values.

I suspect it's something like the differences between the reconstructed values and the input values, projected back to the missing values, a bit like back-propagation, and then multiplied by the inverse of some matrix. But I can't quite figure out the exact solution.

If anyone could help or suggest a better method, I would greatly appreciate it.

Thank you!

submitted by diyweatherman
[link][1 comment]

Closed Form Missing Value Imputation Question

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...