The data that I am using is fabricated from a few minutes of sound, but this process bloats the size so much that it has no hope of fitting in my limit of 4GB of RAM. What I have been doing is walking down pieces of my sound data, and fabricating small pieces one at a time, and training on scrambled versions of these. This is causing problems with learning, though, and I really need to scramble my entire training set. I can't scramble the sound first, because the process of making training data from sound takes into account local samples (components of a sound file) in a neighborhood of a few hundred samples.
One solution would be to fabricate all of the data, put it in a massive file on my hard drive, scramble it there, and then require constant disc reads during learning, but this seems like a pain, and will probably slow down learning considerably because I have a feeling the IO will take longer than the matrix multiplications that are the current time bottleneck in learning.
EDIT:
Scrambling it on my hard drive would take forever if I used an easy to implement, obvious algorithm (each read and write would only be for one training example). I would probably use a USB drive for that.
[link][1 comment]