Hi. I've primarily been a reader of this subreddit until now and I was looking for some advice for my graudate research.
I'm working on on M.S. in Engineering, specifically in Machine Learning. I'm almost done wrapping up the exact direction I want to take my research in, which involves developing an algorithm with thorough evaluation on synthetic and real-world data.
My problem is to address and overcome the following hurdles:
Concept drift (also known as covariate drift, domain adaptation, non-stationary data) - In a batch learning scenario, adjacent time-series 'snapshots' of the data can be drawn from different data distributions if there is a hidden context in the data, causing the decision boundaries to change over time.
Inductive Learning - Since the labeling of real-world data can be expensive, the algorithm should be able to deal with only receiving unlabeled data for a series of consecutive time steps until labeled data eventually arives.
Imbalanced data - Prior probabilities for a given class may be heavily lopsided at a given series of time steps.
The restrictions of the algorithm are strictly on the data retention. Training and test data cannot be stored and used in future time steps. The idea here is to deal with the stability-plasticity dilemma in an environment that gradually (simply defined as not completely at random) over time.
I was wondering if anyone had any advice or information as to what direction I can explore or what the current methods are that can be used for motivation or a basis for improvement. Another great piece of information would be where I can find real-world data to experiment with. The UCI Machine Learning repository doesn't seem to have anything that specifically relates to concept drift.
I've read a couple books which have summarized some of the following methods found here (we use MATLAB for implementation at my school), so I am familiar with these concepts as well as ensemble learning methods using a variety of different base learning models (SVM, MLP, Naive Bayes).
I've seen a lot of great knowledge shared on here and this seems like a great place to get advice for an aspiring ML student.
Thank you in advance!
[link][3 comments]