Given ample training data of compiling and non-compiling strings of source code in a single language (consider the language to be fully unknown, albeit consistent), construct a self configuring solution for classifying new strings of source code from that same language as either compiling or non-compiling.
The solution to this problem could theoretically comprehend (or tractably index) the foundational structure of language (programming, spoken, written, mathematical, scientific, genetic, viral, chemical, or otherwise).
If anyone else is actively working in this problem domain, I'm looking for you.
[link][5 comments]