Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62709

Ask r/ML: Seeking advice on how ML techniques can be used to parse data from XML files

$
0
0

We have about 30,000 XML files that contain financial information. We're trying to parse 10 pieces of data from each of these files, but the problem is that the xml tags that contain the data are not consistently named. Let's say we're looking for 'revenue'. Some files may have 'revenues', some may have 'TotalRevenue', etc.

Are there any ML techniques that would allow us to extract this data that are better than the naive approach of building a dictionary of terms, scoring matches, and taking the one with the highest score?

I would imagine that some sort of supervised learning algorithm that uses n-grams (or some other markov-like approach) might yield interesting results. I'm just getting into NLP (hence the fascination with n-grams), so please forgive my misuse of any terms.

I'm sure this problem has been solved before ... is there a name for it?

Thank you so much for your time!

submitted by metaobject
[link] [4 comments]

Viewing all articles
Browse latest Browse all 62709

Trending Articles