Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62693

Looking to scrape contact info from mixed raw HTML pages, for indexing into a central database. Anybody faced this problem (or similar) before and have a lead?

$
0
0

Title is a pretty good summary. I've got a large number of files I'm responsible for extracting name, email, address, phone number from to build a database for work, and data scraping is a little out of my wheelhouse. The layout of the pages is not all the same, but there are indicators to help locate the pertinent data - the name or an abbreviation of the name is part of the filename, and the phone numbers seem to universally be in the xxx-xxx-xxxx or (xxx) xxx-xxxx format. I wouldn't presume to barge in here and ask y'all to do my work for me - just a point in the right direction would be a blessing. Without prior knowledge, I'm using the shotgun research approach so far, and several hours of Googling has thusfar produced... bupkiss. So, any ideas?

submitted by mskallisti
[link] [15 comments]

Viewing all articles
Browse latest Browse all 62693

Trending Articles