Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63398

Resources for Extracting Main Text from a Webpage

$
0
0

What are some good resources for extracting the main text of a webpage? What I mean is, given a web page in HTML format, extract the main body of the text, not including irrelevant stuff like sidebars, ads etc.

I know that this is an active research topic, but I am curious if anyone has found a library that works well.

submitted by LADataJunkie
[link] [9 comments]

Viewing all articles
Browse latest Browse all 63398

Trending Articles