Library to Strip Wiki Markup from Wikipedia?

I was wondering if there's a better resource for stripping the natural language text, doing appropriate substitutions of things like {{convert|3|lbs|kg}} but ignoring things like {{cite book|bla|bla|bla}} and replacing [[Cat|Cat]] style links but removing [[fr:Chat]] style links, preferably in Ruby, Python, or even C.

All I can find is the PHP that is part of Mediawiki itself and a number of sources that strip the marked-up text from XML, which is trivial.

If not, would anyone be interested in collaborating on writing (a solid begining to) such a parser over the weekend? I have been doing some experimenting and the difficulty is really just recognizing what templates are syntactically relevant (convert, et al) and which are information that doesn't belong/I don't want in the natural language portion (see, cite, et al)

submitted by ltltltlt
[link] [2 comments]

Library to Strip Wiki Markup from Wikipedia?

Trending Articles

Moondru Mudichu 07-06-2016 – Polimer tv Serial

Practice Sheet of Right form of verbs for HSC Students

Greg Gutfeld

Black Angus Grilled Artichokes

Xiaomi YI YHS-113 HD Smart WiFi IP Camera new firmware upgrade

Shaun Sewell – Whitehaven

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Kate Rusby – Blooming Heather

WinISO Standard 6.3.0.4864 With Activator (KaranPc)

Directed Electronics OE Pty Ltd v Isuzu Australia Limited (No 2) [2024] FCA 1198

CODY DAVID SMITH Arrested by Clackamas County Sheriff's Office on Mar 12, 2020

Fight Path: Michael Imperato continues climb from controversy over denied UFC...

Barry loses dog bite case

HOLGER KAMIN Arrested by Miami-Dade County Corrections on Feb 14, 2017

The 10 Tennessee Cities With The Largest Black Population For 2021

Xamarin Forms Android App Connect/Communicate via USB to PC

COURT REGISTER

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

My Sisters Plan For Me To Smell Her Feet (Fiction): Part 1,2,3 and 4!!!

Intel HD Graphics Driver v10.18.10.3345 Available for Windows, New OpenGL 4.1...