WP2TXT

WP2TXT extracts plain text data from Wikipedia dump file.

  DOWNLOAD Free

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata.

In addition, the app allows you to specify text elements to be extracted/converted (title, heading, paragraph, etc.). The character references are converted to UTF-8 entities.
The app is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.

  DOWNLOAD Free
Specifications
Developer:
Yoichiro Hasebe
License type:
Freeware