WP2TXT 0.8

Free WP2TXT extracts plain text data from Wikipedia dump file.
Rating
Your vote:
Latest version:
0.8 See all
Developer:
Yoichiro Hasebe
Download
Free  
Freeware
Used by 1 person
Info updated on:

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. In addition, the app allows you to specify text elements to be extracted/converted (title, heading, paragraph, etc.). The character references are converted to UTF-8 entities.
The app is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.

Screenshot (1)

Comments

User

Your vote: