WP2TXT 0.3

Free WP2TXT extracts plain text data from Wikipedia dump file.
Rating
Your vote:
Latest version:
0.8 See all
Developer:
Yoichiro Hasebe
Download
Free   11 MB
Freeware
Used by 1 person
Info updated on:

WP2TXT extracts plain text data from Wikipedia dump file (encoded in XML/compressed with Bzip2) stripping all the MediaWiki markups and other metadata. In addition, the app allows you to specify text elements to be extracted/converted (title, heading, paragraph, etc.). The character references are converted to UTF-8 entities.
The app is originally intended to be useful for researchers who look for an easy way to obtain open-source multi-lingual corpora, but may be handy for other purposes.

Screenshot (1)

Comments

User

Your vote:

Related apps

WebScraper
WebScraper
rating

Scans websites to extract useful associated data from them.

Nifty File Lists
Nifty File Lists
rating

Creates file lists, extracts metadata and saves them in multiple formats.

Pdf Metadata Editor
Pdf Metadata Editor
Free
rating

Allows modifying the metadata of single or various PDF documents.

Dump Truck
Dump Truck
rating

Securely store, sync and share all of your files.

Squeed
Squeed
rating

Straightforward, efficient MP3, FLAC, AIFF, and M4A metadata (tag) editor with online database support.

Tags