Python: NLTK download corpus: Difference between revisions
From OnnoCenterWiki
Jump to navigationJump to search
Onnowpurbo (talk | contribs) No edit summary |
Onnowpurbo (talk | contribs) No edit summary |
||
| Line 15: | Line 15: | ||
--------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ||
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala | Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar, | ||
Packages: | |||
[ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian) | |||
[ ] mwa_ppdb............ The monolingual word aligner (Sultan et al. | |||
2015) subset of the Paraphrase Database. | |||
[ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder) | |||
[-] panlex_lite......... PanLex Lite Corpus | |||
[ ] pe08................ Cross-Framework and Cross-Domain Parser | |||
Evaluation Shared Task | |||
[-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0 | |||
character properties in Perl | |||
[ ] porter_test......... Porter Stemmer Test Files | |||
[-] stopwords........... Stopwords Corpus | |||
[ ] vader_lexicon....... VADER Sentiment Lexicon | |||
[ ] wmt15_eval.......... Evaluation data from WMT15 | |||
Collections: | |||
[-] all-corpora......... All the corpora | |||
[-] all................. All packages | |||
[-] book................ Everything used in the NLTK Book | |||
([*] marks installed packages; [-] marks out-of-date or corrupt packages) | |||
Download which package (l=list; x=cancel)? | |||
Identifier> | |||
Pilih | |||
all | |||
supaya tidak pusing, tapi ini akan memakan banyak bandwidth | |||
Revision as of 22:17, 1 February 2017
Corpus untuk NLTK bisa di download menggunakan script, misalnya download-corpus.py
import nltk nltk.download()
jalankan
python download-corpus.py
akan keluar
NLTK Downloader
---------------------------------------------------------------------------
d) Download l) List u) Update c) Config h) Help q) Quit
---------------------------------------------------------------------------
Pilih d untuk mendownload semua corpus yang ada supaya tidak pusing kepala, akan keluar,
Packages:
[ ] averaged_perceptron_tagger_ru Averaged Perceptron Tagger (Russian)
[ ] mwa_ppdb............ The monolingual word aligner (Sultan et al.
2015) subset of the Paraphrase Database.
[ ] nonbreaking_prefixes Non-Breaking Prefixes (Moses Decoder)
[-] panlex_lite......... PanLex Lite Corpus
[ ] pe08................ Cross-Framework and Cross-Domain Parser
Evaluation Shared Task
[-] perluniprops........ perluniprops: Index of Unicode Version 7.0.0
character properties in Perl
[ ] porter_test......... Porter Stemmer Test Files
[-] stopwords........... Stopwords Corpus
[ ] vader_lexicon....... VADER Sentiment Lexicon
[ ] wmt15_eval.......... Evaluation data from WMT15
Collections:
[-] all-corpora......... All the corpora
[-] all................. All packages
[-] book................ Everything used in the NLTK Book
([*] marks installed packages; [-] marks out-of-date or corrupt packages)
Download which package (l=list; x=cancel)?
Identifier>
Pilih
all
supaya tidak pusing, tapi ini akan memakan banyak bandwidth
AKan tersimpan di
~/nltk_data/
Lumayan besar ..