This is meant as a comment to sakthi's answer: you actually have to precise which POS you're looking for noun, adjective, verb, etc. Stack Overflow for Teams — Collaborate and share knowledge with a private group.
Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Asked 9 years, 4 months ago. Active 7 years, 7 months ago. Viewed 4k times.
Add a comment. Active Oldest Votes. Vincent Labatut Vincent Labatut 1, 1 1 gold badge 23 23 silver badges 36 36 bronze badges. Test that the data has been installed as follows. This assumes you downloaded the Brown Corpus :. If your web connection uses a proxy server, you should specify the proxy address as follows.
In the case of an authenticating proxy, specify a username and password. In this way, stemming reduces the size of the index and increases retrieval accuracy. In NLTK, stemmerI , which have stem method, interface has all the stemmers which we are going to cover next. Let us understand it with the following diagram. It is one of the most common stemming algorithms which is basically designed to remove and replace well-known suffixes of English words.
This class knows several regular word forms and suffixes with the help of which it can transform the input word to a final stem. The resulting stem is often a shorter word having the same root meaning. NLTK has LancasterStemmer class with the help of which we can easily implement Lancaster Stemmer algorithms for the word we want to stem. It basically takes a single regular expression and removes any prefix or suffix that matches the expression.
0コメント