Features:
- Allow custom tokenizer to be used for tokenizing words with
.words(:pr:`555`). Thanks :user:`ReinerBRO` for the PR.
Support:
- Support Python 3.10-3.14.
Bug fixes:
- Fix
textblob.download_corporascript (:issue:`474`). Thanks :user:`cagan-elden` for reporting.
Changes:
- Remove vendorized
unicodecsvmodule, as it's no longer used. - Support Python 3.9-3.13 and nltk>=3.9 (:pr:`486`) Thanks :user:`johnfraney` for the PR.
Bug fixes:
- Remove usage of deprecated cElementTree (:issue:`339`). Thanks :user:`tirkarthi` for reporting and for the PR.
- Address
SyntaxWarningon Python 3.12 (:pr:`418`). Thanks :user:`smontanaro` for the PR.
Removals:
TextBlob.translate()andTextBlob.detect_language, andtextblob.translateare removed. Use the official Google Translate API instead (:issue:`215`).- Remove
textblob.compat.
Support:
- Support Python 3.8-3.12. Older versions are no longer supported.
- Support nltk>=3.8.
Bug fixes:
- Fix translation and language detection (:issue:`395`). Thanks :user:`sudoguy` for the patch.
Features:
- Performance improvement: Use
chain.from_iterablein_text.pyto improve runtime and memory usage (:pr:`333`). Thanks :user:`cool-RR` for the PR.
Other changes:
- Remove usage of ctypes (:pr:`354`). Thanks :user:`casatir`.
Deprecations:
TextBlob.translate()andTextBlob.detect_languageare deprecated. Use the official Google Translate API instead (:issue:`215`).
Other changes:
- Backwards-incompatible: Drop support for Python 3.4.
- Test against Python 3.7 and Python 3.8.
- Pin NLTK to
nltk<3.5on Python 2 (:issue:`315`).
Bug fixes:
- Fix bug when
Wordstring type after pos_tags is not astr(:pr:`255`). Thanks :user:`roman-y-korolev` for the patch.
Bug fixes:
- Fix bug that raised a
RuntimeErrorwhen executing methods that delegate topattern.en(:issue:`230`). Thanks :user:`vvaezian` for the report and thanks :user:`danong` for the fix. - Fix methods of
WordListthat modified the list in-place by removing the internal _collection variable (:pr:`235`). Thanks :user:`jammmo` for the PR.
Bug fixes:
- Convert POS tags from treebank to wordnet when calling
lemmatizeto preventMissingCorpusError(:issue:`160`). Thanks :user:`jschnurr`.
Features:
- Add TextBlob.sentiment_assessments property which exposes pattern's sentiment assessments (:issue:`170`). Thanks :user:`jeffakolb`.
Features:
- Use specified tokenizer when tagging (:issue:`167`). Thanks :user:`jschnurr` for the PR.
Bug fixes:
- Avoid AttributeError when using pattern's sentiment analyzer (:issue:`178`). Thanks :user:`tylerjharden` for the catch and patch.
- Correctly pass
formatargument toNLTKClassifier.accuracy(:issue:`177`). Thanks :user:`pavelmalai` for the catch and patch.
Features:
- Performance improvements to NaiveBayesClassifier (:issue:`63`, :issue:`77`, :issue:`123`). Thanks :user:`jcalbert` for the PR.
Features:
- Add Word.stem and WordList.stem methods (:issue:`145`). Thanks :user:`nitkul`.
Bug fixes:
- Fix translation and language detection (:issue:`137`). Thanks :user:`EpicJhon` for the fix.
Changes:
- Backwards-incompatible: Remove Python 2.6 and 3.3 support.
Bug fixes:
- Fix translation and language detection (:issue:`115`, :issue:`117`, :issue:`119`). Thanks :user:`AdrianLC` and :user:`jschnurr` for the fix. Thanks :user:`AdrianLC`, :user:`edgaralts`, and :user:`pouya-cognitiv` for reporting.
Changes:
- Compatible with nltk>=3.1. NLTK versions < 3.1 are no longer supported.
- Change default tagger to NLTKTagger (uses NLTK's averaged perceptron tagger).
- Tested on Python 3.5.
Bug fixes:
- Fix singularization of a number of words. Thanks :user:`jonmcoe`.
- Fix spelling correction when nltk>=3.1 is installed (:issue:`99`). Thanks :user:`shubham12101` for reporting.
Changes:
- Unchanged text is now considered a translation error. Raises
NotTranslated(:issue:`76`). Thanks :user:`jschnurr`.
Bug fixes:
Translator.translatewill detect language of input text by default (:issue:`85`). Thanks again :user:`jschnurr`.- Fix matching of tagged phrases with CFG in
ConllExtractor. Thanks :user:`lragnarsson`. - Fix inflection of a few irregular English nouns. Thanks :user:`jonmcoe`.
Bug fixes:
- Fix
DecisionTreeClassifier.pprintfor compatibility with nltk>=3.0.2. - Translation no longer adds erroneous whitespace around punctuation characters (:issue:`83`). Thanks :user:`AdrianLC` for reporting and thanks :user:`jschnurr` for the patch.
- TextBlob now depends on NLTK 3. The vendorized version of NLTK has been removed.
- Fix bug that raised a SyntaxError when translating text with non-ascii characters on Python 3.
- Fix bug that showed "double-escaped" unicode characters in translator output (issue #56). Thanks Evan Dempsey.
- Backwards-incompatible: Completely remove
import text.blob. You shouldimport textblobinstead. - Backwards-incompatible: Completely remove
PerceptronTagger. Installtextblob-aptaggerinstead. - Backwards-incompatible: Rename
TextBlobExceptiontoTextBlobErrorandMissingCorpusExceptiontoMissingCorpusError. - Backwards-incompatible:
Formatclasses are passed a file object rather than a file path. - Backwards-incompatible: If training a classifier with data from a file, you must pass a file object (rather than a file path).
- Updated English sentiment corpus.
- Add
feature_extractorparameter toNaiveBayesAnalyzer. - Add
textblob.formats.get_registry()andtextblob.formats.register()which allows users to register custom data source formats. - Change
BaseClassifier.detectfrom astaticmethodto aclassmethod. - Improved docs.
- Tested on Python 3.4.
- Fix display (
__repr__) of WordList slices on Python 3. - Add download_corpora module. Corpora must now be downloaded using
python -m textblob.download_corpora.
- Sentiment analyzers return namedtuples, e.g.
Sentiment(polarity=0.12, subjectivity=0.34). - Memory usage improvements to NaiveBayesAnalyzer and basic_extractor (default feature extractor for classifiers module).
- Add
textblob.tokenizers.sent_tokenizeandtextblob.tokenizers.word_tokenizeconvenience functions. - Add
textblob.classifiers.MaxEntClassifer. - Improved NLTKTagger.
- Fix bug in spelling correction that stripped some punctuation (Issue #48).
- Various improvements to spelling correction: preserves whitespace characters (Issue #12); handle contractions and punctuation between words. Thanks @davidnk.
- Make
TextBlob.wordsmore memory-efficient. - Translator now sends POST instead of GET requests. This allows for larger bodies of text to be translated (Issue #49).
- Update pattern tagger for better accuracy.
- Fix bug that caused
ValueErrorupon sentence tokenization. This removes modifications made to the NLTK sentence tokenizer. - Add
Word.lemmatize()method that allows passing in a part-of-speech argument. Word.lemmareturns correct part of speech for Word objects that have theirposattribute set. Thanks @RomanYankovsky.
- Backwards-incompatible: Renamed package to
textblob. This avoids clashes with other namespaces called text. TextBlob should now be imported withfrom textblob import TextBlob. - Update pattern resources for improved parser accuracy.
- Update NLTK.
- Allow Translator to connect to proxy server.
- PerceptronTagger completely deprecated. Install the
textblob-aptaggerextension instead.
- Bugfix updates.
- Fix bug in feature extraction for
NaiveBayesClassifier. basic_extractoris now case-sensitive, e.g. contains(I) != contains(i)- Fix
reproutput when a TextBlob contains non-ascii characters. - Fix part-of-speech tagging with
PatternTaggeron Windows. - Suppress warning about not having scikit-learn installed.
- Wordnet integration.
Wordobjects havesynsetsanddefinitionsproperties. Thetext.wordnetmodule allows you to createSynsetandLemmaobjects directly. - Move all English-specific code to its own module,
text.en. - Basic extensions framework in place. TextBlob has been refactored to make it easier to develop extensions.
- Add
text.classifiers.PositiveNaiveBayesClassifier. - Update NLTK.
NLTKTaggernow working on Python 3.- Fix
__str__behavior.print(blob)should now print non-ascii text correctly in both Python 2 and 3. - Backwards-incompatible: All abstract base classes have been moved to the
text.basemodule. - Backwards-incompatible:
PerceptronTaggerwill now be maintained as an extension,textblob-aptagger. Instantiating atext.taggers.PerceptronTagger()will raise aDeprecationWarning.
- Word tokenization fix: Words that stem from a contraction will still have an apostrophe, e.g.
"Let's" => ["Let", "'s"]. - Fix bug with comparing blobs to strings.
- Add
text.taggers.PerceptronTagger, a fast and accurate POS tagger. Thanks @syllog1sm. - Note for Python 3 users: You may need to update your corpora, since NLTK master has reorganized its corpus system. Just run
curl https://raw.github.com/sloria/TextBlob/master/download_corpora.py | pythonagain. - Add
download_corpora_lite.pyscript for getting the minimum corpora requirements for TextBlob's basic features.
- Fix bug that resulted in a
UnicodeEncodeErrorwhen tagging text with non-ascii characters. - Add
DecisionTreeClassifier. - Add
labels()andtrain()methods to classifiers.
- Classifiers can be trained and tested on CSV, JSON, or TSV data.
- Add basic WordNet lemmatization via the
Word.lemmaproperty. WordList.pluralize()andWordList.singularize()methods returnWordListobjects.
- Add Naive Bayes classification. New
text.classifiersmodule,TextBlob.classify(), andSentence.classify()methods. - Add parsing functionality via the
TextBlob.parse()method. Thetext.parsersmodule currently has one implementation (PatternParser). - Add spelling correction. This includes the
TextBlob.correct()andWord.spellcheck()methods. - Update NLTK.
- Backwards incompatible:
clean_htmlhas been deprecated, just as it has in NLTK. Use Beautiful Soup'ssoup.get_text()method for HTML-cleaning instead. - Slight API change to language translation: if
from_langisn't specified, attempts to detect the language. - Add
itokenize()method to tokenizers that returns a generator instead of a list of tokens.
- Unicode fixes: This fixes a bug that sometimes raised a
UnicodeEncodeErrorupon creating accessingsentencesfor TextBlobs with non-ascii characters. - Update NLTK
- Important patch update for NLTK users: Fix bug with importing TextBlob if local NLTK is installed.
- Fix bug with computing start and end indices of sentences.
- Fix bug that disallowed display of non-ascii characters in the Python REPL.
- Backwards incompatible: Restore
blob.jsonproperty for backwards compatibility with textblob<=0.3.10. Add ato_json()method that takes the same arguments asjson.dumps. - Add
WordList.appendandWordList.extendmethods that append Word objects.
- Language translation and detection API!
- Add
text.sentimentsmodule. Contains thePatternAnalyzer(default implementation) as well as aNaiveBayesAnalyzer. - Part-of-speech tags can be accessed via
TextBlob.tagsorTextBlob.pos_tags. - Add
polarityandsubjectivityhelper properties.
- New
text.tokenizersmodule withWordTokenizerandSentenceTokenizer. Tokenizer instances (from either textblob itself or NLTK) can be passed to TextBlob's constructor. Tokens are accessed through the newtokensproperty. - New
Blobberclass for creating TextBlobs that share the same tagger, tokenizer, and np_extractor. - Add
ngramsmethod. - Backwards-incompatible:
TextBlob.json()is now a method, not a property. This allows you to pass arguments (the same that you would pass tojson.dumps()). - New home for documentation: https://textblob.readthedocs.io/
- Add parameter for cleaning HTML markup from text.
- Minor improvement to word tokenization.
- Updated NLTK.
- Fix bug with adding blobs to bytestrings.
- Bundled NLTK no longer overrides local installation.
- Fix sentiment analysis of text with non-ascii characters.
- Updated nltk.
- ConllExtractor is now Python 3-compatible.
- Improved sentiment analysis.
- Blobs are equal (with ==) to their string counterparts.
- Added instructions to install textblob without nltk bundled.
- Dropping official 3.1 and 3.2 support.
- Importing TextBlob is now much faster. This is because the noun phrase parsers are trained only on the first call to
noun_phrases(instead of training them every time you import TextBlob). - Add text.taggers module which allows user to change which POS tagger implementation to use. Currently supports PatternTagger and NLTKTagger (NLTKTagger only works with Python 2).
- NPExtractor and Tagger objects can be passed to TextBlob's constructor.
- Fix bug with POS-tagger not tagging one-letter words.
- Rename text/np_extractor.py -> text/np_extractors.py
- Add run_tests.py script.
- Every word in a
BloborSentenceis aWordinstance which has methods for inflection, e.gword.pluralize()andword.singularize(). - Updated the
np_extractormodule. Now has an new implementation,ConllExtractorthat uses the Conll2000 chunking corpus. Only works on Py2.