pycld3is archived as of 2026 and will receive no further releases, bug fixes, or security updates. The last released version is0.22, which does not build on Python 3.10 or newer without patches.
- Upstream
cld3is effectively unmaintained. Google'scld3repository has not seen meaningful activity in years, so there is little value in keeping bindings current.- Protobuf C++ ABI instability.
pycld3links againstlibprotobuf, and Google bumps that library's SONAME nearly every release. Any wheel is pinned to one protobuf version, and a user-side protobuf upgrade silently breaksimport cld3(see #30, #34). This is not fixable without vendoring protobuf.- Python 3.10+ build break. The shipped Cython-generated
cld3/pycld3.cppreferenceslongintrepr.h, which CPython removed in 3.11 (see #35, #31). A regeneration would fix this, but the protobuf problem above would remain.- Poor Windows story. From-source installs on Windows consistently fail on protobuf include paths (see #17, #29).
gcld3— Google's own Python bindings to CLD3, usingpybind11. The closest drop-in replacement forpycld3.lingua-language-detector— Pure-Python, more accurate than CLD3 on short text, actively maintained.fasttext+ thelid.176model — Fast, accurate, widely used in production.pycld2— Bindings to the older CLD2, no protobuf dependency.Existing releases of
pycld3remain installable from PyPI for anyone in a compatible environment (Python ≤ 3.9, matchinglibprotobuf), but new projects should choose one of the alternatives above.
Python bindings to the Compact Language Detector v3 (CLD3).
This package contains Python bindings (via Cython) to Google's CLD3 library.
>>> import cld3
>>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度")
LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)The library outputs BCP-47-style language codes. For some languages, output is differentiated by script. Language and script names from Unicode CLDR. It supports over 100 languages/scripts. See full list of supported languages/scripts in Google's CLD3 documentation.
This project supports CPython versions 3.6 through 3.9.
We publish wheels for the following matrix:
- MacOS: CPython 3.6 thru 3.9
- Linux: CPython 3.6 thru 3.9; (manylinux1)
The wheels for both MacOS and manylinux1 include the external protobuf library copied into the wheel itself via auditwheel or delocate so that you won't need to install any extra non-PyPI dependencies.
If you are installing on one of the variants listed above, you should not need to have protoc or libprotobuf installed:
python -m pip install -U pycld3If you are not on a platform variant that is eligible to use a wheel, you may still be able to use pycld3 via its source distribution (tar.gz), but a bit more work is required to install.
Namely, you'll also need:
- the Protobuf compiler (the
protocexecutable) - the Protobuf development headers and
libprotoclibrary - a compiler, preferably
g++
Please consult the official protobuf repository for information on installing Protobuf. The project contains an Installation README that covers installation on Windows and Unix.
If for whatever reason you are on a Unix host but unable to use the wheels (for instance, if you have an i686 architecture), here is a quick-and-dirty guide to installing.
sudo apt-get update -y
sudo apt-get install -y --no-install-recommends \
g++ \
protobuf-compiler \
libprotobuf-dev
python -m pip install -U pycld3Note: Alpine Linux does not support PyPI wheels as of April 2020. The steps below are mandatory on Alpine Linux because you will need to install from the source distribution. If the situation permits, using a Debian distro should be much easier (and faster).
apk --update add g++ protobuf protobuf-dev
python -m pip install -U pycld3Install from source, as root/UID 0:
sudo su -
set -ex
pushd /opt
PROTOBUF_VERSION='3.11.4'
yum update -y
yum install -y autoconf automake gcc-c++ glibc-headers gzip libtool make python3-devel zlib-devel
curl -Lo /opt/protobuf.tar.gz \
"https://github.com/protocolbuffers/protobuf/releases/download/v${PROTOBUF_VERSION}/protobuf-cpp-${PROTOBUF_VERSION}.tar.gz"
tar -xzvf protobuf.tar.gz
rm -f protobuf.tar.gz
pushd "protobuf-${PROTOBUF_VERSION}"
./configure --with-zlib --disable-debug && make && make install && ldconfig --verbose
popd && rm -rf "protobuf-${PROTOBUF_VERSION}" && popd && set +ex
python -m pip install -U pycld3Note: the steps above are for CentOS 8. For earlier versions, you may need to replace:
gcc-c++withg++python3-develwithpython-devel
brew update
brew upgrade protobuf || brew install -v protobuf
python -m pip install -U pycld3Please consult Protobuf's C++ Installation - Windows section for help with installing Protobuf on Windows.
If you would like to help contribute Windows wheels (preferably as a job within the project's CI/CD pipelines), please file an issue.
cld3 exports two module-level functions, get_language() and get_frequent_languages():
>>> import cld3
>>> cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度")
LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)
>>> cld3.get_language("This is a test")
LanguagePrediction(language='en', probability=0.9999980926513672, is_reliable=True, proportion=1.0)
>>> for lang in cld3.get_frequent_languages(
... "This piece of text is in English. Този текст е на Български.",
... num_langs=3
... ):
... print(lang)
...
LanguagePrediction(language='bg', probability=0.9173890948295593, is_reliable=True, proportion=0.5853658318519592)
LanguagePrediction(language='en', probability=0.9999790191650391, is_reliable=True, proportion=0.4146341383457184)A first resort is to preprocess (clean) your input text based on conditions specific to your program.
A salient example is to remove URLs and email addresses from the input. CLD3 (unlike CLD2) does almost none of this cleaning for you, in the spirit of not penalizing other users with overhead that they may not need.
Here's such an example using a simplified URL regex from Regular Expressions Cookbook, 2nd ed.:
>>> import re
>>> import cld3
# cld3 does not ignore the URL components by default
>>> s = "Je veux que: https://site.english.com/this/is/a/url/path/component#fragment"
>>> cld3.get_language(s)
LanguagePrediction(language='en', probability=0.5319557189941406, is_reliable=False, proportion=1.0)
>>> url_re = r"\b(?:https?://|www\.)[a-z0-9-]+(\.[a-z0-9-]+)+(?:[/?].*)?"
>>> new_s = re.sub(url_re, "", s)
>>> new_s
'Je veux que: '
>>> cld3.get_language(new_s)
LanguagePrediction(language='fr', probability=0.9799421429634094, is_reliable=True, proportion=1.0)Note: This URL regex aims for simplicity. It requires a domain name, and doesn't allow a username or password; it allows the scheme (http or https) to be omitted if it can be inferred from the subdomain (www). Source: Regular Expressions Cookbook, 2nd ed. - Goyvaerts & Levithan.
In some other cases, you cannot fix the incorrect detection.
Language detection algorithms in general may perform poorly with very short inputs.
Rarely should you trust the output of something like detect("hi"). Keep this limitation in mind regardless
of what library you are using.
Please remember that, at the end of the day, this project is just a Python wrapper to the CLD3 C++ library that does the actual heavy-lifting.
First, please make sure you have read the installation section that that you have installed Protobuf if necessary.
If that doesn't help, please file an issue in this repository. The build process for this project is somewhat complex because it involves both Cython and Protobuf, but I do my best to make it work everywhere possible.
If you've installed Protobuf, but are seeing an error such as:
ImportError: libprotobuf.so.22: cannot open shared object file: No such file or directory
This likely means that Python is not finding the libprotobuf shared object,
possibly because ldconfig didn't do what it was supposed to.
You may need to tell it where to look.
You can find where the library sits via:
$ find /usr -name 'libprotoc.so' \( -type l -o -type f \)
/usr/local/lib/libprotoc.soThen, you can add the directory containing this file to LD_LIBRARY_PATH:
export LD_LIBRARY_PATH="$(dirname $(find /usr -name 'libprotoc.so' \( -type l -o -type f \))):$LD_LIBRARY_PATH"You can quickly test that this worked:
$ python -c 'import cld3; print(cld3.get_language("影響包含對氣候的變化以及自然資源的枯竭程度"))'
LanguagePrediction(language='zh', probability=0.999969482421875, is_reliable=True, proportion=1.0)This repository contains a fork of google/cld3 at commit 06f695f. The license for google/cld3 can be found at
LICENSES/CLD3_LICENSE.
This repository is a combination of changes introduced by various forks of google/cld3 by the following people:
- Johannes Baiter (@jbaiter)
- Elizabeth Myers (@Elizafox)
- Witold Bołt (@houp)
- Alfredo Luque (@iamthebot)
- WISESIGHT (@wisesight)
- RNogales (@RNogales94)
- Brad Solomon (@bsolomon1124)