@@ -24,10 +24,10 @@ it is usually counterproductive to worry about the efficiency of the code.
2424 k-Means example
2525---------------
2626
27- In the following, I show examples of the `k-means algorithm
28- <https://en.wikipedia.org/wiki/K-means_clustering> `_ to form a previously known
29- number of groups from a set of objects. This can be achieved in the following
30- three steps:
27+ In the following, I will provide examples of the `k-means algorithm
28+ <https://en.wikipedia.org/wiki/K-means_clustering> `_ algorithm, which is used to
29+ form a predefined number of clusters from a set of objects. This can be achieved
30+ using MacQueen’s algorithm in the following three steps:
3131
3232#. Choose the first :samp: `k ` elements as cluster centres
3333#. Assign each new element to the cluster with the least increase in variance.
@@ -40,6 +40,7 @@ A possible implementation with pure Python could look like this:
4040.. literalinclude :: py_kmeans.py
4141 :caption: py_kmeans.py
4242 :name: py_kmeans.py
43+ :lines: 6-
4344
4445We can create sample data with:
4546
@@ -62,18 +63,30 @@ Performance measurements
6263------------------------
6364
6465Once you have worked with your code, it can be useful to examine its efficiency
65- more closely. `cProfile
66- <https://docs.python.org/3.14/library/profile.html#module-cProfile> `_,
67- :doc: `ipython-profiler ` or :doc: `scalene ` can be used for this.
66+ more closely. :doc: `cProfile <tracing >`, :doc: `ipython-profiler `, :doc: `scalene `
67+ or :doc: `tprof ` can be used for this. So far, I usually carry out the following
68+ steps:
69+
70+ #. I profile the entire programme with :doc: `cProfile <tracing >` or `py-spy
71+ <https://github.com/benfred/py-spy> `_ to find slow functions.
72+ #. If necessary, I can use the `line_profiler
73+ <https://github.com/pyutils/line_profiler> `_ to identify the slow sections
74+ within the function
75+ #. If the slow function is computationally intensive, I try one of the following
76+ optimisations; however, if the application is data-intensive (dictionaries,
77+ strings, I/O), I take a closer look at the architecture.
78+ #. Then I optimise a slow function.
79+ #. Finally, I create a new profile and filter out the result of my optimised
80+ version so that I can compare the results.
6881
6982.. versionadded :: Python3.15
7083 :pep: `799 ` will provide a special profiling module that organises the
7184 profiling tools integrated in Python under a uniform namespace. This module
7285 contains:
7386
7487 :mod: `profiling.tracing `
75- deterministic function call tracing, which has been moved from ` cProfile
76- <https://docs.python.org/3.14/library/profile.html#module- cProfile> `_ .
88+ deterministic function call tracing, which has been moved from
89+ :doc: ` cProfile < tracing >` .
7790 :mod: `profiling.sampling `
7891 the new statistical sampling profiler :doc: `tachyon `.
7992
@@ -91,12 +104,14 @@ more closely. `cProfile
91104 :titlesonly:
92105 :maxdepth: 0
93106
107+ tracing
94108 ipython-profiler.ipynb
95109 scalene.ipynb
110+ tprof
96111 tachyon
97112
98- Search for existing implementations
99- -----------------------------------
113+ 1. Search for existing implementations
114+ --------------------------------------
100115
101116You should not try to reinvent the wheel: If there are existing implementations,
102117you should use them. There are even two implementations for the k-means
@@ -128,8 +143,8 @@ create a considerable overhead in your project if you are not already using
128143<https://ml.dask.org> `_ elsewhere. In the following, I will therefore show you
129144further possibilities to optimise your own code.
130145
131- Find anti-patterns
132- ------------------
146+ 2. Find anti-patterns
147+ ---------------------
133148
134149Then you can use :doc: `perflint ` to search your code for the most common
135150performance anti-patterns in Python.
@@ -144,31 +159,39 @@ performance anti-patterns in Python.
144159.. seealso ::
145160 * `Effective Python <https://effectivepython.com >`_
146161
147- Vectorisations with NumPy
148- -------------------------
162+ 3. Vectorisations with NumPy
163+ ----------------------------
149164
150165:doc: `../workspace/numpy/index ` moves repetitive operations into a statically
151166typed compiled layer, combining the fast development time of Python with the
152- fast execution time of C. You may be able to use
153- :doc: `../workspace/numpy/ufunc `, :doc: `vectorisation
154- <../workspace/numpy/vectorisation>` and
155- :doc: `../workspace/numpy/indexing-slicing ` in all combinations to move
156- repetitive operations into compiled code to avoid slow loops.
157-
158- With NumPy we can do without some loops:
167+ fast execution time of C.
168+
169+ +---------------+---------------+----------+
170+ | Version | Spectral-norm | vs 3.14x |
171+ +===============+===============+==========+
172+ | CPython 3.14 | 14,046ms | |
173+ | – Basis | | |
174+ +---------------+---------------+----------+
175+ | NumPy | 27ms | 520x |
176+ +---------------+---------------+----------+
177+
178+ You may be able to use :doc: `../workspace/numpy/ufunc `, :doc: `vectorisation
179+ <../workspace/numpy/vectorisation>`, :doc: `../workspace/numpy/indexing-slicing `
180+ in various combinations to move repetitive operations into compiled code and
181+ thus avoid slow loops, for example:
159182
160183.. literalinclude :: np_kmeans.py
161184 :caption: np_kmeans.py
162185 :name: np_kmeans.py
163- :lines: 1-8
186+ :lines: 5-12
164187
165188The advantages of NumPy are that the Python overhead only occurs per array and
166189not per array element. However, because NumPy uses a specific language for array
167190operations, it also requires a different mindset when writing code. Finally, the
168191batch operations can also lead to excessive memory consumption.
169192
170- Special data structures
171- -----------------------
193+ 4. Special data structures
194+ --------------------------
172195
173196:doc: `../workspace/pandas/index `
174197 for SQL-like :doc: `../workspace/pandas/group-operations ` and
@@ -179,7 +202,7 @@ Special data structures
179202 .. literalinclude :: pd_kmeans.py
180203 :caption: pd_kmeans.py
181204 :name: pd_kmeans.py
182- :lines: 2-4, 11-15
205+ :lines: 5-8, 16-19
183206
184207`scipy.spatial <https://docs.scipy.org/doc/scipy/reference/spatial.html >`_
185208 for spatial queries like distances, nearest neighbours, k-Means :abbr: `etc
@@ -190,7 +213,7 @@ Special data structures
190213 .. literalinclude :: sp_kmeans.py
191214 :caption: sp_kmeans.py
192215 :name: sp_kmeans.py
193- :lines: 6-9
216+ :lines: 5-13
194217
195218`scipy.sparse <https://docs.scipy.org/doc/scipy/reference/sparse.html >`_
196219 `sparse matrices <https://en.wikipedia.org/wiki/Sparse_matrix >`_
@@ -209,8 +232,8 @@ Special data structures
209232
210233 parallelise-pandas
211234
212- Select compiler
213- ---------------
235+ 5. Select compiler
236+ ------------------
214237
215238Faster CPython
216239~~~~~~~~~~~~~~
@@ -226,8 +249,47 @@ particular is likely to benefit from the changes; code already written in C,
226249I/O-heavy processes and multithreaded code, on the other hand, are unlikely to
227250benefit.
228251
252+ And indeed, the cPython versions have become significantly more efficient since
253+ then:
254+
255+ +------------------+---------+
256+ | Version | |
257+ +==================+=========+
258+ | CPython 3.10.4 | 1.422x |
259+ +------------------+---------+
260+ | CPython 3.12.0 | 1.093x |
261+ +------------------+---------+
262+ | CPython 3.13.0 | 1.024x |
263+ +------------------+---------+
264+ | CPython 3.15.0a0 | |
265+ | – Basis | |
266+ +------------------+---------+
267+
229268.. seealso ::
230- * `Faster CPython <https://web.archive.org/web/20221007175548/https://faster-cpython.readthedocs.io/ >`__
269+ * `Faster CPython
270+ <https://web.archive.org/web/20221007175548/https://faster-cpython.readthedocs.io/> `__
271+ * `Faster CPython Benchmark Infrastructure
272+ <https://github.com/faster-cpython/benchmarking-public?tab=readme-ov-file> `_
273+
274+ Free-threaded Python was also included in another comparison:
275+
276+ +---------------+---------+---------+---------------+----------+
277+ | Version | N-body | vs 3.14 | Spectral-norm | vs 3.14x |
278+ +===============+=========+=========+===============+==========+
279+ | CPython 3.10 | 1,663ms | 0.75x | 16,826ms | 0.83x |
280+ +---------------+---------+---------+---------------+----------+
281+ | CPython 3.11 | 1,200ms | 1.04x | 13,430ms | 1.05x |
282+ +---------------+---------+---------+---------------+----------+
283+ | CPython 3.13 | 1,134ms | 1.10x | 13,637ms | 1.03x |
284+ +---------------+---------+---------+---------------+----------+
285+ | CPython 3.14 | 1,242ms | | 14,046ms | |
286+ | – Basis | | | | |
287+ +---------------+---------+---------+---------------+----------+
288+ | CPython 3.14t | 1,513ms | 0.82x | 14,551ms | 0.97x |
289+ +---------------+---------+---------+---------------+----------+
290+
291+ – Surce: `The Optimization Ladder
292+ <https://cemrehancavdar.com/2026/03/10/optimization-ladder/> `_
231293
232294If you don’t want to wait with your project until the release of Python 3.11 in
233295the final version probably on 24 October 2022, you can also have a look at the
@@ -247,12 +309,32 @@ Python JIT compiler
247309 <https://github.com/python/cpython/blob/main/Tools/jit/README.md> `_
248310 * :ref: `whatsnew315-jit `
249311
312+ +------------------+---------+
313+ | Version | |
314+ +==================+=========+
315+ | CPython 3.15.0a0 | 1.001x |
316+ | (JIT) | |
317+ +------------------+---------+
318+ | CPython 3.15.0a0 | |
319+ | – Basis | |
320+ +------------------+---------+
321+
250322Cython
251323~~~~~~
252324
253325For intensive numerical operations, Python can be very slow, even if you have
254326avoided all anti-patterns and used vectorisations with NumPy. In this case,
255327translating code into `Cython <https://cython.org >`_ can be helpful.
328+
329+ +---------------+---------+---------+---------------+----------+
330+ | Version | N-body | vs 3.14 | Spectral-norm | vs 3.14x |
331+ +===============+=========+=========+===============+==========+
332+ | CPython 3.14 | 1,242ms | | 14,046ms | |
333+ | – Basis | | | | |
334+ +---------------+---------+---------+---------------+----------+
335+ | Cython | 10ms | 124x | 142ms | 99x |
336+ +---------------+---------+---------+---------------+----------+
337+
256338Unfortunately, the code often has to be restructured and thus increases in
257339complexity. Explicit type annotations and the provision of code also become more
258340cumbersome.
@@ -262,7 +344,7 @@ Our example could then look like this:
262344.. literalinclude :: cy_kmeans.pyx
263345 :caption: cy_kmeans.pyx
264346 :name: cy_kmeans.pyx
265- :lines: 1-28
347+ :lines: 5-32
266348
267349.. seealso ::
268350 * `Cython Tutorials
@@ -277,13 +359,27 @@ scientific Python and NumPy code into fast machine code, for example:
277359.. literalinclude :: nb_kmeans.py
278360 :caption: nb_kmeans.py
279361 :name: nb_kmeans.py
280- :lines: 1-25
362+ :lines: 5-29
281363
282364However, Numba requires `LLVM <https://en.wikipedia.org/wiki/LLVM >`_ and some
283365Python constructs are not supported.
284366
285- Task planner
286- ------------
367+ +---------------+---------+---------+---------------+----------+
368+ | Version | N-body | vs 3.14 | Spectral-norm | vs 3.14x |
369+ +===============+=========+=========+===============+==========+
370+ | CPython 3.14 | 1,242ms | | 14,046ms | |
371+ | – Basis | | | | |
372+ +---------------+---------+---------+---------------+----------+
373+ | Numba | 22ms | 56x | 104ms | 135x |
374+ +---------------+---------+---------+---------------+----------+
375+
376+ .. seealso ::
377+ * `Speeding up NumPy with parallelism
378+ <https://pythonspeed.com/articles/numpy-parallelism/> `_ by Itamar
379+ Turner-Trauring
380+
381+ 6. Task planner
382+ ---------------
287383
288384:doc: `jupyter-tutorial:hub/ipyparallel/index `, :doc: `dask ` and `Ray
289385<https://docs.ray.io/en/latest/> `_ can distribute tasks in a cluster. In doing
@@ -319,7 +415,7 @@ Our example could look like this with Dask:
319415.. literalinclude :: ds_kmeans.py
320416 :caption: ds_kmeans.py
321417 :name: ds_kmeans.py
322- :lines: 1 -
418+ :lines: 5 -
323419
324420.. toctree ::
325421 :hidden:
@@ -328,8 +424,8 @@ Our example could look like this with Dask:
328424
329425 dask.ipynb
330426
331- Multithreading, Multiprocessing and Async
332- -----------------------------------------
427+ 7. Multithreading, Multiprocessing and Async
428+ --------------------------------------------
333429
334430After a brief :doc: `overview <multiprocessing-threading-async >`, three examples
335431of :doc: `threading <threading-example >`, :doc: `multiprocessing
0 commit comments