@@ -39,7 +39,8 @@ quickly and reliably. Omitting them (the compiler's default at ``-O1`` and
3939above) prevents these tools from producing useful call stacks for Python
4040processes, and undermines the perf trampoline support CPython shipped in 3.12.
4141
42- The measured overhead is under 2% geometric mean for typical workloads.
42+ The measured overhead is under 2% geometric mean for typical workloads
43+ (see `Backwards Compatibility `_ for per-platform numbers).
4344Multiple major Linux distributions, language runtimes, and Python ecosystem
4445tools have already adopted this change. No existing PEP covers this topic;
4546CPython issue `#96174 `_ has been open since August 2022 without resolution.
@@ -67,8 +68,10 @@ default experience for Python.
6768
6869The performance wins that profiling enables far outweigh the modest overhead of
6970frame pointers. As Brendan Gregg notes: "I've seen frame pointers help find
70- performance wins ranging from 5% to 500%" [#gregg2024 ]_. A 0.5-2% overhead that
71- unlocks the ability to find 5-500% improvements is a favourable trade.
71+ performance wins ranging from 5% to 500%" [#gregg2024 ]_. These wins come from
72+ identifying hot paths in production systems; they are not about CPython's own
73+ overhead, but about what profiling enables across the full stack. A 0.5-2%
74+ overhead that unlocks such insights is a favourable trade.
7275
7376What Are Frame Pointers?
7477------------------------
@@ -79,7 +82,7 @@ arguments, and the address to return to when the function finishes. The **call
7982stack ** is the chain of all active stack frames: it records which function
8083called which, all the way from ``main() `` to the function currently executing.
8184
82- A **frame pointer ** is a CPU register (``%rbp `` on x86-64, ``x29 `` on AArch64)
85+ A **frame pointer ** is a CPU register (for example, ``%rbp `` on x86-64, ``x29 `` on AArch64)
8386that each function sets to point to the base of its own stack frame. Each
8487frame also stores the *previous * frame pointer, creating a linked list through
8588the entire call stack::
@@ -101,23 +104,24 @@ the entire call stack::
101104Stack unwinding is the process of walking this chain to reconstruct the call
102105stack. Profilers do it to find out where the program is spending time;
103106debuggers do it to show backtraces; crash handlers do it to produce useful
104- error reports. With frame pointers, unwinding is a simply following pointers: read
105- ``%rbp ``, follow the link, repeat. It is very fast and requires no
106- external data.
107+ error reports. With frame pointers, unwinding is simply following pointers: read
108+ ``%rbp ``, follow the link, repeat. It requires no external data.
107109
108110At optimisation levels ``-O1 `` and above, GCC and Clang omit frame pointers by
109111default [#gcc_fomit ]_. This frees the ``%rbp `` register for general use,
110112giving the optimiser one more register to work with. On x86-64 this is a gain
111113of one register out of 16 (about 7%). The performance benefit is small
112- (typically 0.5-2%) but it was considered worthwhile when the convention was
113- established for 32-bit x86, where the gain was one register out of 6 (~20%).
114+ (typically a few percent) but it was considered worthwhile when the convention
115+ was established for 32-bit x86, where the gain was one register out of 6
116+ (~20%). See `Detailed Performance Analysis of CPython with Frame Pointers `_
117+ for a full breakdown by platform and workload.
114118
115119Without frame pointers, the linked list does not exist. Tools that need to
116120walk the call stack must instead parse DWARF debug information (a complex,
117- variable-length encoding of how each function laid out its stack frame). This
118- is slower, more fragile, and impossible in some contexts (such as inside the
119- Linux kernel). In the worst case, tools simply produce broken or incomplete
120- results.
121+ variable-length encoding of how each function laid out its stack frame) or,
122+ on Windows, `` .pdata `` / `` .xdata `` unwind metadata. This is slower, more
123+ fragile, and impossible in some contexts (such as inside the Linux kernel).
124+ In the worst case, tools simply produce broken or incomplete results.
121125
122126Here is a concrete example. A ``perf `` profile of a Python process **without **
123127frame pointers typically shows::
@@ -322,8 +326,7 @@ BPF programs for production monitoring), frame pointers are the only viable
322326unwinding mechanism because they are the only mechanism the kernel's built-in
323327helpers support.
324328
325- :ref: `CPython's own documentation <python:perf_profiling >`__ already states the
326- recommended fix:
329+ CPython's own documentation already states the recommended fix:
327330
328331 For best results, Python should be compiled with
329332 ``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer" ``
@@ -393,7 +396,7 @@ extension builds. If only the interpreter has frame pointers but extensions do
393396not, the chain is still broken at every C extension boundary. By adding the
394397flags to ``CFLAGS `` as reported by ``sysconfig ``, extension builds that consume
395398CPython's compiler flags (for example via ``pip install ``, Setuptools, or
396- `` python setup.py build`` ) will inherit frame pointers by default. Extensions
399+ other build backends ) will inherit frame pointers by default. Extensions
397400and libraries with independent build systems still need to enable the same
398401flags themselves for the frame-pointer chain to remain continuous.
399402
@@ -476,7 +479,7 @@ Using ``CFLAGS`` ensures:
476479 the ``python `` binary, ``libpython ``, and built-in extension modules under
477480 ``Modules/ ``.
4784812. The flags **are ** written into the ``sysconfig `` data, so that third-party C
479- extensions built against this Python (via ``pip ``, `` setuptools `` , or direct
482+ extensions built against this Python (via ``pip ``, Setuptools , or direct
480483 ``sysconfig `` queries) inherit frame pointers by default.
481484
482485This is an intentional design choice. For profiling data to be useful, the
@@ -514,15 +517,15 @@ Ecosystem Impact
514517
515518Because the flags are in ``CFLAGS ``, they propagate automatically to consumers
516519that build against CPython's reported compiler flags, such as C extensions
517- built via pip, Setuptools, or direct ``sysconfig `` queries. Those
520+ built via `` pip `` , Setuptools, or direct ``sysconfig `` queries. Those
518521consumers need take no additional action to benefit from this change.
519522
520523Not all compiled code in the Python ecosystem inherits CPython's ``CFLAGS ``.
521524Rust extensions built with ``pyo3 `` or ``maturin ``, C++ libraries with their
522525own build systems, and embedding applications that compile CPython from source
523526each manage their own compiler flags. This PEP recommends that all such
524527projects also enable ``-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer ``
525- in their builds. A frame-pointer chain is only as complete as its weakest
528+ in their builds. A frame-pointer chain is only as strong as its weakest
526529link: a single library in the call stack without frame pointers breaks the
527530chain for the entire process, regardless of whether CPython and every other
528531library has them. The goal is that every native component in a Python process
@@ -838,7 +841,7 @@ Machine Geometric mean effect
838841===================================== =======================
839842Apple M2 Mac Mini (arm64) 0.1% faster
840843macOS M3 Pro (arm64) 0.1% slower
841- Raspberry Pi (aarch64). 0.2% slower
844+ Raspberry Pi (aarch64) 0.2% slower
842845Ampere Altra Max (aarch64) 0.9% faster
843846AWS Graviton c7g.16xlarge (aarch64) 0.8% slower
844847Intel i7 12700H (x86-64) 1.9% slower
@@ -1053,7 +1056,7 @@ Footnotes
10531056 https://github.com/Fidget-Spinner/python-framepointer-bench
10541057
10551058 .. [#missing_benchmarks ] Some benchmarks are missing due to incompatibilities with
1056- Python 3.15alpha .
1059+ Python 3.15 alpha .
10571060
10581061 .. _#96174 : https://github.com/python/cpython/issues/96174
10591062.. _python/cpython issue #96174 : https://github.com/python/cpython/issues/96174
@@ -1065,7 +1068,7 @@ Appendix
10651068
10661069For all graphs below, the green dots are geometric means of the
10671070individual benchmark's median, while orange lines are the median of our data points.
1068- Hollow circles reperesent outliers.
1071+ Hollow circles represent outliers.
10691072
10701073The first graph is the overall effect on pyperformance seen on each system.
10711074All system configurations have below 2% geometric mean and median slowdown:
0 commit comments