Skip to content

Commit 3b8af0a

Browse files
committed
Apply suggestions
1 parent df300e1 commit 3b8af0a

1 file changed

Lines changed: 25 additions & 22 deletions

File tree

peps/pep-0831.rst

Lines changed: 25 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,8 @@ quickly and reliably. Omitting them (the compiler's default at ``-O1`` and
3939
above) prevents these tools from producing useful call stacks for Python
4040
processes, and undermines the perf trampoline support CPython shipped in 3.12.
4141

42-
The measured overhead is under 2% geometric mean for typical workloads.
42+
The measured overhead is under 2% geometric mean for typical workloads
43+
(see `Backwards Compatibility`_ for per-platform numbers).
4344
Multiple major Linux distributions, language runtimes, and Python ecosystem
4445
tools have already adopted this change. No existing PEP covers this topic;
4546
CPython issue `#96174`_ has been open since August 2022 without resolution.
@@ -67,8 +68,10 @@ default experience for Python.
6768

6869
The performance wins that profiling enables far outweigh the modest overhead of
6970
frame pointers. As Brendan Gregg notes: "I've seen frame pointers help find
70-
performance wins ranging from 5% to 500%" [#gregg2024]_. A 0.5-2% overhead that
71-
unlocks the ability to find 5-500% improvements is a favourable trade.
71+
performance wins ranging from 5% to 500%" [#gregg2024]_. These wins come from
72+
identifying hot paths in production systems; they are not about CPython's own
73+
overhead, but about what profiling enables across the full stack. A 0.5-2%
74+
overhead that unlocks such insights is a favourable trade.
7275

7376
What Are Frame Pointers?
7477
------------------------
@@ -79,7 +82,7 @@ arguments, and the address to return to when the function finishes. The **call
7982
stack** is the chain of all active stack frames: it records which function
8083
called which, all the way from ``main()`` to the function currently executing.
8184

82-
A **frame pointer** is a CPU register (``%rbp`` on x86-64, ``x29`` on AArch64)
85+
A **frame pointer** is a CPU register (for example, ``%rbp`` on x86-64, ``x29`` on AArch64)
8386
that each function sets to point to the base of its own stack frame. Each
8487
frame also stores the *previous* frame pointer, creating a linked list through
8588
the entire call stack::
@@ -101,23 +104,24 @@ the entire call stack::
101104
Stack unwinding is the process of walking this chain to reconstruct the call
102105
stack. Profilers do it to find out where the program is spending time;
103106
debuggers do it to show backtraces; crash handlers do it to produce useful
104-
error reports. With frame pointers, unwinding is a simply following pointers: read
105-
``%rbp``, follow the link, repeat. It is very fast and requires no
106-
external data.
107+
error reports. With frame pointers, unwinding is simply following pointers: read
108+
``%rbp``, follow the link, repeat. It requires no external data.
107109

108110
At optimisation levels ``-O1`` and above, GCC and Clang omit frame pointers by
109111
default [#gcc_fomit]_. This frees the ``%rbp`` register for general use,
110112
giving the optimiser one more register to work with. On x86-64 this is a gain
111113
of one register out of 16 (about 7%). The performance benefit is small
112-
(typically 0.5-2%) but it was considered worthwhile when the convention was
113-
established for 32-bit x86, where the gain was one register out of 6 (~20%).
114+
(typically a few percent) but it was considered worthwhile when the convention
115+
was established for 32-bit x86, where the gain was one register out of 6
116+
(~20%). See `Detailed Performance Analysis of CPython with Frame Pointers`_
117+
for a full breakdown by platform and workload.
114118

115119
Without frame pointers, the linked list does not exist. Tools that need to
116120
walk the call stack must instead parse DWARF debug information (a complex,
117-
variable-length encoding of how each function laid out its stack frame). This
118-
is slower, more fragile, and impossible in some contexts (such as inside the
119-
Linux kernel). In the worst case, tools simply produce broken or incomplete
120-
results.
121+
variable-length encoding of how each function laid out its stack frame) or,
122+
on Windows, ``.pdata`` / ``.xdata`` unwind metadata. This is slower, more
123+
fragile, and impossible in some contexts (such as inside the Linux kernel).
124+
In the worst case, tools simply produce broken or incomplete results.
121125

122126
Here is a concrete example. A ``perf`` profile of a Python process **without**
123127
frame pointers typically shows::
@@ -322,8 +326,7 @@ BPF programs for production monitoring), frame pointers are the only viable
322326
unwinding mechanism because they are the only mechanism the kernel's built-in
323327
helpers support.
324328

325-
:ref:`CPython's own documentation <python:perf_profiling>`__ already states the
326-
recommended fix:
329+
CPython's own documentation already states the recommended fix:
327330

328331
For best results, Python should be compiled with
329332
``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"``
@@ -393,7 +396,7 @@ extension builds. If only the interpreter has frame pointers but extensions do
393396
not, the chain is still broken at every C extension boundary. By adding the
394397
flags to ``CFLAGS`` as reported by ``sysconfig``, extension builds that consume
395398
CPython's compiler flags (for example via ``pip install``, Setuptools, or
396-
``python setup.py build``) will inherit frame pointers by default. Extensions
399+
other build backends) will inherit frame pointers by default. Extensions
397400
and libraries with independent build systems still need to enable the same
398401
flags themselves for the frame-pointer chain to remain continuous.
399402

@@ -476,7 +479,7 @@ Using ``CFLAGS`` ensures:
476479
the ``python`` binary, ``libpython``, and built-in extension modules under
477480
``Modules/``.
478481
2. The flags **are** written into the ``sysconfig`` data, so that third-party C
479-
extensions built against this Python (via ``pip``, ``setuptools``, or direct
482+
extensions built against this Python (via ``pip``, Setuptools, or direct
480483
``sysconfig`` queries) inherit frame pointers by default.
481484

482485
This is an intentional design choice. For profiling data to be useful, the
@@ -514,15 +517,15 @@ Ecosystem Impact
514517

515518
Because the flags are in ``CFLAGS``, they propagate automatically to consumers
516519
that build against CPython's reported compiler flags, such as C extensions
517-
built via pip, Setuptools, or direct ``sysconfig`` queries. Those
520+
built via ``pip``, Setuptools, or direct ``sysconfig`` queries. Those
518521
consumers need take no additional action to benefit from this change.
519522

520523
Not all compiled code in the Python ecosystem inherits CPython's ``CFLAGS``.
521524
Rust extensions built with ``pyo3`` or ``maturin``, C++ libraries with their
522525
own build systems, and embedding applications that compile CPython from source
523526
each manage their own compiler flags. This PEP recommends that all such
524527
projects also enable ``-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer``
525-
in their builds. A frame-pointer chain is only as complete as its weakest
528+
in their builds. A frame-pointer chain is only as strong as its weakest
526529
link: a single library in the call stack without frame pointers breaks the
527530
chain for the entire process, regardless of whether CPython and every other
528531
library has them. The goal is that every native component in a Python process
@@ -838,7 +841,7 @@ Machine Geometric mean effect
838841
===================================== =======================
839842
Apple M2 Mac Mini (arm64) 0.1% faster
840843
macOS M3 Pro (arm64) 0.1% slower
841-
Raspberry Pi (aarch64). 0.2% slower
844+
Raspberry Pi (aarch64) 0.2% slower
842845
Ampere Altra Max (aarch64) 0.9% faster
843846
AWS Graviton c7g.16xlarge (aarch64) 0.8% slower
844847
Intel i7 12700H (x86-64) 1.9% slower
@@ -1053,7 +1056,7 @@ Footnotes
10531056
https://github.com/Fidget-Spinner/python-framepointer-bench
10541057
10551058
.. [#missing_benchmarks] Some benchmarks are missing due to incompatibilities with
1056-
Python 3.15alpha.
1059+
Python 3.15 alpha.
10571060
10581061
.. _#96174: https://github.com/python/cpython/issues/96174
10591062
.. _python/cpython issue #96174: https://github.com/python/cpython/issues/96174
@@ -1065,7 +1068,7 @@ Appendix
10651068

10661069
For all graphs below, the green dots are geometric means of the
10671070
individual benchmark's median, while orange lines are the median of our data points.
1068-
Hollow circles reperesent outliers.
1071+
Hollow circles represent outliers.
10691072

10701073
The first graph is the overall effect on pyperformance seen on each system.
10711074
All system configurations have below 2% geometric mean and median slowdown:

0 commit comments

Comments
 (0)