Don't panic in get_signal when a finished compile has neither code nor signal#2746
Closed
jetm wants to merge 1 commit into
Closed
Don't panic in get_signal when a finished compile has neither code nor signal#2746jetm wants to merge 1 commit into
jetm wants to merge 1 commit into
Conversation
get_signal did `status.signal().expect("must have signal")`, assuming the
Unix invariant that an ExitStatus with no exit code was terminated by a
signal. That does not always hold: an ExitStatus reconstructed for a
distributed compile (or an abnormal wait status such as WIFSTOPPED) can
report neither a code nor a signal. When that happened the expect() panicked
the compile task, which the server surfaced as a misleading "Failed to bind
socket" and, under load, repeatedly fell back to local compilation.
Return Option<i32> from get_signal and assign it straight into res.signal, so
a compile that reports neither code nor signal leaves res.signal unset
instead of crashing the in-flight task. The Windows arm returns None rather
than panicking; ExitStatus::code() is always Some there, so the signal branch
is never reached anyway.
Add a unit test covering a real terminating signal (SIGKILL) and the
neither-code-nor-signal case (WIFSTOPPED via from_raw), which previously
panicked.
Signed-off-by: Javier Tia <javier@peridio.com>
Author
|
Superseded by #2750, which combines this with the rest of the OpenEmbedded/Yocto distributed-compile fixes into a single series. Closing in favor of that PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
get_signaldoesstatus.signal().expect("must have signal"), assuming the Unixinvariant that an
ExitStatuswith no exit code was terminated by a signal. Thatdoes not always hold. When the server reconstructs an
ExitStatusfor adistributed compile whose compiler died abnormally, the status can carry neither
an exit code nor a terminating signal, and the
expect()panics the compile task.I hit this with
sccache-dist: when the build server's packaged toolchain failsto start a compile (e.g.
gcc: error while loading shared libraries: libm.so.6),the synthesized status has neither a code nor a signal. The panic crashes the
in-flight task, which the server then surfaces as a misleading
Failed to bind socket: ... panicked with message "must have signal", and underload it recurs on every such compile.
Fix
Return
Option<i32>fromget_signaland assign it straight intores.signal,so a compile that reports neither a code nor a signal leaves
res.signalunsetinstead of panicking.
This matches the contract the client already implements:
handle_compile_finishedin
commands.rshandles the case where bothretcodeandsignalareNonebyprinting
Missing compiler exit status!and returning-3. The server simplynever reached that path because it panicked first.
The Windows arm returns
Nonerather thanpanic!;ExitStatus::code()is alwaysSomethere, so the signal branch is unreachable anyway.Test
Added a unit test covering a real terminating signal (SIGKILL) and the
neither-code-nor-signal case (
WIFSTOPPEDviaExitStatusExt::from_raw), whichpreviously panicked.
With the fix, the same
sccache-distscenario now reportsMissing compiler exit status!and falls back to local compilation instead of crashing the server.