Fail bazel.bat early if 8.3 short names are disabled in Windows#52500
Conversation
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5e518b28f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Files inventory check summaryFile checks results against ancestor 29a81a22: Results for datadog-agent_7.82.0~devel.git.242.966bcd3.pipeline.120237285-1_amd64.deb:No change detected |
Regression DetectorRegression Detector ResultsMetrics dashboard Baseline: 5b40697 Optimization Goals: ✅ No significant changes detected
|
| perf | experiment | goal | Δ mean % | Δ mean % CI | trials | links |
|---|---|---|---|---|---|---|
| ➖ | quality_gate_logs | % cpu utilization | +1.16 | [+0.10, +2.22] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle | memory utilization | +0.66 | [+0.61, +0.71] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_idle | memory utilization | +0.07 | [+0.01, +0.14] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_idle_all_features | memory utilization | -0.10 | [-0.14, -0.06] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_no_fs_load | memory utilization | -0.50 | [-0.60, -0.40] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_security_mean_fs_load | memory utilization | -0.60 | [-0.64, -0.56] | 1 | Logs bounds checks dashboard |
| ➖ | quality_gate_metrics_logs | memory utilization | -1.69 | [-1.93, -1.44] | 1 | Logs bounds checks dashboard |
Bounds Checks: ✅ Passed
| perf | experiment | bounds_check_name | replicates_passed | observed_value | links |
|---|---|---|---|---|---|
| ✅ | quality_gate_idle | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle | memory_usage | 10/10 | 145.44MiB ≤ 154MiB | bounds checks dashboard |
| ✅ | quality_gate_idle | total_bytes_received | 10/10 | 574.94KiB ≤ 819.20KiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | intake_connections | 10/10 | 3 ≤ 4 | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | memory_usage | 10/10 | 482.83MiB ≤ 495MiB | bounds checks dashboard |
| ✅ | quality_gate_idle_all_features | total_bytes_received | 10/10 | 0.89MiB ≤ 1.25MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | intake_connections | 10/10 | 4 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_logs | memory_usage | 10/10 | 181.33MiB ≤ 195MiB | bounds checks dashboard |
| ✅ | quality_gate_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_logs | total_bytes_received | 10/10 | 263.88MiB ≤ 292MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | cpu_usage | 10/10 | 330.40 ≤ 2000 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | intake_connections | 10/10 | 3 ≤ 6 | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | memory_usage | 10/10 | 394.40MiB ≤ 430MiB | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | missed_bytes | 10/10 | 0B = 0B | bounds checks dashboard |
| ✅ | quality_gate_metrics_logs | total_bytes_received | 10/10 | 0.86GiB ≤ 1.04GiB | bounds checks dashboard |
| ✅ | quality_gate_security_idle | cpu_usage | 10/10 | 30.36 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_idle | memory_usage | 10/10 | 299.09MiB ≤ 330MiB | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | cpu_usage | 10/10 | 61.91 ≤ 70 | bounds checks dashboard |
| ✅ | quality_gate_security_mean_fs_load | memory_usage | 10/10 | 274.68MiB ≤ 320MiB | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | cpu_usage | 10/10 | 24.23 ≤ 40 | bounds checks dashboard |
| ✅ | quality_gate_security_no_fs_load | memory_usage | 10/10 | 277.66MiB ≤ 320MiB | bounds checks dashboard |
Explanation
Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%
Performance changes are noted in the perf column of each table:
- ✅ = significantly better comparison variant performance
- ❌ = significantly worse comparison variant performance
- ➖ = no significant change in performance
A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".
For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:
-
Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
-
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
-
Its configuration does not mark it "erratic".
Replicate Execution Details
We run multiple replicates for each experiment/variant. However, we allow replicates to be automatically retried if there are any failures, up to 8 times, at which point the replicate is marked dead and we are unable to run analysis for the entire experiment. We call each of these attempts at running replicates a replicate execution. This section lists all replicate executions that failed due to the target crashing or being oom killed.
Note: In the below tables we bucket failures by experiment, variant, and failure type. For each of these buckets we list out the replicate indexes that failed with an annotation signifying how many times said replicate failed with the given failure mode. In the below example the baseline variant of the experiment named experiment_with_failures had two replicates that failed by oom kills. Replicate 0, which failed 8 executions, and replicate 1 which failed 6 executions, all with the same failure mode.
| Experiment | Variant | Replicates | Failure | Logs | Debug Dashboard |
|---|---|---|---|---|---|
| experiment_with_failures | baseline | 0 (x8) 1 (x6) | Oom killed | Debug Dashboard |
The debug dashboard links will take you to a debugging dashboard specifically designed to investigate replicate execution failures.
❌ Retried Profiling Replicate Execution Failures (ddprof)
Note: Profiling replicas may still be executing. See the debug dashboard for up to date status.
| Experiment | Variant | Replicates | Failure | Debug Dashboard |
|---|---|---|---|---|
| quality_gate_idle_all_features | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_idle_all_features | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_metrics_logs | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_metrics_logs | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_idle | comparison | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_no_fs_load | baseline | 10 | Oom killed | Debug Dashboard |
| quality_gate_security_no_fs_load | comparison | 10 | Oom killed | Debug Dashboard |
CI Pass/Fail Decision
✅ Passed. All Quality Gates passed.
- quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle_all_features, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_idle, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check missed_bytes: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_metrics_logs, bounds check total_bytes_received: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_mean_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
- quality_gate_security_no_fs_load, bounds check memory_usage: 10/10 replicas passed. Gate passed.
a5e518b to
9bb4a2e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bb4a2eabf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
9bb4a2e to
fb33f68
Compare
bazel hook early if 8.3 short names are disabled in Windowsbazel.bat early if 8.3 short names are disabled in Windows
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fb33f68193
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
fb33f68 to
c137e67
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c137e67e87
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
c137e67 to
2df9b24
Compare
2df9b24 to
7b6ed92
Compare
Static quality checks✅ Please find below the results from static quality gates 33 successful checks with minimal change (< 2 KiB)
|
Gitlab CI Configuration ChangesModified Jobsvariables (configuration) variables:
AGENT_API_KEY_ORG2: agent-api-key-org-2
AGENT_APP_KEY_ORG2: agent-app-key-org-2
AGENT_BINARIES_DIR: bin/agent
AGENT_QA_E2E: agent-qa-e2e
API_KEY_ORG2: ci.datadog-agent.datadog_api_key_org2
ARTIFACT_DOWNLOAD_ATTEMPTS: 2
ATLASSIAN_WRITE: atlassian-write
BTFHUB_ARCHIVE_BRANCH: main
BUCKET_BRANCH: dev
CACHE_COMPRESSION_LEVEL: slowest
CHANGELOG_COMMIT_SHA: ci.datadog-agent.gitlab_changelog_commit_sha
CHOCOLATEY_API_KEY: ci.datadog-agent.chocolatey_api_key
CI_IMAGE_BTF_GEN: v118883148-8fa6a628
CI_IMAGE_BTF_GEN_SUFFIX: ''
CI_IMAGE_DOCKER_ARM64: v118883148-8fa6a628
CI_IMAGE_DOCKER_ARM64_SUFFIX: ''
CI_IMAGE_DOCKER_X64: v118883148-8fa6a628
CI_IMAGE_DOCKER_X64_SUFFIX: ''
CI_IMAGE_GITLAB_AGENT_DEPLOY: v118883148-8fa6a628
CI_IMAGE_GITLAB_AGENT_DEPLOY_SUFFIX: ''
CI_IMAGE_LINUX: v118883148-8fa6a628
CI_IMAGE_LINUX_SUFFIX: ''
CI_IMAGE_RPM_ARM64: v118883148-8fa6a628
CI_IMAGE_RPM_ARM64_SUFFIX: ''
CI_IMAGE_RPM_ARMHF: v118883148-8fa6a628
CI_IMAGE_RPM_ARMHF_SUFFIX: ''
CI_IMAGE_RPM_X64: v118883148-8fa6a628
CI_IMAGE_RPM_X64_SUFFIX: ''
- CI_IMAGE_WIN_LTSC2022_X64: v119961810-e18b0f68
+ CI_IMAGE_WIN_LTSC2022_X64: v120051755-ba1d8fc3
- CI_IMAGE_WIN_LTSC2022_X64_SUFFIX: ''
? ^^
+ CI_IMAGE_WIN_LTSC2022_X64_SUFFIX: _test_only
? ^^^^^^^^^^
- CI_IMAGE_WIN_LTSC2025_X64: v119961810-e18b0f68
+ CI_IMAGE_WIN_LTSC2025_X64: v120051755-ba1d8fc3
- CI_IMAGE_WIN_LTSC2025_X64_SUFFIX: ''
? ^^
+ CI_IMAGE_WIN_LTSC2025_X64_SUFFIX: _test_only
? ^^^^^^^^^^
CLANG_BUILD_VERSION: v60409452-ee70de70
CLANG_LLVM_VER: 12.0.1
CLUSTER_AGENT_BINARIES_DIR: bin/datadog-cluster-agent
CLUSTER_AGENT_CLOUDFOUNDRY_BINARIES_DIR: bin/datadog-cluster-agent-cloudfoundry
CODECOV: codecov
CODECOV_TOKEN: ci.datadog-agent.codecov_token
COMPARE_TO_BRANCH: main
CRC_PULL_SECRET: ci.datadog-agent.crc-pull-secret
CWS_INSTRUMENTATION_BINARIES_DIR: bin/cws-instrumentation
DATADOG_AGENT_EMBEDDED_PATH: /opt/datadog-agent/embedded
DDA_CLIENT_TOKEN: dda-feature-flags-client-token
DDA_FEATURE_FLAGS_CI_SSM_KEY_WINDOWS: ci.datadog-agent.dda-feature-flags-client-token
DDA_FEATURE_FLAGS_CI_VAULT_KEY: token
DDA_FEATURE_FLAGS_CI_VAULT_KEY_MACOS: token
DDA_FEATURE_FLAGS_CI_VAULT_PATH: k8s/gitlab-runner-datadog-agent/datadog-agent/$DDA_CLIENT_TOKEN
DDA_FEATURE_FLAGS_CI_VAULT_PATH_MACOS: aws/arn:aws:iam::486234852809:role/ci-datadog-agent/$DDA_CLIENT_TOKEN
DD_AGENT_TESTING_DIR: $CI_PROJECT_DIR/test/new-e2e/tests
DD_PKG_GITLAB_URL: https://artifact-gateway.us1.ddbuild.io/internal/artifact-gateway/api/v4
DEB_GPG_KEY_ID: c0962c7d
DEB_GPG_KEY_NAME: Datadog, Inc. APT key
DEB_RPM_TESTING_BUCKET_BRANCH: testing
DEB_S3_BUCKET: apt.datad0g.com
DEB_TESTING_S3_BUCKET: apttesting.datad0g.com
DOCKER_REGISTRY_RO: dockerhub-readonly
DOCKER_REGISTRY_URL: docker.io
DOGSTATSD_BINARIES_DIR: bin/dogstatsd
DYNAMIC_TESTS_BREAKGLASS: dynamic-tests-breakglass
E2E_AZURE: e2e-azure
E2E_COVERAGE_PIPELINE: false
E2E_GCP: e2e-gcp
EXECUTOR_JOB_SECTION_ATTEMPTS: 2
FF_CLEAN_UP_FAILED_CACHE_EXTRACT: true
FF_KUBERNETES_HONOR_ENTRYPOINT: true
FF_SCRIPT_SECTIONS: 1
FF_TIMESTAMPS: true
FF_USE_FASTZIP: true
FF_USE_WINDOWS_JOB_OBJECT: true
GENERAL_ARTIFACTS_CACHE_BUCKET_URL: https://dd-agent-omnibus.s3.amazonaws.com
GET_SOURCES_ATTEMPTS: 2
GIT_STRATEGY: s3
GO_TEST_SKIP_FLAKE: 'true'
GPG_TEST_KEY_ID: crypto/k8s/keys/k8s_gitlab-runner-datadog-agent_datadog-agent_testing_signing-key
INSTALLER_TESTING_S3_BUCKET: installtesting.datad0g.com
INSTALL_SCRIPT_API_KEY_ORG2: install-script-api-key-org-2
INTEGRATION_WHEELS_CACHE_BUCKET: dd-agent-omnibus
KERNEL_MATRIX_TESTING_ARM_AMI_ID: ami-0b5f838a19d37fc61
KERNEL_MATRIX_TESTING_X86_AMI_ID: ami-05b3973acf5422348
KITCHEN_INFRASTRUCTURE_FLAKES_RETRY: 2
MACOS_APPLE_APPLICATION_SIGNING: apple-application-signing
MACOS_APPLE_DEVELOPER_ACCOUNT: apple-developer-account
MACOS_APPLE_INSTALLER_SIGNING: apple-installer-signing
MACOS_KEYCHAIN_PWD: ci-keychain
MACOS_S3_BUCKET: dd-agent-macostesting
OMNIBUS_BASE_DIR: /omnibus
OMNIBUS_GIT_CACHE_DIR: /tmp/omnibus-git-cache
OMNIBUS_PACKAGE_DIR: $CI_PROJECT_DIR/omnibus/pkg/
OMNIBUS_PACKAGE_DIR_SUSE: $CI_PROJECT_DIR/omnibus/suse/pkg
PIPELINE_KEY_ALIAS: alias/ci_datadog-agent_pipeline-key
PROCESS_S3_BUCKET: datad0g-process-agent
PYTHONUNBUFFERED: 1
RESTORE_CACHE_ATTEMPTS: 2
RPM_GPG_KEY_ID: b01082d3
RPM_GPG_KEY_NAME: Datadog, Inc. RPM key
RPM_S3_BUCKET: yum.datad0g.com
RPM_TESTING_S3_BUCKET: yumtesting.datad0g.com
RUN_E2E_TESTS: auto
RUN_KMT_TESTS: auto
RUN_UNIT_TESTS: auto
S3_ARTIFACTS_URI: s3://dd-ci-artefacts-build-stable/$CI_PROJECT_NAME/$CI_PIPELINE_ID
S3_CP_CMD: aws s3 cp $S3_CP_OPTIONS
S3_CP_OPTIONS: --no-progress --region us-east-1 --sse AES256
S3_DD_AGENT_OMNIBUS_BTFS_URI: s3://dd-agent-omnibus/btfs
S3_DD_AGENT_OMNIBUS_JAVA_URI: s3://dd-agent-omnibus/openjdk
S3_DD_AGENT_OMNIBUS_LLVM_URI: s3://dd-agent-omnibus/llvm
S3_DSD6_URI: s3://dsd6-staging
S3_OMNIBUS_CACHE_BUCKET: dd-ci-datadog-agent-omnibus-cache-build-stable
S3_OMNIBUS_GIT_CACHE_BUCKET: dd-ci-datadog-agent-omnibus-git-cache-build-stable
S3_PERMANENT_ARTIFACTS_URI: s3://dd-ci-persistent-artefacts-build-stable/$CI_PROJECT_NAME
S3_PROJECT_ARTIFACTS_URI: s3://dd-ci-artefacts-build-stable/$CI_PROJECT_NAME
S3_RELEASE_ARTIFACTS_URI: s3://dd-release-artifacts/$CI_PROJECT_NAME/$CI_PIPELINE_ID
S3_RELEASE_INSTALLER_ARTIFACTS_URI: s3://dd-release-artifacts/datadog-installer/$CI_PIPELINE_ID
S3_SBOM_STORAGE_URI: s3://sbom-root-us1-ddbuild-io/$CI_PROJECT_NAME/$CI_PIPELINE_ID
SECRET_GENERIC_CONNECTOR_BINARIES_DIR: bin/secret-generic-connector
SKIP_WINDOWS: 'false'
SLACK_AGENT: slack-agent-ci
SMP_ACCOUNT: smp
STATIC_BINARIES_DIR: bin/static
SYSTEM_PROBE_BINARIES_DIR: bin/system-probe
TEST_KEYS_URL: apttesting.datad0g.com/test-keys
VCPKG_BLOB_SAS_URL: ci.datadog-agent-buildimages.vcpkg_blob_sas_url
VIRUS_TOTAL: virus-total
WINDOWS_BUILDS_S3_BUCKET: $WIN_S3_BUCKET/builds
WINDOWS_POWERSHELL_DIR: $CI_PROJECT_DIR/signed_scripts
WINDOWS_SYMBOLS_S3_BUCKET: pipelines/windows-symbols
WINDOWS_TESTING_S3_BUCKET: pipelines/A7/$CI_PIPELINE_ID
WINGET_PAT: ci.datadog-agent.winget_pat
WIN_S3_BUCKET: dd-agent-mstestingChanges Summary
ℹ️ Diff available in the job log. |
5b7298b to
b9e107d
Compare
### What does this PR do? Enable NTFS 8.3 short-name creation on the container scratch volume at startup, via `fsutil 8dot3name set C: 0` in the entrypoint, and drop the registry key from #1212 that did not take effect. The registry key from #1212 is dropped because it is now superseded and provably inert for the scratch volume. ### Motivation #1212 set `NtfsDisable8dot3NameCreation` to 0 in the image, betting the SYSTEM-hive value would travel with the layers. It **does** travel, but a container's fresh scratch volume surprisingly still comes up disabled at runtime: ``` The volume state is: 1 (8dot3 name creation is DISABLED) The registry state is: 0 (8dot3 name creation is ENABLED on all volumes) Based on the above settings, 8dot3 name creation is ENABLED on "C:" ``` `fsutil` **mistakenly** reports ENABLED after the registry override, yet freshly created files still get no 8.3 alias, so `GetShortPathNameW` stays a no-op and Bazel still cannot shorten long paths (bazelbuild/bazel#19710), root cause of [#incident-56436](https://dd.enterprise.slack.com/archives/C0BBPAV0J20). #1212 also anticipated that a build-time `fsutil 8dot3name set` would not persist, which a test image confirmed: the build volume flips to state 0, but a container started from that image is back to state 1. Per microsoft/Windows-Containers#507 and Microsoft Q&A[^1], the per-volume flag only takes hold across an unmount/remount and is not carried by image layers, so it must be (re)applied at runtime. The entrypoint is the earliest such point, ahead of any build. ### Possible Drawbacks / Trade-offs `fsutil 8dot3name set` runs on every container start, but it is a cheap per-volume metadata write. As in #1212, it only affects files created afterwards, which covers the runtime-created output base. ### Additional Notes Validated by DataDog/datadog-agent#52500 against this image: `bazel:test:windows-amd64` is green with the runtime override removed. [^1]: of which: - https://learn.microsoft.com/en-us/answers/questions/1406321/enable-8dot3-name-creation-for-volume-c-in-windows - https://learn.microsoft.com/en-us/answers/questions/2455648/8-3-filename-support - etc.
### What does this PR do? Enable NTFS 8.3 short-name creation on the container scratch volume at startup, via `fsutil 8dot3name set C: 0` in the entrypoint. The registry key from #1212 is dropped because it is now superseded and provably inert for the scratch volume. ### Motivation #1212 set `NtfsDisable8dot3NameCreation` to 0 in the image, betting the SYSTEM-hive value would travel with the layers. It **does** travel, but a container's fresh scratch volume surprisingly still comes up disabled at runtime: ``` The volume state is: 1 (8dot3 name creation is DISABLED) The registry state is: 0 (8dot3 name creation is ENABLED on all volumes) Based on the above settings, 8dot3 name creation is ENABLED on "C:" ``` `fsutil` **mistakenly** reports ENABLED after the registry override, yet freshly created files still get no 8.3 alias, so `GetShortPathNameW` stays a no-op and Bazel still cannot shorten long paths (bazelbuild/bazel#19710), root cause of [#incident-56436](https://dd.enterprise.slack.com/archives/C0BBPAV0J20). #1212 also anticipated that a build-time `fsutil 8dot3name set` would not persist, which a test image confirmed: the build volume flips to state 0, but a container started from that image is back to state 1. Per microsoft/Windows-Containers#507 and Microsoft Q&A[^1], the per-volume flag only takes hold across an unmount/remount and is not carried by image layers, so it must be (re)applied at runtime. The entrypoint is the earliest such point, ahead of any build. ### Possible Drawbacks / Trade-offs `fsutil 8dot3name set` runs on every container start, but it is a cheap per-volume metadata write. As in #1212, it only affects files created afterwards, which covers the runtime-created output base. ### Additional Notes Validated by DataDog/datadog-agent#52500 against this image: `bazel:test:windows-amd64` is green with the runtime override removed. There's work in progress we'll monitor to remove the workaround: - bazelbuild/bazel#29921. [^1]: of which: - https://learn.microsoft.com/en-us/answers/questions/1406321/enable-8dot3-name-creation-for-volume-c-in-windows - https://learn.microsoft.com/en-us/answers/questions/2455648/8-3-filename-support - etc.
### What does this PR do? Enable NTFS 8.3 short-name creation on the container scratch volume at startup, via `fsutil 8dot3name set C: 0` in the entrypoint. The registry key from #1212 is dropped because it is now superseded and provably inert for the scratch volume. ### Motivation #1212 set `NtfsDisable8dot3NameCreation` to 0 in the image, betting the SYSTEM-hive value would travel with the layers. It **does travel**, but a container's fresh scratch volume surprisingly still comes up disabled at runtime despite `fsutil` output: ``` The volume state is: 1 (8dot3 name creation is DISABLED) The registry state is: 0 (8dot3 name creation is ENABLED on all volumes) Based on the above settings, 8dot3 name creation is ENABLED on "C:" ``` `fsutil` **mistakenly** reports ENABLED after the registry override, yet freshly created files **still get no 8.3 alias**, so `GetShortPathNameW` stays a no-op and Bazel still cannot shorten long paths (bazelbuild/bazel#19710), root cause of [#incident-56436](https://dd.enterprise.slack.com/archives/C0BBPAV0J20). #1212 also anticipated that a build-time `fsutil 8dot3name set` would not persist, which a test image confirmed: the build volume flips to state 0, but a container started from that image is back to state 1. Per microsoft/Windows-Containers#507 and Microsoft Q&A[^1], the per-volume flag only takes hold across an unmount/remount and is not carried by image layers, so it must be (re)applied at runtime. => The entrypoint is the earliest such point, ahead of any build. ### Possible Drawbacks / Trade-offs `fsutil 8dot3name set` runs on every container start, but it is a cheap per-volume metadata write. As in #1212, it only affects files created afterwards, which covers the runtime-created output base. ### Additional Notes Validated by DataDog/datadog-agent#52500 against this image: `bazel:test:windows-amd64` is green with the runtime override removed. There's work in progress we'll monitor to remove the workaround: - bazelbuild/bazel#29921. [^1]: of which: - https://learn.microsoft.com/en-us/answers/questions/1406321/enable-8dot3-name-creation-for-volume-c-in-windows - https://learn.microsoft.com/en-us/answers/questions/2455648/8-3-filename-support - etc.
### What does this PR do? Enable NTFS 8.3 short-name creation on the container scratch volume at startup, via `fsutil 8dot3name set C: 0` in the entrypoint. The registry key from #1212 is dropped because it is now superseded and provably inert for the scratch volume. ### Motivation #1212 set `NtfsDisable8dot3NameCreation` to 0 in the image, betting the SYSTEM-hive value would travel with the layers. It **does travel**, but a container's fresh scratch volume surprisingly still comes up disabled at runtime despite `fsutil` output: ``` The volume state is: 1 (8dot3 name creation is DISABLED) The registry state is: 0 (8dot3 name creation is ENABLED on all volumes) Based on the above settings, 8dot3 name creation is ENABLED on "C:" ``` `fsutil` **mistakenly** reports ENABLED after the registry override, yet freshly created files **still get no 8.3 alias**, so `GetShortPathNameW` stays a no-op and Bazel still cannot shorten long paths (bazelbuild/bazel#19710), root cause of [#incident-56436](https://dd.enterprise.slack.com/archives/C0BBPAV0J20). #1212 also anticipated that a build-time `fsutil 8dot3name set` would not persist, which a test image confirmed: the build volume flips to state 0, but a container started from that image is back to state 1. Per microsoft/Windows-Containers#507 and Microsoft Q&A[^1], the per-volume flag only takes hold across an unmount/remount and is not carried by image layers, so it must be (re)applied at runtime. => The entrypoint is the earliest such point, ahead of any build. ### Possible Drawbacks / Trade-offs `fsutil 8dot3name set` runs on every container start, but it is a cheap per-volume metadata write. As in #1212, it only affects files created afterwards, which covers the runtime-created output base. ### Additional Notes Validated by DataDog/datadog-agent#52500 against this image: `bazel:test:windows-amd64` is green with the runtime override removed. There's work in progress we'll monitor to remove the workaround: - bazelbuild/bazel#29921. [^1]: of which: - https://learn.microsoft.com/en-us/answers/questions/1406321/enable-8dot3-name-creation-for-volume-c-in-windows - https://learn.microsoft.com/en-us/answers/questions/2455648/8-3-filename-support - etc.
### What does this PR do? Add a preflight check: create a temp file whose name exceeds 8.3 limits and compare its long name (`%%~nxi`) against its short name (`%%~snxi`). When they are equal, `GetShortPathNameW` is a no-op, meaning 8.3 short names are disabled and the hook exits with actionable instructions. ### Motivation `GetShortPathNameW` is a no-op when 8.3 short names are disabled, causing Bazel to fail with an internal path-length error even with the 260-character OS limit lifted (bazelbuild/bazel#19710). GitLab runners have the volume state enabled. Docker containers did not, root cause of #incident-56436. On the one hand, DataDog/datadog-agent-buildimages#1215 re-enables them in the build images at container runtime. On the other hand, this fails fast with actionable instructions rather than a cryptic Bazel error. ### Describe how you validated your changes Tested on a Windows VM: `%%~snxi` produces `123456~1.123` (differs from `%%~nxi`) when 8.3 short names are enabled.
b9e107d to
966bcd3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 966bcd3e47
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
chouquette
left a comment
There was a problem hiding this comment.
LGTM but what happens when the cache is on a different drive as the bazel output root? (Just kidding, don't hurt me)
@chouquette That's in fact the case in CI: the cache ( In the real life, the problem manifests itself only in Windows containers due to their volume metadata being not persisted as a container layer: More details in: |
@rdesgroppes I should have known better than jokingly asking a question and not expect an answer from you 😁 |
0fffd5d
into
main
What does this PR do?
Add a preflight check that creates a temp file whose name exceeds 8.3 limits and compare its long name (
%%~nxi) against its short name (%%~snxi).When they are equal,
GetShortPathNameWis a no-op, meaning 8.3 short names are disabled and the hook exits with actionable instructions.Motivation
GetShortPathNameWis a no-op when 8.3 short names are disabled, causing Bazel to fail with an internal path-length error (bazelbuild/bazel#19710), even with the 260-character OS limit lifted.GitLab runners had the volume state enabled, but Docker containers did not, root cause of #incident-56436.
The latter is now fixed by the combo:
The present change makes it fail fast with actionable instructions rather than a cryptic Bazel error.
Describe how you validated your changes
Before #52561 was merged: https://gitlab.ddbuild.io/DataDog/datadog-agent/-/jobs/1788332180#L48
Additional Notes
To be reverted once following upstream fix (or equivalent) is available: