Skip to content

Commit 9b5e43e

Browse files
feat: Expose used MemoryPool details in ResourcesExhausted error messages (#20387)
## Which issue does this PR close? - Closes #20386. ## Rationale for this change `memory_limit` (`RuntimeEnvBuilder::new().with_memory_limit()`) configuration uses `greedy` memory pool as `default`. However, if `memory_pool` (`RuntimeEnvBuilder::new().with_memory_pool()`) is set, it overrides by expected `memory_pool` config such as `fair`. Also, if both `memory_limit` and `memory_pool` configs are not set, `unbounded` memory pool will be used so it can be useful to expose `ultimately used/selected pool` as part of `ResourcesExhausted` error message for the end user awareness and the user may need to switch used memory pool (`greedy`, `fair`, `unbounded`), - Also, [this comparison table](lance-format/lance#3601 (comment)) is an example use-case for both `greedy` and `fair` memory pools runtime behaviors and this addition can help for this kind of comparison table by exposing used memory pool info as part of native logs. Please find following example use-cases by `datafusion-cli`: **Case1**: datafusion-cli result when `memory-limit` and `top-memory-consumers > 0` are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 3 DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as: ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB, DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B, ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B. Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB) ``` **Case2**: datafusion-cli result when `memory-limit` and `top-memory-consumers = 0` (disabling top memory consumers logging) are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' --top-memory-consumers 0 DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB) ``` **Case3**: datafusion-cli result when only `memory-limit`, `memory-pool` and `top-memory-consumers > 0` are set: ``` eren.avsarogullari@AWGNPWVK961 debug % ./datafusion-cli --memory-limit 10M --mem-pool-type fair --top-memory-consumers 3 --command 'select * from generate_series(1,500000) as t1(v1) order by v1;' DataFusion CLI v53.0.0 Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'. caused by Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as: ExternalSorterMerge[0]#2(can spill: false) consumed 10.0 MB, peak 10.0 MB, ExternalSorter[0]#1(can spill: true) consumed 0.0 B, peak 0.0 B, DataFusion-Cli#0(can spill: false) consumed 0.0 B, peak 0.0 B. Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB) ``` ## What changes are included in this PR? - Adding name property to MemoryPool instances, - Expose used MemoryPool info to Resources Exhausted error messages ## Are these changes tested? Yes and updating existing test cases. ## Are there any user-facing changes? Yes, being updated Resources Exhausted error messages.
1 parent e524f49 commit 9b5e43e

9 files changed

Lines changed: 243 additions & 86 deletions

File tree

datafusion-cli/tests/cli_integration.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -261,11 +261,11 @@ fn bind_to_settings(snapshot_name: &str) -> SettingsBindDropGuard {
261261
"Consumer(can spill: bool) consumed XB, peak XB",
262262
);
263263
settings.add_filter(
264-
r"Error: Failed to allocate additional .*? for .*? with .*? already allocated for this reservation - .*? remain available for the total pool",
264+
r"Error: Failed to allocate additional .*? for .*? with .*? already allocated for this reservation - .*? remain available for the total memory pool: '.*?'",
265265
"Error: Failed to allocate ",
266266
);
267267
settings.add_filter(
268-
r"Resources exhausted: Failed to allocate additional .*? for .*? with .*? already allocated for this reservation - .*? remain available for the total pool",
268+
r"Resources exhausted: Failed to allocate additional .*? for .*? with .*? already allocated for this reservation - .*? remain available for the total memory pool: '.*?'",
269269
"Resources exhausted: Failed to allocate",
270270
);
271271

datafusion-cli/tests/snapshots/cli_top_memory_consumers@no_track.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,6 @@ exit_code: 1
1616
[CLI_VERSION]
1717
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
1818
caused by
19-
Resources exhausted: Failed to allocate
19+
Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
2020

2121
----- stderr -----

datafusion-cli/tests/snapshots/cli_top_memory_consumers@top2.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,6 @@ caused by
1919
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
2020
Consumer(can spill: bool) consumed XB, peak XB,
2121
Consumer(can spill: bool) consumed XB, peak XB.
22-
Error: Failed to allocate
22+
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
2323

2424
----- stderr -----

datafusion-cli/tests/snapshots/cli_top_memory_consumers@top3_default.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,6 @@ Resources exhausted: Additional allocation failed for ExternalSorter[0] with top
1818
Consumer(can spill: bool) consumed XB, peak XB,
1919
Consumer(can spill: bool) consumed XB, peak XB,
2020
Consumer(can spill: bool) consumed XB, peak XB.
21-
Error: Failed to allocate
21+
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: greedy(used: 10.0 MB, pool_size: 10.0 MB)
2222

2323
----- stderr -----

datafusion-cli/tests/snapshots/cli_top_memory_consumers_with_mem_pool_type@no_track.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,6 @@ exit_code: 1
1818
[CLI_VERSION]
1919
Error: Not enough memory to continue external sort. Consider increasing the memory limit config: 'datafusion.runtime.memory_limit', or decreasing the config: 'datafusion.execution.sort_spill_reservation_bytes'.
2020
caused by
21-
Resources exhausted: Failed to allocate
21+
Resources exhausted: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB)
2222

2323
----- stderr -----

datafusion-cli/tests/snapshots/cli_top_memory_consumers_with_mem_pool_type@top2.snap

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ caused by
2121
Resources exhausted: Additional allocation failed for ExternalSorter[0] with top memory consumers (across reservations) as:
2222
Consumer(can spill: bool) consumed XB, peak XB,
2323
Consumer(can spill: bool) consumed XB, peak XB.
24-
Error: Failed to allocate
24+
Error: Failed to allocate additional 128.0 KB for ExternalSorter[0] with 0.0 B already allocated for this reservation - 0.0 B remain available for the total memory pool: fair(pool_size: 10.0 MB)
2525

2626
----- stderr -----

datafusion-examples/examples/execution_monitoring/memory_pool_tracking.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,8 @@ async fn automatic_usage_example() -> Result<()> {
119119
ExternalSorter[1]#93(can spill: true) consumed 69.0 KB, peak 69.0 KB,
120120
ExternalSorter[13]#155(can spill: true) consumed 67.6 KB, peak 67.6 KB,
121121
ExternalSorter[8]#140(can spill: true) consumed 67.2 KB, peak 67.2 KB.
122-
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated for this reservation - 7.1 MB remain available for the total pool
122+
Error: Failed to allocate additional 10.0 MB for ExternalSorterMerge[0] with 0.0 B already allocated
123+
for this reservation - 7.1 MB remain available for the total memory pool
123124
*/
124125
}
125126
}

datafusion/execution/src/memory_pool/mod.rs

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
//! help with allocation accounting.
2020
2121
use datafusion_common::{Result, internal_datafusion_err};
22+
use std::fmt::Display;
2223
use std::hash::{Hash, Hasher};
2324
use std::{cmp::Ordering, sync::Arc, sync::atomic};
2425

@@ -181,7 +182,10 @@ pub use pool::*;
181182
///
182183
/// * [`TrackConsumersPool`]: Wraps another [`MemoryPool`] and tracks consumers,
183184
/// providing better error messages on the largest memory users.
184-
pub trait MemoryPool: Send + Sync + std::fmt::Debug {
185+
pub trait MemoryPool: Send + Sync + std::fmt::Debug + Display {
186+
/// Return pool name
187+
fn name(&self) -> &str;
188+
185189
/// Registers a new [`MemoryConsumer`]
186190
///
187191
/// Note: Subsequent calls to [`Self::grow`] must be made to reserve memory
@@ -232,7 +236,7 @@ pub enum MemoryLimit {
232236
/// [`MemoryReservation`] in a [`MemoryPool`]. All allocations are registered to
233237
/// a particular `MemoryConsumer`;
234238
///
235-
/// Each `MemoryConsumer` is identifiable by a process-unique id, and is therefor not cloneable,
239+
/// Each `MemoryConsumer` is identifiable by a process-unique id, and is therefore not cloneable,
236240
/// If you want a clone of a `MemoryConsumer`, you should look into [`MemoryConsumer::clone_with_new_id`],
237241
/// but note that this `MemoryConsumer` may be treated as a separate entity based on the used pool,
238242
/// and is only guaranteed to share the name and inner properties.

0 commit comments

Comments
 (0)