Skip to content

Assertion failure while running multi-thread concurrent inserts workload: trunk_split_leaf(): "(num_leaves + trunk_num_pivot_keys(spl, parent) <= spl->cfg.max_pivot_keys)" #467

@gapisback

Description

@gapisback

In dev branch agurajada/467-large-inserts-trunk-assert-bug off of /main, a new collection of multi-threaded heavy-inserts workload is being developed. One of the cases runs into this assertion:

$ build/debug/bin/unit/large_inserts_bugs_stress_test --num-inserts 5000000 --num-threads 6 test_seq_key_fully_packed_value_inserts_threaded_same_start_keyid

[...]
exec_worker_thread()::489:Thread 6  inserts 5000000 (5 million), sequential key, fully-packed constant value, KV-pairs starting from 0 (0) ...
OS-pid=1842079, Thread-ID=6, Insert fully-packed fixed value of length=256 bytes.
Assertion failed at src/trunk.c:5461:trunk_split_leaf(): "(num_leaves + trunk_num_pivot_keys(spl, parent) <= spl->cfg.max_pivot_keys)". num_leaves=6, trunk_num_pivot_keys()=9, cfg.max_pivot_keys=14

NOTE: Before you can repro this you need to pull-in in-flight fix for issue #458; Otherwise you will run into that assertion first. The dev-branch where this repro has been constructed, agurajada/467-large-inserts-trunk-assert-bug, pulls-in that commit already.

This test case was synthesized to reliably repro this problem that was seen during manual testing and test-dev.

The key-points needed for the repro are:

  • Need to insert large'ish #s of rows using --num-inserts arg. Works with upwards of 1-2 Million inserts / thread
  • Need more than a few threads. Sometimes this repros with --num-threads 4 also.
  • Something about this test case is peculiar in the way it repros the issue. All threads start from the same key-ID of 0, so we are doing duplicate key inserts, essentially. TEST_KEY_SIZE = 30, but not sure if this value specifically makes a difference.
  • I needed to use fully-packed constant value of length TEST_VALUE_SIZE 256 in order to reliably repro this assertion.
  • It's a bit unreliable, so you will have to run this a few times to get the assertion.

See this internal slack thread where this issue was aired out on a private dev branch first, before repro'ing this off of /main.


The other thing about this test is this part of the configuration:

111    data->cfg = (splinterdb_config){.filename   = TEST_DB_NAME,
112                                    .cache_size = 256 * Mega,
113                                    .disk_size  = 40 * Giga,

I was trying with 64MiB cache and that some times works. Often times, we will run into unable to find a free buffer error from clockcache.c, ... so to avoid those noise errors, I settled on 256MiB cache, which should be small enough to induce lots of IOs to disk.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions