Skip to content

Commit 3e353de

Browse files
authored
Add updates for merge changes in Accumulo 4.0 (#452)
1 parent 7f5fdef commit 3e353de

1 file changed

Lines changed: 94 additions & 0 deletions

File tree

_docs-4/administration/merging.md

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
---
2+
title: Merging
3+
category: administration
4+
order: 6
5+
---
6+
7+
Accumulo 4.0 has improved tablet merging support, including:
8+
9+
* Merging no longer requires "chop" compactions.
10+
* Merging is now managed by FATE
11+
* Accumulo now supports auto merging of tablets.
12+
13+
## New Merge Design
14+
15+
Merge used to be a slow operation because tablets had to be compacted before merging. This was necessary because Rfiles may contain data outside the tablet range and this data needed to be removed.
16+
The updated merge algorithm works by "fencing" the RFiles in a tablet by the valid range. This operation is a fast metadata operation and the valid range of a file is now inserted into the file column.
17+
Scans will only return data in the specified range so compactions are no longer required. The normal system compaction process will eventually remove the data outside the range.
18+
19+
## Auto Merge
20+
21+
Accumulo supports auto merging tablets that are below a certain threshold, similar to splitting tablets that are above a threshold.
22+
The manager runs a task that periodically looks for ranges of tablets that can be merged. For a range of tablets to be eligible to be merged the following must be true:
23+
24+
1. All tablets in the range must be marked as eligible to be merged using the per tablet `TabletMergeability` setting. (more below)
25+
2. The combined files must be less than `table.merge.file.max`
26+
3. The total size must be less than `table.mergeability.threshold`. This is defined as the combined size of RFiles as a percentage of the split threshold
27+
28+
## Configuration
29+
30+
The following properties are used to configure merging:.
31+
32+
* `manager.tablet.mergeability.interval` - Time to wait between scanning tables to identify ranges of tablets that can be auto-merged (default is `24h`)
33+
* `table.mergeability.threshold` - A range of tablets are eligible for automatic merging until the combined size of RFiles reaches this percentage of the split threshold. (default is `.25`)
34+
* `table.merge.file.max` - The maximum number of files that a merge operation will process (default is `10000`). This property also applies to merges through the API as well.
35+
36+
## Tablet Mergeability
37+
38+
Each tablet can be marked individually with a value to indicate if/when it can be auto merged by the system.
39+
The following are the possible settings:
40+
41+
* `NEVER` - Tablets are never eligible for automatic merging
42+
* `ALWAYS` - Tablets are always eligible for automatic merging
43+
* `DELAY` - Tablets are eligible to be merged after the configured delay, relative to the Manager time.
44+
45+
### Tablet Mergeability Defaults
46+
47+
* System generated splits - Defaults to `ALWAYS` mergeable. Any system created tablets are always eligible to be merged.
48+
* User added splits - Defaults to `NEVER` mergeable if not specified.
49+
50+
### Upgrade
51+
52+
During upgrade all existing tablets will be marked with a default of `NEVER` for the TabletMergeability column to preserve
53+
the previous behavior. Only new tablets that are generated by system splits will be marked as `ALWAYS`.
54+
55+
### Configuring Tablets with the API
56+
57+
#### Adding/updating splits
58+
59+
There is a new `putSplits()` method that takes a map of splits and mergeability settings and will either create those splits or update existing with the settings.
60+
61+
```java
62+
// Adding splits or updating existing splits
63+
String tableName = "table";
64+
SortedMap<Text,TabletMergeability> splits = new TreeMap<>();
65+
// Mark each split with its mergeability setting
66+
splits.put(new Text(String.format("%09d", 333)), TabletMergeability.always());
67+
splits.put(new Text(String.format("%09d", 444)), TabletMergeability.always());
68+
splits.put(new Text(String.format("%09d", 666)), TabletMergeability.never());
69+
splits.put(new Text(String.format("%09d", 999)),
70+
TabletMergeability.after(Duration.ofDays(1)));
71+
// add or update splits
72+
client.tableOperations().putSplits(String tableName, splits);
73+
```
74+
75+
`TabletInformation` contains information describing the current mergeability state inside `TabletMergeAbilityInfo`.
76+
77+
#### Listing TabletMergeabilityInfo
78+
```java
79+
try (Stream<TabletInformation> tabletInfo =
80+
client.tableOperations().getTabletInformation(table, new Range())) {
81+
tabletInfo.forEach(ti -> {
82+
TabletMergeabilityInfo tmi = ti.getTabletMergeabilityInfo();
83+
// Some examples of the API usage
84+
// Gets the optional delay that is configured
85+
Optional<Duration> delay = tmi.getDelay();
86+
// If the tablet is currently eligilbe for merging
87+
boolean mergeable = tmi.isMergeable();
88+
// Optional estimated elapsed time since the delay was set
89+
Optional<Duration> elapsed = tmi.getElapsed();
90+
// Optional estimated remaining time before the tablet is eligible for merging
91+
Optional<Duration> remaining = tmi.getRemaining();
92+
});
93+
}
94+
```

0 commit comments

Comments
 (0)