Skip to content

Commit 62414ee

Browse files
ryncsngregkh
authored andcommitted
mm/shmem, swap: improve cached mTHP handling and fix potential hang
commit 5c241ed upstream. The current swap-in code assumes that, when a swap entry in shmem mapping is order 0, its cached folios (if present) must be order 0 too, which turns out not always correct. The problem is shmem_split_large_entry is called before verifying the folio will eventually be swapped in, one possible race is: CPU1 CPU2 shmem_swapin_folio /* swap in of order > 0 swap entry S1 */ folio = swap_cache_get_folio /* folio = NULL */ order = xa_get_order /* order > 0 */ folio = shmem_swap_alloc_folio /* mTHP alloc failure, folio = NULL */ <... Interrupted ...> shmem_swapin_folio /* S1 is swapped in */ shmem_writeout /* S1 is swapped out, folio cached */ shmem_split_large_entry(..., S1) /* S1 is split, but the folio covering it has order > 0 now */ Now any following swapin of S1 will hang: `xa_get_order` returns 0, and folio lookup will return a folio with order > 0. The `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will always return false causing swap-in to return -EEXIST. And this looks fragile. So fix this up by allowing seeing a larger folio in swap cache, and check the whole shmem mapping range covered by the swapin have the right swap value upon inserting the folio. And drop the redundant tree walks before the insertion. This will actually improve performance, as it avoids two redundant Xarray tree walks in the hot path, and the only side effect is that in the failure path, shmem may redundantly reallocate a few folios causing temporary slight memory pressure. And worth noting, it may seems the order and value check before inserting might help reducing the lock contention, which is not true. The swap cache layer ensures raced swapin will either see a swap cache folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if swap cache is bypassed), so holding the folio lock and checking the folio flag is already good enough for avoiding the lock contention. The chance that a folio passes the swap entry value check but the shmem mapping slot has changed should be very low. Link: https://lkml.kernel.org/r/20250728075306.12704-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20250728075306.12704-2-ryncsn@gmail.com Fixes: 809bc86 ("mm: shmem: support large folio swap out") Signed-off-by: Kairui Song <kasong@tencent.com> Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Dev Jain <dev.jain@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ hughd: removed skip_swapcache dependencies ] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
1 parent 1be8e41 commit 62414ee

1 file changed

Lines changed: 30 additions & 9 deletions

File tree

mm/shmem.c

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -794,7 +794,9 @@ static int shmem_add_to_page_cache(struct folio *folio,
794794
pgoff_t index, void *expected, gfp_t gfp)
795795
{
796796
XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio));
797-
long nr = folio_nr_pages(folio);
797+
unsigned long nr = folio_nr_pages(folio);
798+
swp_entry_t iter, swap;
799+
void *entry;
798800

799801
VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
800802
VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
@@ -806,14 +808,25 @@ static int shmem_add_to_page_cache(struct folio *folio,
806808

807809
gfp &= GFP_RECLAIM_MASK;
808810
folio_throttle_swaprate(folio, gfp);
811+
swap = radix_to_swp_entry(expected);
809812

810813
do {
814+
iter = swap;
811815
xas_lock_irq(&xas);
812-
if (expected != xas_find_conflict(&xas)) {
813-
xas_set_err(&xas, -EEXIST);
814-
goto unlock;
816+
xas_for_each_conflict(&xas, entry) {
817+
/*
818+
* The range must either be empty, or filled with
819+
* expected swap entries. Shmem swap entries are never
820+
* partially freed without split of both entry and
821+
* folio, so there shouldn't be any holes.
822+
*/
823+
if (!expected || entry != swp_to_radix_entry(iter)) {
824+
xas_set_err(&xas, -EEXIST);
825+
goto unlock;
826+
}
827+
iter.val += 1 << xas_get_order(&xas);
815828
}
816-
if (expected && xas_find_conflict(&xas)) {
829+
if (expected && iter.val - nr != swap.val) {
817830
xas_set_err(&xas, -EEXIST);
818831
goto unlock;
819832
}
@@ -2189,7 +2202,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
21892202
error = -ENOMEM;
21902203
goto failed;
21912204
}
2192-
} else if (order != folio_order(folio)) {
2205+
} else if (order > folio_order(folio)) {
21932206
/*
21942207
* Swap readahead may swap in order 0 folios into swapcache
21952208
* asynchronously, while the shmem mapping can still stores
@@ -2214,14 +2227,22 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
22142227

22152228
swap = swp_entry(swp_type(swap), swp_offset(swap) + offset);
22162229
}
2230+
} else if (order < folio_order(folio)) {
2231+
swap.val = round_down(swap.val, 1 << folio_order(folio));
2232+
index = round_down(index, 1 << folio_order(folio));
22172233
}
22182234

2219-
/* We have to do this with folio locked to prevent races */
2235+
/*
2236+
* We have to do this with the folio locked to prevent races.
2237+
* The shmem_confirm_swap below only checks if the first swap
2238+
* entry matches the folio, that's enough to ensure the folio
2239+
* is not used outside of shmem, as shmem swap entries
2240+
* and swap cache folios are never partially freed.
2241+
*/
22202242
folio_lock(folio);
22212243
if (!folio_test_swapcache(folio) ||
2222-
folio->swap.val != swap.val ||
22232244
!shmem_confirm_swap(mapping, index, swap) ||
2224-
xa_get_order(&mapping->i_pages, index) != folio_order(folio)) {
2245+
folio->swap.val != swap.val) {
22252246
error = -EEXIST;
22262247
goto unlock;
22272248
}

0 commit comments

Comments
 (0)