feat: cap directory listing at max_directory_count to avoid full scans#370
Conversation
Large directories caused slow responses because yield_file_infos_paginated read and sorted all entries regardless of pagination. Now at most max_directory_count+1 entries are read via itertools.islice; if truncated, entries are returned in filesystem order (sort skipped) and is_truncated is set in the API response. The frontend disables column sorting when is_truncated is true. max_directory_count defaults to 10000 and is configurable via settings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ount Without sorting in the truncated case, entries were returned in arbitrary filesystem order, breaking cursor-based pagination and causing each page request to re-scan all max_count entries from scratch to locate the cursor. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces a cap on directory entry scanning for paginated listings to avoid full directory scans on very large folders, and surfaces truncation state to the API and UI.
Changes:
- Backend: cap
yield_file_infos_paginatedscanning/counting viamax_count(defaulting toSettings.max_directory_count) and return anis_truncatedflag. - API + frontend: propagate
is_truncatedto the client and disable table sorting when a listing is truncated. - Tests: update pagination tests for the new return shape and add a truncation test.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
fileglancer/filestore.py |
Limits scandir reads to max_count + 1, caps total_count, and returns is_truncated. |
fileglancer/settings.py |
Adds max_directory_count setting (default 10,000). |
fileglancer/server.py |
Adds is_truncated to the paginated directory listing API response. |
frontend/src/queries/fileQueries.ts |
Adds is_truncated to response typing and maps it to isTruncated in query data. |
frontend/src/components/ui/BrowsePage/FileTable.tsx |
Disables column sorting when the listing is truncated. |
tests/test_pagination.py |
Updates tuple unpacking for new return value and adds a truncation-focused test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| directory has more entries than max_count. | ||
| """ | ||
| if max_count is None: | ||
| max_count = get_settings().max_directory_count |
| # Read max_count + 1 entries: the extra one detects truncation without | ||
| # reading the entire directory. | ||
| with os.scandir(full_path) as scanner: | ||
| entries = list(itertools.islice(scanner, max_count + 1)) | ||
| total_count = len(entries) | ||
| is_truncated = total_count > max_count | ||
| if is_truncated: | ||
| entries = entries[:max_count] | ||
| total_count = max_count | ||
| entries.sort(key=lambda e: (not e.is_dir(follow_symlinks=False), e.name)) |
There was a problem hiding this comment.
The sorting and cursor methods may need to be reevaluated in the case of truncation.
| def test_truncation(self, pagination_dir): | ||
| """is_truncated is True and entries are in filesystem order (not sorted) when truncated.""" | ||
| fsp = FileSharePath(zone="test", name="test", mount_path=pagination_dir) | ||
| store = Filestore(fsp) | ||
| # max_count=5 on a 13-entry directory | ||
| infos, has_more, next_cursor, total_count, is_truncated = ( | ||
| store.yield_file_infos_paginated(None, limit=10, max_count=5) | ||
| ) | ||
| assert total_count == 5 | ||
| assert is_truncated is True | ||
| assert len(infos) == 5 # limit=10 > max_count=5, so all 5 returned |
|
We may need to re-evaluated how sorting or cursors work when truncation is applied. The comment about is_dir is also intriguing regarding the performance impact.
|
|
@allison-truhlar LGTM |
| type: 'warning', | ||
| title: 'Folder contents truncated', | ||
| message: | ||
| 'Only showing the first 10,000 items in this folder. Reorganize contents into sub-folders to see more.' |
There was a problem hiding this comment.
Currently, I have the maximum number of items configurable in settings.py. Would there be a better place to define this so that the notification matches the setting?
There was a problem hiding this comment.
Could we just pass the value through the response along with is_truncated?
I would say the the following and avoid the recommendation regarding subfolder. The recommendation would likely corrupt the neuroglancer precomputed layout here.
Do we use the term "folder" consistently? I associate "folder" with the GUI appearance and directory as the technical term for the file system object. |
|
I think chromatic.yml needs to specify repository. It does not work since I submitted this from my fork. Copilot suggests the following. - uses: actions/checkout@v5
with:
repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
ref: ${{ github.event.pull_request.head.sha || github.sha }}
fetch-depth: 0 |
At some point we did talk about using "folder" rather than "directory", with the assumption that this was how most people outside SciComp would refer to a directory on their computer. However, looking through the UI, there are several places we do use the word "directory" instead - some steps in the "tours" (although other steps use "folder"), tooltip for the "favorite" button in the toolbar, change permissions dialog, the permissions table in the properties panel, Napari tab in the "how to use your data link" dialog, and the job options listing. I counted 10 places where we use "folder", so it's about 3:5 directory:folder. I will note that we should standardize the language going forward, but I lean towards keeping folder now since that was our intended language. |
I think I will just disable this for external pull requests since it requires a secret to run - Claude suggested that no matter how it's configured, the secret won't be accessible to a pull_request from a fork. Chromatic suggests configuring the action to run on push, but I didn't want the action to run on every push. I think external pull requests are rare enough I think I will add a workflow_dispatch to manually trigger this with. For this PR, I will just merge it because I know we're not changing any of the new design system components. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
frontend/src/components/ui/BrowsePage/FileTable.tsx:340
- Placing the truncation warning inside
FileTablemeans it only renders when the parent renders the table.FileBrowserrenders this component only when the filtereddisplayFilesarray is non-empty, so a truncated directory whose returned entries are all hidden by filters (for example dotfiles) will show the empty-state instead of warning that results were truncated; render the warning at the browser level or ensure the table is mounted for truncated results.
{isTruncated ? (
<div
className={`${getNotificationStyles('warning').container} p-4 rounded-md mb-3`}
>
<NotificationItem
notification={{
id: 0,
type: 'warning',
title: 'Folder contents truncated',
message: `This folder contains more than ${truncationLimit} items. Only ${truncationLimit} items are shown.`
}}
showDismissButton={false}
/>
</div>
) : null}
…ation The truncated subset contains whatever max_count+1 entries os.scandir returned first; scandir order is filesystem-dependent, so the test's assertion that dirs appear first was a coincidence that held locally but not on CI. Assert what yield_file_infos_paginated actually guarantees: the returned subset is sorted, and repeat calls are stable.
|
Merging this after addressing all the Copilot review comments. |


Summary
yield_file_infos_paginatedpreviously calledlist(os.scandir(...))which read every entry in a directory before paginating, causing slow responses for large directoriesitertools.islice(scanner, max_count + 1)to read at mostmax_directory_count + 1entries (default 10,000, configurable viaSettings.max_directory_count/FGC_MAX_DIRECTORY_COUNT)When truncated, entries are returned in filesystem order (sort skipped) and(There's a complication here. See below.)is_truncated: trueis included in the API responseisTruncatedis trueTest plan
pixi run -e test test-backendpixi run node-checkis_truncatedappears in the API response for large directories🤖 Generated with Claude Code