Refactor to use workers for all file access by krokicki · Pull Request #353 · JaneliaSciComp/fileglancer

krokicki · 2026-04-12T16:35:32Z

This is an experimental branch which refactors Fileglancer to use user-specific worker processes for all setuid-based access, including all file operations and cluster tasks. This abandons the usage of seteuid in the Uvicorn workers, which has been problematic. Any async/await code can break that model, and it's easy to introduce hard-to-debug issues by accident. This new model is more similar to JupyterHub (as we had in the beginning). However, the workers are managed automatically and don't require users to manually start a server.

I've gone through code reviews with both Claude and Codex (they caught different issues). I've also tested it with a personal dev deployment on fileglancer-dev, but the code is quite complex and low-level and needs more testing and validation.

Claude PR Summary

Refactor all user-scoped file access to run on per-user persistent worker subprocesses. The main Uvicorn process never calls seteuid/setegid/setgroups — instead, each user gets a long-lived worker subprocess that runs with their real UID/GID/groups, communicating over a Unix socketpair with length-prefixed JSON. File descriptors for opened files are passed back to the main process via SCM_RIGHTS so streaming responses work without copying data through the worker.

Worker pool (worker_pool.py): Manages per-user workers with LRU eviction, idle timeout, configurable pool size, and graceful shutdown. Workers are spawned on demand and cleaned up automatically.
Worker subprocess (user_worker.py): Handles all user-scoped operations (file I/O, cluster jobs, git ops, SSH keys, S3 proxy) in a synchronous request/response loop. Action handlers are registered in a dispatch table.
Server integration (server.py): All endpoints that previously used EffectiveUserContext now dispatch through _worker_exec(). In dev/test mode (no use_access_flags), actions run directly in-process with no subprocess overhead.
Streaming (filestore.py): New _stream_contents and _stream_range static methods that stream from an already-opened file handle, used by the server to stream file content from worker-passed file descriptors.
Tests (test_worker.py): IPC protocol tests, UserWorker integration tests, async execute tests (including concurrency serialization), action handler tests, and worker main loop tests.

Notable fixes included in this branch

Enforce max_workers cap: raise 503 when pool is full instead of unbounded subprocess growth
Use os.getgrouplist() for supplementary groups (fixes missing groups on LDAP/NSS systems)
File descriptor leak protection on error paths and for unexpected extra fds from SCM_RIGHTS
Socket timeout (120s) to prevent indefinite thread hangs if a worker is stuck
Proper error handling and status codes in all action handlers
Dev-mode parity: action handler exceptions are caught and logged consistently

@JaneliaSciComp/fileglancer @StephanPreibisch

…cesses (akin to JupyterHub)

When the pool is at capacity and all workers are busy, _evict_lru() returns without evicting. Previously, get_worker() would unconditionally spawn a new worker anyway, silently exceeding the limit. Now it raises a WorkerError(503) so the caller gets a proper "try again later" instead of unbounded subprocess growth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

WorkerPool.__init__ was hardcoding max_workers=50 and idle_timeout=300 instead of reading from Settings.worker_pool_max_workers and worker_pool_idle_timeout. The eviction loop sleep is now also capped at the idle_timeout so shorter timeouts are honored promptly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Two fd leak paths: (1) if an exception occurs mid-receive, any fds already collected from SCM_RIGHTS ancillary data were never closed; (2) if multiple fds arrived (unlikely but possible), only fds[0] was wrapped and the rest leaked. Now all fds are closed on error, and extra fds beyond the first are closed on success. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

These handlers were missing try/except for FileNotFoundError and PermissionError, unlike all neighboring file action handlers. Exceptions would propagate to the worker main loop's generic handler and return 500 instead of proper 404/403. Also add status_code to _get_filestore error responses for consistency with other handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_spawn_worker() was using grp.getgrall() to build the child process's supplementary groups. On LDAP/NSS-backed systems, getgrall() may not enumerate all effective groups, causing workers to lose access to files that the user can normally reach. os.getgrouplist() queries NSS directly and matches the approach used by user_context.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Bare except: catches SystemExit and KeyboardInterrupt, preventing clean shutdown. Use except Exception: instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The json.loads(model.model_dump_json()) pattern serializes to a JSON string then immediately parses it back to a dict. model_dump(mode='json') produces the same dict directly without the string round-trip, which matters for directory listings with many files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

In dev/test mode (no worker pool), exceptions from action handlers would propagate unhandled to the generic exception handler. In production, the worker subprocess catches these and returns error dicts. Now dev mode also catches and logs action handler exceptions, matching the production error behavior more closely. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

_action_get_job_file_paths and _action_get_service_url were calling get_settings() to get the db_url while every other handler uses ctx.db_url. Use ctx.db_url for consistency with the rest of the worker action handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The worker socket had no timeout, so if a worker hung mid-response the run_in_executor thread would block indefinitely. Add a 120s timeout so the socket raises and the request fails rather than silently tying up a thread forever. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neomorphic · 2026-04-13T15:30:04Z

I am concerned about memory usage with multiple uvicorn workers in production.

Each uvicorn worker process creates its own independent WorkerPool. Since uvicorn uses multiprocessing.get_context("spawn"), every worker process re-imports the module and creates a fresh pool — there is no shared state between them.

This means the number of user worker subprocesses scales as active_users × uvicorn_processes, not just active_users. Claude estimated a user worker consumes ~69 MB RSS (~50 MB PSS) on my dev linux server. With the production config (--workers 10), I tested two users browsing casually and produced 6 user worker processes across 4 uvicorn workers within a few minutes.

At scale with 50 active users and 10 uvicorn workers, the theoretical maximum is 500 user worker processes consuming ~25 GB. This is mitigated by the 300-second idle timeout and connection-level load balancing, but it will still be significantly more than the 50 workers the max_workers setting suggests in the configuration.

This should either be documented, or the architecture should be adjusted — for example, by running the worker pool as a separate broker process shared across all uvicorn workers, or by reducing max_workers to (max_workers / uvicorn_workers) automatically.

krokicki · 2026-04-14T12:57:35Z

@neomorphic Great points, this is something I started to think about but probably needs more experimentation and discussion. A separate shared broker process would solve some issues while introducing others (single point of failure, additional multiprocessing code, etc.) so there are trade-offs.

krokicki · 2026-04-16T14:13:57Z

Notes:

Optimize workers so that they import as little as possible

Test plan:

Test all Fileglancer features
Kill worker and see what happens
Deploy on fileglancer-dev and test simultaneous multi-user

# Conflicts: # fileglancer/server.py # fileglancer/user_context.py

worker_pool and user_worker imported pwd at module top-level, breaking import on Windows where pwd doesn't exist. Wrap in try/except per the same pattern used for user_context.py on main. Also skip the entire test_worker.py module on Windows since the worker subsystem uses fork/setuid/SCM_RIGHTS. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Reformat 3 frontend files with prettier 3.8.3 (the version pinned on main; my earlier run used cached 3.6.2 which had different rules) - Move 'import grp as _grp' inside the existing try/except in _action_get_profile so the get_profile endpoint returns an empty groups list on Windows instead of 500-ing on missing grp module Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

_get_filestore did a real filesystem stat on the mount root for every file action. Cache successful lookups for the worker's lifetime — mount status doesn't change mid-session, and workers are evicted on idle so a remount picks up next spawn. Only successes are cached; 'not found' / 'not mounted' still retry. Test fixture clears the cache between tests since each test creates a fresh fsp with the same name pointing at a different tempdir. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

13 file-action handlers had identical fsp resolution boilerplate. Extract into a decorator that resolves request['fsp_name'] and passes the Filestore as the third arg, returning a 404/500 error response if resolution fails. Side benefit: handlers that previously returned {'error': ...} without a status_code now consistently get 404 for missing fsps instead of defaulting to 500. Net -34 lines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@action

Replace the 47-line manual _ACTIONS registry at the bottom with @action(name) decorators co-located with each handler. New handlers can't silently miss the registry, and the action name is visible alongside the function definition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The 4 cluster action handlers (submit, cancel, poll, reconnect) each repeated the same boilerplate: import create_executor, pop extra_args from cluster_config, call create_executor(**config). Pull it into one helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Without an explicit status_code, worker.execute() raised WorkerError with the default 500, bypassing the caller's intended 400 handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the substring match in with_filestore with explicit status codes returned alongside the error message. Unmounted file shares now return 503 instead of 500. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Building a filtered copy avoids surprising callers that read cluster_config again after the executor is built. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switches from reaching into the executor's private _jobs dict to the public track() method added in py-cluster-api. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls in the public Executor.track() method used by the worker poll handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

getgrall() enumerates every group on the system, which is very slow on LDAP/NIS hosts. getgrouplist() asks NSS what groups the user is in — on this dev box it's ~12x faster (2.9 ms vs 33.7 ms) for the same 27 groups, and the gap grows with the size of the directory. Result is cached per-username so repeat get_profile calls are free for the rest of the worker's lifetime. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

If the forwarder dies silently, the worker's stderr pipe eventually fills and blocks the worker on its next write. logger.exception preserves the traceback so the cause is recoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

find_fsp_in_paths is a pure function over the FSP list, so callers don't need a DB session for symlink target resolution. The Filestore methods that previously took a session= for that purpose now take fsps=. Tests that mocked database.find_fsp_from_absolute_path can now pass the real list and let find_fsp_in_paths do the work. Foundation for moving DB access out of the worker subprocess. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Workers no longer receive FGC_DB_URL. Any DB access from a worker handler now goes through ctx.db (a DbProxy). In a subprocess, that's RpcDbProxy, which sends a db_request message back over the same socket; the parent's UserWorker._handle_db_request runs the query with full credentials and sends a db_response. In dev/test mode the same handlers use LocalDbProxy to call the database directly. The wire protocol picks up two new message kinds (db_request, db_response) distinguished by a "_kind" field; existing action requests/responses keep their old shape. The whitelist of allowed db methods lives in user_worker.DB_METHODS — currently get_file_share_paths and get_job, both read-only and narrowly scoped. Filestore session= and validate_path_in_filestore(session) were already refactored to fsps= in earlier commits; this commit threads ctx.db through the worker handlers and removes the last db.get_db_session(ctx.db_url) calls. read_job_file is split out of get_job_file_content so the worker can read job files without re-doing the DB lookup. Closes the threat where a user could read /proc/<worker-pid>/environ to recover DB credentials and access other users' session tokens etc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Dispatches ensure_repo, discover_manifests, read_manifest, submit, cancel, poll, and reconnect through _worker_exec instead of spawning a fresh fileglancer.apps.worker subprocess per call. Deletes apps/worker.py and the parent's now-dead cluster_api stdlib-logging interceptor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

krokicki and others added 18 commits April 12, 2026 08:43

refactoring to run all setuid file access on user-specific worker pro…

9809bc7

…cesses (akin to JupyterHub)

pass file descriptors using SCM_RIGHTS for performance

e88956c

fixed code

e07027b

added worker tests and fixed SCM_RIGHTS bug

c4c6c4c

fixed recvmsg bug

6d7ffa4

add lock so workers only handle one request at a time

3b6b455

show log when worker triggers 500 error

942dbca

worker error handling

0dd3e09

replace bare except with except Exception in head_object

f351e7d

Bare except: catches SystemExit and KeyboardInterrupt, preventing clean shutdown. Use except Exception: instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

krokicki requested review from cgoina, mkitti and neomorphic April 12, 2026 16:36

remove dead code (EffectiveUserContext)

7a7e407

krokicki and others added 4 commits May 8, 2026 16:37

Merge remote-tracking branch 'origin/main' into refactor-workers

2bbf5ce

# Conflicts: # fileglancer/server.py # fileglancer/user_context.py

linting

ff3d87e

krokicki marked this pull request as ready for review May 8, 2026 22:07

krokicki and others added 17 commits May 8, 2026 18:24

fix validate_proxied_path returning 500 instead of 400 in worker mode

01f225b

Without an explicit status_code, worker.execute() raised WorkerError with the default 500, bypassing the caller's intended 400 handling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

return structured error from _get_filestore

57042b2

Replaces the substring match in with_filestore with explicit status codes returned alongside the error message. Unmounted file shares now return 503 instead of 500. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

stop mutating request dict in _get_executor

e609c75

Building a filtered copy avoids surprising callers that read cluster_config again after the executor is built. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

use Executor.track() in poll handler instead of _jobs

62001fc

Switches from reaching into the executor's private _jobs dict to the public track() method added in py-cluster-api. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

bump py-cluster-api to >=0.6.0

d1bd656

Pulls in the public Executor.track() method used by the worker poll handler. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

log worker actions

e74e58c

fix: re-export set_worker_exec from apps/__init__.py

9c905f7

chore: create alpha version for test release from branch

152ddbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to use workers for all file access#353

Refactor to use workers for all file access#353
krokicki wants to merge 40 commits into
mainfrom
refactor-workers

krokicki commented Apr 12, 2026

Uh oh!

neomorphic commented Apr 13, 2026

Uh oh!

krokicki commented Apr 14, 2026

Uh oh!

krokicki commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

krokicki commented Apr 12, 2026

Claude PR Summary

Notable fixes included in this branch

Uh oh!

neomorphic commented Apr 13, 2026

Uh oh!

krokicki commented Apr 14, 2026

Uh oh!

krokicki commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants