vnc_worker: multi-client support, device dirty tracking, and docs#3233
vnc_worker: multi-client support, device dirty tracking, and docs#3233bitranox wants to merge 71 commits intomicrosoft:mainfrom
Conversation
Tracks which 16x16 pixel tiles need updating using a packed bit vector. Supports marking from arbitrary pixel rectangles (matching the synthetic video protocol), per-client accumulation via OR, and merging adjacent dirty tiles into larger rectangles for efficient encoding.
Forward guest-reported dirty rectangles from the synthetic video device to an optional channel. The DirtyRect type converts from the internal protocol format to a public mesh-serializable type. The resolver passes None for now; the channel will be connected when the VNC worker gains multi-client support with bitmap-driven updates.
Replace single-client state machine with concurrent multi-client architecture. Multiple VNC clients can now connect simultaneously to the same port without disconnecting each other. Key changes: - SharedView wraps framebuffer in Arc<Mutex> for concurrent read access - SharedInput clones the mesh::Sender so each client can send input - Per-client VNC server instances with independent zlib streams, pixel format negotiation, and update timers (RFB protocol requirement) - FuturesUnordered drives all client connections concurrently - Abort senders stored separately for clean shutdown on restart/stop
Shrink dirty detection tiles from 64x64 to 16x16 pixels for better precision with small updates (cursor blinks, text editing). At 1080p this is still only ~1KB of bitmap. Add tracing throughout the VNC server for operational visibility: - info: RFB version, pixel format, encodings, resolution changes - trace: per-update dirty tile count and encoding used Enable with RUST_LOG=vnc=debug or RUST_LOG=vnc=trace.
Move DirtyRect type to video_core (shared between video device and VNC worker). Add dirty_recv to VncParameters so the VNC worker can receive device-reported dirty rectangles. Currently passed as None from openvmm_entry -- connecting it to the video device resolver requires plumbing through the resource resolution system.
Add dirt_send to SynthVideoHandle following the established pattern (same as ShutdownIcHandle, GuestEmulationDeviceHandle). The channel is created in openvmm_entry when --gfx is enabled, sender goes to the video device, receiver goes to VncParameters. The full path is now connected: guest driver → synthetic video device → mesh channel → VNC worker
The coordinator in the worker's process() loop now polls the dirty rect channel from the video device and feeds rects into a shared DirtyBitmap. Each per-client VNC server reads from this shared bitmap during its update cycle. When device dirty rects are available, tiles are taken from the bitmap instead of doing a full-frame pixel comparison. When no device rects arrive (older hyperv_fb driver), the existing tile diff runs as fallback. Both paths feed the same per-client pending bitmap, which is merged into rects for encoding.
Replace shared DirtyBitmap with per-client mpsc channels. The coordinator broadcasts device rects to each client's channel; each client drains into its own pending bitmap. No shared state means no clearing ambiguity. When device dirty rects are available, only read the dirty scanlines from VRAM instead of the full 8MB framebuffer. An idle desktop with a blinking cursor now reads ~1KB instead of 8MB per update cycle. Add missing logging: - tracelimit::warn_ratelimited for full-screen retransmits - Dirty rect source (device/diff/full) in trace output Add TODO for FeatureChangeMessage (is_dirt_needed) in the video device handshake to explicitly request dirty rects from the guest.
Fix: in the device-dirty path, copy prev_fb into cur_fb before reading dirty lines so non-dirty regions stay valid for future tile-diff fallback cycles. Without this, switching from device-dirty to tile-diff mode would produce false positives from zeroed regions. Add debug logging when a client's dirty channel is full (slow consumer falling behind). Add per-update bytes_sent tracking at debug level. Add FeatureChangeMessage TODO in video device for explicitly requesting dirty rects from the guest during handshake.
- Replace 8MB copy_from_slice with O(1) pointer swap in device-dirty path. Non-dirty regions stay valid via the swap; only dirty scanlines are overwritten with fresh VRAM data. - Pre-allocate tile_buf capacity before pixel conversion loop to avoid reallocations during the per-scanline encoding inner loop. - Use Arc<Vec<DirtyRect>> for broadcasting dirty rects to clients. Only a ref-count bump per client instead of cloning the full Vec. - Add vertical merging to merge_dirty_rects(): a 3x3 dirty block now produces 1 rectangle instead of 3, reducing wire overhead and client processing. - Extract mask_trailing_bits() helper to deduplicate bitmap init logic.
- Make TILE_SIZE pub(crate) instead of pub -- callers shouldn't depend on the internal tile granularity. - Add DirtyRectReceiver type alias to reduce nested generic complexity. - Add debug_assert bounds check in the encoding loop to catch dirty rects that exceed framebuffer dimensions during development. - Document that new clients always get a full first frame, so missing prior device dirty rects is not a correctness issue. - Add tests for vertical merging and non-merging of different-width spans.
Break the monolithic 850-line run_internal() into four concerns: - Encoder: per-connection zlib state, pixel format conversion, and rect encoding. Owns tile_buf/zlib_buf scratch buffers. - UpdateState: framebuffer snapshots (cur_fb/prev_fb), dirty bitmap, and the three update modes (full/device-dirty/tile-diff). Owns the collect_dirty/tile_diff/commit lifecycle. - build_cursor(): extracted 80-line cursor rendering into a standalone function returning (pixels, mask). - handle_client_message(): extracted all client message parsing (pixel format, encodings, key/mouse, clipboard, qemu extensions) out of the main loop. The main event loop in run_internal() is now ~100 lines: handshake, select on update/socket, delegate to UpdateState and Encoder. No behavioral changes -- same protocol, same update logic, same encoding. Just structural separation for readability and testability.
- Replace loose (Vec, &str) return from collect_dirty with DirtyResult struct carrying both rects and source label. Eliminates the incorrect dirty_recv.is_some() heuristic for determining the source in the caller -- now the source is authoritative from UpdateState. - Add set_resolution() method instead of setting width/height fields directly. - Make HID_MOUSE_MAX_ABS_VALUE private (only used within vnc crate). - Fix doc comment on collect_dirty (said "or None" but returned Vec).
- Unify the two duplicated select! blocks in process() into one by wrapping the client_done future in a pending-when-empty helper. Eliminates the if/else branch with near-identical select bodies. - Clear dirty_senders in abort_all_clients() to avoid stale senders surviving a worker restart. - Fix outdated doc comment (said "shared bitmap", now per-client channels). - Add has_dirty_recv to Inspect output for observability.
Test the full UpdateState pipeline with a mock framebuffer: - First frame is always a full update with all tiles dirty - Unchanged framebuffer produces no dirty rects (tile diff works) - Single pixel change detected and localized to correct tile - Device dirty rects feed the pending bitmap and use partial VRAM reads - prev_fb remains valid after device-dirty cycle, so a subsequent tile-diff cycle produces no false positives Test pixel format conversion: - 32bpp identity (no-conversion fast path) - 16bpp RGB565 conversion with correct bit packing
Bug fix: convert_pixels used red_max for blue shift calculation instead of blue_max. This caused incorrect blue channel rendering for non-32bpp pixel formats. Added regression test. Replace &'static str source with DirtySource enum for type safety. Add convert_pixels doc explaining 0x00RRGGBB pixel layout. Fix duplicate doc comment fragment on merge_dirty_rects. New tests: - convert_pixels with empty input - convert_pixels blue channel regression - merge_dirty_rects on empty bitmap - mark_rect with zero-area rect is a no-op
…rror
Replace (u16, u16, u16, u16) tuples with named Rect { x, y, w, h }
struct throughout DirtyBitmap, UpdateState, and the encoding loop.
Self-documenting field access instead of positional indexing.
Extract ClientState struct from handle_client_message's 8 &mut params:
fmt, width, height, ready_for_update, force_full_update, send_cursor,
and scancode_state are now fields of a single struct passed by &mut.
Preserve flate2 CompressError in Error::ZlibCompression via #[from]
instead of discarding it with map_err(|_| ...). Diagnostic info is
no longer lost.
Bug fix: convert_pixels used red_max for blue shift calculation
(shift_b). Now correctly uses blue_max.
Move supports_desktop_resize/zlib/cursor from Server to ClientState -- these are per-connection encoding capabilities that belong with the other per-connection state, not split across two structs. Encoder::encode_rect now takes &Rect instead of 4 separate u16 params. Document fb_width param (framebuffer stride, not rect width). Updater::update takes &mut self and calls try_send directly instead of cloning the mpsc::Sender on every timer tick (was allocating per call). Add comprehensive module-level doc describing the architecture. Add field-level docs to ClientState. Fix stale doc comment on collect_dirty.
Pre-compute pixel format conversion parameters (shifts, depth, endianness) into PixelConversion struct, cached in ClientState. Eliminates per-pixel recomputation of count_ones() and shift arithmetic -- these were computed from the PixelFormat on every convert_pixels call (8160 calls per full 1080p update). Eliminate double merge_dirty_rects() call in device-dirty path: the merged rects were computed once for partial VRAM reads and then recomputed identically for the final output. Now computed once and reused for both. Use Vec::fill() for bitmap clear/mark_all instead of per-word loops (may vectorize better, though the bitmap is only ~1KB).
When a tile row in mark_rect spans multiple u64 words (e.g., a full-screen device dirty rect covering 120 tiles), set entire words at once instead of individual bit operations. Reduces a 120-tile row from 120 bit-sets to ~2 word-sets. For narrow rects (< 64 tiles wide), computes a single bitmask for the affected word. No behavioral change, same test coverage passes.
Allocate input_size + 128 bytes upfront instead of input_size + 64. The extra margin accounts for zlib Sync flush overhead and reduces the chance of hitting the doubling fallback. The Vec retains capacity across calls, so after the first large rect this is typically a no-op. Restructure the compression loop for clarity: compute in/out offsets at the top, check completion before buffer growth (the common path exits without ever hitting the resize branch).
Batch all FramebufferUpdate data (header, cursor, rects) into a single output_buf and send with one socket.write_all(). Reduces async write calls from O(dirty_rects * 3) to O(1) per update cycle, eliminating per-rect polling machinery and syscall overhead. Hoist the no_convert check out of the per-scanline encode loop. For the common 32bpp-native case, the inner loop is now a direct extend_from_slice without function call or branch per scanline. Add set_tile() to DirtyBitmap for direct bit-setting by tile coords, used by tile_diff to skip mark_rect's redundant clamping and division. Add merge_into() for reusable Vec allocation across update cycles. Encoder::encode_rect is now sync (no socket access), making the encode/send split cleaner.
Each client allocates ~8MB for framebuffer buffers plus zlib state. Reject connections beyond the limit by dropping the socket immediately, logging a warning with the rejected address.
When the guest video driver has sent dirty rects at least once, empty cycles now skip the 8MB full-framebuffer read and tile diff entirely. Previously, an idle 1080p desktop with device dirty support still read 8MB from VRAM 33 times/second per client just to find nothing changed. Set TCP_NODELAY on accepted VNC sockets to disable Nagle's algorithm. VNC is latency-sensitive (interactive mouse/keyboard) and we already batch framebuffer updates into a single write, so Nagle just adds 40ms of unnecessary delay on small handshake messages.
Fix recycle_rects: assign-then-clear instead of clear-then-assign so the Vec's heap allocation is actually preserved across cycles. Cap SetEncodings count at 256 and ClientCutText length at 1MB to prevent allocation attacks from malicious clients. The RFB spec defines ~20 encoding types, and clipboard text beyond 1MB is not useful.
…ue_max Replace count_ones() with leading_zeros() for deriving channel bit widths from pixel format max values. count_ones() counts set bits, not bit width, giving wrong results for non-conforming max values (e.g., max=5 has count_ones=2 but actual width is 3). Fix shift_b to use blue_max instead of red_max. The red_max workaround was hiding the real bug. With leading_zeros(), non-standard max values are handled correctly — we derive bit width from the most significant set bit, not the number of set bits. Guard against max=0 (buggy client) by defaulting to 8 bits per channel. Add validation in SetPixelFormat: log non-conforming max values (not of the form 2^N-1) at debug level. Permissive like QEMU — warn but don't reject. New tests: RGB332 (asymmetric 3/3/2 bit widths), zero max handling.
…h modifiers Extract duplicated paste logic (keysym path + QEMU extended key path) into a single paste_clipboard() method. Eliminates ~80 lines of duplicated NUMPAD_SC arrays and per-character paste loops. Track both left AND right Ctrl/Alt modifiers for the Ctrl-Alt-P paste intercept. Previously only left-side modifiers were tracked. Fix log levels: per-keystroke logging downgraded from info! to trace!, paste trigger to debug!. Info-level key logging was flooding production. Use stack buffer for Alt+Numpad digit conversion instead of Vec allocation per non-ASCII character. Remove dead emit_ascii_char from scancode.rs (no longer called after paste_clipboard uses emit() with keysyms directly). Fix mark_rect overflow: guard 1u64 << (hi+1) when hi==63 to avoid undefined shift behavior.
…size Fix: SetEncodings with count > 256 previously read only 256 entries, leaving the remaining bytes in the socket buffer and corrupting the RFB message stream. Now reads the full count (capped at 4096 for OOM protection) to keep the stream synchronized. Remove format!() allocation in paste_clipboard's Alt+Numpad path. Compute decimal digits arithmetically with stack variables instead of allocating a String per non-ASCII character. Batch the DesktopSize resize message into a single socket write (was two separate write_all calls).
Comprehensive guide covering: - Crate structure (vnc, vnc_worker, vnc_worker_defs, video_core) - Data flow diagram for the 30ms update cycle - Dirty region tracking: device rects, tile diff fallback, full refresh - DirtyBitmap internals (16x16 tiles, batch word-set, two-pass merge) - Idle desktop optimization (skip VRAM reads when device dirty seen) - Multi-client architecture: connection lifecycle, dirty broadcast, per-client isolation, worker restart - Pixel format conversion with PixelConversion cache and fast path - Encoding pipeline: batched output_buf, zlib stream management - Performance characteristics table
Lists all gaps vs TigerVNC/QEMU/libvnc with priority, effort, and plan: - Critical: VNC Auth, Tight/ZRLE encoding, WebSocket transport - High: CopyRect, Continuous Updates, IPv6 - Medium: Compression levels, Extended Clipboard - Low: LED State - Deferred: Pixel format fix (workaround in place) Includes phase ordering, dependency list, and explicit "not planned" section (H.264, audio, file transfer, etc.).
- Fix compile error: bind_addr -> addr in socket creation context - Fix stale client: when dirty broadcast is dropped because a client's channel is full, set an AtomicBool flag that the client checks during collect_dirty to force a full refresh, preventing permanently stale regions - Fix SetEncodings stream desync: drain remaining entries when client advertises more than the 4096 allocation cap - Fix help text: --vnc-listen advertised [::] but parser only accepts bare :: (IpAddr) or [::]:port (SocketAddr)
- Fix display freeze when upstream dirty channel closes: coordinator clears per-client senders, clients detect channel closure via try_next() returning Ok(None) and reset device_dirty_seen to re-enter tile diff mode - Fix stuck modifiers on paste: release both left and right Ctrl/Alt in keysym and QEMU scancode paths, not just left-side keys - Fix roadmap: [::] -> :: to match parser
High: - ClientCutText: drain oversized messages to keep RFB stream in sync (same pattern as SetEncodings fix) Medium: - PolledSocket::new failure on accept now logs and continues instead of killing the entire server - dirty_bitmap mark_rect: clamp i32 coords to framebuffer bounds before u16 cast to prevent wrapping (e.g. 70000 -> 4464) - socket.bind()/listen() now include VNC address context in errors Low: - Closed per-client dirty channel: set dirty_recv = None to stop polling a dead channel every 30ms - Error message: remove misleading [host]:port suggestion (DNS not supported)
- Rectangle count: cap at u16::MAX with warning log instead of silent truncation. Only encode capped number of rects. - Keysym matching: use u32 constants for modifier detection to prevent false matches from truncated Unicode keysyms (0x01000000+). - set_tile: add debug_assert for bounds + release-mode guard to prevent OOB panic on invalid tile coordinates. - Vertical merge: search all existing rects (not just last) to merge rows with multiple disjoint horizontal spans into fewer rectangles.
- When dirty rect count exceeds u16::MAX and is truncated, force a full refresh on the next cycle to cover the dropped regions instead of leaving them permanently stale. - Skip scancode emission for keysyms > 0xFFFF (Unicode high-plane) instead of truncating to u16 which could misinterpret the low bits.
…ation - Cursor-only FramebufferUpdates now consume ready_for_update, preventing the server from busy-looping on every 30ms timer tick until the next dirty rect arrives. - Reject pixel formats where shift + channel_bits > 32, which would cause shift overflow in convert_pixels (debug panic, garbage pixels in release).
Replace hardcoded MAX_CLIENTS=16 with a configurable --vnc-max-clients CLI option (default 16). The value is passed through VncParameters to the MultiClientServer and preserved across worker restarts.
Three new tests: - max_clients=1 allows single connection, rejects second - max_clients=3 accepts up to limit, rejects limit+1 - Slot freed after disconnect allows new client Document --vnc-max-clients in CLI reference, graphical console page, architecture doc, and comparison doc.
When the VNC client limit is reached and --vnc-evict-oldest is set, disconnect the oldest client to make room instead of rejecting the new connection. The oldest client is identified by its position in abort_senders (chronologically ordered). Three new tests verify eviction with limit=1, limit=2, and that eviction-off still rejects correctly.
…clients Use abort_senders.len() for the active client count check instead of clients.len(). Evicted clients are removed from abort_senders immediately but their futures linger in clients until reaped by the next poll. Under rapid connection churn, clients.len() would overcount and allow unbounded overshoot of max_clients.
Evicted client futures linger in self.clients until the next poll reaps them. Document this as intentional: abort_senders.len() is the hard cap on active protocol sessions, dying futures resolve within one poll cycle, and blocking the accept loop would add latency.
Fix pre-existing bug in keysym-to-scancode table: F2 through F10 were mapped one scancode too high (F2 sent F3, F3 sent F4, etc.). F1, F11, F12 were correct. Reject pixel formats where any channel exceeds 8 bits (e.g. blue_max = 511 giving 9 bits), which would cause underflow in the shift computation (8 - 9 = panic in debug, wrap in release). The internal pixel format is 0x00RRGGBB with 8 bits per channel.
The original version had 8 outright incorrect claims about QEMU, TigerVNC, and libvncserver. Rewritten with accurate comparisons: - Acknowledge that multi-client, rect merging, and output batching are standard practice, not unique advantages - Correctly describe QEMU's dirty tracking (page-level, not polling) - Correctly note QEMU clipboard support (added in 6.1) - Focus on genuine differentiators: per-client isolation depth, SYNTHVID dirty rects, agent-free clipboard paste, validation
Authentication and WebSocket transport are out of scope for this contribution. External solutions (SSH tunneling, VPN, websockify proxy) are sufficient. Renumber implementation phases accordingly.
This reverts commit 63c1ae6.
The roadmap contains internal planning details (feature priorities, implementation phases, LOC estimates) that should not be published. Moved to /vnc-roadmap.md and added to .gitignore. Removed roadmap link from the comparison doc.
Remove internal planning details (future CLI options, layout tables) from the published keyboard doc. Replace with a concise known limitations section. Copy full version to project root as private reference.
| /// When the client limit is reached, disconnect the oldest client | ||
| /// instead of rejecting the new connection | ||
| #[clap(long)] | ||
| pub vnc_evict_oldest: bool, |
There was a problem hiding this comment.
It might be worth wrapping all of these vnc specific CLI options into a VncCli struct, similar to how DiskCli works
There was a problem hiding this comment.
thanks for the suggestion.
done in 7a50013. Grouped all VNC options into a VncCli struct with #[clap(flatten)]:
There was a problem hiding this comment.
Kind of, but not quite what I had in mind. I was thinking that VncCli (potentially renamed to VncConfigCli?) could implement the FromStr trait. Then users can specify an options string passed with the --vnc options that can be parsed. IMO this would align well with other supported cli options.
This also opens up the opportunity to fail if someone specifies both a port option and a full socket address instead of silently overriding.
There was a problem hiding this comment.
Happy to implement this if it's what you prefer, but I have a couple of concerns I want to raise first:
-
Backwards compatibility: -
-vnc-portalready exists in the repo since the original open-source drop. Changing it to--vnc port=5900would break any scripts or tooling that invoke openvmm with--vnc-port. Is that acceptable, or would you want to keep the flat--vnc-portflag for compatibility and only move the new options (--vnc-listen, --vnc-max-clients, --vnc-evict-oldest) into an options string? (wierd) -
Option string length: With sensible defaults on all fields, the full form gets long:
--vnc port=5900,listen=::,max-clients=32,evict-oldest.In practice these flags will be used sparingly. most users will just need--gfxand occasionally--vnc-listenor--vnc-max-clients. The flat flags feel lighter for that common case, while the options-string form is denser when many options are combined.
That said, if you feel the options-string form is the right direction for consistency with other CLI types, I'm happy to implement it. Just let me know your preference on backwards compatibility: keep --vnc-port, or break it and require the new form everywhere?
- Flat flags dominate for day-to-day tools (most CLI tools, all major container/orchestration tools, most VNC servers)
- Options strings are used for configuring device instances where you might have multiple of them (qemu -device ..., qemu -netdev ..., docker --mount type=...) or for forwarding to nested config (sshd -o, systemd-run --property)
The distinction matters: qemu -vnc uses an options string because :0 is the display identifier and the options configure that specific display. But qemu also uses flat flags for singleton options like -m 2G, -cpu, -boot.
For openvmm's VNC, there's exactly one VNC server (it's a singleton), and the existing --vnc-port, --gfx are already flat flags. The precedent in this codebase (looking at --com1, --com2, --pcat-boot-order, --virtio-console) is mixed, some singletons use flat flags, some use options strings. DiskCli, NicConfigCli, VhostUserCli use options strings but those can have multiple instances.
Your argument is consistent with DiskCli etc., but those are per-device and the VNC server is a singleton. The closest analog in openvmm is --com1 which is flat.
There was a problem hiding this comment.
after re-thinking let me reword it : i would definitely not like to implement that ;-)
Extract process_channel() from VideoChannel::process() to make the synthvid protocol handler testable independently of the VMBus channel type. No behavioral changes.
Move --vnc, --vnc-port, --vnc-listen, --vnc-max-clients, and --vnc-evict-oldest into a VncCli struct with #[clap(flatten)]. Follows the DiskCli/NicConfigCli pattern for grouping related CLI options.
|
Thanks for your work on the VNC support very much! Could you please mention that MobaXterm works now? Before your changes, it wouldn't. Seems important as MobaXterm has got more chances to be trusted/allowed to use due to its easier to check provenance. |
You're welcome - it's just an unintended side effect of sticking to the standards. MobaXterm seems to have become the unofficial standard in the Windows sysadmin world and is widely adopted among sysadmins and IT folks working on Windows, the homelab community, corporate networks where Windows clients are the norm but Linux servers need managing, and universities / research institutions (where it's often the recommended tool for students). Given that, I think it's worth explicitly mentioning it as a compatible client. That said, the last word is of course Microsoft's |
Add MobaXterm to the graphical console client list and keyboard compatibility matrix. Keyboard behavior marked as TBA pending testing.
|
@jstarks - bump. Can we merge please, then we should discuss how to handle keyboard mappings for RealVNC (or not) and special character pasting, as well as iron out issues or optimisations You might find. That PR is big enough now, I dont want to add on top of it - but the requested feature (multiclient, dirty region tracking) needed a big, atomic PR in order to test and implement everything properly - anything You might not like, we can address after that in hopefully much smaller PR´s. |
Summary
This started as "a little VNC patch" and somehow turned into 2980 lines + tests. I blame
scope creep@jstarks and the fact that VNC is deceptively simple until you actually try to do it right. You requested Device-driven dirty region tracking and multi-client, and here we are.Core changes:
--vnc-max-clientsvnc-max-clients, new clients will be rejected, or the oldest evicted with option--vnc-evict-oldesthyperv_drmguests)socket.write_all())DirtyBitmapwith 16x16 tile tracking (~1KB at 1080p)PixelConversionwith zero-copy fast path for native 32bppEncoderabstraction separating pixel conversion from wire encodingDocumentation added:
The remaining gaps: VNC authentication, WebSocket transport, Tight/ZRLE encoding (for WAN performance), and continuous updates are not scope of this contribution.
Correct keyboard handling across all clients (RealVNC sends US scancodes regardless of client layout, TigerVNC intercepts
Ctrl+Alt) is a known limitation and out of scope for this contribution.
Test plan
Tested:
Remaining:
I'd be genuinely impressed if this gets merged. Not as a dare -- just acknowledging that reviewing 2980 lines of VNC protocol code is nobody's idea of a fun afternoon. I appreciate anyone who makes it to the bottom of the diff.