Skip to content

feat: add decoding as a multi-threaded CLARA engine#1229

Open
baltzell wants to merge 44 commits into
developmentfrom
decoder-engine
Open

feat: add decoding as a multi-threaded CLARA engine#1229
baltzell wants to merge 44 commits into
developmentfrom
decoder-engine

Conversation

@baltzell
Copy link
Copy Markdown
Collaborator

@baltzell baltzell commented Apr 27, 2026

In the past year, the decoder was sped up enough to be usable as the (single-threaded) CLARA I/O service DecoderReader. With current reconstruction speeds, that scales linearly up to about 32 threads, where it becomes I/O-bound by (single-threaded) decoding.

This PR adds the decoder as a (multi-threaded) CLARA engine DecoderEngine, based on a pool of CLASDecoder objects in lieu of a thread-safe decoder. Unlike other engines in COATJAVA, this implements CLARA's Engine class directly, rather than extending ReconstructionEngine.

For database access, new "share" constructors for CLASDecoder and DetectorEventDecoder are added to inherit a previous instance's ConstantsManager objects, rather than initializing new ones. All but the pool's first decoder objects use these new constructors for database sharing, akin to ReconstructionEngine.

Here's the rough performance for a 24-thread job on a farm25 node with etc/services/rgd-clarode.yml. The 12 ms/event for the DECO engine suggests some thread contention, e.g., synchronized ConstantsManager calls, since it's few-ms when run single-threaded.

READER          10000 events    total time =     0.52 s    average event time =    0.05 ms
DECO            10000 events    total time =   129.11 s    average event time =   12.91 ms
DCDN            10000 events    total time =   417.51 s    average event time =   41.75 ms
MAGFIELDS       10000 events    total time =     0.17 s    average event time =    0.02 ms
FTCAL           10000 events    total time =     0.24 s    average event time =    0.02 ms
FTHODO          10000 events    total time =     0.18 s    average event time =    0.02 ms
FTTRK           10000 events    total time =     1.25 s    average event time =    0.12 ms
FTEB            10000 events    total time =     0.19 s    average event time =    0.02 ms
RASTER          10000 events    total time =     0.26 s    average event time =    0.03 ms
DCCR            10000 events    total time =   102.01 s    average event time =   10.20 ms
MLTD            10000 events    total time =   171.65 s    average event time =   17.17 ms
DCHAI           10000 events    total time =   440.76 s    average event time =   44.08 ms
FTOFHB          10000 events    total time =    34.57 s    average event time =    3.46 ms
EC              10000 events    total time =    17.54 s    average event time =    1.75 ms
CVTFP           10000 events    total time =   938.05 s    average event time =   93.80 ms
CTOF            10000 events    total time =    18.39 s    average event time =    1.84 ms
CND             10000 events    total time =     5.45 s    average event time =    0.54 ms
BAND            10000 events    total time =     1.07 s    average event time =    0.11 ms
HTCC            10000 events    total time =     0.74 s    average event time =    0.07 ms
LTCC            10000 events    total time =     0.58 s    average event time =    0.06 ms
EBHB            10000 events    total time =    10.60 s    average event time =    1.06 ms
DCTB            10000 events    total time =  1439.13 s    average event time =  143.91 ms
FMT             10000 events    total time =    43.14 s    average event time =    4.31 ms
CVTSP           10000 events    total time =   346.16 s    average event time =   34.62 ms
FTOFTB          10000 events    total time =    25.53 s    average event time =    2.55 ms
EBTB            10000 events    total time =     9.04 s    average event time =    0.90 ms
RICH            10000 events    total time =    65.09 s    average event time =    6.51 ms
RTPC            10000 events    total time =     0.34 s    average event time =    0.03 ms
VTX             10000 events    total time =   187.63 s    average event time =   18.76 ms
CALIB           10000 events    total time =     7.27 s    average event time =    0.73 ms
WRITER          10000 events    total time =     2.84 s    average event time =    0.28 ms
TOTAL           10000 events    total time =  4417.00 s    average event time =  441.70 ms

@baltzell baltzell changed the title Decoder engine add decoder engine Apr 27, 2026
@baltzell baltzell marked this pull request as ready for review April 27, 2026 21:07
@baltzell baltzell changed the title add decoder engine feat: add decoder engine Apr 28, 2026
@baltzell baltzell added the speed label Apr 28, 2026
@baltzell baltzell requested a review from c-dilks as a code owner April 28, 2026 19:29
@baltzell
Copy link
Copy Markdown
Collaborator Author

baltzell commented Apr 28, 2026

The decoder showed high thread contention with only one ConstantsManager. Increasing that a bit gives much better performance ("DECO" is the decoder engine):

Screenshot 2026-04-28 at 18 34 20

@baltzell baltzell enabled auto-merge (squash) May 10, 2026 21:50
@baltzell baltzell marked this pull request as draft May 11, 2026 22:46
auto-merge was automatically disabled May 11, 2026 22:46

Pull request was converted to draft

@baltzell baltzell changed the title feat: add decoder engine feat: add decoder CLARA engine and a decoderless Clas12Reader I/O service May 11, 2026
@baltzell baltzell marked this pull request as ready for review May 11, 2026 23:30
@baltzell baltzell enabled auto-merge (squash) May 11, 2026 23:31
@baltzell baltzell changed the title feat: add decoder CLARA engine and a decoderless Clas12Reader I/O service feat: add decoding CLARA engine May 12, 2026
@baltzell baltzell changed the title feat: add decoding CLARA engine feat: add decoding as a multi-threaded CLARA engine May 12, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a multi-threaded CLARA decoder stage by adding a new DecoderEngine that decodes EVIO to HIPO using a pool of CLASDecoder instances, and updates the example CLARA service chain to use it.

Changes:

  • Add org.jlab.clas.reco.DecoderEngine (direct CLARA Engine) that decodes EVIO→HIPO using a decoder pool.
  • Add “sharing” constructors in CLASDecoder / DetectorEventDecoder to reuse ConstantsManager instances across pooled decoders.
  • Update etc/services/rgd-clarode.yml to use EvioToEvioReader and insert the new DECO engine.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
etc/services/rgd-clarode.yml Switches the reader and inserts the new decoder engine into the CLARA chain.
common-tools/clas-reco/src/main/java/org/jlab/clas/reco/DecoderEngine.java New multi-threaded decoder engine implementation using a pooled CLASDecoder.
common-tools/clas-detector/src/main/java/org/jlab/detector/decode/DetectorEventDecoder.java Adds constructors/initialization options to share constants managers for DB access.
common-tools/clas-detector/src/main/java/org/jlab/detector/decode/CLASDecoder.java Adds a “share” constructor and a convenience getDecodedEvent(EvioDataEvent) overload.
common-tools/clara-io/src/main/java/org/jlab/io/clara/EvioToEvioReader.java Alters reported byte order for EVIO input events.
Comments suppressed due to low confidence (3)

common-tools/clas-reco/src/main/java/org/jlab/clas/reco/DecoderEngine.java:69

  • In configure, the timestamp option is applied via setVariation(...) instead of setTimestamp(...). This overwrites the variation with the timestamp string and the timestamp is never set on the decoder pool instances.
            if (i % constantsShared == 0) {
                d0 = new CLASDecoder();
                if (json.has("variation")) d0.setVariation(json.getString("variation"));
                if (json.has("timestamp")) d0.setVariation(json.getString("timestamp"));
                d = d0;

common-tools/clas-reco/src/main/java/org/jlab/clas/reco/DecoderEngine.java:91

  • The EVIO byte order is forced to LITTLE_ENDIAN when constructing EvioDataEvent. This ignores the ByteBuffer's actual order (and differs from ReconstructionEngine, which uses bb.order()). If an input EVIO stream/file is big-endian, decoding will be incorrect. Use the buffer's order (or the reader-reported file byte order) instead of hard-coding LITTLE_ENDIAN.
        if (input.getMimeType().equals("binary/data-evio")) {
            EvioDataEvent evio;
            try {
                ByteBuffer bb = (ByteBuffer) input.getData();
                //evio = new EvioDataEvent(bb.array(), bb.order());
                evio = new EvioDataEvent(bb.array(), ByteOrder.LITTLE_ENDIAN);
            } catch (Exception e) {

common-tools/clas-reco/src/main/java/org/jlab/clas/reco/DecoderEngine.java:103

  • A decoder taken from the pool is only returned via pool.put(d) on the success path. If decoding or event conversion throws after pool.take(), the decoder is leaked from the pool, and repeated failures can eventually drain the pool and stall processing. Return the decoder to the pool in a finally block (or use a try-with-resources style helper) so it is always released.
                CLASDecoder d = pool.take();
                hipo = new HipoDataEvent(d.getDecodedEvent(evio),schema);
                pool.put(d);
                output.setData("binary/data-hipo", hipo.getHipoEvent());
            } catch (Exception e) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common-tools/clara-io/src/main/java/org/jlab/io/clara/EvioToEvioReader.java Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants