Skip to content

Refactor CNBC and InsiderTrading data formats for storage efficiency#6

Merged
AlexCatarino merged 2 commits into
QuantConnect:masterfrom
AlexCatarino:enhance-quiver-data-formats
May 12, 2026
Merged

Refactor CNBC and InsiderTrading data formats for storage efficiency#6
AlexCatarino merged 2 commits into
QuantConnect:masterfrom
AlexCatarino:enhance-quiver-data-formats

Conversation

@AlexCatarino
Copy link
Copy Markdown
Member

@AlexCatarino AlexCatarino commented May 12, 2026

Summary

  • Compact CSV format for CNBC and InsiderTrading: SEC single-letter codes, T/F booleans, integer order direction, fileDate/adviceDate empty-shortcut. Helpers live in QuiverQuantCsvExtensions; enums (TransactionCode, OwnershipType, AcquiredDisposedCode) sit in the new QuantConnect.DataSource.QuiverQuant sub-namespace with [EnumMember] so Newtonsoft serializes API JSON natively (no custom JsonConverters needed).
  • InsiderTrading now persists every field from live/insiders (transaction date, fileDate, transaction code, acquired/disposed, ownership, officer title, isDirector/isOfficer/isTenPercentOwner/isOther). Reader sets Time = uploadedDate.AddDays(-1) so EndTime is the day data became available; fileDate/adviceDate empty in CSV ⇒ Reader reuses the upload date.
  • InsiderTrading downloader mirrors the CNBC orchestration: Run accumulates per-ticker, Flush writes per-ticker files with per-ticker resiliency, ProcessUniverse rebuilds universe files from the corpus. Invalid tickers (e.g. N/A) are filtered up front to avoid Windows path-separator failures.
  • Program.cs lifts processing-date + processing-date-lookback (default 0) so the same iteration loop backfills recent days for both datasets; dataset is selectable via args[0].
  • 33 new helper round-trip tests + 8 Reader/UniverseReader tests covering the new format (61 total Quiver tests, all green).

Test plan

  • dotnet test tests/Tests.csproj — 61 Quiver tests pass.
  • Bulk regenerate locally:
    • InsiderTrading from 2014-04-01 (lookback ≈ 4423)
    • CNBC from 2020-11-01 (lookback ≈ 2017)
  • Spot-check aapl.csv in both datasets shows compact format.

AlexCatarino and others added 2 commits May 12, 2026 02:55
- Extract TransactionCode, OwnershipType, AcquiredDisposedCode enums under
  the QuantConnect.DataSource.QuiverQuant namespace, each annotated with
  [EnumMember] for the SEC single-letter code so Newtonsoft serializes
  them directly without a custom converter.
- Persist enum values, booleans, and OrderDirection in CSV as single
  letters / T-F / underlying int (helpers in QuiverQuantCsvExtensions),
  shrinking on-disk size for both datasets.
- InsiderTrading now reads every field returned by live/insiders: Date,
  fileDate, TransactionCode, PricePerShare, Shares, SharesOwnedFollowing,
  AcquiredDisposedCode, DirectOrIndirectOwnership, OfficerTitle,
  IsDirector/IsOfficer/IsTenPercentOwner/IsOther.
- Reader semantics: Time = uploadedDate.AddDays(-1) so EndTime equals the
  upload day. fileDate / adviceDate empty-shortcut falls back to that
  upload date (CNBC and InsiderTrading aligned).
- InsiderTrading downloader mirrors the CNBC pattern: Run accumulates
  per-ticker in memory, Flush writes per-ticker files with per-ticker
  exception handling, ProcessUniverse rebuilds universe files from the
  per-ticker corpus by upload date.
- Reject invalid tickers (e.g. "N/A") in the InsiderTrading Run loop to
  avoid Windows path-separator failures.
- Program.cs lifts processing-date / processing-date-lookback so a single
  invocation can backfill recent days; CNBC and InsiderTrading share the
  same iteration loop. Dataset is now selectable via args[0].
- Add tests covering the Reader and universe Reader for the compact
  format plus every CSV helper mapping (33 helper cases + 8 Reader cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Bets

- Program.cs now reads the dataset name from the `vendor-data-name`
  config key (default `cnbc`) instead of args, lifts `processingDate`
  and `processingStartDate` to the top so every case shares them, and
  prints the valid options when an unknown dataset is provided.
- QuiverCongressDataDownloader now combines `destinationFolder` with
  `VendorName` and `VendorDataName`, matching every other downloader.
  Previously it dropped `VendorName` from the path.
- QuiverWallStreetBetsDataDownloader stops re-prefixing `alternative` —
  Program.cs already passes `destinationDirectory` (which includes it),
  so the downloader now follows the same convention as the others.
- config.json gains `processing-date-lookback` and `vendor-data-name`
  defaults; auth token kept empty.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@AlexCatarino AlexCatarino merged commit a50ad5f into QuantConnect:master May 12, 2026
1 check passed
@AlexCatarino AlexCatarino deleted the enhance-quiver-data-formats branch May 12, 2026 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant