Skip to content

moderate 1.0 — Trust & Safety for Rails: report, block, filter, a moderation queue, and DSA / App Store / Play–aligned primitives#2

Merged
rameerez merged 15 commits into
mainfrom
v1-trust-and-safety
Jun 3, 2026
Merged

moderate 1.0 — Trust & Safety for Rails: report, block, filter, a moderation queue, and DSA / App Store / Play–aligned primitives#2
rameerez merged 15 commits into
mainfrom
v1-trust-and-safety

Conversation

@rameerez

@rameerez rameerez commented Jun 2, 2026

Copy link
Copy Markdown
Owner

moderate 1.0 — a complete Trust & Safety engine for Rails

This PR is the ground-up 1.0 rewrite of moderate: it grows the gem from a single-purpose profanity validator (0.1.0) into a full, opinionated Trust & Safety layer for any Rails app with user-generated content — reporting, blocking, content filtering, a real moderation queue, appeals, and EU Digital Services Act–aligned + App Store / Google Play–aligned primitives, behind one coherent data model and one set of hooks.

Ships as 1.0.0.beta1. Backward compatible: the 0.x validates :field, moderate: true profanity validator still loads (see Upgrading from 0.x).


Why this exists

The Ruby ecosystem has a canonical gem for nearly every cross-cutting concern — auth (Devise), payments (Pay), admin (madmin/Avo), background jobs (Solid Queue) — but there is no canonical Trust & Safety gem. What exists (obscenity, profanity-filter) is single-purpose, Rails-3-era, and effectively unmaintained, and none of it touches reporting, blocking, a moderation queue, or the law. So every app that hosts user content keeps privately rebuilding the same plumbing.

Two forces make that plumbing non-optional now:

  1. The platforms. Apple App Review Guideline 1.2 and the Google Play UGC policy reject apps that let users post content without a way to filter objectionable material, report it, block other users, and act on reports.
  2. The law. The EU Digital Services Act (Reg. 2022/2065) has been enforcing since 2024, with the first mandatory transparency-reporting year in 2025. Hosting services owe notice-and-action (Art. 16), statement of reasons (Art. 17), internal appeals (Art. 20), and transparency reporting (Art. 24).

These aren't three unrelated features — a report, an auto-filter flag, and a public DSA notice all flow into the same queue and the same resolve / dismiss / appeal lifecycle, with one taxonomy and one set of hooks. Splitting them forces the integrator to re-stitch that seam at the app level, which is exactly the value moderate removes. So: one gem, big surface, small front door.

It is not a compliance certificate. moderate gives you the mechanisms the stores and the law require; your policies, response times, published point of contact, jurisdiction-specific obligations, and day-to-day moderation are still yours. (Note also the DSA's tiered carve-outs — micro/small enterprises are exempt from the Art. 19/24 complaint-handling and transparency tier — so for many apps the gem does more than is strictly required; the core Art. 16/17 notice-and-action duty applies broadly.) All copy uses "DSA-aligned," never "compliant."


Design principles

  • Host-agnostic. Zero domain assumptions. The gem never names a content type; you declare what is reportable/filterable. No host coupling anywhere in lib/app.
  • UI-agnostic primitives, not chrome. The gem ships models, services, helpers, macros, and a thin mountable engine — but renders no opinionated badges/banners in your app. You build the surfaces from the primitives (flagged?, report_link, etc.).
  • Minimal dependencies. Runtime deps are only activerecord, activesupport, railties, globalid. Everything else (admin UI, email, alerts, push, image/LLM moderation, bot protection) is an optional integration auto-detected at runtime via defined?/respond_to? — never a hard dependency.
  • Bring-your-own classifiers behind one contract. Every filter adapter implements classify(value) → Moderate::Result. The only built-in is an offline wordlist; richer classifiers are reference adapters you copy in.
  • The legally-loaded paths are defensively correct (locks, in-transaction enforcement, out-of-transaction notification, signed single-purpose links) — and commented with the why and the regulation references.

What's implemented

1. Reporting & DSA notices — one model, two front doors

  • Moderate::Report backs both the in-app "Report" button and the public DSA Article 16 notice form (intake_kind: "community" | "dsa") — one queue, one decision workflow, one transparency source.
  • Moderate::Services::IntakeReport / IntakeNotice orchestrate persistence + receipt + an append-only audit, and stamp acknowledged_at as the durable Art. 16(4) proof of receipt.
  • An immutable evidence snapshot is captured at intake so a decision is always judged on what was reported, even if the content later changes.

2. Blocking — a single source of truth

  • Moderate::Block is directed in storage, bidirectional in effect. Moderate.blocked_ids_for(user) is the one query you compose into every feed/search/inbox so a block is never half-applied.
  • Actor API: block! / unblock! / blocks? / blocked_by? / blocked_with?.
  • Optional config.on_block(blocker:, blocked:, at:) runs domain teardown inside the block transaction (cancel a pending invite, leave a room, …) so a raising hook rolls the block back rather than leaving a half-applied state.

3. Content filtering — three modes, pluggable adapters

  • The moderates :field, mode:, with: macro (and the equivalent config.filter) screens fields before publish:
    • :off — no check
    • :block — reject the save with a validation error (synchronous adapters only)
    • :flag — allow the write, then file a Moderate::Flag after commit for review (great for DMs/chat, where blocking mid-conversation is hostile). The :flag-after-commit design keeps validators side-effect-free, so a rolled-back record never leaves an orphan flag.
  • Adapters resolve by name (a symbol), so swapping a backend is a one-line change. Async adapters (a remote classifier) run only in :flag mode — validated at config time.

4. Moderation queue & decisions

  • Moderate::Flag is the queue of auto-/manually-flagged content. Moderate::Services::ResolveReport / ResolveFlag / ResolveAppeal apply decisions:
    • run inside with_lock + re-check open? after reload (two moderators can't double-act),
    • apply enforcement (remove reported field / ban) inside the transaction (content is never "actioned" while still live),
    • fire notifications outside the transaction (a broken mailer can't roll back a decision),
    • stamp decision_notified_at only on confirmed delivery.
  • Field-level takedown via remove_reported_field! + its companion query removable_reported_field?(field) (so an admin UI only offers "remove" when the field is actually removable).

5. Appeals (DSA Art. 20)

  • Moderate::Appeal + IntakeAppeal / ResolveAppeal. Free, electronic, open ≥ 6 months (appeal_deadline_at). Decision emails carry signed, single-purpose Global ID appeal links.

6. Transparency (DSA Art. 24)

  • Aggregation + a mountable transparency report (notices received, actions taken, appeal outcomes, …) you can publish.

7. The mountable engine & public forms

  • Isolated Rails engine, mounted at a host-chosen path (never hardcoded).
  • Public DSA notice form: prefills from X-style query params (content_url/content_id/content_author) and the signed-in user, locks auto-filled identity fields while keeping reported-content fields editable, supports multi-URL subject_urls, and auto-detects rails_cloudflare_turnstile for bot protection — with a config.notice_guard fallback and host skip hooks (notice_/appeal_human_verification_skip_if) for native-app contexts.
  • Public appeal form + the transparency report. moderate:views ejects any of these for full customization.

Public API surface

# Actor model
class User < ApplicationRecord
  has_moderation_capabilities       # report, block, be blocked, be banned
end

# Any reportable / filterable model
class Listing < ApplicationRecord
  reportable :title, :description   # whitelist reportable fields (+ self-registers in the registry)
  moderates  :description, mode: :block
  moderates  :photo, mode: :flag, with: :image
end

user.report!(listing, category: "spam", details: "…")
user.block!(other) / user.unblock!(other) / user.blocked_with?(other)
Moderate.blocked_ids_for(user)          # bidirectional SSOT

listing.reported?                        # any open report?
listing.flagged?                         # any pending flag (auto-filter OR manual)?
listing.flagged?(:description)           # field-scoped

Configuration & hooks (all no-ops by default; config/initializers/moderate.rb):

  • config.user_class, config.default_filter_mode, config.filter_adapter, config.report_categories (host-overridable taxonomy)
  • config.audit = ->(event) { … } — every important action, as a stable Moderate::Event envelope
  • config.notify = ->(event) { … } — fan out email/alerts/push; returns a delivered-boolean that gates the DSA "confirmation of receipt"
  • config.on_block = ->(blocker:, blocked:, at:) { … } — domain teardown
  • config.ban_handler = ->(user:, by:, reason:) { … } — what "banned" means in your app (the gem never bans for you)
  • config.register_adapter :name, MyAdapter.new — BYO classifier

Adapters

  • Built-in: :wordlist — a fast, offline, multilingual baseline (ships en/es, Unicode-normalized, common-substitution aware). Honestly framed: a baseline, not a contextual classifier.
  • Reference adapters (copy-in, not shipped/loaded, no forced deps) under examples/: OpenAIModerationAdapter (omni-moderation, text + image, via ruby_llm) and AwsRekognitionAdapter (image/NSFW). Both async (synchronous? == false) → :flag only.
  • One contract: classify(value) → Moderate::Result (with Moderate::Labels mapping every provider onto a single canonical taxonomy — OpenAI's harassment[/threatening], hate[/threatening], sexual[/minors], self-harm[/intent|/instructions], violence[/graphic], illicit[/violent] — so flags, the statement of reasons, and transparency counters all speak one vocabulary).
  • Async classification runs through Moderate::ClassifyJob (ActiveJob).

Optional integrations (auto-detected, never required)

madmin (admin queue) · goodmail (transactional email) · telegrama (admin alerts) · noticed (in-app/push) · rails_cloudflare_turnstile (notice-form bot gate). Each is wired only if the gem is present in the host.


Data model, migrations & taxonomy

  • Four tables: moderate_reports, moderate_blocks, moderate_flags, moderate_appeals.
  • Adaptive install migration: UUID or bigint primary keys, and jsonb/json/MySQL-JSON columns — portable across SQLite, PostgreSQL, and MySQL.
  • Only structural invariants live in the DB (NOT NULLs, the unique [blocker_id, blocked_id] index + the blocked_id index for the SSOT query, the self-block CHECK, FKs, a message-length guard). Taxonomies (categories, legal reasons, country codes) are frozen model constants + inclusion validations, not DB check constraints — so adding/customizing a category needs no migration, and report_categories is host-overridable.

Security & correctness invariants

  • Signed-GID locators resolve emailed links through a reportable-class allow-list (defense against object-substitution), with purpose + HMAC always enforced.
  • Decisions: row-locked, re-checked-open, enforce-in-transaction, notify-out-of-transaction.
  • Hooks are isolated: audit/notify exceptions are swallowed (logged) so an observational sink can never roll back the action it records.

Install & configuration

bundle add moderate
bin/rails generate moderate:install   # writes the initializer + the adaptive migration
bin/rails db:migrate
# optional: bin/rails generate moderate:views   # eject the public notice/appeal/transparency views
mount Moderate::Engine => "/<your-choice>"      # only needed for the public DSA forms

Full guides in docs/: configuration.md, compliance.md (the Art-by-Art mapping + the [gem] vs [you] checklist), dsa-notice-form.md, madmin.md, notifications.md.


Testing & CI

  • A full dummy Rails app + ~199 tests (models, concerns, macros, services, filters, value objects, the engine forms, and integration flows) with SimpleCov.
  • CI matrix via Appraisals: Ruby 3.3 / 3.4 / 4.0 × Rails 7.1 / 7.2 / 8.1 × SQLite / PostgreSQL / MySQL. The migration is exercised on every adapter (migrations, not a dumped schema.rb, are the source of truth — so SQLite-specific schema quirks can't silently break the server adapters).

Upgrading from 0.x

1.0 keeps the gem name but is an entirely new API. The 0.x profanity validator (validates :field, moderate: true) still loads via compatibility shims, so existing apps keep validating. To adopt 1.0: add has_moderation_capabilities to your user model, reportable/moderates to your content models, run the install generator, and migrate. Pin ~> 0.1 if you only want the old behavior.


Intentionally out of scope (for now)

A bundled ML classifier (the wordlist is a baseline; bring an adapter), and an opinionated notification UI (the notify hook + the optional integrations cover fan-out). Extraction seams are preserved so the DSA layer could become a separate gem later if it ever warrants it.

🤖 Generated with Claude Code

rameerez and others added 11 commits June 2, 2026 03:47
Reframes moderate from a profanity validator (0.1) into a full Trust & Safety gem: report/block/filter/moderate + DSA / App Store / Play compliance. This is the docs-first pass — README, docs/, adaptive install migration, initializer, Gemfile/Appraisals/CI/SimpleCov coherence with the ecosystem. Gem code (models/services/concerns/filters) follows next.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, web, dummy app + tests (172 green)

Extracts the proven Trust & Safety logic into a clean, host-agnostic Moderate:: gem built to the README/docs spec: Report/Block/Flag/Appeal models, Actor/Reportable/ContentFilterable concerns + has_moderation/reportable/moderates macros, the Result/Label canonical taxonomy + wordlist adapter, intake/resolve services (report/appeal/flag) with DSA Art 16/17/20 flows, the mountable notice-form engine + report_link helper + BYOUI controller concern, a test/dummy host app, and a full Minitest suite (172 runs, 0 failures; line 86.6% / branch 65.2%). Inline comments cite DSA/OpenAI/Apple/Google/Rails sources. Zero host (CarHey/ride) references.

Post-build corrections still pending (see TODO): abstract external classifier adapters (no inline OpenAI HTTP/dep; reference impl via ruby_llm), move taxonomy lists from DB check-constraints to model constants+validations, auto-integrate rails_cloudflare_turnstile, and make the engine mount-path host-chosen + prefill/lock the notice form.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taxonomy, turnstile auto-integration, host-chosen mount + form prefill/lock

#1 External classifiers are no longer shipped/loaded or implied deps: deleted the inline OpenAI (Net::HTTP) + image adapters; :wordlist is the only built-in; OpenAI (via ruby_llm) and AWS Rekognition are copy-and-register REFERENCE adapters in examples/. Dropped the Moderate::Adapters alias namespace.

#2 Taxonomy moved from DB check-constraints to frozen model constants + ActiveModel inclusion validations (category host-overridable via config.report_categories); migrations keep only structural constraints (NOT NULL, unique block index, self-block, FKs, message length).

#3 The public DSA notice form auto-detects rails_cloudflare_turnstile (widget + server verify) when present, with a config.notice_guard no-op fallback — no ENV/config needed.

#4 Engine routes are relative; the HOST chooses the mount point (no hardcoded /legal). The notice form prefills from X-style query params (content_url/author/id) + Devise current_user, locks the auto-prefilled identity fields, and keeps reported-content fields editable.

Suite green: 176 runs, 692 assertions, 0 failures (line 89% / branch 69%). Zero host (CarHey/ride) references.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
@rameerez rameerez changed the title moderate v1: Trust & Safety rewrite (report / block / filter / DSA · store compliance) moderate 1.0 — Trust & Safety for Rails: report, block, filter, a moderation queue, and DSA / App Store / Play–aligned primitives Jun 2, 2026
rameerez and others added 4 commits June 3, 2026 00:40
…(clean, no aliases) + Report#resolve!/dismiss! + transparency opt-in

- has_moderation_capabilities -> has_reporting_and_blocking; reportable -> has_reportable_content.
  No back-compat aliases — pre-1.0, clean rename. All call sites, tests, dummy app, docs,
  CHANGELOG, and comments updated; the non-macro reportable surface (Moderate::Reportable,
  reportable_fields, the polymorphic association, reportable: kwargs) is untouched.
- Add Report#resolve!(by:, **) and Report#dismiss!(by:, note:) delegators to
  Moderate::Services::ResolveReport so the documented one-liner API is real (+ test).
- Public Art. 24 transparency report is now opt-in: config.transparency_report_enabled
  (default false); the controller 404s when disabled.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- README/docs updated to the renamed macros (has_reporting_and_blocking /
  has_reportable_content) and the transparency-report opt-in.
- Make the documented plain-English admin API real: Appeal#uphold!(by:, note:) /
  Appeal#reject!(by:, note:) delegate to Moderate::Services::ResolveAppeal (matching
  Report#resolve!/#dismiss!); add tests.
- Add Moderate.transparency(from:, to:) — the DSA Art. 24 aggregation the docs
  reference and a host needs to publish its own report now that the public page is
  opt-in; the controller renders it; compliance.md citations fixed; add a test.
- compliance.md: drop adjective backticks on 'reportable', repoint phantom test
  citations at the real transparency test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the two filtering examples distinct and true-to-API: the teaser now shows the
production case (OpenAI moderation, :flag, async→queue, with the one-line
register_adapter), and Quickstart shows the zero-setup case (built-in :wordlist,
:block). Different model/mode/adapter in each.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every [gem] evidence citation in docs/compliance.md now resolves to a real test
(the doc had ~20 aspirational/namespace-stale paths). Each was repointed to the
test that actually exercises that guarantee (verified): decision guarantees ->
resolve_report_test, appeal guarantees -> resolve_appeal_test, snapshot/user-
reportable -> report_test, blocked_ids -> blocking_test, filter modes ->
content_filtering_test, report-link/report! -> reporting_test, etc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rameerez rameerez merged commit 8bbecc5 into main Jun 3, 2026
14 of 15 checks passed
@rameerez rameerez deleted the v1-trust-and-safety branch June 3, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant