moderate 1.0 — Trust & Safety for Rails: report, block, filter, a moderation queue, and DSA / App Store / Play–aligned primitives#2
Merged
Conversation
Reframes moderate from a profanity validator (0.1) into a full Trust & Safety gem: report/block/filter/moderate + DSA / App Store / Play compliance. This is the docs-first pass — README, docs/, adaptive install migration, initializer, Gemfile/Appraisals/CI/SimpleCov coherence with the ecosystem. Gem code (models/services/concerns/filters) follows next. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, web, dummy app + tests (172 green) Extracts the proven Trust & Safety logic into a clean, host-agnostic Moderate:: gem built to the README/docs spec: Report/Block/Flag/Appeal models, Actor/Reportable/ContentFilterable concerns + has_moderation/reportable/moderates macros, the Result/Label canonical taxonomy + wordlist adapter, intake/resolve services (report/appeal/flag) with DSA Art 16/17/20 flows, the mountable notice-form engine + report_link helper + BYOUI controller concern, a test/dummy host app, and a full Minitest suite (172 runs, 0 failures; line 86.6% / branch 65.2%). Inline comments cite DSA/OpenAI/Apple/Google/Rails sources. Zero host (CarHey/ride) references. Post-build corrections still pending (see TODO): abstract external classifier adapters (no inline OpenAI HTTP/dep; reference impl via ruby_llm), move taxonomy lists from DB check-constraints to model constants+validations, auto-integrate rails_cloudflare_turnstile, and make the engine mount-path host-chosen + prefill/lock the notice form. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…taxonomy, turnstile auto-integration, host-chosen mount + form prefill/lock #1 External classifiers are no longer shipped/loaded or implied deps: deleted the inline OpenAI (Net::HTTP) + image adapters; :wordlist is the only built-in; OpenAI (via ruby_llm) and AWS Rekognition are copy-and-register REFERENCE adapters in examples/. Dropped the Moderate::Adapters alias namespace. #2 Taxonomy moved from DB check-constraints to frozen model constants + ActiveModel inclusion validations (category host-overridable via config.report_categories); migrations keep only structural constraints (NOT NULL, unique block index, self-block, FKs, message length). #3 The public DSA notice form auto-detects rails_cloudflare_turnstile (widget + server verify) when present, with a config.notice_guard no-op fallback — no ENV/config needed. #4 Engine routes are relative; the HOST chooses the mount point (no hardcoded /legal). The notice form prefills from X-style query params (content_url/author/id) + Devise current_user, locks the auto-prefilled identity fields, and keeps reported-content fields editable. Suite green: 176 runs, 692 assertions, 0 failures (line 89% / branch 69%). Zero host (CarHey/ride) references. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Claude <noreply@anthropic.com>
…(clean, no aliases) + Report#resolve!/dismiss! + transparency opt-in - has_moderation_capabilities -> has_reporting_and_blocking; reportable -> has_reportable_content. No back-compat aliases — pre-1.0, clean rename. All call sites, tests, dummy app, docs, CHANGELOG, and comments updated; the non-macro reportable surface (Moderate::Reportable, reportable_fields, the polymorphic association, reportable: kwargs) is untouched. - Add Report#resolve!(by:, **) and Report#dismiss!(by:, note:) delegators to Moderate::Services::ResolveReport so the documented one-liner API is real (+ test). - Public Art. 24 transparency report is now opt-in: config.transparency_report_enabled (default false); the controller 404s when disabled. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- README/docs updated to the renamed macros (has_reporting_and_blocking / has_reportable_content) and the transparency-report opt-in. - Make the documented plain-English admin API real: Appeal#uphold!(by:, note:) / Appeal#reject!(by:, note:) delegate to Moderate::Services::ResolveAppeal (matching Report#resolve!/#dismiss!); add tests. - Add Moderate.transparency(from:, to:) — the DSA Art. 24 aggregation the docs reference and a host needs to publish its own report now that the public page is opt-in; the controller renders it; compliance.md citations fixed; add a test. - compliance.md: drop adjective backticks on 'reportable', repoint phantom test citations at the real transparency test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make the two filtering examples distinct and true-to-API: the teaser now shows the production case (OpenAI moderation, :flag, async→queue, with the one-line register_adapter), and Quickstart shows the zero-setup case (built-in :wordlist, :block). Different model/mode/adapter in each. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Every [gem] evidence citation in docs/compliance.md now resolves to a real test (the doc had ~20 aspirational/namespace-stale paths). Each was repointed to the test that actually exercises that guarantee (verified): decision guarantees -> resolve_report_test, appeal guarantees -> resolve_appeal_test, snapshot/user- reportable -> report_test, blocked_ids -> blocking_test, filter modes -> content_filtering_test, report-link/report! -> reporting_test, etc. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
moderate1.0 — a complete Trust & Safety engine for RailsThis PR is the ground-up 1.0 rewrite of
moderate: it grows the gem from a single-purpose profanity validator (0.1.0) into a full, opinionated Trust & Safety layer for any Rails app with user-generated content — reporting, blocking, content filtering, a real moderation queue, appeals, and EU Digital Services Act–aligned + App Store / Google Play–aligned primitives, behind one coherent data model and one set of hooks.Ships as
1.0.0.beta1. Backward compatible: the 0.xvalidates :field, moderate: trueprofanity validator still loads (see Upgrading from 0.x).Why this exists
The Ruby ecosystem has a canonical gem for nearly every cross-cutting concern — auth (Devise), payments (Pay), admin (madmin/Avo), background jobs (Solid Queue) — but there is no canonical Trust & Safety gem. What exists (
obscenity,profanity-filter) is single-purpose, Rails-3-era, and effectively unmaintained, and none of it touches reporting, blocking, a moderation queue, or the law. So every app that hosts user content keeps privately rebuilding the same plumbing.Two forces make that plumbing non-optional now:
These aren't three unrelated features — a report, an auto-filter flag, and a public DSA notice all flow into the same queue and the same resolve / dismiss / appeal lifecycle, with one taxonomy and one set of hooks. Splitting them forces the integrator to re-stitch that seam at the app level, which is exactly the value
moderateremoves. So: one gem, big surface, small front door.Design principles
lib/app.flagged?,report_link, etc.).activerecord,activesupport,railties,globalid. Everything else (admin UI, email, alerts, push, image/LLM moderation, bot protection) is an optional integration auto-detected at runtime viadefined?/respond_to?— never a hard dependency.classify(value) → Moderate::Result. The only built-in is an offline wordlist; richer classifiers are reference adapters you copy in.What's implemented
1. Reporting & DSA notices — one model, two front doors
Moderate::Reportbacks both the in-app "Report" button and the public DSA Article 16 notice form (intake_kind: "community" | "dsa") — one queue, one decision workflow, one transparency source.Moderate::Services::IntakeReport/IntakeNoticeorchestrate persistence + receipt + an append-only audit, and stampacknowledged_atas the durable Art. 16(4) proof of receipt.2. Blocking — a single source of truth
Moderate::Blockis directed in storage, bidirectional in effect.Moderate.blocked_ids_for(user)is the one query you compose into every feed/search/inbox so a block is never half-applied.block!/unblock!/blocks?/blocked_by?/blocked_with?.config.on_block(blocker:, blocked:, at:)runs domain teardown inside the block transaction (cancel a pending invite, leave a room, …) so a raising hook rolls the block back rather than leaving a half-applied state.3. Content filtering — three modes, pluggable adapters
moderates :field, mode:, with:macro (and the equivalentconfig.filter) screens fields before publish::off— no check:block— reject the save with a validation error (synchronous adapters only):flag— allow the write, then file aModerate::Flagafter commit for review (great for DMs/chat, where blocking mid-conversation is hostile). The:flag-after-commit design keeps validators side-effect-free, so a rolled-back record never leaves an orphan flag.:flagmode — validated at config time.4. Moderation queue & decisions
Moderate::Flagis the queue of auto-/manually-flagged content.Moderate::Services::ResolveReport/ResolveFlag/ResolveAppealapply decisions:with_lock+ re-checkopen?after reload (two moderators can't double-act),decision_notified_atonly on confirmed delivery.remove_reported_field!+ its companion queryremovable_reported_field?(field)(so an admin UI only offers "remove" when the field is actually removable).5. Appeals (DSA Art. 20)
Moderate::Appeal+IntakeAppeal/ResolveAppeal. Free, electronic, open ≥ 6 months (appeal_deadline_at). Decision emails carry signed, single-purpose Global ID appeal links.6. Transparency (DSA Art. 24)
7. The mountable engine & public forms
content_url/content_id/content_author) and the signed-in user, locks auto-filled identity fields while keeping reported-content fields editable, supports multi-URLsubject_urls, and auto-detectsrails_cloudflare_turnstilefor bot protection — with aconfig.notice_guardfallback and host skip hooks (notice_/appeal_human_verification_skip_if) for native-app contexts.moderate:viewsejects any of these for full customization.Public API surface
Configuration & hooks (all no-ops by default;
config/initializers/moderate.rb):config.user_class,config.default_filter_mode,config.filter_adapter,config.report_categories(host-overridable taxonomy)config.audit = ->(event) { … }— every important action, as a stableModerate::Eventenvelopeconfig.notify = ->(event) { … }— fan out email/alerts/push; returns a delivered-boolean that gates the DSA "confirmation of receipt"config.on_block = ->(blocker:, blocked:, at:) { … }— domain teardownconfig.ban_handler = ->(user:, by:, reason:) { … }— what "banned" means in your app (the gem never bans for you)config.register_adapter :name, MyAdapter.new— BYO classifierAdapters
:wordlist— a fast, offline, multilingual baseline (shipsen/es, Unicode-normalized, common-substitution aware). Honestly framed: a baseline, not a contextual classifier.examples/:OpenAIModerationAdapter(omni-moderation, text + image, viaruby_llm) andAwsRekognitionAdapter(image/NSFW). Both async (synchronous? == false) →:flagonly.classify(value) → Moderate::Result(withModerate::Labels mapping every provider onto a single canonical taxonomy — OpenAI'sharassment[/threatening],hate[/threatening],sexual[/minors],self-harm[/intent|/instructions],violence[/graphic],illicit[/violent]— so flags, the statement of reasons, and transparency counters all speak one vocabulary).Moderate::ClassifyJob(ActiveJob).Optional integrations (auto-detected, never required)
madmin(admin queue) ·goodmail(transactional email) ·telegrama(admin alerts) ·noticed(in-app/push) ·rails_cloudflare_turnstile(notice-form bot gate). Each is wired only if the gem is present in the host.Data model, migrations & taxonomy
moderate_reports,moderate_blocks,moderate_flags,moderate_appeals.jsonb/json/MySQL-JSON columns — portable across SQLite, PostgreSQL, and MySQL.[blocker_id, blocked_id]index + theblocked_idindex for the SSOT query, the self-block CHECK, FKs, a message-length guard). Taxonomies (categories, legal reasons, country codes) are frozen model constants + inclusion validations, not DB check constraints — so adding/customizing a category needs no migration, andreport_categoriesis host-overridable.Security & correctness invariants
audit/notifyexceptions are swallowed (logged) so an observational sink can never roll back the action it records.Install & configuration
Full guides in
docs/:configuration.md,compliance.md(the Art-by-Art mapping + the[gem]vs[you]checklist),dsa-notice-form.md,madmin.md,notifications.md.Testing & CI
schema.rb, are the source of truth — so SQLite-specific schema quirks can't silently break the server adapters).Upgrading from 0.x
1.0 keeps the gem name but is an entirely new API. The 0.x profanity validator (
validates :field, moderate: true) still loads via compatibility shims, so existing apps keep validating. To adopt 1.0: addhas_moderation_capabilitiesto your user model,reportable/moderatesto your content models, run the install generator, and migrate. Pin~> 0.1if you only want the old behavior.Intentionally out of scope (for now)
A bundled ML classifier (the wordlist is a baseline; bring an adapter), and an opinionated notification UI (the
notifyhook + the optional integrations cover fan-out). Extraction seams are preserved so the DSA layer could become a separate gem later if it ever warrants it.🤖 Generated with Claude Code