Skip to content

feat(s3-publishing): Add S3 static publishing support for vanity URLs#35808

Open
dsilvam wants to merge 13 commits into
mainfrom
s3-static-push-vanity-url-dsilvam
Open

feat(s3-publishing): Add S3 static publishing support for vanity URLs#35808
dsilvam wants to merge 13 commits into
mainfrom
s3-static-push-vanity-url-dsilvam

Conversation

@dsilvam
Copy link
Copy Markdown
Member

@dsilvam dsilvam commented May 22, 2026

Summary

This PR carries the same changes as #35643 (originally authored by @riccardoruocco), opened under a different account so CI checks can run.

Proposed Changes

  • Add a feature flag, STATIC_PUSH_S3_VANITY_ALIAS_ENABLED, to enable Vanity URL handling for AWS S3 static publishing. When the flag is false, dotCMS behaves exactly as it does today and no Vanity URL alias is generated.
  • When the flag is true, publishing a Vanity URL to a static S3 endpoint makes dotCMS resolve the target content, identify the live resource, render or copy it, and write the static clone to the S3 path represented by the Vanity URL.
  • This behavior is backed by the s3_vanity_alias table, which acts as an operational snapshot of the Vanity URL state on S3 and stores the aliases that have actually been materialized.
  • When a canonical content is republished, dotCMS checks the alias table and refreshes the existing Vanity URL clone if one is already tracked. If no Vanity URL alias exists yet, nothing changes.
  • When content is unpublished or removed, dotCMS uses the alias table to remove the corresponding static alias from S3 and keep the bucket aligned with the current state.
  • When a Vanity URL is unpublished or updated, dotCMS updates the tracked alias accordingly, so stale S3 keys are not left behind.

Additional Info

The goal of this change is to make Vanity URLs work on static S3 publishing in the same way they already work at runtime on live dotCMS.

The implementation is intentionally opt-in and does not affect existing installations unless STATIC_PUSH_S3_VANITY_ALIAS_ENABLED=true is set.

The s3_vanity_alias table is used as the source of truth for what was actually written to S3, so publish, republish, unpublish, and delete can all behave consistently without depending only on the current live content state.

Closes #35663

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

Claude finished @dsilvam's task in 6m 41s —— View job


🔍 dotCMS Backend Review

Earlier review's 🔴 Critical issues (path traversal in S3VanityAliasSupport, Lucene injection in AWSS3Publisher, defense-in-depth in getCompleteFileKey, and @WrapInTransaction wrapping S3 I/O) are verified fixed in commit 2205b261. The findings below are new or remaining after that fix commit.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3Publisher.java:676-677

publishLiveVanityContentlet mutates the shared PublisherConfig (CURRENT_HOST, CURRENT_LANGUAGE) but never restores the previous values. This method runs in a loop over bundle assets (via publishVanityAliasForQueueElement inside pushBundleToEndpoint). After the loop completes, config is left with the host/language of the last processed Vanity URL, which can poison any subsequent code that reads CURRENT_HOST/CURRENT_LANGUAGE from config. The existing code at lines 201-219 in this same file already uses an oldHost/oldLanguage save-and-restore pattern, confirming that is the expected idiom here.

config.put(CURRENT_HOST, host);
config.put(CURRENT_LANGUAGE, Long.toString(language.getId()));
final String bucketName = getBucketName(config);
// ...no restore on exit

💡 Capture oldHost = config.get(CURRENT_HOST) / oldLanguage = config.get(CURRENT_LANGUAGE) at entry, wrap the body in try { ... } finally { config.put(CURRENT_HOST, oldHost); config.put(CURRENT_LANGUAGE, oldLanguage); }, mirroring the existing pattern at L201-219.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:260-273 and S3VanityAliasService.java:365-383

Lost atomicity after @WrapInTransaction was removed. publishMaterializedAlias now runs in the order: repository.findByVanityUrlId (own tx) → S3 push → S3 deletes of obsolete clones → replaceMappingsByVanityUrlId (own tx). If the final repository call throws, the S3 mutations have already happened but the DB still references the old clones — the table this PR introduces precisely as "source of truth for what was written to S3" silently drifts. Same shape in unpublishAliasesByVanityUrl: each iteration does removeMaterializedAlias (S3 I/O) then repository.deleteAlias(alias) — a failure between them leaves S3 cleaned but the DB row intact, so the next publish believes the alias is still materialised. Note that publishAliases (L346-353) and unpublishAliases (L529-535) DO have a restoreAliases compensation path; these two do not.

publishAlias(context, alias, file);              // S3 PUT
deleteObsoleteAliases(context, persistedAliases, alias);  // S3 DELETE
repository.replaceMappingsByVanityUrlId(...);    // if this throws, DB is stale

💡 Persist the new mapping immediately after the successful S3 push (and before issuing obsolete-clone deletes), or add a compensation path mirroring restoreAliases. For the unpublish loop, delete the DB row first — a stale row pointing at a missing S3 object is harmless; an orphan S3 object with no row is the dangerous direction.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:45-46, 143-151

replaceMappingsByVanityUrlId deletes scoped on (endpoint_id, language_id, vanity_url_id) and inserts without ON CONFLICT. The table PK is (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash) — it does not include vanity_url_id. Two distinct vanity URLs that ever produce the same (host, language, canonical_path, vanity_path) tuple will collide on the bare INSERT with a PK violation, which surfaces as a DotPublishingException and aborts the entire bundle.

private static final String DELETE_BY_VANITY_URL_ID =
    "DELETE FROM static_s3_vanity_mapping WHERE endpoint_id = ? AND language_id = ? AND vanity_url_id = ?";

💡 Add ON CONFLICT (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash) DO UPDATE SET vanity_url_id = EXCLUDED.vanity_url_id, vanity_path = EXCLUDED.vanity_path, canonical_path = EXCLUDED.canonical_path, bucket_name = EXCLUDED.bucket_name, bucket_region = EXCLUDED.bucket_region, bucket_prefix = EXCLUDED.bucket_prefix, mod_date = CURRENT_TIMESTAMP to the INSERT.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task260408CreateS3VanityAliasTable.java:48-65

PK columns are declared as bare varchar even though their content is fixed-width: canonical_path_hash / vanity_path_hash are SHA-256 hex (always 64 chars), and endpoint_id / host_id / vanity_url_id are dotCMS UUIDs (36 chars). Unsized varchar (a) loses a useful integrity check, (b) makes \d static_s3_vanity_mapping less self-documenting for ops, and (c) is inconsistent with sibling tables in the schema (e.g. identifier, contentlet) that size their UUID columns.

canonical_path_hash varchar not null,
vanity_path_hash varchar not null,
primary key (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash)

💡 Use varchar(64) for the hashes and varchar(36) for the UUID columns. Leave canonical_path / vanity_path unsized since paths are unbounded.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:3

Dead WrapInTransaction import. Commit 2205b261 removed every @WrapInTransaction annotation in this file (the deliberate fix for the connection-pool finding) but left the import. A stale import on a class that just had its transactional behavior intentionally removed is a regression magnet for the next reader.

import com.dotcms.business.WrapInTransaction;

💡 Delete the import.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:680

storageLocation defensively wraps bucketName and bucketPrefix with nullSafe(...) but not alias.vanityPath. The return value is used as a Collectors.toMap key (L645) and as a map-key comparator in indexByStorageLocation — a null vanityPath would NPE deep inside a stream pipeline rather than fail fast. The asymmetry with the other two fields makes it look like an oversight rather than an intentional invariant.

private String storageLocation(final S3VanityAlias alias) {
    return String.join("|", nullSafe(alias.bucketName), nullSafe(alias.bucketPrefix), alias.vanityPath);
}

💡 Either wrap with nullSafe(alias.vanityPath) or add Objects.requireNonNull(vanityPath, "vanityPath") in the S3VanityAlias constructor and document the invariant.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:281-286

longValue falls back to Long.parseLong(String.valueOf(value)). The only column read through it is language_id (bigint not null), so JDBC always returns Long and the Number branch always wins. The fallback would only execute if the column nullability ever changes — and then String.valueOf(null) returns the string "null", which Long.parseLong will throw NumberFormatException on, not handle as null. Brittle, not broken today.

if (value instanceof Number) {
    return ((Number) value).longValue();
}
return Long.parseLong(String.valueOf(value));

💡 Drop the fallback and throw an explicit IllegalStateException("Unexpected non-numeric language_id: " + value), or remove the helper entirely and inline the cast.


Next steps

  • 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

@semgrep-code-dotcms-test
Copy link
Copy Markdown
Contributor

Semgrep found 16 CUSTOM_INJECTION-2 findings:

The method identified is susceptible to injection. The input should be validated and properly
escaped.

If this is a critical or high severity finding, please also link this issue in the #security channel in Slack.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

🔍 dotCMS Backend Review

[🔴 Critical] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasSupport.java:79-90, 112-123

Path traversal: normalizeVanityPath / normalizeCanonicalPath reject only the regex metacharacters in UNSUPPORTED_CHARS = "*?[](){}|^$\\+". They do not reject .. / . segments, URL-encoded sequences (%2e%2e, %2f), null bytes, or other control characters. The normalized value flows (a) into S3 object keys via getCompleteFileKey (which only strips a leading /) and (b) into htmlPageAssetAPI.getPageByPath / urlMapAPI.processURLMap / IdentifierAPI.find(host, canonicalPath) in S3VanityTargetResolver. A Vanity URL author can therefore craft /foo/../../tenant-b/secret.html (or %2e%2e-encoded) to overwrite/delete keys outside the intended bucket prefix or coerce dotCMS into rendering content from a different host/scope.

private static final String UNSUPPORTED_CHARS = "*?[](){}|^$\\+";
...
if (!normalized.startsWith(StringPool.FORWARD_SLASH) || containsUnsupportedChars(normalized)) {
    return Optional.empty();
}
return Optional.of(normalized.replaceAll("/{2,}", "/"));

💡 URL-decode once, tokenize on /, reject any segment equal to .. or ., reject characters below 0x20 (and 0x7F), reject \\ and residual % (double-encoding). Apply identically to both normalizers.


[🔴 Critical] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3Publisher.java:700

Lucene/ES query injection — asset.getAsset() is taken from the publishing queue and concatenated directly into a Lucene query, then executed as the system user. A crafted identifier with Lucene metacharacters (" -live:false +host:other, parens, range operators, OR) can alter the query semantics and surface Vanity URLs from a different host/tenant, which the publisher will then render and upload to the current bucket. The system-user context removes the permission fallback.

final List<Contentlet> contentlets = contentletAPI.search("+identifier:" + asset.getAsset() + " +live:true",
        0, 0, null, APILocator.getUserAPI().getSystemUser(), false);
return contentlets.stream().filter(Contentlet::isVanityUrl).findFirst();

💡 Validate the value as a UUID with UUIDUtil.isUUID(...) before use, or switch to contentletAPI.findContentletByIdentifierAnyLanguage(asset.getAsset()) and filter for isVanityUrl() + isLive() — avoids query parsing entirely.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3EndPointPublisher.java:429-446

getCompleteFileKey is the last line of defense before a vanity path becomes an S3 object key, but it only strips a single leading / and concatenates bucketRootPrefix. There is no traversal check. If the upstream normalizer is bypassed (see Critical above) or a future caller passes an unsanitized value, a path like ../../other-tenant/... becomes a writable / deletable key. Combined with deleteFilesFromEndpoint, an attacker could delete arbitrary objects under any prefix served by the same bucket.

protected String getCompleteFileKey(final String bucketRootPrefix, final String filePath) {
    String completeFileKey = filePath;
    if (completeFileKey.startsWith(File.separator)) {
        completeFileKey = completeFileKey.substring(1);
    }
    // ...prefix concatenation, no traversal check...
}

💡 Defensively reject (throw DotPublishingException) any filePath whose /-separated segments contain .., ., empty segments, or characters below 0x20 — even if upstream already filters.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:106, 421 and S3VanityTargetResolver.java:108-122, 162-198

Cross-tenant content access via system user. All page lookup, URL-map resolution, file-asset rendering, and host lookup are performed as APILocator.getUserAPI().getSystemUser() with respectFrontendRoles=false. The Vanity URL's author may have no read permission on the page being materialized — yet the system user does. Combined with the path-traversal issue, an editor with rights only to create Vanity URLs can force the publisher to render and upload a page they cannot otherwise read, effectively bypassing dotCMS permissions and exfiltrating restricted content via S3.

final User systemUser = APILocator.getUserAPI().getSystemUser();
final Optional<String> canonicalPath = aliasSupport.normalizeCanonicalPath(
        aliasSupport.getForwardTo(vanityContentlet));
...
target = resolveTarget(context, canonicalPath.get(), systemUser);

💡 Resolve/render using the Vanity URL's owner or mod-user (vanityContentlet.getOwner() / getModUser()), or — at minimum — verify that user has READ on the resolved page/asset before rendering as system user. Apply the same to restoreLiveResource / resolveObscuredLiveTarget.


[🟠 High] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasService.java:93-143, 338-356, 367-386, 528-540

All four public service methods (publishAliasForVanityUrl, publishAliases, unpublishAliasesByVanityUrl, unpublishAliases) are annotated @WrapInTransaction but perform long-running S3 network I/O (page rendering, S3 PUT/DELETE) inside the wrapped DB transaction. Holding a Postgres connection + transaction open for the duration of S3 round-trips can starve the connection pool, hold row locks far longer than necessary, and convert a transient S3 latency spike into a DB availability incident.

@WrapInTransaction
public void publishAliases(final S3VanityAliasContext context) throws DotDataException {
    ...
    publishAliases(context, aliasesToRefresh, alias -> { });   // S3 PUT loop
    deleteAliases(context, aliasesToDelete, deletedNow::add);  // S3 DELETE loop
    repository.replaceMappings(context.lookup, aliasesToRefresh); // only DB work
}

💡 Drop @WrapInTransaction from these methods and wrap only the actual DB sequence with LocalTransaction.wrapReturn(() -> ...). The publish-then-persist ordering plus the restoreAliases compensation path already handle S3/DB split-failure cases.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:143-151, 217-230

replaceMappingsByVanityUrlId deletes scoped on (endpoint_id, language_id, vanity_url_id), then inserts. The table PK is (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash) and does not include vanity_url_id. If a different vanity URL has previously been published with the same (host_id, canonical_path, vanity_path) and is still in the table, the new INSERT violates the PK and throws — even though logically the new alias should replace it. There is no ON CONFLICT and no merge-by-PK pre-delete.

private static final String DELETE_BY_VANITY_URL_ID =
    "DELETE FROM static_s3_vanity_mapping WHERE endpoint_id = ? AND language_id = ? AND vanity_url_id = ?";
private static final String INSERT_ALIAS =
    "INSERT INTO static_s3_vanity_mapping (...) VALUES (...)";

💡 Switch the INSERT to INSERT ... ON CONFLICT (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash) DO UPDATE SET vanity_url_id = EXCLUDED.vanity_url_id, vanity_path = EXCLUDED.vanity_path, canonical_path = EXCLUDED.canonical_path, bucket_name = EXCLUDED.bucket_name, bucket_region = EXCLUDED.bucket_region, bucket_prefix = EXCLUDED.bucket_prefix, mod_date = CURRENT_TIMESTAMP.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/AWSS3Publisher.java:720

findContentletByIdentifier(asset.getAsset(), true, asset.getLanguageId(), systemUser, false) reads any contentlet by identifier as system user with no host/tenant verification. A publishing-queue entry could pull a Vanity URL from a different host and write S3 keys under the current host's bucket. Mostly defense-in-depth, but it widens the blast radius of the Lucene injection above.
💡 After loading, validate contentlet.getHost() matches the current host context and that the bundle author has READ on the contentlet.


[🟡 Medium] dotCMS/src/main/java/com/dotmarketing/startup/runonce/Task260408CreateS3VanityAliasTable.java:48-65

All string columns are declared as bare varchar (effectively text on Postgres), but the PK contains canonical_path_hash / vanity_path_hash (SHA-256 hex, always 64 chars) and endpoint_id / host_id (UUIDs, 36 chars). Typing them as bare varchar makes the PK index materially larger than necessary.

canonical_path_hash varchar not null,
vanity_path_hash varchar not null,
...
primary key (endpoint_id, host_id, language_id, canonical_path_hash, vanity_path_hash)

💡 Use varchar(64) (or char(64)) for the hashes and varchar(36) for the id columns to shrink the PK index footprint.


[🟡 Medium] dotCMS/src/enterprise/java/com/dotcms/enterprise/publishing/staticpublishing/S3VanityAliasRepository.java:281-286

longValue falls back to Long.parseLong(String.valueOf(value)). The Postgres JDBC driver always returns BIGINT as Long today, but if a future code path ever surfaces the column as BigDecimal (e.g. "42.0"), parseLong will throw NumberFormatException. Brittle, not strictly broken.

if (value instanceof Number) {
    return ((Number) value).longValue();
}
return Long.parseLong(String.valueOf(value));

💡 Either trust the JDBC contract and throw on unexpected types, or use new BigDecimal(String.valueOf(value)).longValueExact().


Next steps

  • 🔴 / 🟠 Fix locally and push — these need your judgment
  • 🟡 You can ask me to handle mechanical fixes inline: @claude fix <issue description> in <File.java>
  • Every new push triggers a fresh review automatically

final User systemUser = APILocator.getUserAPI().getSystemUser();
final Optional<String> canonicalPath = aliasSupport.normalizeCanonicalPath(
aliasSupport.getForwardTo(vanityContentlet));
final Optional<S3VanityResolvedTarget> target = resolveTarget(context, canonicalPath.get(), systemUser);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

canonicalPath.get() without isPresent might result in a NPE


private static final String CREATED_BUCKETS = "createdBuckets";
private static final Lazy<Boolean> STATIC_PUSH_S3_VANITY_ALIAS_ENABLED =
Lazy.of(() -> Config.getBooleanProperty("STATIC_PUSH_S3_VANITY_ALIAS_ENABLED", false));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not allow changes on the fly.
If you cache in a lazy wrapper, you will have to restart to see it take effect

Logger.info(this, "Sleeping before next push try, seconds: " + secondsToSleep);
Thread.sleep(secondsToSleep * 1000);
} catch (InterruptedException ie) {
Logger.error(this, "Can't Sleep before retry file: " + file.getAbsolutePath());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're missing a Thread.currentThread().interrupt(); here

* @param context publishing context
* @param aliases aliases to remove
*/
private void rollbackAliases(final S3VanityAliasContext context, final List<S3VanityAlias> aliases) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never called.
its dead code

}

/**
* Indica se il file viene gestito dal publisher S3.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

italian - must be English

@syedATdot syedATdot force-pushed the s3-static-push-vanity-url-dsilvam branch from d3c5801 to 1f11202 Compare May 26, 2026 06:24
@mergify
Copy link
Copy Markdown

mergify Bot commented May 26, 2026

⚠️ The sha of the head commit of this PR conflicts with #35643. Mergify cannot evaluate rules on this PR. Once #35643 is merged or closed, Mergify will resume processing this PR. ⚠️

@mergify
Copy link
Copy Markdown

mergify Bot commented May 26, 2026

⚠️ The sha of the head commit of this PR conflicts with #35643. Mergify cannot evaluate rules on this PR. Once #35643 is merged or closed, Mergify will resume processing this PR. ⚠️

dsilvam and others added 2 commits May 26, 2026 13:48
- Path traversal: add containsTraversalOrUnsafePath() to S3VanityAliasSupport
  normalizers — URL-decodes once, then rejects .. / . segments, control chars,
  and residual % (double-encoded sequences)
- Lucene injection: validate asset identifier as UUID before concatenating into
  ES/Lucene query in AWSS3Publisher
- Defense-in-depth: add rejectPathTraversal() guard in getCompleteFileKey()
  before any path becomes an S3 object key
- Connection pool: remove @WrapInTransaction from all four public service methods
  in S3VanityAliasService — S3 I/O now runs outside DB transactions; each
  repository call manages its own short-lived transaction

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… DDL

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI: Safe To Rollback Area : Backend PR changes Java/Maven backend code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Restore static S3 Vanity URL support for dotCMS 2 compatibility

3 participants