Skip to content

docs: update the docs for cloudberry-site#114

Draft
tuhaihe wants to merge 7 commits into
apache:mainfrom
tuhaihe:docs-update-for-site
Draft

docs: update the docs for cloudberry-site#114
tuhaihe wants to merge 7 commits into
apache:mainfrom
tuhaihe:docs-update-for-site

Conversation

@tuhaihe
Copy link
Copy Markdown
Member

@tuhaihe tuhaihe commented Jun 2, 2026

closes: #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what document is being added or updated.

Contributor's checklist

Here are some reminders before you submit your pull request:

tuhaihe added 7 commits June 2, 2026 10:57
The PXF documentation is moving to the Apache Cloudberry website at
cloudberry.apache.org/docs (built by apache/cloudberry-site, Docusaurus).
This repository keeps the Markdown sources under docs/content/ but no longer
ships its own build pipeline. Drop the Bookbinder/Middleman scaffolding under
docs/book/ that would otherwise become unmaintained.

Subsequent commits convert the Markdown sources to a Docusaurus-friendly
layout, refresh docs/README.md, and add a docs lint workflow.
Convert the upstream Bookbinder-flavoured PXF documentation under
docs/content/ into a layout that the Apache Cloudberry website
(apache/cloudberry-site, Docusaurus) can consume directly.

Changes per file:

* Rename .html.md.erb -> .md. The .erb sources contained no ERB template
  code, so removing both suffixes is a no-op for content while letting
  Markdown tooling pick the files up.
* Rewrite intra-doc links from `(foo.html)` and `(foo.html#anchor)` to
  Docusaurus-style relative paths like `(./foo.md)` or
  `(../administering/cfg_server.md#about-the-pxf-fs-basepath-property)`.
  Same-page bare anchors `(#foo)` are also remapped, including a fix for the
  upstream typo `(#procedure.html)` in pxf_kerbhdfs.md.
* Replace heading anchor blocks of the form
  `## <a id="suppplat"></a> Supported Platforms` with plain headings, since
  Docusaurus auto-generates slug-based anchors. Cross-file references that
  used the old IDs are rewritten to the new slug. Stand-alone `<a id>` tags
  (table captions, mid-section deep links) are preserved as MDX-friendly
  invisible anchors.
* Add `description` and `sidebar_position` frontmatter to every page so the
  Docusaurus sidebar can be auto-generated and pages get sensible meta tags.
  Page titles that referenced "Greenplum® Platform Extension Framework" are
  rebranded to "Apache Cloudberry Platform Extension Framework".
* Reorganise the previously-flat `docs/content/` into category sub-directories
  matching the legacy subnav: `intro/`, `administering/`, `access-hadoop/`,
  `access-objectstores/`, `access-jdbc/`, `access-nfs/`, `troubleshooting/`,
  `upgrade/`, plus the existing `ref/`. Each carries a `_category_.json` for
  the Docusaurus sidebar.
* Rebrand prose: "Greenplum Database" -> "Apache Cloudberry" and bare
  "Greenplum" (where it refers to the deployment, not a specific Greenplum
  release/version) -> "Apache Cloudberry". Compatibility tables, transition
  notes, and other historical references that need to keep the original wording
  are preserved via a small set of guard patterns.
* Tweak raw HTML so MDX v3 can render it: `class="..."` -> `className="..."`,
  and `<a href="foo.html">` rewritten the same way as Markdown links.

A few warnings remain that reflect pre-existing dead links upstream
(e.g. `#s3_override_ext_ext`, `init_pxf.html`, `#topic1` in ref pages); these
are flagged for follow-up but kept as-is so the diff is mechanical.
The previous README described the now-deleted Bookbinder build under
docs/book/. Replace it with a brief authoring guide that points at
apache/cloudberry-site as the source of the rendered documentation and
documents the conventions used by the Markdown sources (frontmatter,
relative links, image placement, category metadata).
Add a lightweight docs-lint workflow that runs on pushes and pull requests
touching docs/. It runs:

* markdownlint-cli2 against `docs/content/**/*.md` with a relaxed config
  tuned for the imported PXF content (legacy inline HTML, long lines, etc.).
* lychee for link validity, both internal relative paths and external URLs.

The goal is to catch broken links early rather than waiting for the
cloudberry-site Docusaurus build to fail.
YAML treats an unquoted ':' inside a scalar as a key/value separator,
which makes Docusaurus' gray-matter parser bail out with
'incomplete explicit mapping pair'. Wrap the two affected descriptions
in double quotes so they parse as plain strings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant