Skip to content

[ENHANCEMENT] Added provider, resource, and version parameters to sources#535

Open
Eric Godwin (ericgodwin) wants to merge 6 commits into
mainfrom
ericg/530-update-sources-version
Open

[ENHANCEMENT] Added provider, resource, and version parameters to sources#535
Eric Godwin (ericgodwin) wants to merge 6 commits into
mainfrom
ericg/530-update-sources-version

Conversation

@ericgodwin
Copy link
Copy Markdown

@ericgodwin Eric Godwin (ericgodwin) commented May 21, 2026

Background

The intent of this change is to update our source item field to include the information necessary for data provenance:

- provider: The name of the entity that produced the data: meta, esri, microsoft, osm, etc.
- resource: The subject or type of data given by the provider: division-names, buildings, planet, etc.
- version: The sortable identifier such as a date or number: 2026-02-13, 5.3, A5692

Together, along with the version_id these values allow a user to uniquely identify what raw input data was used to construct Overture data. Our current system, of providing only a dataset is lacking dataset version information but is also inconsistently constructed. All three new fields will be nullable and optional to start as this is the first step where we are making it so the pipeline can populate these fields.

Major change release plan

While this change in itself is not a breaking change, it is part of a larger plan with major impact. The rough timeline for these changes is:

  • Update the schema to add provider + resource + version details as optional fields (this PR - June / July)
  • Non-schema work for pipelines to populate the new fields
  • Update the schema to A) make provider + resource + version required fields and B) mark dataset as deprecated and make it an optional field. (BREAKING - September)
  • Update the schema and code to remove the dataset field. (BREAKING - March 2027 or later)

Messaging around this change is that the current method of providing provenance is not sufficient to ensure traceability. Besides documenting the deprecation of dataset we will want to provide details on how the provider, resource, version work together to identify a data snapshot.

Closes #530

Testing

A couple of new examples / counterexamples have been added. In particular one to check that the length of each of the provided fields is at least 1 and a second which shows what properly populated fields look like.

The tests were then run with the following results:

  ## Test Results
  
  | Result | Count |
  |--------|-------|
  | ✅ Passed | 1993 |
  | ❌ Failed | 0 |
  | ⚠️  Errors | 0 |

  **Duration:** 5.21s

Documentation website

Docs preview for this PR.

…yItem

Signed-off-by: ericgodwin <eric@overturemaps.org>
@ericgodwin Eric Godwin (ericgodwin) added the change type - minor 🤏 Minor schema change. See https://lf-overturemaps.atlassian.net/wiki/x/GgDa label May 21, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

🗺️ Schema reference docs preview is live!

🌍 Preview https://staging.overturemaps.org/schema/pr/535/schema/index.html
🕐 Updated May 28, 2026 19:36 UTC
📝 Commit 0fefe80
🔧 env SCHEMA_PREVIEW true

Note

♻️ This preview updates automatically with each push to this PR.

@ericgodwin Eric Godwin (ericgodwin) changed the title Added provider, resource, and version parameters to the sourcePropert… [Enhancement] Added provider, resource, and version parameters to sources May 21, 2026
@ericgodwin Eric Godwin (ericgodwin) changed the title [Enhancement] Added provider, resource, and version parameters to sources [ENHANCEMENT] Added provider, resource, and version parameters to sources May 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds additional provenance fields to sources schema items so that a source can be identified with finer granularity than the current dataset string (as groundwork for future deprecation of dataset). All changed files in the PR were reviewed.

Changes:

  • Extend sourcePropertyItem with optional provider, resource, and version string fields (with minLength constraints).
  • Update schema documentation text around sourcePropertyItem and sources to reflect the intended future direction.
  • Add one new example and one new counterexample illustrating populated vs. invalid empty values.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
schema/defs.yaml Adds provider/resource/version fields to the common sourcePropertyItem definition and updates related descriptions.
examples/buildings/sources-with-version.yaml New example demonstrating populated provider/resource/version fields in sources.
counterexamples/buildings/bad-sources-empty-provider.yaml New counterexample validating that empty strings for the new fields are rejected by the schema constraints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread schema/defs.yaml
Comment thread schema/defs.yaml
Comment thread schema/defs.yaml
Agent-Logs-Url: https://github.com/OvertureMaps/schema/sessions/2997187e-b460-4134-a7d3-bd9fab7b2c22

Co-authored-by: ericgodwin <1336911+ericgodwin@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@vcschapp Victor Schappert (vcschapp) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eric Godwin (@ericgodwin) we finally switched from "JSON Schema only" to "mixed JSON Schema/Pydantic (you have to update both)" to "pure Pydantic", so the place to update is packages/overtures-schema-common/src/overture/schema/common/sources.py.

Should just be a matter of inserting fields into SourceItem here-ish...

Giving an example of one field:

provider: Annotated[
    StrippedString | None,
    Field(
        min_length=1,
        description=textwrap.dedent("""
            The provider label for the entity that contributed this data (e.g., osm, meta, esri).
        """).strip()
    ),
] = None
  • The "union with None" (| None) is doing the heavy lifting that allows these fields to be nullable for the time being.
  • IMO StrippedString is probably a reasonable base field type for all three fields because it's hard to imagine a good reason why sourcing fields would need to begin or end with whitespace...

Technically all of defs.yaml and the counterexamples/ and examples/ directories are "legacy" and shouldn't need to be updated.

The new analogue to defs.yaml is the Python source code mostly in packages/overtures-schema-common and the reference examples and counterexamples under reference/.

We schema dwellers need to do a better job keeping the repo updated and communicating this!

Comment thread schema/defs.yaml Outdated
…mbedded white space.

Signed-off-by: ericgodwin <eric@overturemaps.org>
Signed-off-by: ericgodwin <eric@overturemaps.org>
@ericgodwin
Copy link
Copy Markdown
Author

Victor Schappert (@vcschapp) I went ahead and updated the Pydantic model as you indicated with a couple of slight changes.

  • Updated sources.py as suggested
  • Did not used StrippedString as I explicitly called out the format with a regex. I don't want embedded spaces either
  • Made provider and resource lowercase only
  • Added some clarity to the descriptions.

Update all 15 baseline JSON schema files to include the new provider,
resource, and version fields added to the sources model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ericgodwin <eric@overturemaps.org>
@ericgodwin Eric Godwin (ericgodwin) force-pushed the ericg/530-update-sources-version branch from 702d215 to 2369d81 Compare May 28, 2026 19:32
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: ericgodwin <eric@overturemaps.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change type - minor 🤏 Minor schema change. See https://lf-overturemaps.atlassian.net/wiki/x/GgDa

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update sources details to provide the version

4 participants