Skip to content

Experimental dataset JSON GET API returns multi-valued fields as arrays#12488

Open
stevenwinship wants to merge 3 commits into
developfrom
9495-return-multi-valued-fields-as-arrays
Open

Experimental dataset JSON GET API returns multi-valued fields as arrays#12488
stevenwinship wants to merge 3 commits into
developfrom
9495-return-multi-valued-fields-as-arrays

Conversation

@stevenwinship

Copy link
Copy Markdown
Contributor

What this PR does / why we need it: For API GET /api/datasets/{id}/metadata multi-valued field values are not returned as arrays if only 1 entry exists. This causes parsing the JSON to be more complex.

Which issue(s) this PR closes: #9495

Special notes for your reviewer: Since the fields are parsed as an array, when more than 1 entry exists, there is no break in backward compatibility.

Suggestions on how to test this: Create a dataset with 1 subject and another with more than 1 subject. Examine the JSON output to see that the "subject": ["Medicine, Health and Life Sciences"] is always surrounded with []. This will happen for all fields that have DatasetFieldType.isAllowMultiples() (for reference "title" does not allow multiples)

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?: Included

Additional documentation:

@stevenwinship stevenwinship self-assigned this Jun 25, 2026
@stevenwinship stevenwinship moved this to In Progress 💻 in IQSS Dataverse Project Jun 25, 2026
@stevenwinship stevenwinship added FY26 Sprint 26 FY26 Sprint 26 (2026-06-17 - 2026-07-01) Feature: API Original size: 20 Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect User Role: API User Makes use of APIs labels Jun 25, 2026
@github-actions github-actions Bot added the FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) label Jun 25, 2026
// Add metadata value to aggregation, suppress array when multiples not allowed
JsonArray valArray = vals.build();
return (valArray.size() != 1) ? valArray : valArray.get(0);
return (dfType.isAllowMultiples()) ? valArray : valArray.get(0);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you make changes to the OREMap, you need to update the version as noted at

//NOTE: Update this value whenever the output of this class is changed
private static final String DATAVERSE_ORE_FORMAT_VERSION = "Dataverse OREMap Format v1.0.3";
. This could also break archiving and tools such as DVUploader that can read archival bags to restore datasets. Hopefully those are robust enough to this change, but they presumably have code to parse single values that will now be obsolete.

@stevenwinship stevenwinship Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the version (even though the output really doesn't change) and check DVUploader. DVUploader and anyone using the API should have no issue with the array since it is there when more than 1 entry is there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like DVUploader expects an array. When it finds a String it converts it to an array so I believe this code is not affected by this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking. Thinking some more, I think most of the places we read the ORE map, we use json-ld tools and canonicalize the format before parsing, which probably managed the conversion already. (That would probably be a best practice for json-ld in general.)

@coveralls

coveralls commented Jun 25, 2026

Copy link
Copy Markdown

Coverage Status

coverage: 25.021% (+6.2%) from 18.845% — 9495-return-multi-valued-fields-as-arrays into develop

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

Test Results

403 tests  ±0   388 ✅ ±0   37m 21s ⏱️ -2s
 55 suites ±0    15 💤 ±0 
 55 files   ±0     0 ❌ ±0 

Results for commit 3e3fed7. ± Comparison against base commit 741c5da.

♻️ This comment has been updated with latest results.

@stevenwinship stevenwinship removed their assignment Jun 26, 2026
@stevenwinship stevenwinship moved this from In Progress 💻 to Ready for Review ⏩ in IQSS Dataverse Project Jun 26, 2026
@cmbz cmbz added the FY27 Sprint 1 FY27 Sprint 1 (2026-07-01 - 2026-07-15) label Jul 1, 2026
@stevenwinship stevenwinship force-pushed the 9495-return-multi-valued-fields-as-arrays branch from 190d915 to b8a67ac Compare July 2, 2026 13:15
@github-actions

This comment has been minimized.

There are two affected endpoints, and the version specific one returns JSON-LD (affected by this PR) or JSON (not affected), depending on what MimeType the user requests.

@qqmyers qqmyers left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks OK. I edited the release note to indicate there are two api calls and one is only if JSON LD is requested.

In QA, I think someone should verify that the PUT will accept the new formatting (so we maintain round-tripping). I think it will, because the PUT uses a JSON-LD library that standardizes what we actually try to parse (The 'same' JSON-LD can be written with many variants of the @context so standardizing (or completely parsing as JSON-LD) is necessary for inputs.)

@github-project-automation github-project-automation Bot moved this from Ready for Review ⏩ to Ready for QA ⏩ in IQSS Dataverse Project Jul 2, 2026
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

📦 Pushed preview images as

ghcr.io/gdcc/dataverse:9495-return-multi-valued-fields-as-arrays
ghcr.io/gdcc/configbaker:9495-return-multi-valued-fields-as-arrays

🚢 See on GHCR. Use by referencing with full name as printed above, mind the registry name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature: API FY25 Sprint 26 FY25 Sprint 26 (2025-06-18 - 2025-07-02) FY26 Sprint 26 FY26 Sprint 26 (2026-06-17 - 2026-07-01) FY27 Sprint 1 FY27 Sprint 1 (2026-07-01 - 2026-07-15) Original size: 20 Size: 10 A percentage of a sprint. 7 hours. Type: Bug a defect User Role: API User Makes use of APIs

Projects

Status: Ready for QA ⏩

Development

Successfully merging this pull request may close these issues.

Experimental dataset JSON GET API doesn't return multi-valued fields as arrays

4 participants