Clarify validation semantics for omitted dataset dtype (e.g., VectorData) across APIs

## Summary
For datasets whose schema omits `dtype`, it is not clear what validation behavior should be considered normative across APIs.

A concrete example is `hdmf_common/VectorData`, which omits `dtype` in the schema. In Python HDMF, omitted `dtype` still has practical runtime restrictions based on build-time inference and backend handling, but those rules do not appear to be documented as schema-level validation requirements.

For another implementation such as MatNWB, this creates ambiguity about what should be accepted or rejected during validation.

## Concrete problem in MatNWB
At the moment, the following is possible in MatNWB:

```matlab
types.hdmf_common.VectorData(data, types.hdmf_common.VectorData(data, 1))
```

This seems like it should be invalid, because the payload of a dataset is itself another HDMF container object rather than dataset data.

However, because MatNWB currently does not perform type validation when `dtype` is omitted, this is accepted.

## Question
What is the intended normative validation behavior for datasets with omitted `dtype`, especially concrete types such as `VectorData`?

More specifically:

- Should omitted `dtype` be interpreted as "no fixed primitive dtype is prescribed by the schema", while still requiring the dataset payload to be valid leaf/storable data?
- Should implementations reject HDMF container objects as direct dataset payloads unless the schema explicitly declares a reference dtype?
- Which parts of Python HDMFs current behavior for omitted `dtype` are intentional cross-API semantics, and which parts are just implementation details of that API?

## Why this matters
As a maintainer of another HDMF-based API, I need to know whether validation for omitted `dtype` should be:

- schema-level permissive: no explicit dtype constraint, but still reject obviously non-dataset objects
- Python-HDMF-compatible: mirror current inference/restriction behavior
- something else

Right now, the schema appears intentionally generic, but the practical validation semantics are underspecified for alternative implementations.

## Request
Could the intended semantics for omitted dataset `dtype` be clarified, especially for `VectorData` and similar concrete types?

If there is already an intended rule, documenting it in the schema language docs and/or `hdmf-common` docs would help other implementations validate consistently.

This issue text was drafted with the help of Codex (GPT-5.4).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify validation semantics for omitted dataset dtype (e.g., VectorData) across APIs #1441

Summary

Concrete problem in MatNWB

Question

Why this matters

Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify validation semantics for omitted dataset dtype (e.g., VectorData) across APIs #1441

Description

Summary

Concrete problem in MatNWB

Question

Why this matters

Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions