Skip to content

Fix RunGroupBy on numpy 2.x with StringDtype meta columns#319

Open
benmsanderson wants to merge 2 commits into
openscm:mainfrom
benmsanderson:fix/issubdtype-stringdtype-numpy2
Open

Fix RunGroupBy on numpy 2.x with StringDtype meta columns#319
benmsanderson wants to merge 2 commits into
openscm:mainfrom
benmsanderson:fix/issubdtype-stringdtype-numpy2

Conversation

@benmsanderson
Copy link
Copy Markdown

Summary

Fixes #318.

On Python 3.12 with pandas 3.x, string-valued meta columns of a DataFrame round-tripped through MultiIndex come back as pandas.StringDtype rather than object. RunGroupBy.__init__ called np.issubdtype(StringDtype, np.number) directly, which numpy 2.x rejects with:

TypeError: Cannot interpret '<StringDtype(storage='python', na_value=nan)>' as a data type

This blocks downstream callers of ScmRun.groupby (and therefore ScmRun.convert_unit and anything else that uses groupby internally) under the newer Python / numpy / pandas stack.

Fix

Routes the numeric-dtype check through a new module-level helper _is_numeric_dtype that wraps np.issubdtype in a try/except. Semantically a dtype numpy cannot classify is not a numpy numeric dtype, so False is the correct fallback.

def _is_numeric_dtype(dtype):
    try:
        return bool(np.issubdtype(dtype, np.number))
    except TypeError:
        return False

No change in behaviour on Python 3.11 / numpy 1.x (where np.issubdtype would have returned False anyway for non-numeric dtypes).

Test

tests/unit/test_groupby.py::test_groupby_with_string_extension_dtype exercises ScmRun.groupby directly with a MultiIndex-built ScmRun whose meta columns end up as StringDtype on pandas 3.x. Pre-fix the test raised TypeError; post-fix it returns a single group as expected. The test is deliberately narrower than the original failing call path (convert_unit), because convert_unit hits a separate pandas 3.x compat issue (ValueError: assignment destination is read-only in apply_units) that is out of scope for this PR.

Environment that reproduces #318

  • Python 3.12.12
  • scmdata 0.18.0
  • numpy 2.4.6
  • pandas 3.0.3

benmsanderson and others added 2 commits May 22, 2026 10:12
On Python 3.12 with pandas 3.x, string-valued meta columns of a
DataFrame round-tripped through MultiIndex come back as
pandas.StringDtype rather than object. RunGroupBy.__init__ called
np.issubdtype(StringDtype, np.number) directly, which numpy 2.x
rejects with `TypeError: Cannot interpret <StringDtype(...)>`.

This blocks downstream callers of ScmRun.groupby (and therefore
ScmRun.convert_unit and anything else that goes through groupby
internally) under the newer Python / numpy / pandas stack.

Fix routes the numeric-dtype check through a small helper
_is_numeric_dtype that wraps np.issubdtype in a try/except.
Semantically a dtype numpy cannot classify is not a numpy numeric
dtype, so False is the correct fallback.

Regression test added in tests/unit/test_groupby.py exercising
ScmRun.groupby directly with a MultiIndex-built ScmRun whose meta
columns end up as StringDtype on pandas 3.x. Pre-fix the test raised
TypeError; post-fix it returns a single group as expected.

Closes openscm#318

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown
Member

@mikapfl mikapfl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me.

@mikapfl
Copy link
Copy Markdown
Member

mikapfl commented May 22, 2026

@znicholls @lewisjared I don't have write access here, so I can neither approve workflows nor merge this.

@znicholls
Copy link
Copy Markdown
Collaborator

See #318 (comment)

Happy to merge, not merge, whatever

@znicholls
Copy link
Copy Markdown
Collaborator

As you can see @benmsanderson the repo is relatively broken. Happy for you to go full AI on it if you want and I can just make you a maintainer then you do what you want

@benmsanderson
Copy link
Copy Markdown
Author

Cool - that would be great! Thanks Zeb

@znicholls
Copy link
Copy Markdown
Collaborator

Added you with the permissions I think you'll need now. If something is missing, just ping me and I can either upgrade you to admin or give you what you need

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ScmRun.groupby fails on Python 3.12 / numpy 2.x

3 participants