Fix RunGroupBy on numpy 2.x with StringDtype meta columns#319
Open
benmsanderson wants to merge 2 commits into
Open
Fix RunGroupBy on numpy 2.x with StringDtype meta columns#319benmsanderson wants to merge 2 commits into
benmsanderson wants to merge 2 commits into
Conversation
On Python 3.12 with pandas 3.x, string-valued meta columns of a DataFrame round-tripped through MultiIndex come back as pandas.StringDtype rather than object. RunGroupBy.__init__ called np.issubdtype(StringDtype, np.number) directly, which numpy 2.x rejects with `TypeError: Cannot interpret <StringDtype(...)>`. This blocks downstream callers of ScmRun.groupby (and therefore ScmRun.convert_unit and anything else that goes through groupby internally) under the newer Python / numpy / pandas stack. Fix routes the numeric-dtype check through a small helper _is_numeric_dtype that wraps np.issubdtype in a try/except. Semantically a dtype numpy cannot classify is not a numpy numeric dtype, so False is the correct fallback. Regression test added in tests/unit/test_groupby.py exercising ScmRun.groupby directly with a MultiIndex-built ScmRun whose meta columns end up as StringDtype on pandas 3.x. Pre-fix the test raised TypeError; post-fix it returns a single group as expected. Closes openscm#318 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Member
|
@znicholls @lewisjared I don't have write access here, so I can neither approve workflows nor merge this. |
Collaborator
|
See #318 (comment) Happy to merge, not merge, whatever |
Collaborator
|
As you can see @benmsanderson the repo is relatively broken. Happy for you to go full AI on it if you want and I can just make you a maintainer then you do what you want |
Author
|
Cool - that would be great! Thanks Zeb |
Collaborator
|
Added you with the permissions I think you'll need now. If something is missing, just ping me and I can either upgrade you to admin or give you what you need |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #318.
On Python 3.12 with pandas 3.x, string-valued meta columns of a
DataFrameround-tripped throughMultiIndexcome back aspandas.StringDtyperather thanobject.RunGroupBy.__init__callednp.issubdtype(StringDtype, np.number)directly, which numpy 2.x rejects with:This blocks downstream callers of
ScmRun.groupby(and thereforeScmRun.convert_unitand anything else that uses groupby internally) under the newer Python / numpy / pandas stack.Fix
Routes the numeric-dtype check through a new module-level helper
_is_numeric_dtypethat wrapsnp.issubdtypein atry/except. Semantically a dtype numpy cannot classify is not a numpy numeric dtype, soFalseis the correct fallback.No change in behaviour on Python 3.11 / numpy 1.x (where
np.issubdtypewould have returnedFalseanyway for non-numeric dtypes).Test
tests/unit/test_groupby.py::test_groupby_with_string_extension_dtypeexercisesScmRun.groupbydirectly with aMultiIndex-builtScmRunwhose meta columns end up asStringDtypeon pandas 3.x. Pre-fix the test raisedTypeError; post-fix it returns a single group as expected. The test is deliberately narrower than the original failing call path (convert_unit), becauseconvert_unithits a separate pandas 3.x compat issue (ValueError: assignment destination is read-onlyinapply_units) that is out of scope for this PR.Environment that reproduces #318