GH-15047: [Python]: switch from pytz to zoneinfo by default for string to tzinfo conversion#49694
GH-15047: [Python]: switch from pytz to zoneinfo by default for string to tzinfo conversion#49694jorisvandenbossche wants to merge 5 commits intoapache:mainfrom
Conversation
… string to tzinfo conversion
|
|
|
@github-actions crossbow submit test-pandas |
|
Revision: dc184fe Submitted crossbow builds: ursacomputing/crossbow @ actions-cdf269df89 |
|
@github-actions crossbow submit -g python |
|
Revision: c2c973a Submitted crossbow builds: ursacomputing/crossbow @ actions-02d567dab2 |
|
@github-actions crossbow submit -g wheel |
|
Revision: c2c973a Submitted crossbow builds: ursacomputing/crossbow @ actions-4c3b50812b |
|
As far as I can see, all failures in the above crossbow builds (after the last update) are unrelated (docker/git/download failures) |
AlenkaF
left a comment
There was a problem hiding this comment.
LGTM, thanks! Added only one minor comment.
| if not dtype.timezone().empty(): | ||
| tzinfo = string_to_tzinfo(frombytes(dtype.timezone())) | ||
| prefer_zoneinfo = True | ||
| # only we this method would return a pandas.Timestamp, prefer |
There was a problem hiding this comment.
| # only we this method would return a pandas.Timestamp, prefer | |
| # only if this method would return a pandas.Timestamp, prefer |
Note, I needed some time understanding what is meant here. Maybe simpler is better?
# Adjust preference based on the pandas version
or a bit more:
# Adjust preference based on the pandas version -
# keep returning pytz for older pandas
Rationale for this change
zoneinfois available starting with Python 3.9, so we can now assume that it is available, and so we can switch from returningpytztimezones by default to returnzoneinfotimezones (ordatetime.timezonefor fixed offsets).Only keeping pytz as fallback for strings that are not supported by
zoneinfobut were supported bypytz. Later, we should maybe deprecate that fallback.Generally we should move away from using
pytz, since the core functionality of having time zones is now available in the standard library (zoneinfo), and because the pytz package has several warts / incompatibilities with stdlib datetime (https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html)What changes are included in this PR?
Whenever we create a python timezone object, which is when converting to pandas or when converting to a
datetime.datetimeobject:zoneinfofordatetime.datetimeobjectszoneinfofor pandas objects if pandas >= 3, to align with the change on the pandas side (API: Default to stdlib timezone objects instead of pytz pandas-dev/pandas#34916)In either case, when preferring
zoneinfo, we still fall back topytzfor named timezones ifzoneinfodoes not recognize the zone name (apparently pytz can have some common (older) aliases that might not always work with zoneinfo).This fallback is something we could deprecate and remove later on (so we can eventually remove all usage of pytz)
Are these changes tested?
Yes
Are there any user-facing changes?
This PR includes breaking changes to public APIs.
It is a different object that we return (different class, i.e.
zoneinfo.ZoneInfoinstead of apytz.tzinfo.BaseTzInfo, both are still subclasses ofdatetime.tzinfo), which has some differences in the API, so for people relying on that, this is a breaking change.For the conversion to pandas, pandas itself has made this breaking change anyhow, so for those cases it aligns with that change of pandas.