Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736) by gaoflow · Pull Request #2771 · pvlib/pvlib-python

gaoflow · 2026-06-02T03:04:08Z

Closes Issue with importing NSRDB spectral-on-demand files with pvlib.iotools.read_nsrdb_psm4 #2736
I am familiar with the contributing guidelines
Tests added
Updates entries in docs/sphinx/source/reference for API changes. (n/a — no API change)
Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes.
New code is fully documented. (behavior unchanged for existing files; no public-API docstring change)
Pull request is nearly complete and ready for detailed review.

What this fixes

read_nsrdb_psm4 parsed its three header lines with a naive str.split(','):

metadata_fields = fbuf.readline().split(',')
metadata_values = fbuf.readline().split(',')
columns        = fbuf.readline().split(',')

The NSRDB spectral-on-demand CSVs reported in #2736 have quoted column
names that contain commas, e.g.

..., "GaAs (Bauhuis et al., 2009)","InGaP (Gray, 2008)", ...

These are valid CSV (the commas are inside quotes), and pandas.read_csv
parses the data rows correctly — but str.split(',') splits each quoted name
into multiple fragments, inflating the column count. The mismatch between the
mis-split names/usecols and the correctly-parsed data then raises on read.

The change

Parse the three header lines with the csv module (which honors quoting)
instead of str.split(','). For ordinary (unquoted) files this is identical
to the previous behavior, so the existing readers are unaffected.

This addresses the parsing crash that @kandersolar confirmed should be
supported. The further map_variables=True unit handling for spectral files
(W/m²/µm → W/m²/nm) mentioned in the issue is a separate enhancement and is
left out of scope here.

Reproduction (before this PR)

from io import StringIO
from pvlib.iotools import psm4

content = (
    "Source,Location ID,City,State,Country,Latitude,Longitude,Time Zone,"
    "Elevation,Local Time Zone,Version\n"
    "NSRDB,1,-,-,-,40.0,-105.0,-7,1600,-7,4.0.1\n"
    'Year,Month,Day,Hour,Minute,GHI,"GaAs (Bauhuis et al., 2009)",'
    '"InGaP (Gray, 2008)"\n'
    "2023,1,1,0,0,0,0.1,0.2\n"
    "2023,1,1,1,0,5,0.3,0.4\n"
)
psm4.read_nsrdb_psm4(StringIO(content), map_variables=False)
# ParserError: Too many columns specified: expected 10 and found 8

After the fix the quoted columns survive intact
('GaAs (Bauhuis et al., 2009)', 'InGaP (Gray, 2008)').

A regression test (test_read_nsrdb_psm4_quoted_columns_with_commas) is added
that fails on main and passes with this change; the existing
read_nsrdb_psm4 tests continue to pass.

read_nsrdb_psm4 split the three header lines with a naive str.split(','), which broke spectral-on-demand files whose column names are quoted fields containing commas (e.g. '"GaAs (Bauhuis et al., 2009)"'). Such names were split into spurious columns, raising on read. Parse the header lines with the csv module so quoted fields are kept intact. Fixes pvlib#2736

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736)#2771

Parse read_nsrdb_psm4 header with the csv module to keep quoted commas (fixes #2736)#2771
gaoflow wants to merge 1 commit into
pvlib:mainfrom
gaoflow:fix-2736-nsrdb-psm4-quoted-columns

gaoflow commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaoflow commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant