Problem
client.query.sql() sends a single GET /{entity_set}?sql=<query> and returns whatever the server responds with. The Dataverse Web API default page size is 5,000 rows. If the query matches more than 5,000 rows, results are silently truncated — no error, no warning.
This is inconsistent with the rest of the SDK:
client.records.get() — auto-paginates via @odata.nextLink, returns all results
client.dataframe.get() — auto-paginates, consolidates into a single DataFrame
client.query.sql() is the only read method that silently drops data.
Observed behavior
# Table has 136,590 rows
results = client.query.sql("SELECT srcid FROM prefix_ballbyball")
len(results) # → 5000 (silently truncated, no error)
The user gets 5,000 rows and has no way to know 131,590 rows are missing.
Expected behavior
client.query.sql() should follow @odata.nextLink in the response and return all matching rows, consistent with records.get() and dataframe.get().
Proposed fix
Check for @odata.nextLink in the response from _query_sql() and continue fetching pages until exhausted:
def _query_sql(self, sql: str) -> list[dict[str, Any]]:
# ... existing validation ...
url = f"{self.api}/{entity_set}"
params = {"sql": sql}
all_rows = []
r = self._request("get", url, params=params)
body = r.json() if r.text else {}
all_rows.extend(body.get("value", []))
next_link = body.get("@odata.nextLink")
while next_link:
r = self._request("get", next_link)
body = r.json() if r.text else {}
all_rows.extend(body.get("value", []))
next_link = body.get("@odata.nextLink")
return all_rows
Alternatively, return a page iterator consistent with records.get() — but this would change the return type from list[Record] to Iterable[list[Record]].
Impact
Any user querying a table with >5,000 rows via client.query.sql() is getting silently truncated results today. The ?sql= parameter is commonly used for filtered reads and its utility would increase significantly if it could return full result sets.
Context
Discovered during end-to-end testing of a 21-table cricket dataset import (297K total records). After import, analytical queries via client.query.sql() returned only 5,000 of 136,590 rows from the largest table, producing incorrect aggregation results.
Problem
client.query.sql()sends a singleGET /{entity_set}?sql=<query>and returns whatever the server responds with. The Dataverse Web API default page size is 5,000 rows. If the query matches more than 5,000 rows, results are silently truncated — no error, no warning.This is inconsistent with the rest of the SDK:
client.records.get()— auto-paginates via@odata.nextLink, returns all resultsclient.dataframe.get()— auto-paginates, consolidates into a single DataFrameclient.query.sql()is the only read method that silently drops data.Observed behavior
The user gets 5,000 rows and has no way to know 131,590 rows are missing.
Expected behavior
client.query.sql()should follow@odata.nextLinkin the response and return all matching rows, consistent withrecords.get()anddataframe.get().Proposed fix
Check for
@odata.nextLinkin the response from_query_sql()and continue fetching pages until exhausted:Alternatively, return a page iterator consistent with
records.get()— but this would change the return type fromlist[Record]toIterable[list[Record]].Impact
Any user querying a table with >5,000 rows via
client.query.sql()is getting silently truncated results today. The?sql=parameter is commonly used for filtered reads and its utility would increase significantly if it could return full result sets.Context
Discovered during end-to-end testing of a 21-table cricket dataset import (297K total records). After import, analytical queries via
client.query.sql()returned only 5,000 of 136,590 rows from the largest table, producing incorrect aggregation results.