Hi @paleolimbot ,
I'm exploring the possibility of building a dplyr-compatible interface on top of sedonadb for R, similar to what duckplyr provides for DuckDB.
Looking at the current R bindings, I noticed that sedonadb exposes a limited subset of DataFrame operations:
select_indices() for column selection
limit() for row limiting
collect() / to_view() for materialization
In contrast, duckdb-r exposes a full relational algebra API that duckplyr uses:
- Expression builders:
expr_reference(), expr_constant(), expr_function(), expr_comparison()
- Relation operations:
rel_filter(), rel_project(), rel_aggregate(), rel_order(), rel_join()
This allows duckplyr to translate dplyr verbs directly into relational operations without going through SQL string generation.
Questions:
-
Are there any plans to expose more of DataFusion's DataFrame API (like filter(), aggregate(), sort()) through the R bindings?
-
Would there be interest in accepting contributions that add an expression/relational API similar to duckdb-r?
For now, I'm working on a SQL-based approach using sd_sql(), which works but requires R-to-SQL expression translation. A native relational API would be more elegant and potentially more performant (avoiding SQL parsing overhead).
Hi @paleolimbot ,
I'm exploring the possibility of building a dplyr-compatible interface on top of sedonadb for R, similar to what duckplyr provides for DuckDB.
Looking at the current R bindings, I noticed that sedonadb exposes a limited subset of DataFrame operations:
select_indices()for column selectionlimit()for row limitingcollect()/to_view()for materializationIn contrast, duckdb-r exposes a full relational algebra API that duckplyr uses:
expr_reference(),expr_constant(),expr_function(),expr_comparison()rel_filter(),rel_project(),rel_aggregate(),rel_order(),rel_join()This allows duckplyr to translate dplyr verbs directly into relational operations without going through SQL string generation.
Questions:
Are there any plans to expose more of DataFusion's DataFrame API (like
filter(),aggregate(),sort()) through the R bindings?Would there be interest in accepting contributions that add an expression/relational API similar to duckdb-r?
For now, I'm working on a SQL-based approach using
sd_sql(), which works but requires R-to-SQL expression translation. A native relational API would be more elegant and potentially more performant (avoiding SQL parsing overhead).