|
5 | 5 | :type :post |
6 | 6 | :date "2025-09-04" |
7 | 7 | :category :clojure |
8 | | - :tags [:metadata :civitas]}}} |
9 | | -(ns data-engineering.clojure-support-for-popular-data-tools) |
| 8 | + :tags [:metadata :civitas] |
| 9 | + :canonical-url "https://alza-bitz.github.io/clojure-support-for-popular-data-tools"}}} |
| 10 | +(ns data-engineering.support-for-popular-data-tools.snowflake) |
10 | 11 |
|
11 | 12 | ;; In this article I look at the extent of Clojure support for some popular on-cluster data processing tools that Clojure users might need for their data engineering or data science tasks. Then for [Snowflake](https://snowflake.com) in particular **I go further and present a new Clojure API.** |
12 | 13 |
|
|
32 | 33 |
|
33 | 34 | ;; Please note, I don't wish to make any critical judgments based on either the summary analysis above or the more detailed analysis below. The goal is to understand the situation with respect to Clojure support and highlight any gaps, although I suppose I am also inadvertently highlighting the difficulties of maintaining open source software! |
34 | 35 |
|
| 36 | +;; With that said, let's dive into the details for Spark, Kafka and Snowflake. |
| 37 | + |
35 | 38 | ;; ### Spark Interop with Geni |
36 | 39 |
|
37 | 40 | ;; [Geni](https://github.com/zero-one-group/geni) is the go-to library for Spark interop. Some months back, I was motivated to evaluate the coverage of Spark features. In particular, I wanted to understand what would be involved to support [Spark Connect](https://spark.apache.org/spark-connect/) as it would reduce the complexity of computing on-cluster directly from the Clojure REPL. |
|
82 | 85 | {:id 3 :name "Charlie" :age 35 :department "Engineering" :salary 80000}]) |
83 | 86 |
|
84 | 87 | ;; Create session and save data |
85 | | -(with-open [session (sp/create-session "src/data_engineering/snowflake.edn")] |
| 88 | +(with-open [session (sp/create-session "src/data_engineering/support_for_popular_data_tools/snowflake.edn")] |
86 | 89 | (-> (sp/create-dataframe session employee-data) |
87 | 90 | (sp/save-as-table "employees" :overwrite))) |
88 | 91 |
|
89 | 92 | ;; #### Feature 2. Compute over Snowflake table(s) on-cluster to produce a smaller result for local processing |
90 | 93 |
|
91 | | -(with-open [session (sp/create-session "src/data_engineering/snowflake.edn")] |
| 94 | +(with-open [session (sp/create-session "src/data_engineering/support_for_popular_data_tools/snowflake.edn")] |
92 | 95 | (let [table-df (sp/table session "employees")] |
93 | 96 | (-> table-df |
94 | 97 | (sp/filter (sp/gt (sp/col table-df :salary) (sp/lit 70000))) |
|
0 commit comments