Commit 3288872
committed
feat: add native Parquet VARIANT column support for JSON and semi-structured fields
Write Kafka Connect JSON fields (Debezium CDC, Confluent Protobuf Struct,
custom messages, maps, arrays) as native Parquet VARIANT columns instead of
plain STRING, improving storage efficiency and query performance.
- Upgrade parquet-java to 1.17.0 for VARIANT logical type support - https://parquet.apache.org/blog/2026/02/27/variant-type-in-apache-parquet-for-semi-structured-data/
- Add config: parquet.variant.enabled, parquet.variant.connect.names,
parquet.variant.field.names
- Auto-detect recursive schemas (google.protobuf.Struct) as VARIANT
- Stream JSON-to-Variant conversion ported from Apache Spark (PR also raised in parquet-java, once approved, code will be neat here in this repo: apache/parquet-java#3415)
- Unwrap Protobuf Struct/Value/ListValue/map-as-array to clean JSON
- Graceful fallback for non-JSON values (e.g. __debezium_unavailable_value)
- Feature is fully opt-in (disabled by default), zero impact on existing connectors1 parent bba6b97 commit 3288872
13 files changed
Lines changed: 3444 additions & 14 deletions
File tree
- kafka-connect-s3/src
- main/java/io/confluent/connect/s3
- format/parquet
- variant
- test/java/io/confluent/connect/s3/format/parquet/variant
Lines changed: 102 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
241 | 280 | | |
242 | 281 | | |
243 | 282 | | |
| |||
820 | 859 | | |
821 | 860 | | |
822 | 861 | | |
| 862 | + | |
| 863 | + | |
| 864 | + | |
| 865 | + | |
| 866 | + | |
| 867 | + | |
| 868 | + | |
| 869 | + | |
| 870 | + | |
| 871 | + | |
| 872 | + | |
| 873 | + | |
| 874 | + | |
| 875 | + | |
| 876 | + | |
| 877 | + | |
| 878 | + | |
| 879 | + | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
823 | 903 | | |
824 | 904 | | |
825 | 905 | | |
| |||
1418 | 1498 | | |
1419 | 1499 | | |
1420 | 1500 | | |
| 1501 | + | |
| 1502 | + | |
| 1503 | + | |
| 1504 | + | |
| 1505 | + | |
| 1506 | + | |
| 1507 | + | |
| 1508 | + | |
| 1509 | + | |
| 1510 | + | |
| 1511 | + | |
| 1512 | + | |
| 1513 | + | |
| 1514 | + | |
| 1515 | + | |
| 1516 | + | |
| 1517 | + | |
| 1518 | + | |
| 1519 | + | |
| 1520 | + | |
| 1521 | + | |
| 1522 | + | |
1421 | 1523 | | |
1422 | 1524 | | |
1423 | 1525 | | |
| |||
Lines changed: 82 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| 28 | + | |
| 29 | + | |
28 | 30 | | |
29 | 31 | | |
30 | 32 | | |
| |||
37 | 39 | | |
38 | 40 | | |
39 | 41 | | |
| 42 | + | |
40 | 43 | | |
41 | 44 | | |
42 | 45 | | |
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
| 49 | + | |
46 | 50 | | |
47 | 51 | | |
48 | 52 | | |
| |||
66 | 70 | | |
67 | 71 | | |
68 | 72 | | |
| 73 | + | |
| 74 | + | |
69 | 75 | | |
70 | 76 | | |
71 | 77 | | |
72 | 78 | | |
73 | 79 | | |
74 | 80 | | |
| 81 | + | |
75 | 82 | | |
76 | 83 | | |
77 | 84 | | |
| |||
80 | 87 | | |
81 | 88 | | |
82 | 89 | | |
83 | | - | |
84 | | - | |
85 | | - | |
86 | | - | |
87 | | - | |
88 | | - | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | | - | |
93 | | - | |
94 | | - | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | 99 | | |
100 | | - | |
101 | 100 | | |
102 | | - | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
103 | 133 | | |
104 | 134 | | |
105 | 135 | | |
| |||
125 | 155 | | |
126 | 156 | | |
127 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
128 | 166 | | |
129 | 167 | | |
130 | 168 | | |
| |||
157 | 195 | | |
158 | 196 | | |
159 | 197 | | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
160 | 228 | | |
161 | 229 | | |
162 | 230 | | |
| |||
0 commit comments