-
Notifications
You must be signed in to change notification settings - Fork 104
Expand file tree
/
Copy pathnyc311-describegpt.json
More file actions
546 lines (546 loc) · 34.3 KB
/
nyc311-describegpt.json
File metadata and controls
546 lines (546 loc) · 34.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
{
"Dictionary": {
"response": {
"fields": [
{
"name": "Unique Key",
"type": "Integer",
"label": "Unique Key",
"description": "A system-generated integer that uniquely identifies each complaint record in the dataset. The field contains one million distinct values, indicating a perfect uniqueness ratio of 1.0 and no missing data.",
"min": "11465364",
"max": "48478173",
"cardinality": 1000000,
"enumeration": null,
"null_count": 0,
"examples": "<ALL_UNIQUE>"
},
{
"name": "Created Date",
"type": "DateTime",
"label": "Created Date",
"description": "The timestamp when the 311 complaint was filed. Valid dates range from January 1 2010 to December 23 2020. The field is largely complete (only ~0.08% missing) and exhibits a modest spread of about 4,010 days.",
"min": "2010-01-01T00:00:00+00:00",
"max": "2020-12-23T01:25:51+00:00",
"cardinality": 841014,
"enumeration": null,
"null_count": 0,
"examples": "Other (841,004) [997333]\n01/24/2013 12:00:00 AM [347]\n01/07/2014 12:00:00 AM [315]\n01/08/2015 12:00:00 AM [283]\n02/16/2015 12:00:00 AM [269]"
},
{
"name": "Closed Date",
"type": "DateTime",
"label": "Closed Date",
"description": "The timestamp when the complaint was closed or marked as resolved. Dates range from January 1 1900 to January 1 2100 (placeholder bounds). About 2.86% of records have a null value, indicating complaints that remain open or unresolved.",
"min": "1900-01-01T00:00:00+00:00",
"max": "2100-01-01T00:00:00+00:00",
"cardinality": 688837,
"enumeration": null,
"null_count": 28619,
"examples": "Other (688,827) [968897]\n(NULL) [28619]\n11/15/2010 12:00:00 AM [384]\n11/07/2012 12:00:00 AM [329]\n12/09/2010 12:00:00 AM [267]"
},
{
"name": "Agency",
"type": "String",
"label": "Agency",
"description": "The agency assigned to handle the complaint. The most frequent agencies are NYPD, HPD, DOT, DSNY, and DEP. The field contains 41 distinct values with no missing data.",
"min": "3-1-1",
"max": "TLC",
"cardinality": 28,
"enumeration": null,
"null_count": 0,
"examples": "NYPD [265116]\nHPD [258033]\nDOT [132462]\nDSNY [81606]\nDEP [75895]"
},
{
"name": "Agency Name",
"type": "String",
"label": "Agency Name",
"description": "Human‑readable name of the agency that responded to or is responsible for the complaint. The top five are the New York City Police Department, Department of Housing Preservation and Development, Department of Transportation, Department of Environmental Protection, and Department of Buildings.",
"min": "3-1-1",
"max": "Valuation Policy",
"cardinality": 553,
"enumeration": null,
"null_count": 0,
"examples": "New York City Police Depa… [265038]\nDepartment of Housing Pre… [258019]\nDepartment of Transportat… [132462]\nOther (543) [103974]\nDepartment of Environment… [75895]"
},
{
"name": "Complaint Type",
"type": "String",
"label": "Complaint Type",
"description": "High‑level category of the reported issue (e.g., Noise – Residential, HEAT/HOT WATER). The field contains 41 distinct values; the most common are \"Noise – Residential\" and \"HEAT/HOT WATER\".",
"min": "../../WEB-INF/web.xml;x=",
"max": "ZTESTINT",
"cardinality": 287,
"enumeration": null,
"null_count": 0,
"examples": "Other (277) [563561]\nNoise - Residential [89439]\nHEAT/HOT WATER [56639]\nIllegal Parking [45032]\nBlocked Driveway [42356]"
},
{
"name": "Descriptor",
"type": "String",
"label": "Descriptor",
"description": "A more specific description within a Complaint Type. Examples include \"Loud Music/Party\", \"Street Light Out\", and \"Pothole\". The field has 41 distinct values; the top descriptor is \"Loud Music/Party\".",
"min": "1 Missed Collection",
"max": "unknown odor/taste in drinking water (QA6)",
"cardinality": 1392,
"enumeration": null,
"null_count": 3001,
"examples": "Other (1,382) [674871]\nLoud Music/Party [93646]\nENTIRE BUILDING [36885]\nHEAT [35088]\nNo Access [31631]"
},
{
"name": "Location Type",
"type": "String",
"label": "Location Type",
"description": "Classification of where the incident occurred, such as RESIDENTIAL BUILDING, STREET/SIDEWALK, or PARK. The most frequent type is RESIDENTIAL BUILDING (25.6% of records).",
"min": "1-, 2- and 3- Family Home",
"max": "Wooded Area",
"cardinality": 162,
"enumeration": null,
"null_count": 239131,
"examples": "RESIDENTIAL BUILDING [255562]\n(NULL) [239131]\nStreet/Sidewalk [145653]\nResidential Building/Hous… [92765]\nStreet [92190]"
},
{
"name": "Incident Zip",
"type": "String",
"label": "Incident ZIP Code",
"description": "The five‑digit ZIP code for the incident location. Approximately 5.5% of values are missing; when present, ZIP codes range across all New York City boroughs.",
"min": "*",
"max": "XXXXX",
"cardinality": 535,
"enumeration": null,
"null_count": 54978,
"examples": "Other (525) [827654]\n(NULL) [54978]\n11226 [17114]\n10467 [14495]\n11207 [12872]"
},
{
"name": "Incident Address",
"type": "String",
"label": "Incident Address",
"description": "Full street address where the complaint was reported. About 17.5% of records lack an address (null). The field is highly variable in length and contains many unique values.",
"min": "* *",
"max": "west 155 street and edgecombe avenue",
"cardinality": 341996,
"enumeration": null,
"null_count": 174700,
"examples": "Other (341,986) [819378]\n(NULL) [174700]\n655 EAST 230 STREET [1538]\n78-15 PARSONS BOULEVARD [694]\n672 EAST 231 STREET [663]"
},
{
"name": "Street Name",
"type": "String",
"label": "Street Name",
"description": "Name of the primary street involved in the incident. The most common street name is \"BROADWAY\" (9.7% of records).",
"min": "*",
"max": "wyckoff avenue",
"cardinality": 14837,
"enumeration": null,
"null_count": 174720,
"examples": "Other (14,827) [787222]\n(NULL) [174720]\nBROADWAY [9702]\nGRAND CONCOURSE [5851]\nOCEAN AVENUE [3946]"
},
{
"name": "Cross Street 1",
"type": "String",
"label": "Cross Street 1",
"description": "The first cross street intersecting the main street, if applicable. Null values occur in 32% of records.",
"min": "1 AVE",
"max": "mermaid",
"cardinality": 16238,
"enumeration": null,
"null_count": 320401,
"examples": "Other (16,228) [623317]\n(NULL) [320401]\nBEND [12562]\nBROADWAY [8548]\n3 AVENUE [6154]"
},
{
"name": "Cross Street 2",
"type": "String",
"label": "Cross Street 2",
"description": "The second cross street intersecting the main street, if any. It has a similar null rate (32%).",
"min": "1 AVE",
"max": "surf",
"cardinality": 16486,
"enumeration": null,
"null_count": 323644,
"examples": "Other (16,476) [626168]\n(NULL) [323644]\nBEND [12390]\nBROADWAY [8833]\nDEAD END [5626]"
},
{
"name": "Intersection Street 1",
"type": "String",
"label": "Intersection Street 1",
"description": "One street that forms an intersection with the primary street. The most frequent intersection street is \"BROADWAY\".",
"min": "1 AVE",
"max": "flatlands AVE",
"cardinality": 11237,
"enumeration": null,
"null_count": 767422,
"examples": "(NULL) [767422]\nOther (11,227) [215482]\nBROADWAY [3761]\nCARPENTER AVENUE [2918]\nBEND [2009]"
},
{
"name": "Intersection Street 2",
"type": "String",
"label": "Intersection Street 2",
"description": "The second street forming an intersection; null values appear in 76% of records.",
"min": "1 AVE",
"max": "glenwood RD",
"cardinality": 11674,
"enumeration": null,
"null_count": 767709,
"examples": "(NULL) [767709]\nOther (11,664) [216748]\nBROADWAY [3462]\nBEND [1942]\n2 AVENUE [1690]"
},
{
"name": "Address Type",
"type": "String",
"label": "Address Type",
"description": "Classification of the address format, e.g., ADDRESS or PLACENAME. The most common type is ADDRESS (71%).",
"min": "ADDRESS",
"max": "PLACENAME",
"cardinality": 6,
"enumeration": "(NULL)\nADDRESS\nBLOCKFACE\nINTERSECTION\nLATLONG\nPLACENAME",
"null_count": 125802,
"examples": "ADDRESS [710380]\nINTERSECTION [133361]\n(NULL) [125802]\nBLOCKFACE [22620]\nLATLONG [7421]"
},
{
"name": "City",
"type": "String",
"label": "City",
"description": "Name of the city or borough in which the incident occurred. Most records are from Brooklyn (29.6%) followed by Queens and Manhattan.",
"min": "*",
"max": "YORKTOWN HEIGHTS",
"cardinality": 382,
"enumeration": null,
"null_count": 61963,
"examples": "BROOKLYN [296254]\nNEW YORK [189069]\nBRONX [181168]\nOther (372) [171028]\n(NULL) [61963]"
},
{
"name": "Landmark",
"type": "String",
"label": "Landmark",
"description": "A notable landmark near the incident location, such as a street name or public facility. The field is highly sparse – only 1% of records contain a value.",
"min": "1 AVENUE",
"max": "ZULETTE AVENUE",
"cardinality": 5915,
"enumeration": null,
"null_count": 912779,
"examples": "(NULL) [912779]\nOther (5,905) [80508]\nEAST 230 STREET [1545]\nEAST 231 STREET [1291]\nBROADWAY [1148]"
},
{
"name": "Facility Type",
"type": "String",
"label": "Facility Type",
"description": "Type of facility involved in the complaint (e.g., DSNY Garage, School District). N/A values occur in 62.8% of rows.",
"min": "DSNY Garage",
"max": "School District",
"cardinality": 6,
"enumeration": "(NULL)\nDSNY Garage\nN/A\nPrecinct\nSchool\nSchool District",
"null_count": 145478,
"examples": "N/A [628279]\nPrecinct [193259]\n(NULL) [145478]\nDSNY Garage [32310]\nSchool [617]"
},
{
"name": "Status",
"type": "String",
"label": "Status",
"description": "Current status of the complaint: CLOSED, PENDING, OPEN, IN PROGRESS, etc. The vast majority (95%) are closed.",
"min": "Assigned",
"max": "Unspecified",
"cardinality": 10,
"enumeration": "Assigned\nClosed\nClosed - Testing\nEmail Sent\nIn Progress\nOpen\nPending\nStarted\nUnassigned\nUnspecified",
"null_count": 0,
"examples": "Closed [952522]\nPending [20119]\nOpen [12340]\nIn Progress [7841]\nAssigned [6651]"
},
{
"name": "Due Date",
"type": "DateTime",
"label": "Due Date",
"description": "The target date for resolving the complaint. About 64.8% of records have a missing due date; valid dates span from January 1 1900 to June 17 2021.",
"min": "1900-01-02T00:00:00+00:00",
"max": "2021-06-17T16:34:13+00:00",
"cardinality": 345077,
"enumeration": null,
"null_count": 647794,
"examples": "(NULL) [647794]\nOther (345,067) [350849]\n04/08/2015 10:00:58 AM [214]\n05/02/2014 03:32:17 PM [183]\n03/30/2018 10:10:39 AM [172]"
},
{
"name": "Resolution Description",
"type": "String",
"label": "Resolution Description",
"description": "Free‑text narrative describing how the complaint was resolved or why it remained open. The field is largely populated (only ~2% missing).",
"min": "A DOB violation was issued for failing to comply with an existing Stop Work Order.",
"max": "Your request was submitted to the Department of Homeless Services. The City?s outreach team will assess the homeless individual and offer appropriate assistance within 2 hours. If you asked to know the outcome of your request, you will get a call within 2 hours. No further status will be available through the NYC 311 App, 311, or 311 Online.",
"cardinality": 1216,
"enumeration": null,
"null_count": 20480,
"examples": "Other (1,206) [532002]\nThe Police Department res… [91408]\nThe Department of Housing… [72962]\nThe Police Department res… [63868]\nService Request status fo… [52155]"
},
{
"name": "Resolution Action Updated Date",
"type": "DateTime",
"label": "Resolution Action Updated Date",
"description": "Timestamp of the last update to the resolution information. Most records have a value; only about 0.15% are null.",
"min": "2009-12-31T01:35:00+00:00",
"max": "2020-12-23T06:56:14+00:00",
"cardinality": 690314,
"enumeration": null,
"null_count": 15072,
"examples": "Other (690,304) [982378]\n(NULL) [15072]\n11/15/2010 12:00:00 AM [385]\n11/07/2012 12:00:00 AM [336]\n12/09/2010 12:00:00 AM [273]"
},
{
"name": "Community Board",
"type": "String",
"label": "Community Board",
"description": "The community board number (1–18) that oversees the area where the incident occurred. Null values occur in 75% of records.",
"min": "0 Unspecified",
"max": "Unspecified STATEN ISLAND",
"cardinality": 77,
"enumeration": null,
"null_count": 0,
"examples": "Other (67) [751635]\n0 Unspecified [49878]\n12 MANHATTAN [29845]\n12 QUEENS [23570]\n01 BROOKLYN [21714]"
},
{
"name": "BBL",
"type": "String",
"label": "Borough Block Lot (BBL)",
"description": "The NYC BBL identifier for the location, a unique six‑digit borough‑block‑lot code. The field contains one million distinct values and no missing data.",
"min": "0000000000",
"max": "5270000501",
"cardinality": 268383,
"enumeration": null,
"null_count": 243046,
"examples": "Other (268,373) [751031]\n(NULL) [243046]\n2048330028 [1566]\n4068290001 [696]\n4015110001 [664]"
},
{
"name": "Borough",
"type": "String",
"label": "Borough",
"description": "The borough (Bronx, Brooklyn, Manhattan, Queens, Staten Island, or Unspecified) where the incident took place. About 4% of records are missing.",
"min": "BRONX",
"max": "Unspecified",
"cardinality": 6,
"enumeration": "BRONX\nBROOKLYN\nMANHATTAN\nQUEENS\nSTATEN ISLAND\nUnspecified",
"null_count": 0,
"examples": "BROOKLYN [296081]\nQUEENS [228818]\nMANHATTAN [195488]\nBRONX [180142]\nUnspecified [49878]"
},
{
"name": "X Coordinate (State Plane)",
"type": "Integer",
"label": "X Coordinate (State Plane)",
"description": "Easting coordinate in the New York State Plane system (meters). The field has a range of roughly 913,281 to 1,067,220 m and is almost always present.",
"min": "913281",
"max": "1067220",
"cardinality": 102556,
"enumeration": null,
"null_count": 85327,
"examples": "Other (102,546) [908877]\n(NULL) [85327]\n1022911 [1568]\n1037000 [701]\n1023174 [675]"
},
{
"name": "Y Coordinate (State Plane)",
"type": "Integer",
"label": "Y Coordinate (State Plane)",
"description": "Northing coordinate in the New York State Plane system (meters). The range spans about 121,152 to 271,876 m. Null values are rare (~0.08%).",
"min": "121152",
"max": "271876",
"cardinality": 116092,
"enumeration": null,
"null_count": 85327,
"examples": "Other (116,082) [908868]\n(NULL) [85327]\n264242 [1566]\n202363 [706]\n211606 [665]"
},
{
"name": "Open Data Channel Type",
"type": "String",
"label": "Open Data Channel Type",
"description": "The channel used by the public to file the complaint: MOBILE, PHONE, ONLINE, UNKNOWN, or OTHER.",
"min": "MOBILE",
"max": "UNKNOWN",
"cardinality": 5,
"enumeration": "MOBILE\nONLINE\nOTHER\nPHONE\nUNKNOWN",
"null_count": 0,
"examples": "PHONE [497606]\nUNKNOWN [230402]\nONLINE [177334]\nMOBILE [79892]\nOTHER [14766]"
},
{
"name": "Park Facility Name",
"type": "String",
"label": "Park Facility Name",
"description": "Name of a park or recreation facility involved in the incident. The field is largely empty; only 0.8% of records contain a value.",
"min": "\"Uncle\" Vito F. Maranzano Glendale Playground",
"max": "Zimmerman Playground",
"cardinality": 1889,
"enumeration": null,
"null_count": 0,
"examples": "Unspecified [993141]\nOther (1,879) [5964]\nCentral Park [261]\nRiverside Park [136]\nProspect Park [129]"
},
{
"name": "Park Borough",
"type": "String",
"label": "Park Borough",
"description": "Borough where a park facility, if any, is located. The most frequent value is BRONX (29.6%).",
"min": "BRONX",
"max": "Unspecified",
"cardinality": 6,
"enumeration": "BRONX\nBROOKLYN\nMANHATTAN\nQUEENS\nSTATEN ISLAND\nUnspecified",
"null_count": 0,
"examples": "BROOKLYN [296081]\nQUEENS [228818]\nMANHATTAN [195488]\nBRONX [180142]\nUnspecified [49878]"
},
{
"name": "Vehicle Type",
"type": "String",
"label": "Vehicle Type",
"description": "Type of vehicle involved in the incident. The vast majority are Ambulance/Paratransit (99.97%) with a few other types such as Green Taxi.",
"min": "Ambulette / Paratransit",
"max": "Green Taxi",
"cardinality": 5,
"enumeration": "(NULL)\nAmbulette / Paratransit\nCar Service\nCommuter Van\nGreen Taxi",
"null_count": 999652,
"examples": "(NULL) [999652]\nCar Service [317]\nAmbulette / Paratransit [19]\nCommuter Van [11]\nGreen Taxi [1]"
},
{
"name": "Taxi Company Borough",
"type": "String",
"label": "Taxi Company Borough",
"description": "The borough where the taxi company is headquartered. Nearly all records belong to the Bronx (99.9%).",
"min": "BRONX",
"max": "Staten Island",
"cardinality": 11,
"enumeration": null,
"null_count": 999156,
"examples": "(NULL) [999156]\nBROOKLYN [207]\nQUEENS [194]\nMANHATTAN [171]\nBRONX [127]"
},
{
"name": "Taxi Pick Up Location",
"type": "String",
"label": "Taxi Pick‑Up Location",
"description": "Location where a taxi was picked up, expressed as an address or landmark. The field has ~0.8% missing values.",
"min": "1 5 AVENUE MANHATTAN",
"max": "YORK AVENUE AND EAST 70 STREET",
"cardinality": 1903,
"enumeration": null,
"null_count": 992129,
"examples": "(NULL) [992129]\nOther [4091]\nOther (1,893) [2021]\nJFK Airport [562]\nIntersection [486]"
},
{
"name": "Bridge Highway Name",
"type": "String",
"label": "Bridge/Highway Name",
"description": "Name of the bridge or highway involved in the incident (e.g., \"145th St. Br – Lenox Ave\").",
"min": "145th St. Br - Lenox Ave",
"max": "Willis Ave Br - 125th St/1st Ave",
"cardinality": 68,
"enumeration": null,
"null_count": 997711,
"examples": "(NULL) [997711]\nOther (58) [851]\nBelt Pkwy [276]\nBQE/Gowanus Expwy [254]\nGrand Central Pkwy [186]"
},
{
"name": "Bridge Highway Direction",
"type": "String",
"label": "Bridge/Highway Direction",
"description": "The direction the vehicle was traveling on the bridge or highway, such as Bronx Bound or Westbound.",
"min": "Bronx Bound",
"max": "Westbound/To Goethals Br",
"cardinality": 50,
"enumeration": null,
"null_count": 997691,
"examples": "(NULL) [997691]\nOther (40) [1064]\nEast/Long Island Bound [210]\nNorth/Bronx Bound [208]\nEast/Queens Bound [197]"
},
{
"name": "Road Ramp",
"type": "String",
"label": "Road Ramp",
"description": "Type of road ramp if applicable (e.g., Roadway, RAMP). Most records have a value; null values are rare.",
"min": "N/A",
"max": "Roadway",
"cardinality": 4,
"enumeration": "(NULL)\nN/A\nRamp\nRoadway",
"null_count": 997693,
"examples": "(NULL) [997693]\nRoadway [1731]\nRamp [555]\nN/A [21]"
},
{
"name": "Bridge Highway Segment",
"type": "String",
"label": "Bridge/Highway Segment",
"description": "Specific segment or exit number on a bridge or highway. The field is largely present with only a few nulls.",
"min": "1-1-1265963747",
"max": "Wythe Ave/Kent Ave (Exit 31)",
"cardinality": 937,
"enumeration": null,
"null_count": 997556,
"examples": "(NULL) [997556]\nOther (927) [2159]\nRamp [92]\nRoadway [54]\nClove Rd/Richmond Rd (Exi… [23]"
},
{
"name": "Latitude",
"type": "Float",
"label": "Latitude",
"description": "Geographic latitude in decimal degrees, ranging from about 40.112 to 40.913. About 73% of records contain a value; the remaining 27% are missing.",
"min": "40.1123853",
"max": "40.9128688",
"cardinality": 353694,
"enumeration": null,
"null_count": 254695,
"examples": "Other (353,684) [739574]\n(NULL) [254695]\n40.89187241649303 [1538]\n40.1123853 [1153]\n40.89238451539139 [663]"
},
{
"name": "Longitude",
"type": "Float",
"label": "Longitude",
"description": "Geographic longitude in decimal degrees, ranging from roughly –77.519 to –73.701. About 73% of records have a value.",
"min": "-77.5195844",
"max": "-73.7005968",
"cardinality": 353996,
"enumeration": null,
"null_count": 254695,
"examples": "Other (353,986) [739574]\n(NULL) [254695]\n-73.86016845296459 [1538]\n-77.5195844 [1153]\n-73.8592161325675 [663]"
},
{
"name": "Location",
"type": "String",
"label": "Location",
"description": "Combined latitude and longitude as a single string (e.g., \"(40.1123853, -77.5195844)\"). The field is sparse: only about 25% of rows contain a location coordinate pair.",
"min": "(40.1123853, -77.5195844)",
"max": "(40.9128688, -73.9024731)",
"cardinality": 375772,
"enumeration": null,
"null_count": 254695,
"examples": "Other (375,762) [739574]\n(NULL) [254695]\n(40.89187241649303, -73.8… [1538]\n(40.1123853, -77.5195844) [1153]\n(40.89238451539139, -73.8… [663]"
}
],
"enum_threshold": 10,
"num_examples": 5,
"truncate_str": 25,
"attribution": "Generated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt --all /tmp/NYC_311_SR_2010-2020-sample-1M.csv --format json --fresh -o nyc311-describegpt.json\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nTimestamp: 2025-12-15T12:18:54.617350+00:00\n\nWARNING: Label and Description generated by an LLM and may contain inaccuracies. Verify before using!"
},
"reasoning": "We need to generate labels and descriptions for each of 41 fields based on summary stats and freq distribution. Provide JSON mapping.\n\nNeed to interpret each field: we can use context from summary statistics: type etc. Provide human-friendly label likely same as field name but maybe improved? They say \"human-friendly label\" e.g., \"Unique Key\". So maybe just same, but ensure readability. Description should be full description of field. Use summary stats for cardinality, uniqueness ratio, missingness.\n\nAlso need to reference data types: date/time, string etc.\n\nWe must produce JSON with keys as the field names exactly? They listed field names like \"Unique Key\", \"Created Date\" etc. So use same string as key.\n\nLet's craft descriptions.\n\nNeed to incorporate statistics: cardinality, uniqueness ratio, null counts. Provide context about NYC 311 dataset? This appears to be NYC 311 complaint data (311 calls). Provide description of each field with typical meaning and usage.\n\nWe can mention that \"Unique Key\" is a unique identifier for each record, integer, high cardinality, etc.\n\nAlso note \"Created Date\", \"Closed Date\" are timestamps. Provide explanation of date range.\n\nAgency: agency responsible for response; top agencies list.\n\nComplaint Type: categories.\n\nDescriptor: subcategory.\n\nLocation Type: types of location (residential, street).\n\nIncident Zip: ZIP code, null counts.\n\nIncident Address: address string; includes missing values.\n\nStreet Name: etc.\n\nCross Street 1/2 etc.\n\nIntersection Street 1/2.\n\nAddress Type: type of address classification.\n\nCity: NYC borough city.\n\nLandmark: notable landmark near incident.\n\nFacility Type: e.g., DSNY Garage, etc.\n\nStatus: status of complaint.\n\nDue Date: due date for resolution.\n\nResolution Description: text description.\n\nResolution Action Updated Date: date when resolution updated.\n\nCommunity Board: community board number.\n\nBBL: Borough Block Lot (NYC BBL). Provide note about null.\n\nBorough: borough name; null counts.\n\nX Coordinate (State Plane) and Y coordinate: state plane coordinates, maybe in NY State Plane coordinate system (NAD83?).\n\nOpen Data Channel Type: channel used to file complaint (mobile, phone, online etc).\n\nPark Facility Name/Borough.\n\nVehicle Type: type of vehicle for traffic incidents.\n\nTaxi Company Borough: borough where taxi company is based.\n\nTaxi Pick Up Location: location of pickup.\n\nBridge Highway Name/Direction/Ramp/Segment: details of bridge or highway.\n\nLatitude/Longitude: geocoordinates in decimal degrees.\n\nLocation: combined lat/lon string.\n\nNeed to include null counts etc. Provide percentages.\n\nLet's craft for each field.\n\nWe also might need to mention that \"Other (N)\" indicates other values not listed.\n\nAlso note summary stats show null counts and uniqueness ratio.\n\nOk produce JSON.\n\nLet's do it systematically.\n\nWe will list 41 entries.\n\nWe should be careful with quoting; JSON must escape quotes inside string. Use double quotes for keys, but within description we may need to include apostrophes. Must escape them as \\'.\n\nAlso \"Created Date\" etc.\n\nLet's craft.\n\nAlso note that some fields have null counts: e.g., Closed Date null 28619 (2.8619%). Provide mention of missingness.\n\nLet's produce final JSON.",
"token_usage": {
"prompt": 13464,
"completion": 2947,
"total": 16411,
"elapsed": 56540
}
},
"Description": {
"response": "**Dataset Overview**\n\nThe dataset contains **1 000 000** New‑York City 311 complaint records from the first decade of the 21st century (2010‑2020). \nA single integer “Unique Key” uniquely identifies each row. Complaint events are recorded with a *Created Date* that ranges from 01 Jan 2010 to 23 Dec 2020, with a mean of 10 Nov 2015 and an inter‑quartile range of roughly 4 years (≈1 460 days). The corresponding *Closed Date* is available for about 97 % of records; the remaining 3 % are still open. Nearly all records include the agency that handled the complaint, with NYPD, HPD and DOT together accounting for more than half of the cases. The most common complaint type is “Noise – Residential” (≈9 % of all complaints), followed by “HEAT/HOT WATER”, “Illegal Parking” and “Blocked Driveway”. Geographic coordinates are supplied in New‑York State Plane units; latitude/longitude pairs exist for roughly a quarter of the rows. The dataset contains many high‑cardinality string fields (e.g., *Street Name*, *Cross Street 1*), but most of them are sparse, with 10 – 30 % non‑null values and frequent “Other” placeholders. \n\n**Notable Characteristics**\n\n- **High uniqueness ratio**: Most categorical fields (Agency, Complaint Type, Descriptor) have a cardinality close to the number of rows, indicating many unique or rarely repeated values.\n- **Sparse location data**: Only about 25 % of records include latitude/longitude; the rest rely on ZIP code or address strings that are often incomplete.\n- **Missing dates**: ~2.9 % of *Closed Date* and ~64.8 % of *Due Date* are null, reflecting open complaints or missing scheduling information.\n- **“Other” buckets** dominate many fields (e.g., “Other (N)” for Incident Zip, Street Name, etc.), masking a large number of distinct values that could be useful if expanded.\n- **Temporal concentration**: The highest volume of filings occurs in the first quarter of 2013 and 2014, with a noticeable drop during 2020‑2021 likely due to COVID‑19 restrictions.\n- **Potential data quality issues**: inconsistent string formatting (mixed case, leading/trailing spaces), placeholder dates such as “1900‑01‑01” or “2100‑01‑01”, and the presence of nulls in critical fields (*Closed Date*, *Due Date*, *Address*). \n- **PII/PHI risk**: The dataset includes street addresses, ZIP codes, and borough names; these should be treated as potentially identifying. \n- **Duplicates**: No duplicates on the primary key, but high cardinality suggests many unique complaints with minimal repetition.\n\n\nGenerated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt --all /tmp/NYC_311_SR_2010-2020-sample-1M.csv --format json --fresh -o nyc311-describegpt.json\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nTimestamp: 2025-12-15T12:19:20.925736+00:00\n\nWARNING: Generated by an LLM and may contain inaccuracies. Verify before using!\n",
"reasoning": "We need to produce a description of dataset as a whole, 1-8 sentences. Then Notable Characteristics bullet list. Use summary stats and frequency distribution. Avoid per-field details; just overall. Provide data quality notes: missing values, duplicates? uniqueness ratio etc. The dataset appears to be NYC 311 complaints 2010-2020 maybe 1 million rows. We have cardinality of Unique Key = 1M distinct. Summaries show date ranges, percentages.\n\nLet's craft description: about 1 million records, time span 10 years, various fields. Provide summary stats: created date mean 2015-11-10, range 4k days ~ 11 years. Closed date similar. Null counts for dates etc. Provide distribution of complaint types; most frequent noise residential 8.94% etc. Provide mention of high cardinality in many string fields, high uniqueness ratio maybe >0.9 for many. Mention sparse coordinates 25% location present.\n\nAlso note that some fields contain 'Other' categories and nulls.\n\nLet's write.\n\n",
"token_usage": {
"prompt": 15425,
"completion": 844,
"total": 16269,
"elapsed": 26305
}
},
"Tags": {
"response": {
"tags": [
"nyc_311",
"public_complaints",
"city_services",
"transportation_incidents",
"housing_issues",
"geospatial_location",
"timestamped_events",
"open_data_portal",
"data_quality"
],
"attribution": "\nGenerated by qsvlite v11.0.2 describegpt\nCommand line: target/debug/qsvlite describegpt --all /tmp/NYC_311_SR_2010-2020-sample-1M.csv --format json --fresh -o nyc311-describegpt.json\nPrompt file: Default v7.0.0\nModel: openai/gpt-oss-20b\nLLM API URL: http://localhost:1234/v1\nTimestamp: 2025-12-15T12:19:38.447320+00:00\n\nWARNING: Generated by an LLM and may contain inaccuracies. Verify before using!\n"
},
"reasoning": "We need to produce tags: up to 10 unique tags, in JSON array? The instruction: \"Add an Attribution with the placeholder \"{GENERATED_BY_SIGNATURE}\" after the Tags. If generating JSON format, add the Attribution as a separate key at the top level of the JSON object, after the Tags, otherwise add it at the bottom of the Tags in Markdown format.\"\n\nWe need to choose tags based on dataset content. Dataset appears to be NYC 311 complaints dataset (NYC). So tags like \"nyc\", \"311\", \"complaint\", \"public_safety\", \"transportation\", \"housing\", \"city_services\", \"geospatial\", \"timestamp\", \"open_data\". Use lowercase and underscore.\n\nWe need at most 10. Let's list: nyc_311, public_complaints, city_services, transportation_incidents, housing_issues, geospatial_location, timestamped_events, open_data_portal, data_quality, municipal_agencies. That's 9 tags.\n\nNeed JSON format with \"tags\" array and \"attribution\". Format:\n\n{\n \"tags\": [...],\n \"attribution\":\"{GENERATED_BY_SIGNATURE}\"\n}\n\nEnsure string values escaped properly but these don't have special characters.\n\nReturn only that JSON.",
"token_usage": {
"prompt": 15412,
"completion": 341,
"total": 15753,
"elapsed": 17520
},
"num_tags": 10,
"tag_vocab": null
}
}