Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions include/executor/vectorized_operator.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,18 @@ class VectorizedGroupByOperator : public VectorizedOperator {
[this](size_t a, size_t b) {
return hash_group_keys_[a] < hash_group_keys_[b];
});
fprintf(stderr, "DEBUG: VectorizedGroupBy sorted %zu groups\n",
hash_group_keys_.size());
for (size_t i = 0; i < sorted_indices_.size() && i < 5; ++i) {
fprintf(stderr, "DEBUG: sorted_indices_[%zu]=%zu key=", i, sorted_indices_[i]);
const auto& key = hash_group_keys_[sorted_indices_[i]];
for (const auto& k : key) {
fprintf(stderr, "%s ", k.to_string().c_str());
}
fprintf(stderr, "\n");
}
} else {
fprintf(stderr, "DEBUG: VectorizedGroupBy hash_group_keys_ is empty\n");
Comment on lines +723 to +734
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify direct stdio API usage vs includes in this file
FILE="include/executor/vectorized_operator.hpp"

echo "== Includes in $FILE =="
rg -n '^\s*`#include`\s*<.*>' "$FILE"

echo
echo "== stdio symbol usage in $FILE =="
rg -n '\bfprintf\s*\(|\bstderr\b' "$FILE"

echo
echo "Expected:"
echo "1) <cstdio> should be explicitly included in this file."
echo "2) Prefer std::fprintf after including <cstdio>."

Repository: poyrazK/cloudSQL

Length of output: 890


Add the proper C stdio header for fprintf/stderr.

include/executor/vectorized_operator.hpp uses fprintf(stderr, ...) in the VectorizedGroupBy debug path (lines 723-734) but does not include <cstdio> (or <stdio.h>), so relying on transitive includes is non-portable—add #include <cstdio> and (optionally) use std::fprintf.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/executor/vectorized_operator.hpp` around lines 723 - 734, The debug
path in VectorizedGroupBy uses fprintf(stderr, ...) (see usage around
sorted_indices_ and hash_group_keys_) but the header <cstdio> is not included;
add `#include` <cstdio> to the top of the header (or source) that defines
VectorizedGroupBy so fprintf and stderr are declared (optionally switch calls to
std::fprintf), ensuring the debug prints compile portably.

}
Comment on lines +723 to 735
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard and sanitize this debug logging before merge.

This path now emits raw group-key values to stderr unconditionally on every GROUP BY execution. That is a production reliability/privacy risk (log noise + potential sensitive-value leakage). Please gate this behind a debug flag and keep it disabled by default.

Suggested change
-                fprintf(stderr, "DEBUG: VectorizedGroupBy sorted %zu groups\n",
-                        hash_group_keys_.size());
-                for (size_t i = 0; i < sorted_indices_.size() && i < 5; ++i) {
-                    fprintf(stderr, "DEBUG: sorted_indices_[%zu]=%zu key=", i, sorted_indices_[i]);
-                    const auto& key = hash_group_keys_[sorted_indices_[i]];
-                    for (const auto& k : key) {
-                        fprintf(stderr, "%s ", k.to_string().c_str());
-                    }
-                    fprintf(stderr, "\n");
-                }
+#ifdef CLOUDSQL_DEBUG_GROUPBY
+                std::fprintf(stderr, "DEBUG: VectorizedGroupBy sorted %zu groups\n",
+                             hash_group_keys_.size());
+                for (size_t i = 0; i < sorted_indices_.size() && i < 5; ++i) {
+                    std::fprintf(stderr, "DEBUG: sorted_indices_[%zu]=%zu key=", i,
+                                 sorted_indices_[i]);
+                    const auto& key = hash_group_keys_[sorted_indices_[i]];
+                    for (const auto& k : key) {
+                        // Consider redacting or truncating values if they may contain sensitive data.
+                        std::fprintf(stderr, "%s ", k.to_string().c_str());
+                    }
+                    std::fprintf(stderr, "\n");
+                }
+#endif
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fprintf(stderr, "DEBUG: VectorizedGroupBy sorted %zu groups\n",
hash_group_keys_.size());
for (size_t i = 0; i < sorted_indices_.size() && i < 5; ++i) {
fprintf(stderr, "DEBUG: sorted_indices_[%zu]=%zu key=", i, sorted_indices_[i]);
const auto& key = hash_group_keys_[sorted_indices_[i]];
for (const auto& k : key) {
fprintf(stderr, "%s ", k.to_string().c_str());
}
fprintf(stderr, "\n");
}
} else {
fprintf(stderr, "DEBUG: VectorizedGroupBy hash_group_keys_ is empty\n");
}
`#ifdef` CLOUDSQL_DEBUG_GROUPBY
std::fprintf(stderr, "DEBUG: VectorizedGroupBy sorted %zu groups\n",
hash_group_keys_.size());
for (size_t i = 0; i < sorted_indices_.size() && i < 5; ++i) {
std::fprintf(stderr, "DEBUG: sorted_indices_[%zu]=%zu key=", i,
sorted_indices_[i]);
const auto& key = hash_group_keys_[sorted_indices_[i]];
for (const auto& k : key) {
// Consider redacting or truncating values if they may contain sensitive data.
std::fprintf(stderr, "%s ", k.to_string().c_str());
}
std::fprintf(stderr, "\n");
}
`#endif`
} else {
fprintf(stderr, "DEBUG: VectorizedGroupBy hash_group_keys_ is empty\n");
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@include/executor/vectorized_operator.hpp` around lines 723 - 735, The code in
VectorizedGroupBy unconditionally prints raw group-key values using
sorted_indices_ and hash_group_keys_ to stderr; wrap that debug block behind a
configurable debug flag (e.g., a static or runtime flag like
vectorized_groupby_debug or executor_debug) and default it to false, so the
printing only occurs when the flag is enabled; additionally sanitize or redact
sensitive values when printing (e.g., replace non-alphanumeric or long values
with placeholders, and keep the limit of first N groups), and ensure the check
uses the flag before accessing sorted_indices_ and hash_group_keys_ to avoid
accidental leakage in production.

}

Expand Down
27 changes: 26 additions & 1 deletion src/executor/operator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -646,8 +646,33 @@ bool AggregateOperator::open() {
}

groups_.clear();

// Sort group keys lexicographically for deterministic GROUP BY ordering
std::vector<std::string> sorted_keys;
sorted_keys.reserve(groups_map.size());
for (auto& pair : groups_map) {
auto& state = pair.second;
sorted_keys.push_back(pair.first);
}
std::sort(sorted_keys.begin(), sorted_keys.end(),
[&groups_map](const std::string& a, const std::string& b) {
const auto& a_vals = groups_map[a].group_values;
const auto& b_vals = groups_map[b].group_values;
if (a_vals.size() != b_vals.size()) {
return a_vals.size() < b_vals.size();
}
for (size_t i = 0; i < a_vals.size(); ++i) {
if (a_vals[i] < b_vals[i]) {
return true;
}
if (b_vals[i] < a_vals[i]) {
return false;
}
}
return false;
});

for (auto& key : sorted_keys) {
auto& state = groups_map[key];
std::vector<common::Value> row = std::move(state.group_values);
for (size_t i = 0; i < aggregates_.size(); ++i) {
switch (aggregates_[i].type) {
Expand Down
Loading