Skip to content

[http] sanitise URL options before use them#22186

Open
linev wants to merge 4 commits intoroot-project:masterfrom
linev:http_sanity_arg
Open

[http] sanitise URL options before use them#22186
linev wants to merge 4 commits intoroot-project:masterfrom
linev:http_sanity_arg

Conversation

@linev
Copy link
Copy Markdown
Member

@linev linev commented May 8, 2026

URL syntax allow to provide different special symbols using %12%20 symbols.
Also some situations un-escaped quotes can make a problems.
Therefore cleanup string from such cases before use it.

@linev linev requested review from dpiparo and jblomer May 8, 2026 09:16
@linev linev self-assigned this May 8, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

Test Results

    22 files      22 suites   3d 12h 42m 0s ⏱️
 3 850 tests  3 850 ✅ 0 💤 0 ❌
76 031 runs  76 031 ✅ 0 💤 0 ❌

Results for commit cc20b42.

♻️ This comment has been updated with latest results.

@dpiparo dpiparo changed the title [http] sanity URL options before use them [http] sanitise URL options before use them May 8, 2026
val = sval.Data();
}

Bool_t sanitize = kFALSE;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to have a comment here explaining when/why we do not sanitize.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When processing POST data special value created directly in the code - including pointer casting.
Such values should not be touched.
Value also not sanitized when default value from method arguments list is extracted.
In other cases value will be checked

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This as a code comment would be helpful.

Copy link
Copy Markdown
Contributor

@silverweed silverweed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I don't think we can really trust all this code to produce properly "sanitized" input. Sanitizers are notoriously hard to get right and I see many suspicious places in the Sniffer code.

So if these checks are in place as a measure against accidental errors, they may be useful, but I wouldn't consider them a measure against actual attacks.

Regardless, I put a couple of more specific comments.

Also, we definitely need tests for this.

if (!value || !*value)
return "";

TString res = value;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long can value be? Since we're trying to sanitize the input, shouldn't we also put a reasonable max length?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such limit and I cannot make here any assumption.
civetweb cuts URL length by 16K.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend is not guaranteed to be civetweb in principle, is it?
We should put an artificial limit here so we don't have to worry about huge allocations/reallocations/copies/iterations and we could in principle even preallocate the output string once and do a single pass over the string to validate it.

@@ -1303,13 +1303,40 @@ TString TRootSniffer::DecodeUrlOptionValue(const char *value, Bool_t remove_quot
res.ReplaceAll("%5D", "]");
res.ReplaceAll("%3D", "=");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goes over the string 8 times, whereas we could do all of this in 1 pass...(this is in addition to the TString constructor, which also does a full scan of the string to determine the length).
Depending on the max length we impose for the string and whether denial of service is a concern, we might want to fix this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deny of access or deny of service is already problematic with pure civetweb.
It is not designed to protect from such attacks - it should be done in front of it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still a very suboptimal way to do this validation that scales superlinearly with the arguments' length. But if this is not a concern for this class, fine for me.

clean.Append('\\');
clean.Append(c);
} else if (c > 31)
// ignore all special symbols
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think c == 127 is a "special" symbol (DEL)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably I just will use !std::iscntrl(c)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this excludes exactly (does it exclude all ASCII non-printable characters? Does it depend on the locale? The man page makes it look like the case), so I was actually more comfortable with the explicit check.

Comment thread net/httpsniff/inc/TRootSnifferFull.h Outdated

void CreateMemFile();

TString SanitzeArgument(Bool_t isstr, const char *value, Bool_t onlyquotes = kFALSE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Sanitize

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this should probably be static


TString TRootSnifferFull::SanitzeArgument(Bool_t isstr, const char *value, Bool_t onlyquotes)
{
Int_t beg = 0, end = strlen(value);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is value guaranteed to have a reasonable max length here? Otherwise strlen is potentially dangerous.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Value created from URL and will not be longer than 16K

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above: we should put our own limit as well

if (!isstr && !onlyquotes) {
// for numeric types make very strict check
if (std::isalnum(c) || std::strchr(".:+-", c))
sanitized.Append(c);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not necessarily produce a number, so I'm not sure what's the goal of this sanitization (what if the string is -4+3.2-245+++++67?)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goal to avoid symbols sequence which can be interpreted as C/C++ code invocation like gSystem->DoSomething()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're checking for isalnum though: did you mean to check for isdigit? Otherwise I wouldn't rule out "interesting" code invocations going through custom arithmetic operators...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if every non-operator is a digit this will still cause N>=1 expressions to be evaluated, which I guess is safe, but it will still go through the interpreter to be executed before being passed to TMethodCall I think.

if (std::isalnum(c) || std::strchr(".:+-", c))
sanitized.Append(c);
} else if ((c == '\"') || (c == '\\')) {
// escape quotes inside string
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite bizarre if I understand correctly: if an argument is "ab"cdef" we will parse it as the string "ab\"cdef"?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be delivered to C++ as

new TMethodCall(obj_cl, method_name, "ab\"cdef" );

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather require all internal quotes to be already escaped in the argument? If the incoming argument is supposed to be a valid string literal it would not be valid if it contains unescaped quotes, which may indicate some other kind of error (on the user side or otherwise)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather require all internal quotes to be already escaped in the argument?

On each layer escape character can disappear.
Quotes are very special - because when they goes into C++ interpreter they can break argument list and start new command. By such transformation we want guarantee that in any case interpreter will take provided string as single argument. Actually exactly such exploit is shown in [THttpServer-4] issue.
So attempt here to produce string which not closed in between.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On each layer escape character can disappear.

That sounds like a bug from the previous layers...
Regardless, it would be nice to have some tests on this and comments to make it clear why we're doing this and exactly what form of strings do we accept from the user side

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many differences how quotes escaped in URL and in C++.
Main problem - string is used again in C++ script. So one need to provide/create escape characters.

@linev linev force-pushed the http_sanity_arg branch from 3668ee2 to 83b7d75 Compare May 8, 2026 14:37
linev added 3 commits May 8, 2026 16:43
Remove any special symbols
Add escape characters for quote and escape itself

Try to avoid manipulation of arguments for method execution
Avoid special characters as draw arguments
Remove all special characters, check quotes, escape quotes inside
If expecting numeric value - just alphanumeric, comma, +, -, allowed
@linev linev force-pushed the http_sanity_arg branch from 83b7d75 to 2b46cab Compare May 8, 2026 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants