Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.
- Source 1: Lexer.
This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
"C" will be passed as "C", but the same C in escaped form "\x43" will be passed as "\x43".
Proc macros can observe the difference because ToString (the only way to get the literal contents in proc macro API) also prints the literal precisely.
- Source 2: Proc macro API.
Literal::string(s: &str) will make you a string literal containing data s, approximately.
The precise token (returned by ToString) will contain:
escape_debug(s) for string literals (Literal::string)
escape_unicode(s) for character literals (Literal::character)
escape_default(s) for byte string literals (Literal::byte_string)
- Source 3: Recovered from non-attribute AST
AST goes through pretty-printing first, then re-tokenized.
The precise token (returned by ToString) will contain:
- precise
s for raw AST strings
escape_debug(s) for non-raw AST strings
escape_default(s) for AST characters, bytes and byte strings (both raw and non-raw)
- Source 4: Recovered from attribute AST
Just an ad-hoc recovery without pretty-printing.
The precise token (returned by ToString) will contain:
- precise
s for raw AST strings
escape_default(s) for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)
EDIT: Also doc comments go through escape_debug when converted to #[doc = "content"] tokens for proc macros.
It would be nice to
- Figure out what escaping we actually want (perhaps none?) and document the motivation behind the escaping choices.
- Get rid of the escaping differences between token sources, so that at least literals of the same kind are escaped identically.
Proc macros operate on tokens, including string/character/byte-string/byte literal tokens, which they can get from various sources.
This is the most reliable source, the token is passed to a macro precisely like it was written in source code.
"C"will be passed as"C", but the same C in escaped form"\x43"will be passed as"\x43".Proc macros can observe the difference because
ToString(the only way to get the literal contents in proc macro API) also prints the literal precisely.Literal::string(s: &str)will make you a string literal containing datas, approximately.The precise token (returned by
ToString) will contain:escape_debug(s)for string literals (Literal::string)escape_unicode(s)for character literals (Literal::character)escape_default(s)for byte string literals (Literal::byte_string)AST goes through pretty-printing first, then re-tokenized.
The precise token (returned by
ToString) will contain:sfor raw AST stringsescape_debug(s)for non-raw AST stringsescape_default(s)for AST characters, bytes and byte strings (both raw and non-raw)Just an ad-hoc recovery without pretty-printing.
The precise token (returned by
ToString) will contain:sfor raw AST stringsescape_default(s)for non-raw AST strings, AST characters, bytes and byte strings (both raw and non-raw)EDIT: Also doc comments go through
escape_debugwhen converted to#[doc = "content"]tokens for proc macros.It would be nice to