3rdManifesto 8thDayofChristmas identifiers starting numeric
Quote from AntC on January 1, 2025, 5:11 amthe eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)
Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)
I've occasionally wished for identifiers starting with a numeric:
3rdManifesto
,8thDayofChristmas
,9inchNails
, ...Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)
- The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
- So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
- A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
- Special exception for mantissa format:
0.17E-13
.- Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case:
3rdManifesto
vsThirdManifesto
.
the eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)
Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)
I've occasionally wished for identifiers starting with a numeric:
3rdManifesto
,8thDayofChristmas
,9inchNails
, ...
Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)
- The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
- So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
- A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
- Special exception for mantissa format:
0.17E-13
. - Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case:
3rdManifesto
vsThirdManifesto
.
Quote from Dave Voorhis on January 2, 2025, 11:54 amQuote from AntC on January 1, 2025, 5:11 amthe eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)
Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)
I've occasionally wished for identifiers starting with a numeric:
3rdManifesto
,8thDayofChristmas
,9inchNails
, ...Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)
- The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
- So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
- A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
- Special exception for mantissa format:
0.17E-13
.- Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case:
3rdManifesto
vsThirdManifesto
.Stretching my recollection a bit, I think the usual reason for avoiding identifiers starting with numeric characters is simply that it makes the lexxer slower, causing the alpha-only first character convention to be entrenched at a time when any lexxer slowdown -- such as from unbounded lookahead; keep to one character lookahead or two at most! (and ideally zero) -- was potentially impactful.
Now we wouldn't care. Unbounded lookahead is barely noticeable except in pathological cases, and even then the majority of compiler/interpreter processing time is typically expended on type checking and other semantic processing.
Quote from AntC on January 1, 2025, 5:11 amthe eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)
Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)
I've occasionally wished for identifiers starting with a numeric:
3rdManifesto
,8thDayofChristmas
,9inchNails
, ...Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)
- The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
- So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
- A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
- Special exception for mantissa format:
0.17E-13
.- Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case:
3rdManifesto
vsThirdManifesto
.
Stretching my recollection a bit, I think the usual reason for avoiding identifiers starting with numeric characters is simply that it makes the lexxer slower, causing the alpha-only first character convention to be entrenched at a time when any lexxer slowdown -- such as from unbounded lookahead; keep to one character lookahead or two at most! (and ideally zero) -- was potentially impactful.
Now we wouldn't care. Unbounded lookahead is barely noticeable except in pathological cases, and even then the majority of compiler/interpreter processing time is typically expended on type checking and other semantic processing.
Quote from dandl on January 3, 2025, 5:07 amCobol allowed (allows?) identifiers starting with numbers and with embedded minus signs. The convention was
123-begin-customer-validation
or similar. But Cobol had no expressions, just compute so the minus signs caused no grief.There are no issues for the lexer. Say what you want, everything is fast enough. Caveat: unless you need serious backtracking, such as the original Fortran with no reserved words and multiple lexical ambiguities.
The only practical issue is delineating tokens and avoiding ambiguities. In a language like Forth tokens are bounded by whitespace and (almost) anything goes, but most languages allow optional whitespace. You really don't want
111e.length
or0fff_qty
or0x123abcxyz
to be valid identifiers.
Cobol allowed (allows?) identifiers starting with numbers and with embedded minus signs. The convention was 123-begin-customer-validation
or similar. But Cobol had no expressions, just compute so the minus signs caused no grief.
There are no issues for the lexer. Say what you want, everything is fast enough. Caveat: unless you need serious backtracking, such as the original Fortran with no reserved words and multiple lexical ambiguities.
The only practical issue is delineating tokens and avoiding ambiguities. In a language like Forth tokens are bounded by whitespace and (almost) anything goes, but most languages allow optional whitespace. You really don't want 111e.length
or 0fff_qty
or 0x123abcxyz
to be valid identifiers.