The Forum for Discussion about The Third Manifesto and Related Matters

Please or Register to create posts and topics.

3rdManifesto 8thDayofChristmas identifiers starting numeric

the eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)

Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)

I've occasionally wished for identifiers starting with a numeric:

  • 3rdManifesto, 8thDayofChristmas, 9inchNails, ...

Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)

  • The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
  • So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
  • A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
  • Special exception for mantissa format: 0.17E-13.
  • Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case: 3rdManifesto vs ThirdManifesto.

 

 

Quote from AntC on January 1, 2025, 5:11 am

the eternal camelCase vs under_scores debate ... (along with a dozen or so of the usual TTM forum quarrels)

Then lets start the New Year as we usually go on ... (And continuing the Seasonal Silliness because its unseasonably and uncooperatively unsuitable for a BBQ on the beach ...)

I've occasionally wished for identifiers starting with a numeric:

  • 3rdManifesto, 8thDayofChristmas, 9inchNails, ...

Indeed I think I've dabbled in a language that allowed them -- ??POP-2. Is there some deep reason against?/what would go wrong?/why do most languages avoid them? (Yes there's plenty of StackOverflow/Quora q's asking that for specific languages, or even in general. The answers amount unhelpfully to 'because they don't'.)

  • The lexer would go by a 'maximal munch rule': a token continues until meeting a non-alphanumeric char.
  • So if you want two adjacent identifiers, or an identifier followed by a reserved id, put a space between. (That's a usual rule anyway.)
  • A token consisting entirely of digits (possibly incl decimal point) is a number, otherwise an identifier.
  • Special exception for mantissa format: 0.17E-13.
  • Some languages make identifiers case-sensitive; then an identifier starting with a digit counts as lower-case: 3rdManifesto vs ThirdManifesto.

Stretching my recollection a bit, I think the usual reason for avoiding identifiers starting with numeric characters is simply that it makes the lexxer slower, causing the alpha-only first character convention to be entrenched at a time when any lexxer slowdown -- such as from unbounded lookahead; keep to one character lookahead or two at most! (and ideally zero) -- was potentially impactful.

Now we wouldn't care. Unbounded lookahead is barely noticeable except in pathological cases, and even then the majority of compiler/interpreter processing time is typically expended on type checking and other semantic processing.

I'm the forum administrator and lead developer of Rel. Email me at dave@armchair.mb.ca with the Subject 'TTM Forum'. Download Rel from https://reldb.org

Cobol allowed (allows?) identifiers starting with numbers and with embedded minus signs. The convention was 123-begin-customer-validation or similar. But Cobol had no expressions, just compute so the minus signs caused no grief.

There are no issues for the lexer. Say what you want, everything is fast enough. Caveat: unless you need serious backtracking, such as the original Fortran with no reserved words and multiple lexical ambiguities.

The only practical issue is delineating tokens and avoiding ambiguities. In a language like Forth tokens are bounded by whitespace and (almost) anything goes, but most languages allow optional whitespace. You really don't want 111e.length or 0fff_qty or 0x123abcxyzto be valid identifiers.

 

Andl - A New Database Language - andl.org