return 0;
you cannot drop the space between return and 0 because that would create
return0;
consisting of the single identifier return0. Use extra whitespace appropriately to make your code
human-readable, even though the parser ignores it. Note that the parser treats comments as whitespace.
The tokenizer is a "greedy" tokenizer. It grabs as many characters as it can to build up the next token, not
caring if this creates an invalid sequence of tokens. So because ++ is longer than +, the expression
j = i+++++i; // INVALID
is interpreted as the invalid expression
j = i++ ++ +i; // INVALID
instead of the valid
j = i++ + ++i;
7.1.4. Identifiers
Identifiers, used for names of declared entities such as variables, constants, and labels, must start with a letter,
followed by letters, digits, or both. The terms letter and digit are broad in Unicode: If something is considered
a letter or digit in a human language, you can probably use it in identifiers. "Letters" can come from
Armenian, Korean, Gurmukhi, Georgian, Devanagari, and almost any other script written in the world today.
Thus, not only is kitty a valid identifier, but , , , , and are, too.
[4]
Letters also include any currency symbol (such as $, ¥, and £) and connecting punctuation (such as _).
[4] These are the word "cat" or "kitty" in English, Serbo-Croatian, Russian, Persian, Tamil,
and Japanese, respectively.
Any difference in characters within an identifier makes that identifier unique. Case is significant: A, a, á, À,
Å, and so on are different identifiers. Characters that look the same, or nearly the same, can be confused. For
example, the Latin capital letter n "N" and the Greek capital ν """ look alike but are different characters
(\u004e and \u039d, respectively). The only way to avoid confusion is to write each identifier in one
languageand thus in one known set of charactersso that programmers trying to type the identifier will know
whether you meant E or E.[5]
[5] One is a Cyrillic letter, the other is ASCII. Guess which is which and win a prize.