The syntactic grammar of C# is presented in the chapters and appendices that follow this chapter. The terminal symbols of the lexical grammar are the characters of the Unicode character set, and the lexical grammar specifies how characters are combined to form tokens ( Tokens), white space ( White space), comments ( Comments), and pre-processing directives ( Pre-processing directives).Įvery source file in a C# program must conform to the input production of the lexical grammar ( Lexical analysis). The lexical grammar of C# is presented in Lexical analysis, Tokens, and Pre-processing directives. The lexical and syntactic grammars are presented in Backus-Naur form using the notation of the ANTLR grammar tool. The syntactic grammar ( Syntactic grammar) defines how the tokens resulting from the lexical grammar are combined to form C# programs. The lexical grammar ( Lexical grammar) defines how Unicode characters are combined to form line terminators, white space, comments, tokens, and pre-processing directives. This specification presents the syntax of the C# programming language using two grammars.
For maximal portability, it is recommended that files in a file system be encoded with the UTF-8 encoding.Ĭonceptually speaking, a program is compiled using three steps: Source files typically have a one-to-one correspondence with files in a file system, but this correspondence is not required.
A source file is an ordered sequence of Unicode characters. A C# program consists of one or more source files, known formally as compilation units ( Compilation units).