C语言词法及语法定义-ANTLR

<pre class="blockcode"><a data-token="748f3afdfe775b90fe1c5ddfc9c84bca" href="https://raw.githubusercontent.com/antlr/examples-v3/master/C/C/C.g" rel="nofollow">https://raw.githubusercontent.com/antlr/examples-v3/master/C/C/C.g</a></pre>
<pre class="blockcode"></pre>
<pre class="blockcode">/** ANSI C ANTLR v3 grammar

Adapted for C output target by Jim Idle - April 2007.

Translated from Jutta Degener's 1995 ANSI C yacc grammar by Terence Parr
July 2006.  The lexical rules were taken from the Java grammar.

Jutta says: "In 1985, Jeff Lee published his Yacc grammar (which
is accompanied by a matching Lex specification) for the April 30, 1985 draft
version of the ANSI C standard.  Tom Stockfisch reposted it to net.sources in
1987; that original, as mentioned in the answer to question 17.25 of the
comp.lang.c FAQ, can be ftp'ed from ftp.uu.net,
file usenet/net.sources/ansi.c.grammar.Z.
I intend to keep this version as close to the current C Standard grammar as
possible; please let me know if you discover discrepancies. Jutta Degener, 1995"

Generally speaking, you need symbol table info to parse C; typedefs
define types and then IDENTIFIERS are either types or plain IDs.  I'm doing
the min necessary here tracking only type names.  This is a good example
of the global scope (called Symbols).  Every rule that declares its usage
of Symbols pushes a new copy on the stack effectively creating a new
symbol scope.  Also note rule declaration declares a rule scope that
lets any invoked rule see isTypedef boolean.  It's much easier than
passing that info down as parameters.  Very clean.  Rule
direct_declarator can then easily determine whether the IDENTIFIER
should be declared as a type name.

I have only tested this on a single file, though it is 3500 lines.

This grammar requires ANTLR v3 (3.0b8 or higher)

Terence Parr
July 2006

*/
grammar C;

options
{
backtrack = true;
memoize = true;
k  = 2;
language = C;
}

scope Symbols
{
// Only track types in order to get parser working. The Java example
// used the java.util.Set to keep track of these. The ANTLR3 runtime
// has a number of useful 'objects' we can use that act very much like
// the Java hashtables, Lists and Vectors. You have finer control over these
// than the Java programmer, but they are sometimes a little more 'raw'.
// Here, for each scope level, we want a set of symbols, so we can use
// a ANTLR3 runtime provided hash table, and then later we will see if
// a symbols is stored in at any level by using the symbol as the
// key to the hashtable and seeing if the table contains that key.
//
pANTLR3_HASH_TABLE    types;

}

// While you can implement your own character streams and so on, they
// normally call things like LA() via function pointers. In general you will
// be using one of the pre-supplied input streams and you can instruct the
// generated code to access the input pointers directly.
//
// For  8 bit inputs          : #define ANTLR3_INLINE_INPUT_ASCII
// For 16 bit UTF16/UCS2 inputs : #define ANTLR3_INLINE_INPUT_UTF16
//
// If your compiled recognizer might be given inputs from either of the sources
// or you have written your own character input stream, then do not define
// either of these.
//
@lexer::header
{
#define ANTLR3_INLINE_INPUT_ASCII
}

@parser::includes
{
// Include our noddy C++ example class
//
#include <cpp_symbolpp.h>
}

// The @header specifier is valid in the C target, but in this case there
// is nothing to add over and above the generated code. Here you would
// add #defines perhaps that you have made your code reliant upon.
//
// Use @preincludes for things you want to appear in the output file
//    before #include <antlr3.h>
//    @includes to come after #include <antlr3.h>
//    @header for things that should follow on after all the includes.
//
// Hence, this java oriented @header is commented out.
//
// @header {
// import java.util.Set;
// import java.util.HashSet;
// }

// @members inserts functions in C output file (parser without other
//       qualification. @lexer::members inserts functions in the lexer.
//
// In general do not use this too much (put in the odd tiny function perhaps),
// but include the generated header files in your own header and use this in
// separate translation units that contain support functions.
//
@members
{

void addTypeDef(pANTLR3_HASH_TABLE *types, pANTLR3_COMMON_TOKEN typeDef)
{
// By the time we are traversing tokens here, it
// does not matter if we play with the input stream. Hence
// rather than use text or getText() on a token and have the
// huge overhead of creating pANTLR3_STRINGS, then we just
// null terminate the string that the token is pointing to
// and use it directly as a key.
//
*((pANTLR3_UINT8)(typeDef->stop) + 1) = '\0';

// We only create a symbol hash table if we find any
// symbols to record at this scope level
//
if (*types == NULL)
{
  *types = antlr3HashTableNew(11);
}
(*types)->put(*types, (void *)typeDef->start, (void *)(typeDef->start), NULL);

}

// This is a function that is small enough to be kept in the
// generated parser code (@lexer::members puts code in the lexer.
//
// Note a few useful MACROS in use here:
//
// SCOPE_SIZE    returns the number of levels on the stack (1 to n)
//             for the named scope.
// SCOPE_INSTANCE returns a pointer to Scope instance at the
//             specified level.
// SCOPE_TYPE    makes it easy to declare and cast the pointer to

C语言词法及语法定义-ANTLR

浏览过的版块