C语言词法及语法定义-ANTLR

论坛 期权论坛     
匿名小用户   2019-10-20 18:13   171   0
<pre class="blockcode"><a data-token="748f3afdfe775b90fe1c5ddfc9c84bca" href="https://raw.githubusercontent.com/antlr/examples-v3/master/C/C/C.g" rel="nofollow">https://raw.githubusercontent.com/antlr/examples-v3/master/C/C/C.g</a></pre>
<pre class="blockcode"></pre>
<pre class="blockcode">/** ANSI C ANTLR v3 grammar

Adapted for C output target by Jim Idle - April 2007.

Translated from Jutta Degener's 1995 ANSI C yacc grammar by Terence Parr
July 2006.  The lexical rules were taken from the Java grammar.

Jutta says: "In 1985, Jeff Lee published his Yacc grammar (which
is accompanied by a matching Lex specification) for the April 30, 1985 draft
version of the ANSI C standard.  Tom Stockfisch reposted it to net.sources in
1987; that original, as mentioned in the answer to question 17.25 of the
comp.lang.c FAQ, can be ftp'ed from ftp.uu.net,
   file usenet/net.sources/ansi.c.grammar.Z.
I intend to keep this version as close to the current C Standard grammar as
possible; please let me know if you discover discrepancies. Jutta Degener, 1995"

Generally speaking, you need symbol table info to parse C; typedefs
define types and then IDENTIFIERS are either types or plain IDs.  I'm doing
the min necessary here tracking only type names.  This is a good example
of the global scope (called Symbols).  Every rule that declares its usage
of Symbols pushes a new copy on the stack effectively creating a new
symbol scope.  Also note rule declaration declares a rule scope that
lets any invoked rule see isTypedef boolean.  It's much easier than
passing that info down as parameters.  Very clean.  Rule
direct_declarator can then easily determine whether the IDENTIFIER
should be declared as a type name.

I have only tested this on a single file, though it is 3500 lines.

This grammar requires ANTLR v3 (3.0b8 or higher)

Terence Parr
July 2006

*/
grammar C;

options
{
    backtrack = true;
    memoize = true;
    k  = 2;
    language = C;
}



scope Symbols
{
    // Only track types in order to get parser working. The Java example
    // used the java.util.Set to keep track of these. The ANTLR3 runtime
    // has a number of useful 'objects' we can use that act very much like
    // the Java hashtables, Lists and Vectors. You have finer control over these
    // than the Java programmer, but they are sometimes a little more 'raw'.
    // Here, for each scope level, we want a set of symbols, so we can use
    // a ANTLR3 runtime provided hash table, and then later we will see if
    // a symbols is stored in at any level by using the symbol as the
    // key to the hashtable and seeing if the table contains that key.
    //
    pANTLR3_HASH_TABLE     types;


}

// While you can implement your own character streams and so on, they
// normally call things like LA() via function pointers. In general you will
// be using one of the pre-supplied input streams and you can instruct the
// generated code to access the input pointers directly.
//
// For  8 bit inputs            : #define ANTLR3_INLINE_INPUT_ASCII
// For 16 bit UTF16/UCS2 inputs : #define ANTLR3_INLINE_INPUT_UTF16
//
// If your compiled recognizer might be given inputs from either of the sources
// or you have written your own character input stream, then do not define
// either of these.
//
@lexer::header
{
#define ANTLR3_INLINE_INPUT_ASCII
}

@parser::includes
{
// Include our noddy C++ example class
//
#include &lt;cpp_symbolpp.h&gt;
}

// The @header specifier is valid in the C target, but in this case there
// is nothing to add over and above the generated code. Here you would
// add #defines perhaps that you have made your code reliant upon.
//
// Use @preincludes for things you want to appear in the output file
//     before #include &lt;antlr3.h&gt;
//     @includes to come after #include &lt;antlr3.h&gt;
//     @header for things that should follow on after all the includes.
//
// Hence, this java oriented @header is commented out.
//
// @header {
// import java.util.Set;
// import java.util.HashSet;
// }

// @members inserts functions in C output file (parser without other
//          qualification. @lexer::members inserts functions in the lexer.
//
// In general do not use this too much (put in the odd tiny function perhaps),
// but include the generated header files in your own header and use this in
// separate translation units that contain support functions.
//
@members
{

    void addTypeDef(pANTLR3_HASH_TABLE *types, pANTLR3_COMMON_TOKEN typeDef)
    {
// By the time we are traversing tokens here, it
// does not matter if we play with the input stream. Hence
// rather than use text or getText() on a token and have the
// huge overhead of creating pANTLR3_STRINGS, then we just
// null terminate the string that the token is pointing to
// and use it directly as a key.
//
*((pANTLR3_UINT8)(typeDef-&gt;stop) + 1) = '\0';

// We only create a symbol hash table if we find any
// symbols to record at this scope level
//
if (*types == NULL)
{
  *types = antlr3HashTableNew(11);
}
(*types)-&gt;put(*types, (void *)typeDef-&gt;start, (void *)(typeDef-&gt;start), NULL);

    }

    // This is a function that is small enough to be kept in the
    // generated parser code (@lexer::members puts code in the lexer.
    //
    // Note a few useful MACROS in use here:
    //
    // SCOPE_SIZE     returns the number of levels on the stack (1 to n)
    //                for the named scope.
    // SCOPE_INSTANCE returns a pointer to Scope instance at the
    //                specified level.
    // SCOPE_TYPE     makes it easy to declare and cast the pointer to
分享到 :
0 人收藏
您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

积分:50
帖子:1079
精华:0
期权论坛 期权论坛
发布
内容

下载期权论坛手机APP