Notes about flex





Why is Flex useful for?

Writing parser algorithms is not difficult but it is a pretty long and boring task. Furthermore the maintenance is difficult. Using Flex makes lexical parsing a breeze.

Flex accepts regular expressions, just like many UNIX utilities: sed, perl, she shell, ... For each regular expression the programmer associates an action, which is nothing more than a C code. Therefore all the boring and tricky part that consists in writing regular expression is done by Flex.

Besides, scaners generated by Flex are really fast. Even if you write your own code by hand, you'll have great difficulties to write a faster scaner.

To sum up, there are no advantages in writing lexical scaners by hand. Furthermore, Flex is GNU licensed and you can find it for all UNIX flavours and even for Windows.

The simple example below illustrates the use of Flex. I have written this code while I was coding a load balancer. I needed to parse a configuration file which contains IP addresses and port numbers.

The file "loadb.flex" contains the Flex instruction. The file "loadb.c" calls the lexical parser produced by Flex.

Here I've said that an IP address can be anything like: "123.23.45.45", "12.@.14.5", "14.456.78.@", ... The character '@' means any number. In other words, "14.456.78.@" could be "14.456.78.1", "14.456.78.2", ...

Simple use of flex: parsing from a file

loadb.c

%{


/************************************************************************/
/* Code section */
/* */
/* This part will be included as is into the parser code. */
/* */
/************************************************************************/


#include <stdio.h>

char *chomp (char* line)
{
  if (line[strlen(line)-1] == '\n') { line[strlen(line)-1] = 0; }
  return line;
}


%}


/************************************************************************/
/* Definition section */
/* */
/* Remark: the rule '^[ \t]+\n' is not the same as '^[ \t]+$'. In the */
/* first case the string 'yytext' will contain the character */
/* '\n'. On the other hand, the string 'yytext' that matches */
/* second rule won't include the trailing character '\n'. */
/* */
/* Remark: The rule 'DST_CONFIG' will be matched only if the statement */
/* tag 'dst_rules' is activated. All the other rules will be */
/* evalutated whether 'dst_rules' is activated or not. */
/************************************************************************/


FIELD_SEPARATOR \t
BLANK_LINE ^[ \t]+\n
EMPTY_LINE ^\n

IP_JOKER @
IP_NUMBER [0-9]+|{IP_JOKER}
IP_ADRESS {IP_NUMBER}\.{IP_NUMBER}\.{IP_NUMBER}\.{IP_NUMBER}
PORT [0-9]+
SRC_CONFIG ^{IP_ADRESS}{FIELD_SEPARATOR}+{PORT}\n
DST_CONFIG ^{FIELD_SEPARATOR}+{IP_ADRESS}{FIELD_SEPARATOR}+{PORT}\n


%s dst_rules

/************************************************************************/
/* Rules section */
/************************************************************************/


%%
BEGIN(src_rules);

{SRC_CONFIG} {
                 fprintf (stdout, "SRC: [%s]\n", chomp(yytext));
                 BEGIN(dst_rules);
              }

<dst_rules>{DST_CONFIG} { fprintf (stdout, "DST: [%s]\n", chomp(yytext)); }

{BLANK_LINE} { fprintf (stdout, "Blank line\n"); }
{EMPTY_LINE} { fprintf (stdout, "Empty line\n"); }

.* { return 1; }

%%

/* No extra C code */


In the following code I show how to assign a file descriptor (yyin) to the parser. This way, the parser will scan the file associated with yyin. I also show how to reinitialize the parser (the paser is NOT reentrant).



loadb.c

include <stdio.h>

extern int yylex();
extern FILE *yyin;
extern int yyrestart(FILE*);

int main (int argc, char *argv[])
{
  int parser_status, error;

  /********************************************************/
  /** Check command line arguments **/
  /********************************************************/

  if (argc != 2)
  {
    fprintf (stderr, "\nUsage: %s config_file\n", argv[0]);
    return 1;
  }

  /********************************************************/
  /** Open file for the flex parser **/
  /********************************************************/

  yyin = fopen (argv[1], "r");
  if (yyin == NULL)
  {
    fprintf (stderr, "\nCan not open file %s\n", argv[1]);
    return 1;
  }

  /********************************************************/
  /** Call the flex parser using 'yyin' as input */
  /********************************************************/

  parser_status = yylex();
  fclose (yyin);

  switch (parser_status)
  {
    case 0: {
              fprintf (stdout, "\n-- OK --\n");
              error = 0; }; break;
    case 1: {
              fprintf (stderr, "Internal parser error");
              error = 1; }; break;
    default:{
              fprintf (stderr, "Internal error (unexpected return value from the parser)");
              error = 3; };
  }

  /********************************************************/
  /** Try with another file ... */
  /********************************************************/

  yyin = fopen (argv[1], "r");
  if (yyin == NULL)
  {
    fprintf (stderr, "\nCan not open file %s\n", argv[1]);
    return 1;
  }

  yyrestart(yyin);

  parser_status = yylex();
  fclose (yyin);

  switch (parser_status)
  {
    case 0: {
              fprintf (stdout, "\n-- OK --\n");
              error = 0; }; break;
    case 1: {
              fprintf (stderr, "Internal parser error");
              error = 1; }; break;
    default:{
              fprintf (stderr, "Internal error (unexpected return value from the parser)");
              error = 3; };
  }

  return error;
}


Makefile

FLEX = flex
CC = gcc
CTAGS =

OBJS: lex.yy.o

lex.yy.c: config.lex
        ${FLEX} loadb.lex

lex.yy.o: lex.yy.c
        ${CC} ${CTAGS} -c lex.yy.c

config: OBJS loadb.c
        ${CC} ${CTAGS} -o loadb loadb.c lex.yy.o -lfl

all: config

clean:
        rm -f *.o lex.yy.c

Parsing from a socket descriptor

This simple program is a network server. It will open a sockect on a given port number and it will listen for connexion requests. Once a connexion is established, the program will parse the data read from the socket. To parse the input data, the program uses a flex parser.

You can launch the server and connect to it using telnet. Then you can send data to the server. The server will print any tag like "${TOTO1}". For example if you send the line "Test of ${TAG1} to flex", the server will print "${TAG1}".

fdscan.lex

%{

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>






int hashwrap() { return 1; }

/* ------------------------------------------------ */
/* This global variable represents the socket des- */
/* -criptor used by the parser as input. */
/* ------------------------------------------------ */

extern int sockin;

/* ------------------------------------------------ */
/* Instead of reading data from a file, we redefine */
/* YY_INPUT, so we read data from the socket des- */
/* -criptor 'sockin'. */
/* ------------------------------------------------ */

#define YY_INPUT(buf,result,max_size) \
{ \
  size_t nb_bytes; \
                                                                   \
  nb_bytes = read(sockin, (void*)buf, (size_t)max_size); \
                                                                   \
  if ((nb_bytes == 0) || (nb_bytes == -1)) { result = YY_NULL; } \
  else { result = nb_bytes; } \
}


%}

ALPHA ([A-Z]|[a-z])
NUMERIC [0-9]
ALPHANUM ({ALPHA}|{NUMERIC})
CARACTER ({ALPHANUM}|[_\-])
TAG \{{CARACTER}+\}

%%

{TAG} {
          fprintf (stdout, "\ntag: %s", yytext);
          fflush (stdout);
        }
\n { /* Do nothing - ignore input */ }
. { /* Do nothing - ignore input */ }

%%


fltest.c

/* -------------------------------------------------------------- */
/* This simple program illustrate the use of flex to parser data */
/* from a socket descriptor. */
/* -------------------------------------------------------------- */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/param.h>
#include <arpa/inet.h>
#include <errno.h>







extern int hashlex();
int sockin;



int main (int argc, char *argv[])
{
   int server_socket, i, port, longueur, res;
   struct sockaddr_in server_address, client_address;


   /* ----------------------------------------------------------- */
   /* Check command line */
   /* ----------------------------------------------------------- */

   if (argc != 2)
   {
     fprintf (stderr, "\n\nUsage: fltest <port number>\n\n");
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* configuration */
   /* ----------------------------------------------------------- */

   port = atoi(argv[1]);

   /* ----------------------------------------------------------- */
   /* Open server socket */
   /* ----------------------------------------------------------- */

   server_socket = socket (AF_INET, SOCK_STREAM, 0);
   if (server_socket == -1)
   {
     fprintf (stderr, "\n\nERROR: Can not create socket - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Bind socket to internet address */
   /* ----------------------------------------------------------- */

   server_address.sin_family = AF_INET;
   server_address.sin_port = htons((unsigned short)port);
   server_address.sin_addr.s_addr = INADDR_ANY;

   for (i=0; i<8; i++) { server_address.sin_zero[i] = 0; }

   if (bind (server_socket, (struct sockaddr*)&server_address, sizeof(struct sockaddr_in)) == -1)
   {
     fprintf (stderr, "\n\nERROR: Can not bind socket to Internet address - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Configure this socket, so it will listen for connexion re- */
   /* -quests. */
   /* ----------------------------------------------------------- */

   if (listen (server_socket, 50) == -1)
   {
     fprintf (stderr, "\n\nERROR: System call listen() failed - %s.\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* now wait for connexions */
   /* ----------------------------------------------------------- */

   longueur = sizeof(struct sockaddr);

   sockin = accept (server_socket, (struct sockaddr*)&client_address, &longueur);

   if (sockin == -1)
   {
     fprintf (stderr, "\n\nERROR: System call accept() failed - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Parse socket's stream */
   /* ----------------------------------------------------------- */

   res = hashlex();


   fprintf (stdout, "\n\nThe parser returned the value %d\n\n", res);
   fflush (stdout);

   return 0;
}


script used to compile the program

rm -f fd_scan.lex.c

flex -Phash -ofd_scan.lex.c fd_scan.lex
gcc -Wall -c fd_scan.lex.c -o fd_scan.lex.o
gcc -Wall -o fltest fltest.c fd_scan.lex.o -lfl


Parsing in-memory buffers

This simple test program illustrates the use of flex to:

  • Parse in-memory buffer.
  • Mix two flex parsers in the same program.


This program will extract tags that look like "{TOTO}".


hasher.lex

%{

#include <stdio.h>

int hashwrap() { return 1; }

extern int hashlex();

int yylex_memory (char *src)
{
  int result;
  YY_BUFFER_STATE buf_state;

  buf_state = yy_scan_string(src);
  while ((result = hashlex()) != 0)
  yy_delete_buffer(buf_state);

  return result;
}



%}

ALPHA ([A-Z]|[a-z])
NUMERIC [0-9]
ALPHANUM ({ALPHA}|{NUMERIC})
CARACTER ({ALPHANUM}|[_\-])
TAG \{{CARACTER}+\}

%%

{TAG} { fprintf (stdout, "\ntag: %s", yytext); }
\n
.

%%


hasher1.lex

%{

#include <stdio.h>

int hash1wrap() { return 1; }

extern int hash1lex();

int yylex_memory1 (char *src)
{
  int result;
  YY_BUFFER_STATE buf_state;

  buf_state = yy_scan_string(src);
  while ((result = hash1lex()) != 0)
  yy_delete_buffer(buf_state);

  return result;
}



%}

ALPHA ([A-Z]|[a-z])
NUMERIC [0-9]
ALPHANUM ({ALPHA}|{NUMERIC})
CARACTER ({ALPHANUM}|[_\-])
TAG \{{CARACTER}+\}

%%

{TAG} { fprintf (stdout, "\ntag: %s", yytext); }
\n
.

%%


test_hasher.lex

#include <stdio.h>

extern int yylex_memory (char *src);
extern int yylex_memory1 (char *src);


static char buff1[]="A ${TAG1} B ${TAG2} C\nA ${TAG1} B ${TAG2} C\n";
static char buff2[]="A ${TAG3} B ${TAG4} C\nA ${TAG5} B ${TAG6} C\n";


int main()
{
  int res;

  res = yylex_memory(buff1);
  fprintf (stdout, "\n\nyylex returned value %d\n\n", res);

  res = yylex_memory1(buff1);
  fprintf (stdout, "\n\nyylex returned value %d\n\n", res);

  res = yylex_memory(buff2);
  fprintf (stdout, "\n\nyylex returned value %d\n\n", res);

  res = yylex_memory1(buff2);
  fprintf (stdout, "\n\nyylex returned value %d\n\n", res);

  return res;
}


script used to compile the test program

rm *.o hasher.lex.c hasher1.lex.c
flex -Phash -ohasher.lex.c hasher.lex
flex -Phash1 -ohasher1.lex.c hasher1.lex
gcc -Wall -c hasher.lex.c -o hasher.lex.o
gcc -Wall -c hasher1.lex.c -o hasher1.lex.o
gcc -Wall -o hasher.test test_hasher.c hasher.lex.o hasher1.lex.o -lfl


Some rules you should keep in mind

  • If a start condition is inclusive, then rules with no start conditions at all will also be active.
  • If a start condition is exclusive, then only rules qualified with the start condition will be active.
  • Exclusive start conditions are not executed at startup. They are executed when you call BEGIN(condition).
  • A BEGIN(COMMAND) at the begining of all rules says that we start the parser in the COMMAND start condition. But BEGIN is executed whenever you call yylex() ! Not only first time.
  • If the parser finds more than one match, it takes the one matching the most text.
  • If the finds two or more matches of the same length, the rule listed first in the flex input file is chosen.

The following example shows how to use start conditions. This is a fake SMTP server that prints the DATA section on an e-mail. Please note that the header file "dstring.h" refers to a very simple API for dynamic string management.

smtpd.lex

%{

/* ------------------------------------------------ */
/* SMTP parser */
/* ------------------------------------------------ */

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
#include "dstring.h"
#include "smtpd.h"



int hashwrap() { return 1; }

/* ------------------------------------------------ */
/* This global variable represents the socket des- */
/* -criptor used by the parser as input. */
/* ------------------------------------------------ */

extern int sockin;

/* ------------------------------------------------ */
/* Dynamic string used to store the data section of */
/* the e-mail. This variable MUST be initialized */
/* before you call the parser. */
/* ------------------------------------------------ */

extern dstring data;

/* ------------------------------------------------ */
/* Instead of reading data from a file, we redefine */
/* YY_INPUT, so we read data from the socket des- */
/* -criptor 'sockin'. */
/* ------------------------------------------------ */

#define YY_INPUT(buf,result,max_size) \
{ \
  size_t nb_bytes; \
                                                                   \
  nb_bytes = read(sockin, (void*)buf, (size_t)max_size); \
                                                                   \
  if ((nb_bytes == 0) || (nb_bytes == -1)) { result = YY_NULL; } \
  else { result = nb_bytes; } \
}

/* ------------------------------------------------ */
/* The foloowing section defines actions to be exe- */
/* -cuted for each section of the SMTP stream. */
/* */
/* Notes about start conditions: */
/* */
/* o If the start condition is inclusive, then */
/* rules with no start conditions at all will */
/* also be active. */
/* o If it is exclusive, then only rules qualified */
/* with the start condition will be active. */
/* o Exclusive start conditions are not executed at */
/* startup. They are executed when you call */
/* BEGIN(condition). */
/* o The BEGIN says that we start the parser in the */
/* COMMAND start condition. But BEGIN is executed */
/* -- whenever -- you call yylex() !!! Not only */
/* first time. */
/* */
/* Notes about rules matching: */
/* */
/* o If it finds more than one match, it takes the */
/* one matching the most text. */
/* o If it finds two or more matches of the same */
/* length, the rule listed first in the flex */
/* input file is chosen. */
/* ------------------------------------------------ */

%}

CRLF \r\n
END_OF_DATA ^\.{CRLF}

%x DATA_PART

%%

  
    You could add the following line to tell the parser (yylex()) to start in a given start condition:

    BEGIN(start_condition);

    Note that you must write spaces between the word BEGIN and the start of the line. Otherwise flex will take it for a rule's definition.
  


^(HELO|EHLO).*{CRLF} {
                       fprintf (stdout, "\n>> HELO");
                       fflush (stdout);
                       return SMTP_HELO;
                     }

^QUIT{CRLF} {
              fprintf (stdout, "\n>> QUIT");
              fflush (stdout);
              return SMTP_QUIT;
            }

^DATA{CRLF} {
              fprintf (stdout, "\n>> DATA");
              fflush (stdout);
              BEGIN(DATA_PART);
              return SMTP_DATA;
            }

^.+{CRLF} {
            fprintf (stdout, "\n>> OTHER");
            fflush (stdout);
            return SMTP_OTHER;
          }

<DATA_PART>{END_OF_DATA} {
                           if (dstring_add (&data, yytext, strlen(yytext)) == 1) { return SMTP_OUT_OF_MEM; }
                           BEGIN(INITIAL);
                           return SMTP_END_OF_DATA;
                         }

<DATA_PART>^.*\n {
                   if (dstring_add (&data, yytext, strlen(yytext)) == 1) { return SMTP_OUT_OF_MEM; }
                 }


%%


smtpd.c

/* -------------------------------------------------------------- */
/* This simple program illustrate the use of flex to parser data */
/* from a socket descriptor. */
/* -------------------------------------------------------------- */

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <sys/param.h>
#include <arpa/inet.h>
#include <errno.h>

#include "dstring.h"
#include "smtpd.h"


#define DYNAMIC_STRING_INIT_SIZE 2048



#define CONNEXION_RESPONSE 0
#define HELO_RESPONSE 1
#define OK_RESPONSE 2
#define DATA_RESPONSE 3
#define QUIT_RESPONSE 4

char *responses[] = {
                      "220 newsletter.tiscali.fr Simple Mail Transfer Service Ready\r\n",
                      "250 newsletter.tiscali.fr\r\n",
                      "250 OK\r\n",
                      "354 Start mail input; end with <CRLF>.<CRLF>\r\n",
                      "221 newsletter.tiscali.fr Service closing transmission channel\r\n"
                    };




/* -------------------------------------------------------------- */
/* The following function is the flex parser itself */
/* -------------------------------------------------------------- */

extern int hashlex();

/* -------------------------------------------------------------- */
/* These 2 variables are used by the flew parser */
/* -------------------------------------------------------------- */

int sockin;
dstring data;

/* -------------------------------------------------------------- */
/* Main entry point for the SMTP deamon */
/* -------------------------------------------------------------- */

int main (int argc, char *argv[])
{
   int server_socket, i, port, longueur, res;
   struct sockaddr_in server_address, client_address;
   char *pt;
   size_t size;


   /* ----------------------------------------------------------- */
   /* Check command line */
   /* ----------------------------------------------------------- */

   if (argc != 2)
   {
     fprintf (stderr, "\n\nUsage: fltest <port number>\n\n");
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* configuration */
   /* ----------------------------------------------------------- */

   port = atoi(argv[1]);

   /* ----------------------------------------------------------- */
   /* Open server socket */
   /* ----------------------------------------------------------- */

   server_socket = socket (AF_INET, SOCK_STREAM, 0);
   if (server_socket == -1)
   {
     fprintf (stderr, "\n\nERROR: Can not create socket - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Bind socket to internet address */
   /* ----------------------------------------------------------- */

   server_address.sin_family = AF_INET;
   server_address.sin_port = htons((unsigned short)port);
   server_address.sin_addr.s_addr = INADDR_ANY;

   for (i=0; i<8; i++) { server_address.sin_zero[i] = 0; }

   if (bind (server_socket, (struct sockaddr*)&server_address, sizeof(struct sockaddr_in)) == -1)
   {
     fprintf (stderr, "\n\nERROR: Can not bind socket to Internet address - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Configure this socket, so it will listen for connexion re- */
   /* -quests. */
   /* ----------------------------------------------------------- */

   if (listen (server_socket, 50) == -1)
   {
     fprintf (stderr, "\n\nERROR: System call listen() failed - %s.\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Now wait for connexions */
   /* ----------------------------------------------------------- */

   longueur = sizeof(struct sockaddr);

   sockin = accept (server_socket, (struct sockaddr*)&client_address, &longueur);
   if (sockin == -1)
   {
     fprintf (stderr, "\n\nERROR: System call accept() failed - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Send the initial welcome */
   /* ----------------------------------------------------------- */

   if (write (sockin, responses[CONNEXION_RESPONSE], strlen(responses[CONNEXION_RESPONSE])) == -1)
   {
     fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Initialize the dynamic string */
   /* ----------------------------------------------------------- */

   if (dstring_init (&data, DYNAMIC_STRING_INIT_SIZE) == 1)
   {
     fprintf (stderr, "\n\nERROR: Can not allocate memory!\n\n");
     return 1;
   }

   /* ----------------------------------------------------------- */
   /* Parse socket's stream */
   /* ----------------------------------------------------------- */

   do {
        res = hashlex();

        switch (res)
        {
          case SMTP_HELO:
               {
                 if (write (sockin, responses[HELO_RESPONSE], strlen(responses[HELO_RESPONSE])) == -1)
                 {
                   fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
                   return 1;
                 }
               }; break;

          case SMTP_QUIT:
               {
                 if (write (sockin, responses[QUIT_RESPONSE], strlen(responses[QUIT_RESPONSE])) == -1)
                 {
                   fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
                   return 1;
                 }
                 close (sockin);
               }; break;

          case SMTP_DATA:
               {
                 if (write (sockin, responses[DATA_RESPONSE], strlen(responses[DATA_RESPONSE])) == -1)
                 {
                   fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
                   return 1;
                 }
               }; break;

          case SMTP_END_OF_DATA:
               {
                 if (write (sockin, responses[OK_RESPONSE], strlen(responses[OK_RESPONSE])) == -1)
                 {
                   fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
                   return 1;
                 }
               }; break;

          case SMTP_OTHER:
               {
                 if (write (sockin, responses[OK_RESPONSE], strlen(responses[OK_RESPONSE])) == -1)
                 {
                   fprintf (stderr, "\n\nERROR: System call write() failed - %s\n\n", strerror(errno));
                   return 1;
                 }
               }; break;

          case SMTP_OUT_OF_MEM:
               {
                 fprintf (stderr, "\n\nERROR: Can not allocate memory!\n\n");
                 return 1;
               }
        }
      }
      while (res != 0);


   fprintf (stdout, "\n\nThe parser returned the value %d\n\n", res);
   fflush (stdout);

   /* ----------------------------------------------------------- */
   /* Print out the data section */
   /* ----------------------------------------------------------- */

   pt = dstring_get_data (&data, &size);
   if (pt == NULL)
   {
     fprintf (stderr, "\n\nERROR: Can not allocate memory!\n\n");
     return 1;
   }

   fprintf (stdout, "\n\nDATA:\n\n%s\n\n", pt);

   /* ----------------------------------------------------------- */
   /* Free all allocated memory */
   /* ----------------------------------------------------------- */

   dstring_free (&data);
   free (pt);

   return 0;
}


smtpd.h

/* ------------------------------------------------ */
/* Header file for the SMTP deamon */
/* ------------------------------------------------ */

#ifndef SMTPD_HEADER

  #define SMTP_HELO 1
  #define SMTP_DATA 2
  #define SMTP_QUIT 3
  #define SMTP_END_OF_DATA 4
  #define SMTP_OTHER 5
  #define SMTP_OUT_OF_MEM 6

#endif


Makefile



COMMON_C_LIBS_HEADERS = /home/users/dbeurive/common_c_libs/includes
COMMON_C_LIBS_LIBS = /home/users/dbeurive/common_c_libs/libs

CCFLAGS = -Wall -I${COMMON_C_LIBS_HEADERS} -L${COMMON_C_LIBS_LIBS} -o $@
CC = gcc
FLEX = flex

smtpd.lex.c: smtpd.lex
        ${FLEX} -Phash -osmtpd.lex.c smtpd.lex

smtpd.lex.o: smtpd.lex.c
        ${CC} ${CCFLAGS} -c smtpd.lex.c

smtpd: smtpd.c \
       smtpd.lex.o
        ${CC} ${CCFLAGS} smtpd.c smtpd.lex.o -lfl -lmy_dstring


all: smtpd


clean:
        rm -f smtpd.lex.c
        rm -f smtpd.lex.o
        rm -f smtpd


Using Flex to generate C++ parser

The 2 following examples show how to create C++ parsers for:

  • File parsing.
  • In-memory parsing.


C Flex parsers contains global variables that are used to store internal parser's states between two calls to yylex(). Therefore you can not use the same parser to parse two input streams at the same time.

With C++ Flex parsers, global variables are included within classes. Therefore each instance of the parser gets its own copy of the state's variables.

file.lex

%{

/* Empty part */

%}

ALPHA ([A-Z]|[a-z])
NUMERIC [0-9]
ALPHANUM ({ALPHA}|{NUMERIC})
CARACTER ({ALPHANUM}|[_\-])
TAG \$\{{CARACTER}+\}

%%

{TAG} {
          cout << endl << "tag: " << YYText() << endl;
        }

\n
.

%%


file.cc

// -------------------------------------------------------------------------
// This program shows how to interface a C++ generated Flex parser in order
// to parse the content of a file.
//
// The key point is that the parser's constructor (yyFlexLexer(...)) takes
// a pointer to an 'istream' object as input for the parser.
//
// yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
//
// You should first create an 'ifstream' object in order to associate an
// input stream to a given file (the file to parse). Then you create an
// 'istream' object from the 'streambuf' object associated with the
// previously created 'ifstream' object.
//
// Keep in mind that the class 'ifstream' inherites from the class 'io',
// which contains a pointer to the object 'streambuf' used by the object
// of the class 'ifstream'.
//
// ifstream::open() associates a stream to a given input file.
// ios::fail() test the status of the previous aperation.
// ios::rdbuf() returns a pointer to the 'streambuf' object used by
// the stream associated with the input file.
// -------------------------------------------------------------------------


#include <fstream.h>
#include "FlexLexer.h"




int main(int argc, char *argv[])
{

  // -----------------------------------------------------------------------
  // Check command line
  // -----------------------------------------------------------------------

  if (argc != 2)
  {
    cerr << endl << "Usage: " << argv[0] << " <file name>" << endl << endl;
    return 1;
  }

  // -----------------------------------------------------------------------
  // Open input file
  // -----------------------------------------------------------------------

  ifstream input_file(argv[1], ios::in);

  if (input_file.fail())
  {
    cerr << endl << "ERROR: Can not open file " << argv[1] << endl;
    return 1;
  }

  // -----------------------------------------------------------------------
  // Create the input stream for the parser
  // -----------------------------------------------------------------------

  istream input_parser(input_file.rdbuf());

  // -----------------------------------------------------------------------
  // Create a parser
  // -----------------------------------------------------------------------

  yyFlexLexer parser(&input_parser);

  // -----------------------------------------------------------------------
  // Now parser the content of the input file
  // -----------------------------------------------------------------------

  parser.yylex();

  // -----------------------------------------------------------------------
  // Close input file
  // -----------------------------------------------------------------------

  input_file.close();

  return 0;
}


string.cc

// -------------------------------------------------------------------------
// This program shows how to interface a C++ generated Flex parser in order
// to parse an in-memory buffer.
//
// The key point is that the parser's constructor (yyFlexLexer(...)) takes
// a pointer to an 'istream' object as input for the parser.
//
// yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )
//
// You should first create a 'istrstream' object in order to associate an
// inpout stream to a in-memory buffer. Then you create an 'istream' object
// from the 'streambuf' object associated with the previously created
// 'ifstream' object.
//
// ios::rdbuf() returns a pointer to the 'streambuf' object used by the
// stream associated with the input file.
// -------------------------------------------------------------------------

#include <strstream.h>
#include <iostream.h>
#include <string.h>
#include "FlexLexer.h"


static char TO_PARSE[] = "Ceci est un test ${TEST1}\n${TEST2}\n";



int main(int argc, char *argv[])
{

  // -----------------------------------------------------------------------
  // Create a string stream
  // -----------------------------------------------------------------------

  istrstream mem_buff(TO_PARSE, strlen(TO_PARSE));

  // -----------------------------------------------------------------------
  // Create the 'istream' object
  // -----------------------------------------------------------------------

  istream mem_input(mem_buff.rdbuf());

  // -----------------------------------------------------------------------
  // Create a parser
  // -----------------------------------------------------------------------

  yyFlexLexer parser(&mem_input);

  // -----------------------------------------------------------------------
  // Now parser the content of the input file
  // -----------------------------------------------------------------------

  parser.yylex();

  return 0;
}


Makefile

##############################################################
# Tools configuration #
##############################################################

CC = g++
CCFLAGS =
FLEX = flex
FLEXFLAGS = -+

##############################################################
# Compile the flex parser #
##############################################################

file.lex.cc: file.lex
        ${FLEX} ${FLEXFLAGS} -ofile.lex.cc file.lex

file.lex.o: file.lex.cc
        ${CC} -c file.lex.cc

##############################################################
# Build the file parser #
##############################################################

pfile: file.cc \
        file.lex.o
        ${CC} ${CCFLAGS} -o $@ file.cc file.lex.o -lfl

##############################################################
# Build the in-memory parser #
##############################################################

pstring: string.cc \
         file.lex.o
        ${CC} ${CCFLAGS} -o $@ string.cc file.lex.o -lfl

##############################################################
# Generic rules #
##############################################################

all: pfile pstring

clean:
        rm -f file.lex.cc
        rm -f file.lex.o
        rm -f pfile
        rm -f pstring


Handling UNIX and DOS newlines

  • Under UNIX, the end of line is represented by the character "carriage return" (assci code 13 in decimal).
  • Under MSDOS (or Windows) the end of line is represented by the sequnce of characters "carriage return" (ascii code 13 in decimal) followed by a "line feed" (ascii code 10 in decimal).
With Flex, you can traduce this by:


LF \x0A
CR \x0D
NL {CR}?{LF}

NL handles both UNIX and DOS end of lines.