|
Notes about FORMS and CGI scripts
First you need a WEB server. Wich one to choose ? The answer is obvious: Apache.
Why Apache ?
- Almost every body use Apache.
- The documentation about Apache is VERY easy to find.
- Apache is THE reference.
- Apache is FREE !
- You can find Apache on almost every OSs (including MS Windows 95/98/NT !)
Then you need an OS and any of modern programing language. Personaly I use C and
perl.
And for the OS I use Linux.
Why Linux ?
- Because Linux is VERY efficient.
- Because On Linux you have the C and Perl for free. And almost any programming language.
- Because Apache runs very well on Linux.
- Because you can find Netscape navigator for Linux.
- Because Linux offers you great graphical interfaces.
- Linux is free.
But if you are Windows' adicted you can find Apache for Windows (for free) and GCC for DOS
(for free and with the standart libraries !). But if you want Perl for Windows ... you can
buy it.
Note: if you click on "Send" you won't have any result. If you want to produce any result.
you must have:
- Apache installed on your system (localhost).
- A CGI script named "cgi" in the "cgi-bin" directory of Apache.
Text area
Radio widget
Checkbox
Text Field Widget
Select Widget
Password Widget
Generalities
Client-server connexion
URL encoding
When you press the "send" button (TYPE="SUBMIT") the HTTP client sends data to
the CGI script. The data sent is a character string formatted as following:
name1=value1&name2=value2&...&nameN=valueN
The environment variable CONTENT_LENGTH contents the number of characters of this string.
|
- Each couple (name,value) is separated by an &.
- All spaces (in the name or in the value) are replaced by "+" characters.
- Some special characters are replaced by their ASCII code (in hexa).
The ASCII code is coded "%xx", where "x=0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F".
Two simple functions to "URL encode" and "URL decode" a string:
URL encoding |
/*****************************************************/
/* curl_escape(): */
/* */
/* Perform URL encoding on a string. */
/* o string: string to want to encode. */
/* */
/* return a pointer to the URL encoded string or */
/* NULL if error. Note that the returned string has */
/* been **allocated** by the function. Therefore */
/* you must free the memory. */
/*****************************************************/
char* url_escape(char *string)
{
int alloc;
char *ns;
unsigned char in;
int newlen;
int index;
newlen = alloc = strlen(string)+1;
ns = (char*) malloc((size_t)alloc);
if (ns == NULL) { return NULL; }
index = 0;
while((*string) != 0)
{
in = *string;
if(' ' == in) { ns[index++] = '+'; }
else if(!(in >= 'a' && in <= 'z') &&
!(in >= 'A' && in <= 'Z') &&
!(in >= '0' && in <= '9'))
{
/****************************************************/
/* encode it: */
/* the size grows with two, since this'll become a */
/* %XX. */
/****************************************************/
newlen += 2;
if(newlen > alloc)
{
alloc *= 2;
ns = realloc(ns, alloc);
if (ns == NULL) { return NULL; }
}
sprintf(&ns[index], "%%%02X", in);
index+=3;
}
else
{ ns[index++]=in; }
string++;
}
ns[index]=0;
return ns;
}
|
URL decoding |
/*****************************************************/
/* url_unescape(): */
/* */
/* Perform URL decoding on a string. */
/* o string: string to want to encode. */
/* */
/* return a pointer to the URL decoded string or */
/* NULL if error. Note that the returned string has */
/* been **allocated** by the function. Therefore */
/* you must free the memory. */
/*****************************************************/
char* url_unescape(char *string, int length)
{
int alloc;
char *ns;
unsigned char in;
int index;
int hex;
char querypart; /* everything to the right of a '?' letter is
the "query part" where '+' should become ' '.
RFC 2316, section 3.10 */
alloc = (length?length:strlen(string))+1;
ns = malloc(alloc);
if (ns == NULL) { return NULL; }
index = 0;
querypart = 0;
while(--alloc > 0)
{
in = *string;
if ((querypart == 1) && ('+' == in)) { in = ' '; }
else if ((querypart == 0) && ('?' == in))
{
/* we have "walked in" to the query part */
querypart = 1;
}
else if('%' == in)
{
/* encoded part */
if(sscanf(string+1, "%02X", &hex))
{
in = hex;
string+=2;
alloc-=2;
}
}
ns[index++] = in;
string++;
}
ns[index]=0; /* terminate it */
return ns;
}
|
The GET method
If you use the GET method to send the value list ("name=value") to the WEB server, then
the CGI will access it via the environment variable QUERY_STRING. The CGI sends
back the HTML code via the standard output.
Let's look at the TCP message. It looks something like:
TCP dump of a "GET message" |
GET /cgi-bin/cgi?radio1=12ON&radio2=22ON HTTP/1.0
Accept: image/gif, */*
Accept-Language: fr
User-Agent: Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt)
Host: 213.36.83.212
Connection: Keep-Alive
|
The POST method
If we use the POST method, data are passed "from the FORM" (is not exact but almost :) to the CGI
script via the standart input. And the CGI script send "the answer" (to the web server, not the FORM of course) via the standart output.
Let's look at the TCP message. It looks something like:
TCP dump of a "POST message" |
POST /cgi-bin/cgi HTTP/1.0
Accept: */*
Accept-Language: fr
Content-Type: application/x-www-form-urlencoded User-Agent: Mozilla/4.0 (compatible; MSIE 5 .0; Windows NT; DigExt)
Host: 213.36.83.212
Content-Length: 17
Connection: Keep-Alive
form1=azerty+uiop&form2=toto
|
CGI output
The CGI script must indicate to the web server the type of data it sends (the answer). If you want to send HTML data, then you must print
to the standart output the following string (in C syntax):
Content-type: text/html\n\n
|
Note that the two "new line" (\n\n) are necessary. If you don't put it, it won't work.
OK, let's look at the TCP message sent from Apache to the HTTP client. It looks something like:
TCP dump of the answer from Apache |
HTTP/1.1 200 OK\r\n
Date: Fri, 29 Dec 2000 09:37:45 GMT\r\n
Server: Apache/1.3.12 (Unix) mod_perl/1.24 PHP/3.0.16\r\n
Connection: close\r\n
Content-Type:text/html\r\n
\r\n
<HTML><BODY>This is my response!</BODY></HTML>
|
Note:The "\r\n" represent the characters "carriage return" and "new line".
Environment variables
When using the POST method, the "content of the FORM" is
passed to the CGI script over the standart input of the CGI script.
But other very useful data are passed via environment variables.
- AUTH_TYPE If the server supports user authentication, and the script is protected, this is the protocol-specific authentication method used to validate the user.
- CONTENT_LENGTH The number of bytes in the message body of the HTTP request. This is used by POST method request.
- CONTENT_TYPE For queries which have attached information, such as HTTP POST and PUT, this is the content type of the data.
- GATEWAY_INTERFACE The revision of the CGI specification to which this server complies. Format: CGI/revision
- PATH_INFO The extra path information, as given by the client. Scripts can be accessed by their virtual pathname, followed by extra information at the end of this path. Decoded by the server.
- PATH_TRANSLATED The server provides a translated version of PATH_INFO, which takes the path and does any virtual-to-physical mapping to it.
- QUERY_STRING The URL-encoded data in a GET request
- REMOTE_ADDR The IP address of the remote host making the request.
- REMOTE_HOST The hostname making the request. If the server does not have this information, it should set REMOTE_ADDR and leave this unset.
- REMOTE_IDENT If the HTTP server supports RFC 931 identification, then this variable will be set to the remote user name retrieved from the server.
- REMOTE_USER If the server supports user authentication, and the script is protected, this is the username they have authenticated as.
- REQUEST_METHOD The method with which the request was made. For HTTP, this is "GET", "HEAD", "POST", etc.
- SCRIPT_NAME A virtual path to the script being executed, used for self-referencing URLs.
- SERVER_NAME The server's hostname, DNS alias, or IP address as it would appear in self-referencing URLs.
- SERVER_PORT Port number to which the request was sent.
- SERVER_PROTOCOL The name and revision of the information protcol this request came in with.
- SERVER_SOFTWARE The name and version of the information server software answering the request (and running the gateway). Format: name/version
C implementation
This is a list of useful C functions:
Function name
| Header
| Use |
getenv()
| stdlib.h
| get the value of a environment variable. |
atoi()
| stdlib.h
| Convert a string into an integer. |
getchar()
| stdio.h
| read one character from the standart input. |
printf()
| stdio.h
| write to the standart output. |
sprintf()
| stdio.h
| "print" into a string. |
strcmp()
| string.h
| compare two strings. |
strlen()
| string.h
| number of characters in a string.
|
This simple CGI reads input from the standard input (stdout).
File cgi.h |
/******************************************************************************/
/* cgi.h */
/******************************************************************************/
#define CGI_LENGTH_ERROR -1
#define CGI_READ_ERROR 1
#define CGI_STRING_END -1
#define CGI_OK 0
|
File cgi.c |
/******************************************************************************/
/* cgi.c */
/* */
/* Copyright Denis BEURIVE - All rights reserved */
/******************************************************************************/
#include <stdio.h> /* printf() */
/* getchar() */
#include <stdlib.h> /* getenv() */
/* atoi() */
#include "cgi.h"
/**************************************************************************/
/* Hexa_To_Char() */
/* Convert a two characters hexa number into a integer. */
/* */
/* (IN) s: pointer to a 3 characters size long string. */
/* Note: s[2] = 0 */
/* */
/* (OUT) The integer value equivalent to the string. */
/**************************************************************************/
int hexa (char c)
{
switch (c)
{
case '0': return 0;
case '1': return 1;
case '2': return 2;
case '3': return 3;
case '4': return 4;
case '5': return 5;
case '6': return 6;
case '7': return 7;
case '8': return 8;
case '9': return 9;
case 'A': return 10;
case 'B': return 11;
case 'C': return 12;
case 'D': return 13;
case 'E': return 14;
case 'F': return 15;
}
return 0;
}
char Hex_To_Char (char *s)
{
char val;
val = (char)(hexa(s[0]) + 16*hexa(s[1]));
return val;
}
/**************************************************************************/
/* Get_Size() */
/* Returns the size of te CGI message. */
/* */
/* (IN) Nothing. */
/* (OUT) size (in byte) of the CGI message. */
/* or CGI_LENGTH_ERROR if error. */
/**************************************************************************/
int Get_Size()
{
char* s;
s = getenv("CONTENT_LENGTH");
if (s == NULL) { return CGI_LENGTH_ERROR; }
return atoi((const char*)s);
}
/**************************************************************************/
/* Read_Input() */
/* Copy the CGI message from the standart input (stdin). */
/* */
/* (IN) s: pointer on the string that will receive the CGI message. */
/* This string MUST have been allocated BEFORE. So, before */
/* calling "Get_Size()", you should call "Get_Size()". */
/* => s=(char*)malloc(Get_Size+1); */
/* The "+1" comes from the added "0" final. */
/* */
/* (OUT) CGI_OK: no error. */
/* CGI_READ_ERROR: read error. */
/**************************************************************************/
int Read_Input(char *s)
{
int size, i;
size = Get_Size();
if (size == CGI_LENGTH_ERROR) { return CGI_READ_ERROR; }
for (i=0; i<size; i++) { s[i] = (char)getchar(); }
s[size] = 0;
return CGI_OK;
}
/**************************************************************************/
/* Get_Val_Size() */
/* Returns the following datas: */
/* -> the size (in byte) of the string that contains the name. */
/* -> the size (in byte) of the string that contains the value. */
/* -> the position in the CGI message of the next couple */
/* (name,value). */
/* */
/* (IN) s: string that contains the CGI message. */
/* n: pointer to the interger that will receive the size of */
/* the string containing the name. */
/* v: pointer to the interger that will receive the sive of */
/* the string containing the value. */
/* last: position of the couple to read in the string s. */
/* */
/* (OUT) The position of the __NEXT__ couple (name,value) in the */
/* string s. */
/* Or returns CGI_STRING_END if there is nothing more to read */
/* in the string s. */
/* */
/* Note that a couple (name, value) is coded "name=value" and each couple */
/* is separated by a "&" character (except the last couple). */
/**************************************************************************/
int Get_Val_Size (char *s, int *n, int *v, int last)
{
char *p;
p = s + last; /* go to the next couple */
*n = 0;
while (*p != '=') { (*n)+=1; p++; last++; }
p++; /* we skip the "&" */
*v = 0;
while ((*p != '&') && (*p != 0)) { (*v)+= 1; p++; last++; }
if (*p == 0) { return CGI_STRING_END; }
return (last+2); /* we skip the "&" */
}
/**************************************************************************/
/* Get_Couple() */
/* Extract the next couple (name,value) in the CGI string. */
/* */
/* (IN) s: string that contains the CGI message. */
/* name: pointer to the string that will receive the couple */
/* name. */
/* value: pointer to the string that will receive the couple */
/* value. */
/* pos: position of the couple (name,value) to read. */
/* ns: number of character of the couple name. */
/* vs: number of character of the couple value. */
/* */
/* Note: the strings "name" and "value" must have been allocated before. */
/* The function Get_Val_Size() should be called before. */
/**************************************************************************/
void Get_Couple (char *s, int pos, char* name, char* value, int ns, int vs)
{
int i, stop;
char c, hex[3];
stop = ns;
for (i=0; i<ns; i++)
{
c = *(s+pos+i);
if (c == '+') { c = ' '; }
if (c == '%')
{
++i; hex[0]=*(s+pos+i);
++i; hex[1]=*(s+pos+i);
hex[2]=0;
c = Hex_To_Char(hex);
stop -= 2;
}
name[i] = c;
}
name[stop]=0;
stop = vs;
for (i=0; i<vs; i++)
{
c = *(s+pos+1+ns+i);
if (c == '+') { c = ' '; }
if (c == '%')
{
++i; hex[0]=*(s+pos+1+ns+i);
++i; hex[1]=*(s+pos+1+ns+i);
hex[2]=0;
c = Hex_To_Char(hex);
stop -= 2;
}
value[i] = c;
}
value[stop]=0;
}
#ifdef DEBUG
int main (int argc, char **argv)
{
int p, np, vp, next;
char *name, *value;
if (argc != 2)
{
printf ("\nCGI call error\n");
return 1;
}
printf ("\n\ninput: %s\n", argv[1]);
p = 0;
while (p != CGI_STRING_END)
{
next = Get_Val_Size (argv[1], &np, &vp, p);
name = (char*)malloc (np+1);
value = (char*)malloc (vp+1);
Get_Couple (argv[1], p, name, value, np, vp);
printf ("\nfrom %d (next one at %d)", p, next);
printf ("\n name : [%s]", name);
printf ("\n value : [%s]", value);
free (name);
free (value);
p = next;
}
printf ("\n\n");
return 0;
}
#endif
|
File sample.c |
#include <stdio.h>
#include "cgi.h"
int main ()
{
int p, np, vp, next;
char *name, *value, *cgi;
printf ("Content-type: text/html\n\n");
printf ("<HTML><BODY>");
/* read CGI message */
p = Get_Size();
if (p == CGI_LENGTH_ERROR)
{
printf ("<BR><BR><B>CGI ERROR (Get_Size())!!!</B><BR><BR>");
printf ("</BODY></HTML>");
return 1;
}
cgi = (char*)malloc(p+1);
if (cgi == NULL)
{
printf ("<BR><BR><B>CGI ERROR (malloc())!!!</B><BR><BR>");
printf ("</BODY></HTML>");
return 1;
}
if (Read_Input(cgi) == CGI_READ_ERROR)
{
printf ("<BR><BR><B>CGI ERROR (Read_Input())!!!</B><BR><BR>");
printf ("</BODY></HTML>");
free (cgi);
return 1;
}
/* scaning the CGI message */
p=0;
while (p != CGI_STRING_END)
{
next = Get_Val_Size (cgi, &np, &vp, p);
name = (char*)malloc (np+1);
value = (char*)malloc (vp+1);
Get_Couple (cgi, p, name, value, np, vp);
printf ("<BR>from %d (next one at %d)", p, next);
printf ("<BR> name : [%s]", name);
printf ("<BR> value : [%s]", value);
free (name);
free (value);
p = next;
}
printf ("</BODY></HTML>");
free (cgi);
return 0;
}
|
Compiling the program (using gcc):
Under UNIX
|
gcc -c cgi.c
gcc -o cgi sample.c cgi.o |
Under Windows
|
gcc -c cgi.c
gcc -o cgi.exe sample.c cgi.o
|
Where to put the file cgi (or cgi.exe under Windows) ? By default all the CGI scripts are
in the "cgi-bin" directory of the "Apache" installation directory.
|