Intro to HyperText Transfer Protocol (HTTP)

 

 

Overview

HTTP defines a simple request-response language.

A web client establishes a connection with a web server by using HTTP.

HTTP defines how to correctly phrase the request and how the response should look like

HTTP does not define how the network connection is made or managed, nor how the information is actually transmitted; it is done by the lower-level protocols such as TCP/IP

 

 

The Client Request

An HTTP request consists of the following:

  1. The method, which must be one of a set of legal action
  2. The Universal Resource Identifier (URI), which is the name of the information request
  3. The protocol version
  4. Optional supplemental information

(The method is to be executed on the object named by the URI. Actually the web uses a subset of URI names, the Universal Resource Locator (URL))

This table shows a sample of valid HTTP methods:

Method Action
GET Return the object; that is, retrieve the information
HEAD Return only information about the object, such as how old it is, but not hte object itself
POST Send information to be stored on the server. Many servers do not allow information to be POSTed except as input to scripts
PUT Send a new copy of an existing object to the server. Many servers do not allow documents to be PUT.
DELETE Permanently delete the object. Like PUT, this method is not allowed by most servers
... etc ... ... etc ...

 

For example, to request the document /sinn/index.htm from www.openloop.com, the web client will send the following request:

GET /sinn/index.htm HTTP/1.0
User-Agent: NCSA Mosaic for Windows 95/3.0
Accept: text/plain
Accept: text/html
Accept: application/postscript
Accept: image/gif

Header Field Description
User-Agent What kind of browser is making the request
If-Modified-Since Asks that the object be returned only if it is newer than the specified data. This saves the cost of retrieving a document that has already been acquired and has not changed
Accept The Mulitpurpose Internet Mail Extensions (MIME) types and formats of information that the browser is prepared to accept. This may save the cost of transferring documents that the client cannot or will not use. The client will then decode the data according to the rules of MIME. Please note that

Accept: */*

could be used

Authorization User password or other authentication as required

 

 

The Server Response

An HTTP response consist of the following:

  1. A status line, which indicates the success or failure of the request
  2. A description of the information in the response. This is the metadata or metainformation
  3. The actual information requested

(The server replies to the request with a description of what is being returned, followed by the information requested)

The status line has the form:

HTTP-version Status-code Reason

Field Description
HTTP-version The version of the HTTP
Status-code Number indicates the result of the request
Reason A short phrase that explains what the number means
Metadata (Metainformation) Indicates to the browser what it must know to interpret and display the information

 

Selected HTTP status code

Code Reason Description
200 Document follows The request succeeded. The information requested follows.
301 Moved Permanently The document has moved to a new URL
302 Moved Temporarily The document has moved temporarily to a new URL
304 Not Modified The document has not been modified since the date specified in a GET request with if-modified-since.
404 Not Found The information could not be found or permission was denied. This error is returned if the requested URL does not exist or was misspelled
401 Unauthorized The information is restricted; please retry with proper authentication.
402 Payment Required The information requires paying a fee; please retry with proper payment (not used often)
403 Forbidden Access is forbidden
500 Server Error The server experienced an error

 

For example, the response for the /sinn/index.htm request might be the following:

HTTP/1.0 Status 200 Document follows
Server: NCSA/2.0
Date: Wed, 23 Jun, 1999 18:08:08 GMT
Content-type: text/html
Content-length: 5800
Last-modified: Tue, 22 Jun, 1999 12:00:00 GMT

<html>
<head>
<title>Richard P. Sinn</title>
</head>
<body>
<p><br>
<br>
</p>
<a href="subhtml/signature.html">
<p align="center"></a><a href="sinn.htm"><img src="images/homepage_title.gif" ALT WIDTH="554" HEIGHT="131"></a><br>
<a href="sinn.htm"><small><em><strong>Click the above Picture for Personal Information</strong>
... other content of /sinn/index.htm ...

From the point of view of a server, any document is just a stream of bytes delivered over the Internet. A simple ASCII text document is the same as a complicated multimedia presentation. It is up to the web client (browser) to decode and understand the doucment, and present it to the users.

Field Description
Server The type of server software providing the response
Date The date and time of the response
Content-Length How many bytes of data will be sent to the client
Content-Type The MIME type of the information being returned, such as HTML or an image
Content-Language The language of the information, such as English or French
Content-Encoded Additional encoding, such as data compression
Last-Modified The date and time that the information was most recently modified

 

 

Putting it Together - Serving a Web Document form a Web Server

Step 1: Loop and Wait for a new request

The httpd waits for a request to arrive from a web client in the Internet

http1.jpg (20608 bytes)

Step 2: A Request arrives from a Web Client

When a user click on a hyperlink such as http://www.openloop.com/index.htm, the network software (TCP stack) of the web client computer locates the server computer (using DNS or host file) and sets up a bi-directional network (socket) connection from the client to the server www.openloop.com . A request header such as

GET /index.htm HTTP/1.0

is sent.

http2.jpg (43601 bytes)

 

Step 3: The Request is Parsed by the Web Server

The Web Server parses and understand the request is a GET for information.

http3.jpg (32182 bytes)

Step 4: Parses the Rest of the Header

The web server now understnad the protocol version is 1.0. It is a Netscape browser for NT, etc. Since this is a normal example, no further action is needed. (Think about what you could do with the header and XML)

 

Step 5: Do the method requested

The httpd in this stage will fulfills the request or send back error messages. In this example, web server will search in the file system for /index.htm.

Success: The document is sent

http4.jpg (39963 bytes)

Failure: An error such as the following will be sent:

http5.jpg (36383 bytes)

 

 

Step 6: Finish up: close the file; close the network connection

Close file and connection (socket). Client will then due with the data received.

 

Step 7: Back to Step 1 and Loop and Wait

...

 

Handling Mulitple Request at a time

 

Virtual Servers

It is sometimes desirable to run multiple web servers on the same machines.

We should use different ports for different servers, for example:

http://www.isp.com:8080/companyX/index.htm AND

http://www.isp.com:8081/companyY/index.htm

But we actually want to make the URLs looks like:

http://www.companyX.com/index.htm AND

http://www.companyY.com/index.htm

We called the above fictional servers - virtual servers

Using multiple servers address on the same computer is done by an operating system extension called virtual host support. It requires separate IP addresses for each virtual server just as if they were real computers on the Internet, except the multiple IP addresses are assigned to one computer. Connections to the different addresses are routed to the appropriate server software. This feature may not be supported by all operating systems, and the implementation might be very different (jobs vs process vs keneral vs ...)

As the web server grows, it is quite easy to migrate from a virtual server to a physical server (a separate machine) without changing any URL or updating the DNS entry in the primary and secording DNS servers.

 

 

Points to ponder

How connection are setup (TCP, etc)

How inline Images are retrieved (by client)

The interaction between web server and file system (by OS interface)

How access control can be done (file system, os level, user profile, etc)

ftp://www.jfdkafjdk.something.com/somfile.h will work as well (interaction with the FTP server, not the web server)

Channel base connection (one for control info, one for real data)

Batch info for the whole page (instead of separation socket connection for images, etc)

 

 

Copyright 1996-2001 OpenLoop Computing. All rights reserved.