How The Web Page Grew Up

Chances are good that you are building, have built, or are someday going to build a web application.¹ So let’s take a high level look at what constitutes a modern web application, and how we got here.

Remember the 1990s? A time when gas was cheap, grunge was in, and people were still smarter than their mobile phones? Of course, it was also the decade in which the Web was born. Seemed like everyone had a “web page”, some people even had whole “web sites”, connected by “hyperlinks”. Ahhh, good times. Let’s take a look back at how web servers worked in those glory years…

Note: While the details have gotten dramatically more interesting (read: complicated) over the last decade, these basic principles still fundamentally power any web application you use or build today, so pay attention!

Request/Response is the Heartbeat of the Web

The core of the web is powered by a series of requests and responses. A request is most often made by your browser; it contacts an HTTP server, which is a just a program running on a machine² with a public IP address somewhere. The job of the HTTP server is to look at the URL you are requesting³, and figure out what to send back to you. It acts much like a waiter, who brings your food requests to the kitchen, drink requests to the bar, and maybe the occasional request to the maître d’ or the valet. The HTTP server will process requests for a web page to one place, for an image to another place, and may even redirect your browser to an entirely different server to handle some requests.

Note: It turns out that humans are pretty sloppy when it comes to writing HTML, and browsers over the years have grown extremely forgiving. The code shown here isn’t “proper” in a strict sense. The syntactically valid version (that a good developer ought to be writing) is more verbose, but follows the same basic principles.

Basic HTML

For those that forget the HTML they learned from a 700-page Teach yourself HTML in 21 Days book in 1996, here’s a quick reminder. This is basic stuff, but stick with me, and we’ll see how it evolves into the “real” web applications we build today.

Web pages are just boring text files.⁴ What makes them special is that they have formatting tags defined by HTML right inline with the text. Here’s an over-simplified HTML text file:

 Favorite Quotes
 Seymour Cray famously said:
 “The trouble with programmers is that you can never tell what a programmer is doing until it’s too late.”
 That is so true, don’t you think?

All those bracketed characters are the HTML markup.⁵ When you open this file in a web browser (go ahead, download it and open it up, you’ll see…), it looks like this:

The Web Server

Now we know what the browser is showing us, and we said that the job of the HTTP server is typically to return HTML in response to a browser request. So how does it do that, exactly?

In the simplest systems (welcome back to 1993), the HTTP server has the file in a directory tree on its local hard drive. For a corporate page with CEO Bob’s profile, the folders might look like this.

When your browser requests the URL

http://www.example.com/about/team/bob.html, this simple web-server looks in the corresponding directories, finds that path on disk, and responds to the browser with the contents of the bob.html file.

When the browser gets the HTML, in addition to rendering it, it notices that there’s an image at http://www.example.com/about/team/bob-headshot.png, referenced in an HTML tag⁶, so the browser issues a second request. The HTTP server again finds the requested file in the path corresponding to the URL, and returns the contents of the image file.

For many modern web sites, your browser will issue hundreds of requests to gather the information to display a single page. That’s the request/response heartbeat of the web at work.

Server-Generated HTML

Web authors had barely mastered all that stuff above, when inevitably we wanted more control over the web pages. We wanted to do things like

Show a page for every product in a store without having to write thousands of separate HTML files.

<li>
  Display comments on an article in real-time without having to modify the article page HTML for each new comment.
</li>


<li>
  Provide configurable messages for different target audiences to a site.
</li>


<li>
  Show web-pages with real-time data (like the current weather).
</li>

and just about every other web behavior we now take for granted.

These are all accomplished through a great conceptual leap that transforms the web:

As long as the browser gets back valid HTML, it doesn’t care what the server did to get it. So the server doesn’t need to have HTML files on disk, it can create them, on the fly, using any programming tools it wants.

Read that one more time, because it’s easy to take for granted how transformative that premise is for the web as we know it today.

Adding a Timestamp to Our Page

Let’s see a simple example of how this might work. The PHP scripting language is a venerable⁷ web language specifically designed to co-mingle code and plain HTML.

Here, we’ve modified the HTML page with a special new tag, and inside that tag, we’ve written PHP code that prints out the current date and time.

<html>
 <h3>Favorite Quotes</h3>
 <strong>Seymour Cray</strong> famously said:
 <blockquote>“The trouble with programmers is that you can never tell what a programmer is doing until it’s too late.”</blockquote>
 I was reading this page on
 <?php
 echo date(“Y-m-d H:i:s”, time());
 ?>
</html>

Unfortunately, it’s not enough to just put this on the HTTP server and run it like before. The server needs to do more than simply return the contents of that file on disk. We tell the HTTP server⁸ that any file in our website directory that has a “.php” file extension should instead be passed to the PHP program. And the server responds to the browser with whatever that program prints out.

In order for this to work, the machine running your HTTP server has to have the PHP application installed. When the above HTML is opened by the the PHP program, it prints out the following contents.

<html>
 <h3>Favorite Quotes</h3>
 <strong>Seymour Cray</strong> famously said:
 <blockquote>“The trouble with programmers is that you can never tell what a programmer is doing until it’s too late.”</blockquote>
 I was reading this page on
 2013–10–01 12:42:47
</html>

which, when our HTTP server returns to the browser, it will happily display!

In this example, the program that’s getting run is actually that hybrid HTML/PHP file. The php program on the server knows how to identify the sections of that file written in code and it interprets those, replacing their contents with any text printed out by the code. It then returns the processed result to our HTTP server, who in turn returns it to the requesting browser.

Web Applications Today

While some web sites today are still delivered simply through stored HTML files, most are powered by lots of server-side code to generate HTML. Versions of our simple PHP example, but on steroids. And many (in fact, most of the ones you use on a daily basis) are built on huge platforms with hundreds of sophisticated moving parts. For example, when you do a simple Google search, you are getting back an HTML page that’s never existed before, and will never exist in that state again. It takes literally thousands of machines, working synchronously and in the background, to generate that page for you in a few milliseconds. Nevertheless, the basic process is the same: you make a request for a URL and the HTTP server you talk to does all the heavy lifting and returns a simple plaintext blob of HTML.

This is the great power and elegance of the Web. The simple contract between client browser and HTTP server, which is embodied in a URL request and HTML response, yields astounding flexibility and capability. It’s a warrant to be exploited by great web applications.

I really mean almost everyone…it’s looking like this web thing isn’t a fad after all ↩︎
A computer similar to your desktop at home, but running in a rack in a datacenter most likely ↩︎
Plus some other data, like your browser settings, your language settings, information in your cookies, and more. ↩︎
Just plain old text files, like the ones made by TextEdit on your Mac or Notepad.exe on the Windows machine you had once upon a time. ↩︎
The files are “marked up”, like a copy editor marks up a first draft with proofreading symbols. Hence the “markup language” in HTML (Hyper Text Markup Language) ↩︎
Looks like <img src="http://www.example.com/about/team/bob-headshot.png" height="320" width="240"/> ↩︎
Yeah, that’s code for “old and crusty and not cool any more”, but it’s still alive and kicking, so don’t be a hater. ↩︎
Each HTTP server has configuration files that it loads to determine behaviors like which programs to run to generate code for which URLs ↩︎