Ten Things You Didn't Know Apache (2.2) Could Do

Apache 2.2 has been out for a while, and just recently, 2.2.13 was released, featuring the usual slate of enhancements and bug fixes. Happily, the migration to 2.2 seems to be proceeding apace faster than the migration from 1.3 to 2.0, and most people, finally, seem to have jettisoned Apache 1.3.

However, it also seems that a lot of folks are completely unaware of some of the cool new things available in 2.2. Sites are so used to Apache just working; most don’t think about the new features that are going into the Web server all the time.

Here, let’s look at some of the more exciting innovations found in 2.2 and perhaps peek at one or two of the more esoteric ones. You may be surprised and amazed by what’s been lying under your nose all this time.

SNI

I realized long ago that leaving the best to last merely ensures that most people won’t make it that far. So, let’s start with the most compelling feature. If you merely read this first page, you’ll still be ahead of the other system administrators in your office.

Since the beginning of time (the beginning of the web, anyways) SSL suffered from a fundamental shortcoming. Simply stated, you had to have one IP address for every new SSL host that you wanted to run. (The exact origin of this limitation isn’t terribly important right here. You can find a number of articles on the subject elsewhere.) But now that we’ve finally arrived in the 21st Century, you can finally run multiple SSL virtual hosts on the same IP address. You can do this with something called Server Name Indication (SNI).

The deal with SSL is that you don’t know what name is being requested until after the certificate — possibly the wrong one — has already been exchanged. With SNI, this is addressed by sending the server name as part of the initial negotiation, so that you get the certificate that goes with the right name.

Apache 2.2.12 contains SNI, and you can now serve multiple SSL hosts off of one IP address. More good news is that every modern browser supports this feature and has for some time, just waiting for more sites to implement it on the server side. The bad news is that the documentation is somewhat behind the implementation, but hopefully that will get resolved real soon now.

At the moment, however, the best documentation for this functionality is in the docs wiki, at http://wiki.apache.org/httpd/NameBasedSSLVHostsWithSNI. The docs wiki is sort of a staging ground for the Apache documentation, so that stuff eventually makes it into the official docs.

mod_substitute

A frequently asked question on the various Apache support forums is how to modify the content within a page as it is being served out to the client. For example, if you’re proxying to a back-end server and that server has URLs embedded in the pages that point to that back-end server, the end-user on the Internet, being unable to reach that back-end server directly, simply experiences a bunch of broken links. So what’s to be done? In the past there wasn’t much that could be done, short of using a third-party module called mod_proxy_html, which was written specifically for this situation. You can read more about it, as well as more about the situation it attempts to resolve, at http://apache.webthing.com/mod_proxy_html/.

But there is a larger class of problems at hand. What if you just want to modify something in content that’s being served to the end users? Perhaps you’re running a third-party application and don’t have access to the source to customize it, but you want to make some modifications to the output that it produces.

Another module, also available at webthing.com, is mod_line_edit (http://apache.webthing.com/mod_line_edit/) allows you make arbitrary modifications, using sed-like syntax, to the outgoing HTTP response body.

Apache 2.2 introduced mod_substitute, which includes some of the functionality of both of the latter modules and allows you to modify the response that is being sent to the web client, using regular expressions. While this doesn’t do anything that mod_line_edit, or Basant Kukreja’s mod_sed don’t do, it has the advantage of being part of the Apache 2.2 distribution, and so it’s one less step to acquire it.

To use mod_substitute, you must know enough about regular expressions to express your desired change. For example, if you are proxying a back-end server images.local and want to replace that hostname in URLs with its external hostname, you would do the following:

AddOutputFilterByType SUBSTITUTE text/html
Substitute s/images.local/images.mysite.com/i

In this case, the i on the end indicates that the substitution should happen in a case-insensitive fashion. The AddOutputFilterByType directive specifies what kind of files the substitution should affect. You don’t want to do substitutions on images or PDF files, for example, as it will corrupt them and result in garbage.

Place these directives in a or block where you want it to be in effect, or in a .htaccess file, if you don’t have access to the main server configuration file.

Graceful Stop

This may not seem like a big deal, but folks have been asking for it for a long time. Apache 2.2 adds the graceful-stop option, to stop the server … um … gracefully.

Usually, when you stop, or restart Apache, it kills all the existing client connections as part of the process. This results in angry end-users, and your phone rings, and your boss yells at you. Yelling is generally to be avoided.

So, a long, long time ago, the graceful-restart option was added, which allows you to restart the server, but without abruptly terminating in-process client connections.

$  httpd -k graceful-restart

But there are times when you need to shut down a server entirely, and in that case, too, the clients are abruptly dropped. For example, you may want to take a server out of a load-balanced configuration, but you don’t want existing client sessions to be terminated. So what do you do?

Well, with Apache 2.2, a new option stops the server but allows ongoing connections — say, if someone is executing a long-running script or downloading a large file — to complete before the child processes are killed.

$ httpd -k graceful-stop

This has the direct result of your phone ringing less when you’re doing server maintenance. Highly recommended.

mod_proxy_balancer

A lot has been written about mod_proxy_balancer, yet every time I mention it, someone is surprised that this is an included feature of the Apache product. So, here again, mod_proxy_balancer.

Apache 2.2 comes with a front-end proxy that load balances between an arbitrary number of back-end servers. It also maintains sticky sessions; that is, once a client is routed to a particular server, you can force that client to always go back to that server, so that their sessions are not interrupted. It does traffic-based load balancing. It does hot spares: a server can be automatically rolled into the rotation if one of the other ones dies. It has a Web-based management console where you can remove servers from the rotation or modify a server’s priority in the rotation.

So, it’s really a full-featured load balancing proxy. And it’s free, and included in your Apache 2.2 server.

To get started with mod_proxy_balancer, define your pool, or “cluster” of hosts to be balanced:


BalancerMember http://192.168.1.50:80
BalancerMember http://192.168.1.51:80
BalancerMember http://192.168.1.51:80

Then, tell your server to proxy requests through to those servers:

ProxyPass /test balancer://mycluster/

If that seems deceptively easy … well, it actually is that easy, but you can also configure a raft of other options on top of that, including those mentioned above.

As with the other features I’ve mentioned, I’m not going to reproduce the documentation here. Instead, take a look at the examples at http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html


httpd -M

Apache loads modules in two different ways. You can compile them into the server binary when you first install Apache, or you can load them dynamically at startup time using the AddModule directive. Almost every Apache installation has some of each kind. Until recently, if you wanted to know what modules you had loaded, you had to look two different places. You’d run httpd -l to get a list of the compiled-in type:

$ httpd -l
Compiled in modules:
core.c
prefork.c
http_core.c
mod_so.c

Then you’d have to go look in your server configuration file and see what modules had AddModule directives. This is actually harder than it sounds, because a lot of third-party distributions of Apache put each AddModule directive in a separate file, with names like php.load and mod_perl.conf and so on.

In another minor change with a big impact, Apache 2.2 adds the -M flag, which allows you to list all of the modules that are loaded, both static and shared:

$ httpd -M
Password:
Loaded Modules:
core_module (static)
mpm_prefork_module (static)
http_module (static)
so_module (static)
authn_file_module (shared)
...
php5_module (shared)
pony_module (shared)

Each module indicates whether it is static or shared, and now you know for certain what modules were successfully loaded and which ones you forgot.

And, yes, that’s mod_pony. Seriously.

httxt2dbm

If you’re like the rest of us, you have, over the years, accumulated lengthy lists of RewriteRule and Redirect directives to map old URLs to the new ones. These stack up, and, over time, can cause a great deal of confusion about where your content actually lives, not to mention a big performance hit when all the rules have to be processed every time a request is made to your server.

One way to consolidate these redirects is with RewriteMap, a directive in mod_rewrite that allows you to define an external map of rewrite rules. This may be as simple as a text file that lists the mappings, or as complicated as an external script or program, or a database query, that determines the rules.

So, for example, if you have a bunch of old URLs that you want to redirect to new ones (a very typical case), or perhaps just friendly, easier-to remember URLs that you want to redirect to the actual ugly back-end ones then you might have a RewriteMap file like this, called dogs.txt:

/collie /dogs.php?id=875
/doberman /dogs.php?id=12
/daschund /dogs.php?id=99
/siamese /cats.php?id=84

Then, you would use this file in a RewriteMap:

RewriteMap dogmap txt:/path/to/file/dogs.txt

And use the RewriteMap in a RewriteRule:

RewriteRule ^/dogs/(.*) ${dogmap:$1}

The trouble is that this is a plain text file, and, as such, unindexed and therefore slow. Every time you request a URI, mod_rewrite looks through this list, one item at a time, until it finds the one that it needs. And the more items you add to the list, the longer each lookup takes.

For years, the documentation suggested that you could convert the text file to a dbm, and offered a Perl script for doing so. Unfortunately, the script didn’t work particularly well, and, if you could get it to work, there was always the problem of picking the right type of dbm for your particular operating system.

With the 2.2 version, there’s a utility that comes with the server, and is installed alongside the other binaries, that not only converts your text file into a dbm, but correctly selects the same dbm library that your installation of Apache was built with, thus ensuring compatibility.

This script, called httxt2dbm, is used as follows.

httxt2dbm -- Program to Create DBM Files for use by RewriteMap
Usage: httxt2dbm [-v] [-f format] -i SOURCE_TXT -o OUTPUT_DBM

Options:
-v More verbose output

-i Source Text File. If '-', use stdin.

-o Output DBM.

-f DBM Format. If not specified, will use the APR Default.
GDBM for GDBM files (unavailable)
SDBM for SDBM files (available)
DB for berkeley DB files (unavailable)
NDBM for NDBM files (unavailable)
default for the default DBM type

For most of us, the -f option is not particularly useful. Of course, we want it to use the APR default - that is, whatever Apache was built with. If you actually know what the differences are between the various dbm formats, perhaps you have reasons for using a different one, and can do that if you really want to.

$ httxt2dbm -i dogs.txt -o dogs.map

Now, you can modify your RewriteMap directive to use this new file:

RewriteMap dogmap dbm:/path/to/file/dogs.map

Lookups are now performed against the dbm, and so are much faster.

PCRE Zero-Width Assertions

I said I wasn’t going to leave the best to last, but this last one is very cool, and answers one of the most frequently asked questions, although often the folks asking the question wouldn’t think to ask for this particular solution.

The question that tends to get asked often goes something like, “How can I redirect everything except for a particular directory.” For example, requests for anything on this server, I want to redirect over to that other server, except for requests for the images directory.

Now, Apache offers a RedirectMatch directive that allows you to use regular expressions to specify a class of URIs that you want to redirect. Unfortunately, it does not have a negation operator, so you can’t simply say “everything that doesn’t match images.” very easily.

At least until now.

One of the changes with the 2.2 version of the server is that RedirectMatch and all of the other *Match directives now use the Perl Compatible Regular Expression library (PCRE) and so have the full power of the regular expressions that you know and love from your favorite programming language.

One of the cooler of these features is zero-width assertions. Now, I’m not going to go into all the details of what these are. That’s covered very nicely in the tutorial at http://www.regular-expressions.info/lookaround.html. Instead, I’ll give you a specific way that they can be used in Apache to answer this frequently asked, seldom answered question.

RedirectMatch ^/(?!images/)(.*) http://dynamic.myhost.com/$1

This RedirectMatch redirects all URLs to http://dynamic.myhost.com/, unless the URL starts with /images/. This regular expression syntax is called a negative lookahead, and allows you to assert that a string does not contain a particular thing.

It makes me happy when something that I’ve always answered with “you can’t do that” becomes possible, and even easy.

Summary

Apache 2.2 has some great hidden treasures in it that a lot of folks are simply unaware of. 2.4 has even more of them. I can hardly wait.

No comments: