Step Away From the SuperGlobals! An Introduction to Inspekt

by Ed Finkler (2008-02-18)
 

Inspekt is a library for PHP4 and PHP5 that aims to make safe input handing easier, and unsafe actions more difficult. Inspekt establishes a new development approach by wrapping input within “cage” objects, and requiring the developer to use validation and filtering methods to test and manipulate the input data. This article provides a brief introduction to Inspekt and its capabilities.

Introduction

Inspekt is a library for PHP 4 and PHP 5 that aims to make safe input handing easier, and unsafe actions more difficult. Inspekt establishes a new development approach by wrapping input within “cage” objects, and requiring the developer to use validation and filtering methods to test and manipulate the input data. In cases where raw, unfiltered input is required, the developer is forced to demonstrate clear intent. This makes secure development significantly easier, and insecure development significantly harder.

If we examine security issues in web applications, the vast majority have to do with how input is handled. The National Vulnerability Database data from the past two years shows us that mistakes with input validation and filtering in PHP-based web apps account for a near-majority of all reported vulnerabilities. Handling input safely, then, needs to be a core focus of the tools and libraries a developer chooses.

The standard way of retrieving input in PHP is to work with the Superglobals. $_GET, $_POST, $_COOKIE and the rest give you easy access to data passed into your web app, like so:

  1. // copy the value form $_GET to $id
  2. $id = $_GET['id'];
  3. echo "thanks for sending me $id, dude!";
  4.  
  5. // just use the $_GET val directly
  6. echo 'it\'s even more awesome to grab '.$_GET['id'].' directly from input, right?';

Despite the comments above, this is not awesome. If you've read anything with the words “security” and “PHP” in the title within the past few years, you will have noticed that there's no input filtering or validation being done in the code above. It's ripe for all sorts of attacks.

The usual way this gets addressed is with what we might call piecemeal filtering. Basically, you pick the appropriate function and call it right before you do anything dangerous with your data.

  1. // escaped, so not a potential SQL injection threat.
  2. $id = mysql_real_escape_string($_GET['id']);
  3. echo "Now I know that $id is totally safe to insert into my database";
  4.  
  5. // cast as int, so
  6. $id = intval($_GET['id']);
  7. echo "$id is most certainly an integer now";

That works okay for short snippets of code, but some problems start showing up when we deal with anything of appreciable size:

  • remembering the right filtering method for a given case can be a challenge
  • the odds increase that input will be improperly filtered, or not be filtered at all
  • The default action when dealing with input is unsafe. Handling input is “dangerous by default.” Given an increasing number of places where filtering needs to be applied, and given an increasing number of developers working on a codebase as it grows—it gets more and more likely that a mistake will be made.

Inspekt takes a different approach. When used in the recommended way, the unsafe choice is no longer the default. The developer is forced to make a decision about how to handle input at every step.

Inspekt does this by creating cage objects around input data. These cages only allow access to their data through their methods, all of which perform validation or filtering, save for the getRaw method. The creation of the cage also unsets the existing Superglobal, so the only way to get the data is through the cage object.

  1. // Example: creating a cage for $_POST
  2. require_once "Inspekt.php";
  3.  
  4. // makePostCage creates a cage from the $_POST array
  5. $cage_POST = Inspekt::makePostCage();
  6.  
  7. // get the integer value from the POST payload
  8. $userid = $cage_POST->getInt('userid');
  9.  
  10. if ( !isset($_POST['userid']) ) {
  11.     echo 'Cannot access input via $_POST -- use the cage object';
  12. }

The above example only wraps one Superglobal in an input cage. Inspekt makes it easy to automatically wrap all input with the makeSuperCage method:

  1. // require the inspekt library
  2. require "Inspekt.php";
  3.  
  4. // create a "SuperCage" to wrap all possible user input
  5. // the SuperCage should be created before doing *anything* else
  6. $input = Inspekt::makeSuperCage();
  7.  
  8. // we need to ensure that $_POST['userid'] is an integer
  9. if ($userid = $input->post->testInt('userid')) {
  10.    /* do stuff with $userid */
  11.    $db->insert("id={$userid}");
  12. } else {
  13.    trigger_error('$userid input is invalid', E_USER_ERROR);
  14. }

With the SuperCage, a cage is made for each Superglobal and assigned as a property to the SuperCage. They are:

  • $sc->get: cage from $_GET
  • $sc->post: cage from $_POST
  • $sc->cookie: cage from $_COOKIE
  • $sc->server: cage from $_SERVER
  • $sc->files: cage from $_FILES
  • $sc->env: cage from $_ENV
  • If we create the SuperCage at the start of our web app's initialization process, it can create cages before input is handled. This enforces the Inspekt approach for handling input across the entire application, for all contributors.

Each cage has a variety of methods for filtering and validation. Filter methods remove data from the value of the given key and return what remains. If the key does not exist, they return FALSE. Some examples include:

  • getAlnum
  • getInt
  • noTags
  • Tester methods return the value of the given key on pass, and FALSE on fail or if key fails. Some examples:
  • testAlpha
  • testEmail
  • testIp

Scoping Issues

PHP's Superglobals are convenient in that they maintain a global scope without the need to declare then as globals in each function. PHP unfortunately does not allow user-defined superglobals, so we need to find a way to reduce the burden of switching scopes.

All of the Inspekt::make*Cage() methods implement a singleton pattern. This means that the developer does not have to pass the cage object to and from functions—or use the global keyword—to access it outside of the global scope. Just use the make*Cage() method to access the same object you created in a different scope.

  1. function foo() {
  2.    $cage_foo = Inspekt::makeServerCage();
  3.    return $cage_foo;
  4. }
  5.  
  6. // All of these return the *same object*
  7. $cage1 = Inspekt::makeServerCage();
  8. $cage2 = foo();
  9. $cage3 = Inspekt::makeServerCage();
  10.  
  11. // outputs bool(true)
  12. var_dump($cage1 === $cage2 && $cage2 === $cage3);

Multidimensional Arrays and Array Path Queries

Inspekt uses a special kind of formatting to make it easier to grab an arbitrary key from a deep multidimensional array. Let's take a (rather contrived) form example like this:

  1. <form action="formtest.php" method="POST">
  2.    <h3>Enter 5 email addresses</h3>
  3.    <input type="text" name="email_addresses[group1][a]" value="foo1@bar.com" /><br />
  4.    <input type="text" name="email_addresses[group1][b]" value="foo2@bar.com" /><br />
  5.    <input type="text" name="email_addresses[group1][c]" value="foo3@bar.com" /><br />
  6.    <input type="text" name="email_addresses[group2][a]" value="foo4@bar.com" /><br />
  7.    <input type="text" name="email_addresses[group2][b]" value="foo5@bar.com" /><br />
  8.    <input type="text" name="email_addresses[group3][a]" value="foo6@bar.com" /><br />
  9.    <input type="text" name="email_addresses[group3][b]" value="foo7@bar.com" /><br />
  10.  
  11.    <input type="submit" name="submit" value="Go!" id="submit" />
  12. </form>

Submitting the form above would give us an array that looks like this:

  1. [email_addresses] => Array
  2.        (
  3.            [group1] => Array
  4.                (
  5.                    [a] => foo1@bar.com
  6.                    [b] => foo2@bar.com
  7.                    [c] => foo3@bar.com
  8.                )
  9.  
  10.            [group2] => Array
  11.                (
  12.                    [a] => foo4@bar.com
  13.                    [b] => foo5@bar.com
  14.                )
  15.  
  16.            [group3] => Array
  17.                (
  18.                    [a] => foo6@bar.com
  19.                    [b] => foo7@bar.com
  20.                )
  21.  
  22.        )
  23.  
  24.    [submit] => Go!

Inspekt's “Array Path” queries use a UNIX path-style notation to reference items inside an array. Using it, we can test a particular entry with the following code:

  1. $input = Inspekt::makeSuperCage();
  2.  
  3. if ($email = $input->post->testEmail('/email_addresses/group3/a')) {
  4.    echo $email;
  5. } else {
  6.    echo "invalid address";
  7. }

The notation is very simple, and has a few caveats:

  • The forward slash (/) is the separator, so you can't access keys where you're using that character
  • Any numeric keys are converted to integers, so you can't access keys that are numeric strings
  • All queries must include the full path from the root of the array
  • Leading and trailing slashes are ignored. These are all equivalent:
  1. /x/woot/booyah/
  2. /x/woot/booyah
  3. x/woot/booyah/
  4. x/woot/booyah

Automatic Filtering

Inspekt supports automatic input filtering by using a config file. Multiple filters can be chained together and applied either to a single input key, or all input within a certain method (GET, POST, etc.).

  1. ; config.ini
  2.  
  3. [_POST]
  4. *=noTags,getAlnum   ; * means apply to all values in this input array
  5. username=getAlpha   ; apply getAlpha automatically to username
  6. userid=getInt       ; apply getInt automatically to userid
  1. // for the sake of this example, plug-in some values
  2. $_POST['userid'] = '--12<strong>34</strong>';
  3. $_POST['username'] = 'se77777enty_<em>fiv</em>e!';
  4.  
  5. // create a supercage and pass it a config file path
  6. $sc = Inspekt::makeSuperCage('./config.ini');
  7.  
  8. // displays "1234" -- the value has already been altered
  9. echo $sc->post->getRaw('userid');

As shown above, automatic filtering is applied immediately.

Using Inspekt with an Existing Application

Because the cage objects unset the input Superglobals, Inspekt may interfere with existing apps and frameworks. If this is the case, we might consider disabling the unset behavior like so:

  1. // make a supercage without unsetting the source superglobals
  2. $input = Inspekt::makeSuperCage(null, FALSE);

If we take this approach, coding standards should be more carefully enforced. Something like the PHP_CodeSniffer package (http://pear.php.net/package/PHP_CodeSniffer/) may be very useful in these situations. The codebase can then be migrated away from reliance on Superglobals, and the unset behavior can be re-enabled.

Summary

Inspekt isn't an all-in-one solution to a web app's security problems—nothing is. Inspekt should be used as part of a multilayered security approach, including hardware and software firewalls, application firewalls like mod_security, proper configuration of PHP and other parts of your web app stack, and sensible segregation of data on a “need to know” basis.

Inspekt does, however, address the biggest security problem with web apps: improper handling of input. By changing the way we interact with input in PHP, Inspekt makes it significantly easier for developers to apply input filtering correctly and consistently.

For more information, you can visit:

Ed Finkler is the Web and Security Archive Administrator for the Center for Education and Research on Information Assurance and Security (CERIAS) at Purdue University. He is also a member of the PHP Security Consortium, and project lead on the PhpSecInfo and Inspekt security tools. He is the creator of an AIR-based Twitter client that won “Best HTML Community App” in the Adobe AIR Developer Derby. His primary interests are web app security, development, graphic design, and electronic music.
File under: art  homepage  inspekt  security 
 

Comments

Re: Step Away From the SuperGlobals! An Introduction to Inspekt by StR (2008-02-27 18:40:56 (America/Toronto))
I read the Poka Yoke article in php|arch, and explained something like this. With that article, i created my own filter.. a little class like this one. Then came Zend Framework, and it was not compatible. Zend_Controller_Router needed the $_GET superglobals, so we remove the unset from the class.

Then, we created a simple ORM (using Zend_Db_Table) and was able to do $table->update($arrayData). Our class wasn't able to return an array, so we wrote the getRaw method... so,  at the end, no one was using the filter, everywone just used the getRaw method.

By this time, none of the poka yoke principles were applied to our class... and so it was removed. We thought then that filtering was a model's task (model as in MVC) and delegate that task to each model.
Re: Step Away From the SuperGlobals! An Introduction to Inspekt by marcot (2008-03-01 11:54:50 (America/Toronto))
Hey—

I wrote the poka-yoke article; if you remember, one of the conditions that I mentioned as necessary to the success of a poka-yoke system was the fact that it needs to be convenient for the developers to use—that is, the fail-safe mechanism has to be passive (i.e.: refuse to work when you're doing things the wrong way) rather than actively preventing you from doing your job.

The real problem with poka-yoke is that all the pieces of the puzzle must respect them in order for the entire system to work. As you found out, this is super-difficult when you need to integrate different software products that were not all built starting from the same basic infrastructure (incidentally, this one of the very few good reasons why filtering should be built into core, and one that I've very rarely heard proponents of ext_filter make). The pieces don't fit well together, developers get annoyed and they start working around the limitations of the fail-safe mechanisms, thus defeating their purpose.

I found that a good way to deal with this kind of problems is to provide as much isolation as possible between the different systems. For example, many of the pieces of software in our infrastructure talk to each other using a very simple messaging system—a sort of super-lightweight SOA—so that the design specs of each of them can be as unique as they need to be. This enables us to integrate external applications without going crazy trying to fit a square peg into a round hole, and to take advantage of convenient features in other software (like phpBB, for example), without having to compromise our own infrastructure.


--Mt.
Visit the forum