Advanced URL Routing and SEO Techniques with CakePHP

by Nate Abele (2008-03-31)
 

Using CakePHP's advancing routing system, you can easily create advanced, custom, SEO-friendly URLs with minimal configuration.

Last time, we looked at the proper approach to MVC design in CakePHP, which is the area you'll be living in 95% of the time when building your application.

Today, we're going to be stepping out of, or more appropriately, above that to look at the framework's dispatching cycle. When Cake receives a request from a browser or web service client, the URL and other request details are handed off to the Dispatcher. The Dispatcher then makes a call to the Router, which parses the URL using your application's routes configuration file. The job of the Router is to determine the appropriate controller and action to call based on the URL provided. It then passes this information back to the Dispatcher, which loads and executes the appropriate classes/methods.

Defining Custom Routes in Your Routes Configuration

When you first set up your CakePHP application, there's only one way your requests get routed: /controller/action/params. The first part of the URL defines maps to the controller you wish to load, the second part maps to the action within that controller you wish to call, and anything that comes after that are parameters that get passed to the action. While this may work fine for many, even most applications, you'll often want to use more human-friendly (and SEO-friendly), customized URLs. To do this, Cake provides you with a routes configuration file, which you can find in a default installation at /app/config/routes.php.

Recently, my company (OmniTI) launched a new website featuring a very unique and creative URL hierarchy (http://shiflett.org/blog/2008/mar/urls-can-be-beautiful), where the patterns could almost be interpreted as plain English sentences, i.e. http://omniti.com/is/hiring and http://omniti.com/remembers/2008. Not only are URLs in this style much more attractive and easier to remember, but they're also much more SEO-friendly, since you can include contextually-relevant information (i.e. search terms).

Following Cake convention, if we were to use the default routes, we'd need to have controller/action pairs like IsController::hiring() or RemembersController::2008(), the second of which isn't even valid syntax. Fortunately, the routing system allows us to completely divorce the request URL from the code that gets executed.

We add custom routes using the Router::connect() method, which takes 3 parameters: the first parameter is a string that defines the structure of the URL, with any parameter keywords prefixed with “:”. The second parameter is an array containing the default values of the route, which gets merged with any parameters parsed out of the URL itself. For example, if your URL string doesn't include a :controller parameter, you must include it in your defaults array. Note that you may omit :action entirely, in which case it will default to “index”. The third parameter allows us to specify regular expressions to be paired with a URL parameter, which we'll look at in more detail later on.

So let's get the simple ones out of the way first: we need to set up our home page, and our “about” pages. We can do this like so:

  1. Router::connect('/', array('controller' => 'pages', 'action' => 'display', 'home'));
  2.  
  3. Router::connect('/is/*', array('controller' => 'pages', 'action' => 'display', 'about'));

Both of these routes utilize the built-in PagesController object to serve static content (although you could easily create your own and have it load dynamic content as well). By default, PagesController loads view templates based on the route parameters it is given, so if we go to /is, then it will load views/pages/about.ctp. Likewise, if we go to /is/hiring, it will load views/pages/about/hiring.ctp. This allows us to create simple hierarchies for common sets of pages.

Customizing with Regular Expressions

However, we've already run into a problem: the /is/* route will also eat our employee profile pages (i.e. /is/nate-abele), and we want those pages to be served by our PeopleController instead. Even though people rarely leave OmniTI, we're hiring a lot right now (hint, hint) so we want to be able to serve the employee profile pages from a database. In order to do that however, we need to somehow differentiate the “people” URLs from the rest of the “about” page URLs. If we take a look, we can see a simple pattern, which is that all the people URLs are in the form /is/firstname-lastname. By mapping a regular expression to a specific route element, we can capture the person URLs and point those at our PeopleController, like so:

  1. Router::connect(
  2.   '/is/:person',
  3.   array('controller' => 'people', 'action' => 'view'),
  4.   array('person' => '\w+-\w+')
  5. );

As you can see, we're telling Cake that the :person element of the URL must be two sets of word characters, separated by a hyphen. We can then use the person parameter in our controller to retrieve the appropriate database record, per the following:

  1. <?php
  2.  
  3. class PeopleController extends AppController {
  4.  
  5.   public function view() {
  6.     $person = $this->Person->findByName($this->params['person']);
  7.     $this->set(compact('person'));
  8.   }
  9. }
  10.  
  11. ?>

Order Matters

However, you'll notice that if we put our person route below our /is/* route, it won't get executed. This is because the Router relies on route precedence. The first route found that matches the request URL is the one that's utilized. The same applies to reverse routing (taking an array of parameters and converting it to a URL). Therefore, we always want to order our routes from most specific to most general. After we have moved our person route ahead of our about page route, we see that each now works exactly according to plan.

Next, we'll tackle the /does/* pages, which we'll handle in much the same way as the /is/* pages. As you probably guessed, these pages talk about what we actually do, of which there are 4 subsection URLs: design-and-development, scalability-and-performance, web-application-security, and architecture-and-infrastructure. We'll wire these up with the following route:

  1. Router::connect('/does/*', array('controller' => 'pages', 'action' => 'display', 'work'));

Using this route, we can browse to /does and see the template stored in views/pages/work.ctp, or browse to /does/design-and-development and see the template loaded from views/pages/work/design-and-development.ctp.

Since this section isn't going to serve anything other than the one main page and 4 sub-pages, we don't really need to worry about any pattern matching. If a user types in an invalid URL, Cake won't be able to find the template page, and will just show a 404 page. There are ways to override 404s and other errors in order to tell Cake to do something more interesting with them, but that's out of scope right now.

Finally, we have the OmniTI Planet, found at /thinks. The planet aggregates news from within the company, as well as relevant posts from the personal blogs of company employees. Setting up the main route for /thinks is simple enough, but the planet sub-pages look more like /remembers/2008/expressing-our-inner-geek, so we want to route these requests to our PlanetController, which we'll do like so:

  1. Router::connect(
  2.   '/remembers/:year/:title',
  3.   array('controller' => 'planet', 'action' => 'view', 'title' => null),
  4.   array('year' => $Year, 'title' => '[a-zA-Z0-9\-]+')
  5. );

A few interesting things are going on here. First, you'll notice that we're giving 'title' a default value of null. This allows the route to match even if a title is omitted from the URL, which means we can handle URLs like /remembers/2008 without an extra route. Next, you'll notice that we're assigning a regex to 'year' with a variable ($Year) that hasn't been defined anywhere. When Cake loads your routes config file, it gives you access to a few “magic” variables to help you match common patterns, which are as follows:

- $Action - matches any CakePHP default action names, as well as other common ones - $Year - matches any calendar year from 1000 to 2999 - $Month - matches any calendar month from 01 to 12 - $Day - matches any calendar day from 01 to 31 - $ID - matches any valid whole number - $UUID - matches a valid UUID

This makes it very easy to construct calendar-style URLs and the like, by doing the following:

  1. Router::connect(
  2.   '/:controller/:year/:month/:day',
  3.   array('action' => 'index', 'day' => null),
  4.   array('year' => $Year, 'month' => $Month, 'day' => $Day')
  5. );

Getting back to our planet example, we now have two different URL patters pointing to the same action, which we need to handle somehow. Fortunately, as with most things in life, Cake makes that pretty simple:

  1. <?php
  2.  
  3. class PlanetController extends AppController {
  4.  
  5.   public function index() {
  6.       $year = (!empty($this->params['year']) ? $this->params['year'] : date('Y'));
  7.     $posts = $this->Planet->findAllByYear($year);
  8.     $this->set(compact('posts'));
  9.   }
  10.  
  11.   public function view() {
  12.     if (empty($this->params['title'])) {
  13.       return $this->setAction('index');
  14.     }
  15.     $post = $this->Planet->findByTitle($this->params['title']);
  16.     $this->set(compact('post'));
  17.   }
  18. }
  19.  
  20. ?>

Since both types of requests get routed to the view() action, it just needs to check to see that the title parameter isn't empty. If it is, just send the request over to the index() action. The site contains a few other URL patterns and top-level groupings, but all the rest of those can be easily implemented by applying the principles above.

One other common SEO trick which is not utilized in the site is to customize the delimiter used to separate the path elements. For example, you can make Cake respond to URLs like /posts-view-my_post_slug, by simply adding the following to config/bootstrap.php:

  1. $_GET['url'] = str_replace('-', '/', $_GET['url']);

With these tricks, you can provide more human-friendly, easier-to-remember URLs that will score you more points with Google, which will score you more money, which will score you more points with the ladies.

Nate Abele has been a core developer of the CakePHP framework for over two years. Widely regarded as the Johnny Cash of the PHP community, his hobbies are playing guitar, wearing black, and snowboarding. While not enjoying one of these or other hobbies, you can find him writing about himself in the third person. At the time of this writing, Nate resides in New York City near the office of his employer, OmnTI Inc., and is about to enjoy an extremely hot beverage.
File under: art  cake  cakephp  homepage  mvc  SEO 
 

Comments

There are no comments on this entry.

Visit the forum