Reflections on Designing an IRC Bot in PHP, Part 1

by Matthew Turland (2008-04-07)
 

If you've ever perused the internet in search of an article or reference on writing an IRC bot in PHP, you were likely disappointed. Most such documents are outdated to the point that their examples are based on PHP 4 and its socket functions. Few show more than the most trivial of bot implementations and barely touch the tip of the iceberg where the IRC protocol is concerned. This article is intended to help remedy that situation.

Storytime

The PHP Community channel on the Freenode IRC network, #phpc, had a longstanding bot called “Ai”. Like many bots at the time of her creation, she was based on PHP 4. Her source was never released and the only way to have updates made to her was to contact the original developer and wait until his availability allowed him time to work on her.

With the coming end-of-life of PHP 4 and at the encouragement of channel users, I decided to start a project to develop a new bot based on PHP 5 that would fully utilize its new object model and offer users a chance to contribute to the bot they used in their channel.

To the Drawing Board

I could go into a long discussion of some of the earlier iterations of the project, but I'm not sure you would glean much from that. Suffice it to say, the bot that would later come to be known as Phergie was the subject of a fair amount of organic growth and is still today in an alpha state of development. Instead, I'll just go straight into a description of and reasons behind her current design and some additions that are currently intended for inclusion in short-term future releases.

To start with, I needed to get intimately familiar with the IRC protocol in order to make informed design decisions. The IRC Help web site has an excellent section on the RFCs related to the IRC protocol and extensions to it, such as the Client-To-Client Protocol or CTCP. If you don't plan on writing your own bot from scratch, this will at the very least give you a deeper understanding of the design of any existing library you might use. If you do write a bot from the ground up, this knowledge is crucial to getting it to a functional point and being able to troubleshoot issues with it.

The Phergie project came about around the same time that I discussed an idea with a friend of mine, Ben Ramsey, to wrap the libircclient library in a PECL extension. We both dabbled in that project on and off before becoming involved in other tasks, so for the moment it's on the back burner. Assuming that it would one day see a stable release, though, I wanted my new project to be capable of utilizing it when that time came.

Lastly, I wanted to make it as easy as possible to get the bot up and running with minimal configuration as well as to create new plugins for the bot to extend and enhance its base functionality. With all these thoughts in mind, I started work on the core. See Figure 1 for a diagram of its constituent components and how they interact with each other.

The Bootstrap

The bootstrap file is responsible for basic setup tasks that need to take place before the bot can run. These include things like checking the PHP version, setting the include path, reading the configuration file, and instantiating the driver and plugin classes.

It eventually executes a driver method to initiate an event handling loop, which handles receiving events and passing them onto plugins until the bot's connection is terminated. Once that method terminates, the bootstrap checks its return value and, if needed, will reinitialize itself in order to establish a new connection in place of a terminated one or to reload the configuration file. See below for the current revision of the bootstrap file source code.

  1. /**
  2. * Check to see if the version of PHP meets the minimum requirement
  3. */
  4. if (version_compare('5.1.2', PHP_VERSION, '>')) {
  5.     trigger_error('Fatal error: PHP 5.1.2+ is required, current version: ' . PHP_VERSION, E_USER_ERROR);
  6.  
  7. /**
  8. * Backwards compatibility check to see if the PHP version is lower than 5.2
  9. */
  10. } elseif (version_compare('5.2', PHP_VERSION, '>')) {
  11.     trigger_error('Warning: PHP 5.2+ is recommended, current version: ' . PHP_VERSION, E_USER_WARNING);
  12. }
  13.  
  14. /**
  15. * Code base version
  16. *
  17. * @const string
  18. */
  19. define('PHERGIE_VERSION', '1.0.3');
  20.  
  21. /**
  22. * Path to the configuration file used by default when one is not specified or
  23. * register_argc_argv is disabled in php.ini
  24. *
  25. * @const string
  26. */
  27. define('PHERGIE_DEFAULT_INI', 'phergie.ini');
  28.  
  29. /**
  30. * Path to the directory containing the Phergie directory
  31. *
  32. * @const string
  33. */
  34. define('PHERGIE_DIR', dirname(__FILE__) . DIRECTORY_SEPARATOR);
  35.  
  36. /**
  37. * Path to the directory containing the plugins
  38. *
  39. * @const string
  40. */
  41. define('PHERGIE_PLUGIN_DIR', PHERGIE_DIR . 'Plugin' . DIRECTORY_SEPARATOR);
  42.  
  43. /**
  44. * Add the Phergie directory to the include path
  45. */
  46.     . PATH_SEPARATOR .
  47.     dirname(PHERGIE_DIR)
  48. );
  49.  
  50. /**
  51. * Check to make sure the CLI SAPI is being used
  52. */
  53. if (strtolower(php_sapi_name()) != 'cli') {
  54.     trigger_error('Phergie requires the CLI SAPI in order to run', E_USER_ERROR);
  55. }
  56.  
  57. /**
  58. * Allow the bot to run indefinitely
  59. */
  60.  
  61. /**
  62. * Determine what configuration file should be used
  63. */
  64. if (!ini_get('register_argc_argv')) {
  65.     echo 'The register_argc_argv setting in php.ini is disabled, defaulting to ' . PHERGIE_DEFAULT_INI . PHP_EOL;
  66.     $ini = PHERGIE_DEFAULT_INI;
  67. } else if ($argc == 1) {
  68.     echo 'No configuration file specified, defaulting to ' . PHERGIE_DEFAULT_INI . PHP_EOL;
  69.     $ini = PHERGIE_DEFAULT_INI;
  70. } else if (!empty($argv[1]) && file_exists($argv[1]) && is_readable($argv[1])) {
  71.     echo 'Using specified configuration file ' . $argv[1] . PHP_EOL;
  72.     $ini = $argv[1];
  73. } else {
  74.     echo 'Invalid or no configuration file specified, defaulting to ' . PHERGIE_DEFAULT_INI . PHP_EOL;
  75.     $ini = PHERGIE_DEFAULT_INI;
  76. }
  77.  
  78. /**
  79. * Name of the configuration file currently in use
  80. *
  81. * @const string
  82. */
  83. define('PHERGIE_INI', basename($ini));
  84.  
  85. /**
  86. * Path to the configuration file
  87. *
  88. * @const string
  89. */
  90. define('PHERGIE_INI_PATH', realpath($ini));
  91.  
  92. /**
  93. * Loader to automate inclusion of classes based on directory structure and
  94. * class naming conventions.
  95. *
  96. * @param string $class Class name to check and attempt to load
  97. * @return void
  98. */
  99. function phergieAutoLoader($class) {
  100.     $file = str_replace('_', DIRECTORY_SEPARATOR, $class) . '.php';
  101.     require_once($file);
  102.     if (class_exists($class)) {
  103.         return;
  104.     }
  105.     trigger_error('Could not load class "' . $class . '" from file "' . $file . '"', E_USER_ERROR);
  106. }
  107.  
  108. spl_autoload_register('phergieAutoLoader');
  109.  
  110. /**
  111. * Start a runtime loop that will reload all settings from the configuration
  112. * file if the bot disconnects and reconnects, allowing for flushing of the
  113. * configuration without a full shutdown of the bot
  114. */
  115. while (true) {
  116.     /**
  117.     * Obtain and validate the contents of the configuration file
  118.     */
  119.     $required = array('server', 'username', 'nick');
  120.     $config = parse_ini_file(PHERGIE_INI_PATH);
  121.  
  122.     if (count($config) == 0) {
  123.         trigger_error('Configuration file inaccessible or empty: ' . $ini, E_USER_ERROR);
  124.     }
  125.  
  126.     $missing = array();
  127.     foreach ($required as $value) {
  128.         if (empty($config[$value])) {
  129.             $missing[] = $value;
  130.         }
  131.     }
  132.     if (count($missing) > 0) {
  133.         trigger_error('Fatal error: Required configuration settings missing: ' . implode(', ', $missing), E_USER_ERROR);
  134.     }
  135.     unset($required, $missing, $value);
  136.  
  137.     /**
  138.     * Set error reporting to display errors if debug mode is enabled
  139.     */
  140.     if ($config['debug']) {
  141.         error_reporting(E_ALL | E_STRICT);
  142.         ini_set('display_errors', true);
  143.         ini_set('ignore_repeated_errors', true);
  144.     }
  145.  
  146.     /**
  147.     * Configure the client
  148.     */
  149.     if (isset($config['driver'])) {
  150.         $driver = ucfirst(strtolower($config['driver']));
  151.     }
  152.     if (!isset($driver) || !file_exists(PHERGIE_DIR . 'Driver' . $driver . '.php')) {
  153.         trigger_error('Driver not specified or not found, defaulting to Streams', E_USER_NOTICE);
  154.         $driver = 'Streams';
  155.     }
  156.     $class = 'Phergie_Driver_' . $driver;
  157.     $client = new $class();
  158.  
  159.     foreach ($config as $setting => $value) {
  160.         $client->setIni($setting, $value);
  161.     }
  162.  
  163.     unset ($setting, $value, $driver, $class);
  164.  
  165.     /**
  166.     * Determine which plugins should be loaded
  167.     */
  168.     $all = true;
  169.     $include = array();
  170.     if (!empty($config['plugins'])
  171.         && preg_match('/(all|none)(?:\s*except\s*(.+))?/ADi', $config['plugins'], $match)) {
  172.         $all = trim(strtolower($match[1])) != 'none';
  173.         if (!empty($match[2])) {
  174.             $include = array_map('strtolower', preg_split('/[, ]+/', trim($match[2])));
  175.         }
  176.     }
  177.  
  178.     unset ($config, $match);
  179.  
  180.     /**
  181.     * Set up plugins
  182.     */
  183.     $iterator = new DirectoryIterator(PHERGIE_PLUGIN_DIR);
  184.     $plugins = array();
  185.     foreach ($iterator as $entry) {
  186.         if ($iterator->isFile() && pathinfo($entry, PATHINFO_EXTENSION) == 'php') {
  187.             $name = basename($entry, '.php');
  188.             if ($all xor in_array(strtolower($name), $include)) {
  189.                 $plugins[] = $name;
  190.             }
  191.         }
  192.     }
  193.  
  194.     ksort($plugins);
  195.  
  196.     unset ($iterator, $entry, $name, $all, $include);
  197.  
  198.     foreach ($plugins as $plugin) {
  199.         $class = 'Phergie_Plugin_' . $plugin;
  200.         /**
  201.         * @todo When PHP 5.3 is a stable release, change this to
  202.         *       $class::checkDependencies($client, $plugins);
  203.         */
  204.         if (call_user_func(array($class, 'checkDependencies'), $client, $plugins)) {
  205.             $instance = new $class($client);
  206.             $client->addPlugin($instance);
  207.             $client->debug('Loaded ' . $plugin);
  208.         } else {
  209.             $client->debug('Unable to load ' . $plugin);
  210.         }
  211.     }
  212.  
  213.     unset ($plugins, $plugin, $class, $instance);
  214.  
  215.     /**
  216.     * Execute the event handling loop for the client
  217.     */
  218.     $state = $client->run();
  219.     unset($client);
  220.  
  221.     switch($state)
  222.     {
  223.         case Phergie_Driver_Abstract::RETURN_RECONNECT:
  224.             sleep(1);
  225.             break;
  226.         case Phergie_Driver_Abstract::RETURN_KEEPALIVE:
  227.             sleep(15);
  228.             break;
  229.         case Phergie_Driver_Abstract::RETURN_END:
  230.             break 2;
  231.     }
  232. }

The Configuration File

The Phergie project follows the implementation of PHP itself by using a centralized INI configuration file that includes both core settings (the server hostname, the nick and username for the bot, etc.) as well as those that are specific to individual plugins.

A feature desired early on in the project was the ability to modify configuration settings both in memory and in the file on disk from within plugins. This made it possible to persist setting modifications between executions of the bot without requiring direct access to the configuration file itself.

PHP's native parse_ini_file function doesn't provide very exact information on parse errors and tends to have parsing issues when setting values contain an equal sign. There is also no equivalent function to write changes back to an INI file. Preservation of comments and formatting were an important related concern.

As such, a class is currently in development to handle reading and writing of the configuration file and to add a number of features such as support for constants and variables in setting values, use of arithmetic and bitwise operators, arrays of values under a single setting name, and smart caching of setting values into memory. Other classes may be developed in the future to handle other configuration file formats including PHP, XML, and JSON.

See below for the core settings section of the the current revision of the stock configuration file.

  1. ;------------------------
  2. ; Core settings
  3. ;------------------------
  4.  
  5. ; server :
  6. ;    Host name of the server to which the bot should connect.
  7. server =
  8.  
  9. ; port :
  10. ;    Port on which the bot should connect, defaults to 6667 if none is
  11. ;    specified.
  12. port =
  13.  
  14. ; username :
  15. ;    Username for the bot.
  16. username =
  17.  
  18. ; nick :
  19. ;    Nick for the bot.
  20. nick =
  21.  
  22. ; realname :
  23. ;    Real name for the bot.
  24. realname =
  25.  
  26. ; password :
  27. ;    Optional server password to use when connecting. This is not the same
  28. ;    as a password used to authenticate with a NickServ agent, which is
  29. ;    stored in the nickserv.password setting.
  30. password =
  31.  
  32. ; invisble :
  33. ;    Boolean flag indicating whether or not to automatically send an
  34. ;    invisible mode request to the server, which will prevent users from
  35. ;    retrieving a list of channels in which the bot is present on supported
  36. ;    servers.
  37. invisible = true
  38.  
  39. ; keepalive :
  40. ;    Boolean flag indicating whether or not the bot should reconnect when it
  41. ;    gets disconnected from the server. Defaults to false if not set, but
  42. ;    setting it to true is recommended. If not set to true, a timeout value
  43. ;    of 6 or above should be set.
  44. keepalive = true
  45.  
  46. ; timeout :
  47. ;    Amount of time in minutes to wait until a connection times out, or 0 to
  48. ;    disable timeout. If the bot disconnects too frequently, increase the
  49. ;    timeout value. It is recommended to set this to 6 or above if keepalive
  50. ;    is enabled.
  51. timeout = 8
  52.  
  53. ; gender :
  54. ;   M or F to indicate the gender of the bot for instances when the bot must
  55. ;   refer to itself in the third person for actions.
  56. gender =
  57.  
  58. ; curses :
  59. ;   Boolean flag indicating that swear words should be censored in posts
  60. ;   sent by the bot. Defaults to false.
  61. curses = false
  62.  
  63. ; ignore :
  64. ;   Comma- or space-delimited list of hostmasks of users from which events
  65. ;   should be ignored (i. e. not processed by plugins). Should be surrounded
  66. ;   by double-quotes. May contain wildcards using *.
  67. ignore =
  68.  
  69. ; plugins :
  70. ;    One of the following :
  71. ;      - all
  72. ;      - none
  73. ;      - all except LIST
  74. ;      - none except LIST
  75. ;    where LIST is a comma-delimited case-insensitive list of plugin short
  76. ;    names (i. e. plugin class names without the prepended Phergie_Plugin_
  77. ;    segment; for example, the short name for the Phergie_Plugin_Autojoin
  78. ;    class would be Autojoin).
  79. plugins = "all"
  80.  
  81. ; debug :
  82. ;    Boolean flag indicating whether or not debugging mode should be enabled.
  83. debug = true
  84.  
  85. ; log :
  86. ;    Path to a file to which debugging output will be written if debugging
  87. ;    mode is enabled.
  88. log =
  89.  
  90. ; driver :
  91. ;    Short name for the driver that will be used to connect to and handle
  92. ;    requests to and from the IRC server. Currently only the Streams driver
  93. ;    is supported.
  94. driver = "Streams"
  95.  
  96. ; command_prefix :
  97. ;    Prefix used for commands implemented in plugin classes extending the
  98. ;    Command abstract plugin.
  99. command_prefix = ""

The Driver

At the center of the core is the driver, which is responsible for handling communications between the bot and the server. This includes prioritizing, formatting, and sending commands issued via the API to the server as well as converting data from incoming events syndicated by the server into usable data objects.

The only driver currently in active development is based on streams and uses the Socket wrapper. Other drivers can easily be created to wrap any existing IRC PECL extensions or PHP libraries you may want to use, such as PEAR::Net_SmartIRC. Each driver extends a base abstract class that dictates the API it should implement in order for it to be usable by the rest of the core.

Events

There are two basic types of events in the IRC protocol: requests and responses. It's important to note the difference because each type has its own class and is handled differently within plugins.

Requests are initiated by users and include both actions initiated by the bot as well as those initiated by other users and syndicated to the bot by the server. Each request type has its own event handler in the driver. The only oddity I came across in creating the portion of the streams driver for parsing incoming request events was the PRIVMSG command. Its first parameter is the nick of the intended recipient (in this case the bot) when the command is used to issue a message directly to a user. Since that information is not helpful in identifying the source of the event, I created a new method to return either the channel name or the nick of the user who originated the message.

Responses are initiated by the server as a result of an action taken by the bot, where each potential response is specific to that particular action. Because there are so many potential responses, they have a single “blanket” event handler. The response type is stored in a property of the event instance that is automatically passed to the plugin by the driver. Since the formatting of the related section on the IRC Help web site was fairly consistent, I saved a local copy of the page and wrote a quick throw-away script using the DOM extension to extract the name, description, and numeric code of each response and format it for easy transplantation into the response class.

Plugins

On its own, the driver is not very useful. To make it so, code must be written to receive events from the server, act on them in some way, and in most cases dispatch commands to be sent back to the server in response. These units of code are referred to as plugins.

A base plugin class exists to provide functionality that is commonly needed in most plugins such as automatically determining the short name of a subclass for identification purposes, creating a local directory or database for storage, and handling configuration settings. All plugin classes either extend the base class or a subclass of it.

Some tasks that are required for all plugins are performed in the base plugin class constructor. Rather than requiring subclasses to explicitly call the parent constructor within their own, the base constructor calls another method, init, and contains a stub of this method so that subclasses can implement it only if needed to perform initialization tasks when the plugin is loaded.

Before the bootstrap instantiates a plugin, it calls a static method in that plugin class responsible for checking to ensure that the environment meets that plugin's needs. This can include the PHP version, loaded PHP extensions, and other Phergie plugins. If this method returns false, the bootstrap simply skips over instantiation of that plugin and continues.

When the bootstrap does instantiate a plugin, it stores a reference to the driver instance in that plugin to allow it to issue commands to the driver which are in turn sent to the server. The base plugin class makes use of the magic method __call to direct calls to nonexistent methods to the driver instance. This adds the convenience of being able to call a driver method like any other plugin method instead of referencing the driver instance property of the plugin every time a driver method is called.

The meat of plugins are constituted by their event handler methods. The base plugin class contains stub declarations of methods intended to handle specific types of events, which subclasses override. When the driver intercepts an event, it calls a method in the base plugin class to store that event in a property of the base class. Within event handler methods, plugins can simply refer to that property if and when they require information contained in the event.

See below for an example of a plugin that responds to DNS and reverse DNS lookup requests.

  1. /**
  2. * Parses incoming messages for requests to perform a DNS or reverse DNS
  3. * lookup on a given host name or IP address, performs the lookup, and
  4. * responds with a message containing the lookup result.
  5. */
  6. class Phergie_Plugin_Dns extends Phergie_Plugin_Abstract_Command
  7. {
  8.     /**
  9.     * Processes a DNS or reverse DNS lookup request.
  10.     *
  11.     * @param string $arg Host or IP address to look up
  12.     * @return void
  13.     */
  14.     protected function processRequest($arg)
  15.     {
  16.         $target = $this->event->getNick();
  17.         if (preg_match('/^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$/', $arg)) {
  18.             $resolved = gethostbyaddr(long2ip(ip2long($arg)));
  19.         } elseif (preg_match('/^(?:[a-z0-9]+\.)+[a-z]{2,6}$/', $arg)) {
  20.             $resolved = gethostbyname($arg);
  21.         }
  22.  
  23.         $source = $this->event->getSource();
  24.         if (!isset ($resolved)) {
  25.             $this->doPrivmsg($source, $target . ': ' . $arg . ' cannot be resolved.');
  26.         } else {
  27.             $this->doPrivmsg($source, $target . ': ' . $arg . ' resolved to ' . $resolved);
  28.         }
  29.     }
  30.  
  31.     /**
  32.     * Forwards DNS lookup requests onto a central handler.
  33.     *
  34.     * @param string $host Host to look up
  35.     * @return void
  36.     */
  37.     public function onDoDns($host)
  38.     {
  39.         $this->processRequest($host);
  40.     }
  41.  
  42.     /**
  43.     * Forwards reverse DNS lookup requests onto a central handler.
  44.     *
  45.     * @param string $ip IP address to look up
  46.     * @return void
  47.     */
  48.     public function onDoRevdns($ip)
  49.     {
  50.         $this->processRequest($ip);
  51.     }
  52. }

In an article to follow, I will delve more deeply into the subject of plugins and core features that support their development.

Matthew Turland lives in Duson, LA with his wife and three children and is currently employed as Lead Programmer for surgiSYS LLC. In his spare time he contributes to open source projects, frequents the #phpc channel on the Freenode IRC network under the nick Elazar, and shares his experiences on his blog at http://ishouldbecoding.com.
File under: art  bot  homepage  irc  php 
 

Comments

There are no comments on this entry.

Visit the forum