Reflections on Designing an IRC Bot in PHP, Part 2

by Matthew Turland (2008-04-15)
 

The precursor to this article introduced some background and an overview of the design for the Phergie project as an example of the concepts involved in a PHP IRC bot implementation. This article will go further into the topic of plugins including descriptions of those that are commonly needed to make a bot fully functional as well as the commonly needed core features to support plugin development.

Conventions

Plugin files are organized and classes named in the PEAR style where Phergie/Plugin/Class.php holds a class called Phergie_Plugin_Class. This is a fairly tried and true convention and makes creating a class autoloader within the bootstrap file fairly straightforward. Instantiable extensions are stored in the Phergie/Plugin directory. Those that aren't instantiable are stored in the Abstract subdirectory of that directory, such as the base plugin class Phergie_Plugin_Abstract_Base.

Hello World!

As mentioned in the previous article, IRC is a protocol based around events. Plugins are intended to intercept these events and act on them in some way. The best reference for the events that a plugin can intercept is the base plugin class. See below for the relevant segments of code from that class. More often than not, the onPrivmsg event handler is the one that you'll be using. Most bot responses are triggered by something that appears in a message sent by another user, either to a channel in which the bot is present or as a direct message to the bot.

  1. <?php
  2. public function onNick() { }
  3. public function onOper() { }
  4. public function onQuit() { }
  5. public function onJoin() { }
  6. public function onPart() { }
  7. public function onMode() { }
  8. public function onTopic() { }
  9. public function onPrivmsg() { }
  10. public function onAction() { }
  11. public function onNotice() { }
  12. public function onKick() { }
  13. public function onPing() { }
  14. public function onTime() { }
  15. public function onVersion() { }
  16. public function onCtcp() { }
  17. public function onPingReply() { }
  18. public function onTimeReply() { }
  19. public function onVersionReply() { }
  20. public function onCtcpReply() { }
  21. public function onRaw() { }
  22. public function onError() { }
  23. public function onKill() { }
  24. public function onResponse() { }
  25. public function onConnect() { }
  26. public function onTick() { }
  27. public function onInvite() { }

Let's start with a simple example. When the bot joins a new channel, we want it to send a message with the infamous “Hello World!” greeting.

  1. <?php
  2.  
  3. class Phergie_Plugin_HelloWorld extends Phergie_Plugin_Abstract_Base
  4. {
  5.     public function onJoin()
  6.     {
  7.         if ($this->event->getNick() == $this->getIni('nick')) {
  8.             $this->doPrivmsg($this->event->getChannel(), 'Hello World!');
  9.         }
  10.     }
  11. }

This plugin follows the conventions of prefixing the class name with Phergie_Plugin and extending Phergie_Plugin_Abstract_Base. Since we're interested in events where a user (in this case the bot) is joining a channel, we use the onJoin event handler method.

$this->event always contains data for the last intercepted event. getNick() provides the user who initiated the event. In the case of this plugin, we are only interested in cases where this user is the bot. As such, we use getIni() to retrieve the value of the nick configuration setting, which contains the bot's nick, and compare it to the return value of getNick(). If the two match, we proceed to send a message to the channel the bot has joined.

Under the hood, the request event class Phergie_Event_Request includes a mapping of event parameter order (detailed in Chapter 4 of the RFC for the IRC protocol) to more meaningful parameter names. For example, the join event takes a single parameter, the channel being joined. $this->event->getArgument(0) would return this value. Thanks to the parameter mapping, $this->event->getArgument('channel') would also return the same value. Finally, thanks to some method overloading magic via __call, $this->event->getChannel() provides an additional alias. See below for the array from the request event class that contains the mapping.

  1. <?php
  2.  
  3. protected static $map = array
  4. (
  5.     self::TYPE_QUIT => array(
  6.         'message' => 0
  7.     ),
  8.  
  9.     self::TYPE_JOIN => array(
  10.         'channel' => 0
  11.     ),
  12.  
  13.     self::TYPE_KICK => array(
  14.         'channel' => 0,
  15.         'user'    => 1,
  16.         'comment' => 2
  17.     ),
  18.  
  19.     self::TYPE_PART => array(
  20.         'channel' => 0,
  21.         'message' => 1
  22.     ),
  23.  
  24.     self::TYPE_MODE => array(
  25.         'target'   => 0,
  26.         'mode'     => 1,
  27.         'limit'    => 2,
  28.         'user'     => 3,
  29.         'banmask' => 4
  30.     ),
  31.  
  32.     self::TYPE_TOPIC => array(
  33.         'channel' => 0,
  34.         'topic'   => 1
  35.     ),
  36.  
  37.     self::TYPE_PRIVMSG => array(
  38.         'receiver' => 0,
  39.         'text'     => 1
  40.     ),
  41.  
  42.     self::TYPE_NOTICE => array(
  43.         'nickname' => 0,
  44.         'text'     => 1
  45.     ),
  46.  
  47.     self::TYPE_ACTION => array(
  48.         'target' => 0,
  49.         'action' => 1
  50.     ),
  51.  
  52.     self::TYPE_RAW => array(
  53.         'message' => 0
  54.     )
  55. );

To issue the message to the channel, we get the intended channel using the aforementioned getChannel() method and use $this->doPrivmsg() to send the message 'Hello World!' to that channel. doPrivmsg() is one of several commands provided by the underlying driver. The best reference for what commands are available is the base driver class Phergie_Driver_Abstract. See below for the related segments of code from that class.

  1. <?php
  2.  
  3. public abstract function doQuit($reason = null, $reconnect = false);
  4. public abstract function doJoin($channel, $key = null);
  5. public abstract function doPart($channel);
  6. public abstract function doInvite($nick, $channel);
  7. public abstract function doNames($channels);
  8. public abstract function doList($channels = null);
  9. public abstract function doTopic($channel, $topic = null);
  10. public abstract function doMode($target, $mode = null);
  11. public abstract function doNick($nick);
  12. public abstract function doWhois($nick);
  13. public abstract function doPrivmsg($target, $text);
  14. public abstract function doNotice($target, $text);
  15. public abstract function doKick($nick, $channel, $reason = null);
  16. public abstract function doPong($daemon);
  17. public abstract function doAction($target, $text);

The private message event is actually somewhat of an oddity in the protocol because its first parameter can either be a channel name or the bot's name, the latter of which isn't very useful. As a convenience, the request event class has a getSource() method which returns either the channel name or the nick of the user who originated the message. This makes it easier to issue a return message to the channel or user in question, regardless of the origin of the message. Below is a plugin that shows an example of this by returning a greeting from another user.

  1. <?php
  2.  
  3. class Phergie_Plugin_Greet extends Phergie_Plugin_Abstract_Base
  4. {
  5.     public function onPrivmsg()
  6.     {
  7.         if (strtolower($this->event->getText()) == 'hello') {
  8.             $this->doPrivmsg($this->event->getSource(), 'Hello ' . $this->event->getNick() . '!');
  9.         }
  10.     }
  11. }

A few other available methods include getIni(), setIni(), getPluginIni(), and setPluginIni(), which are used to access and manipulate values obtained from the configuration file that are either global or specific to a particular plugin. Another is debug(), which can be used while developing a plugin to send information to stdout for debugging purposes.

The best way to get started in developing plugins is to simply jump in and toy with the code. The existing plugins provide good working examples. The Developers Guide is also a good source of information.

The Bare Essentials

Once the bot is connected to the server, there are a number of things that it should be capable of doing in order to function properly. Most of these features were implemented as plugins since they needed to be triggered by the occurrence of certain events. I'll reference the particular short plugin name used by Phergie in each case.

First up is the Altnick plugin. When the bot specifies its nick, the server will return a particular response if another user is already using that nick. The bot must be capable of recognizing this and reissuing the NICK command with a different nick until the server no longer returns that response, implying that the bot's nick change was successful. Until a nick is successfully set, the bot can do nothing else to interact with the server.

Next is the Nickserv plugin for authenticating the bot's identity with a NickServ agent if one is present on the network. These agents are used to restrict use of registered nicks and as well as actions (such as sending private messages) of unauthenticated users. The presence of a NickServ agent is generally indicated by a NOTICE sent to the bot and requires a PRIVMSG command to be issued to the agent with a password included.

Following that is the Pong plugin for intercepting PING events, which the server uses in order to verify that a connection is still active and has not dropped or terminated without informing the server via a QUIT command. The proper response is for the bot to issue a PONG command. If you've ever seen a ping/pong exchange between users in a channel, this is where it originated.

Unfortunately, there is no equivalent command initiated by the client to check its connection to the server, and the IRC RFC leaves it up to developers implementing the server to determine how often it will send a PING. As such, it's possible for the bot's connection to be fine, but for it to time out simply because no events are occurring for the server to syndicate to it. The only solution we've found is to set the timeout to a fairly high amount (e.g. between 10 and 15 minutes). A side effect of this is that, should the bot's connection actually drop, it could take that up to that amount of time before the bot realizes it and attempts to reconnect.

Last is the Quit plugin for disconnecting from the server on demand without requiring that the user hosting the bot simply kill the process, which isn't a very clean way to disconnect. It can also sometimes result in issues stemming from residual ghost connections when attempting to reconnect.

In regards to the message displayed to clients when a user quits, some networks (Freenode in particular) override the message specified by the user with a short stock answer if the user has not been connected to the server for a minimum amount of time (five minutes for Freenode). This is intended to be for spam prevention purposes and I mention it only to make you aware that the bot does format QUIT commands correctly.

Some Niceties

Some plugins, while not required, do make life easier. Among these is the Autojoin plugin to automatically join a given list of channels. Some bots are specialized to do things like serve up files via DCC. Those that aren't are generally meant to respond to user interaction within a channel and it's often easiest if the bot can simply join a given list of channels automatically without having to be explicitly instructed to do so via PRIVMSG by a user.

After a bot instance is launched, you may want to have the bot join or part channels on demand without having to modify the configuration file and restart the bot. The JoinPart plugin enables you to do this by issuing the commands to the bot via PM or within a channel by addressing a message to the bot there.

When developing plugins or stress-testing the bot by running it for long periods of time, it's often convenient to be able to check memory usage or uptime, to easily indicate what PHP extensions are loaded in a currently running instance, to get the evaluated value of an expression, or to execute a given PHP statement (with proper security checks of course). The Debug and Eval plugins fill these needs.

Management of existing plugin instances is also a common need. The Toggle plugin can be used to silence output of plugins that are malfunctioning or seeing abuse from users. The ModuleList plugin produces an on-demand list of plugins including statuses to indicate if they're currently loaded or silenced.

The ability to change configuration settings stored on disk is a concern previously discussed in the section on the configuration file. The Set plugin implements interception of commands issued via PRIVMSG to handle this.

Last is tracking information on users such as which users are present in channels inhabited by the bot and which users have op or voice privileges. The Users plugin handles obtaining a copy of this information, maintaining it as users come and go or change their user mode, and exposing the information to other plugins via its methods.

Data Storage

Some plugins require the ability to store complex data sets that will persist between bot runs and that can be quickly and easily queried. The file-based SQLite database proved an ideal solution for this. Use of it does not require that a server be installed on the host system, only one of several related PHP extensions. Its performance also provides fast query response time, which was essential to the needs of the project.

All plugins currently use PDO and its pdo_sqlite driver by convention, but are otherwise left to roll their own solutions when it comes to actual implementation. An intended development for the short-term is to implement a utility class that extends PDO and provides commonly needed features not implemented by the pdo_sqlite driver or not available in some SQLite versions. These include testing for the existence of a given table or column or adding or removing a column, which are useful in applying schema upgrades that may be required by later versions of a plugin.

Access Control

Restricting plugins so that responses from them can only be triggered by a specific list of users or users with a certain user mode, like ops, is a fairly common need. Users are identified by hostmasks, which are constituted by their nick, username, and hostname in a predefined format. (See RFC 2812 for more information on this.) Phergie is designed so that users with access to restricted plugins, which we call administrators, can be specified globally or on a per-plugin basis within the configuration file.

Another configuration setting, a boolean flag, exists for ops and can be configured globally or per plugin as well. That setting uses the aforementioned Users plugin in order to determine whether or not a given user (identified by their hostmask) currently has op privileges within the channel from which an event originated (specified as a command parameter where applicable).

On Memory Usage

The PHP memory manager was not originally designed for use in long-running scripts. As such, you have to be particularly diligent about deallocating variables using the unset function as soon as their data is no longer needed. The reason for this is that the memory manager does not release memory that it has allocated back to the system until the PHP process has terminated.

To put this another way, memory use of the PHP process is equal to a certain amount of memory for the process itself plus the peak usage of the PHP script at any given time in the life of the process. Thus, in order to keep memory usage down, only retain data in memory for the minimum amount of time necessary so that the same memory can be used by other variables rather than having the memory manager allocate additional memory for them.

Future Developments

In the process of developing the project to its current state, we've discovered some additional needs that are fairly common in using and developing plugins. The project doesn't currently sport any code for these, but should in the short term.

One of these is the ability to prevent incoming events from being processed, outgoing events from being sent, or manipulating events in some way before either takes place. To address this, we'll be adding in a class to represent an event handler. These will be loaded like plugins and run at specific times based on configuration settings. It will be possible to have them applied to either incoming or outgoing events or both as well as either globally or on a per-plugin per-channel basis. Some applications will include limiting access to certain commands to specific users or user.

Another item is data caching. Plugins will occasionally cache data to memory to speed up subsequent access and currently have to roll their own caching logic (fetching and validation) in order to do so. A driver-based component will be developed at some point that provides centralized logic for this and the capability to use outside sources such as a memcached server. One thing this can potentially do is transfer some memory handling work from the memory manager in PHP to a source with a garbage collection system more suitable for long-running processes in order to keep memory usage down.

The last item has to do with deployment and is still a matter under consideration. We're looking for a method to distribute the core and plugins independently of each other and make them easy to install, possibly using a PEAR server. We've managed to get one up and running using Greg Beaver's guide and will be experimenting with the possibility of using it for deployment in the future.

Epilogue

I hope this article and its description of my experiences in this project have given you some perspective on designing and implementing an IRC bot in PHP. Feel free to check out the Phergie web site and take a look at the source code in SVN and the Developer's Guide. The existing plugins probably provides the best examples of the design I've described here as well as plugin implementations and how interesting their potential applications can be. Also, please stop by the #phergie channel on the Freenode IRC network and join in the project, or start your own! I hope to see this article be the catalyst for the creation of great IRC bots in the times ahead.

Credits

Thanks to John White and Jordi Boggiano for their hard work and active involvement in the project, Ben Ramsey for time spent on the IRC extension for PECL and discussions with whom inspired the Phergie project, Sean Coates for keeping us humble by occasionally showing us how to break our code, Davey Shafik for explaining how PHP uses memory while we sat scratching our heads, and all leaders and members of PHP Community for their support throughout the project's development.

Matthew Turland lives in Duson, LA with his wife and three children and is currently employed as Lead Programmer for surgiSYS LLC. In his spare time he contributes to open source projects, frequents the #phpc channel on the Freenode IRC network under the nick Elazar, and shares his experiences on his blog at http://ishouldbecoding.com.
File under: art  bot  homepage  irc  php 
 

Comments

There are no comments on this entry.

Visit the forum