If I hadn’t moved across the country recently, I probably never would have discovered a bug in how FeedWordPress handles the timestamps of syndicated posts. Fortunately, I did, and this bug is fixed in the recently-released version 0.99.
To explain what I mean, I’ll have to back up a bit, first.
There are a couple things that FeedWordPress uses date and time information for:
Dating new posts: When a new post comes in over a feed, FeedWordPress uses the date and time information reported by feeds to date posts appropriately in the WordPress database. This means parsing date and time information and putting the resulting timestamps into the database.
Checking for updates to existing posts: After a post has been imported into the WordPress database, FeedWordPress will keep checking to see whether the feed reports any updates to that post. This means comparing the last-updated timestamp on the feed to the last-updated timestamp in the database to see which version is newer.
The problem that I noticed came about because FeedWordPress was trying to do something sensible and easy to handle time zones when it was doing all this. Date/time handling in WordPress is fairly easy, once you understand what to do, but it is certainly anything but sensible, and therein lies the problem.
Part of the problem is, as usual, the stack of programs on top of which FeedWordPress has to sit. In order to get dates from the feed, FeedWordPress has to process two different human-readable date formats. RSS feeds use RFC 822 (or something like it), which FeedWordPress can convert to a Unix timestamp using the PHP
strtotime() function. Atom feeds use the W3C DateTime Format, which FeedWordPress can convert to a Unix timestamp using a custom function provided by MagpieRSS, called
parse_w3cdtf(). Unix timestamps supposedly do not vary by time zone: a Unix timestamp for a particular time is defined as the number of seconds between that time and 12:00 midnight on January 1, 1970 Greenwich Mean Time. So both
parse_w3cdtf() make use of the time zone data provided by the format that they handle, and use it to convert the date-time information to a timestamp based on GMT.
So far so good. Now, when converting a syndicated item into a post for the WordPress database, FeedWordPress has to generate four different timestamps:
post_date, which gives the date and time that the post was first published in the local time zone;
post_date_gmt, which gives the date and time that the post was first published in Greenwich Mean Time;
post_modified, which gives the local date and time that the post was last modified; and
post_modified_gmt, which gives the GMT date and time that the post was last modified. These fields are stored in the WordPress database as MySQL DATETIME objects. WordPress later converts them back into Unix timestamps whenever it is necessary for formatting purposes. Here is how I did this in version 0.981:
$post = date('Y-m-d H:i:s', (!is_null($post) ? $post : $post)); $post = date('Y-m-d H:i:s', $post); $post = gmdate('Y-m-d H:i:s', (!is_null($post) ? $post : $post)); $post = gmdate('Y-m-d H:i:s', $post);
$post array contains the Unix timestamps that FeedWordPress got from the feed. The PHP
date() function formats a Unix timestamp relative to the local time zone. The
gmdate() function formats it relative to Greenwich Mean Time. Given that I need local and Greenwich representations of the date and time that the post was first published, and of the date and time that the post was last updated, this seems like the sensible thing to do. But in spite of (or because of) its seeming so sensible, this way of generating post timestamps introduces a subtle bug. More on that later.
When determining whether or not a post has been updated, FeedWordPress uses two items of information: the last-updated date and time provided by the feed, and the last-updated date and time stored in the database. In order to get these two dates into a common format so that they can be compared, FeedWordPress uses either
parse_w3cdtf() to convert the feed’s last-updated date and time to a Unix timestamp, and it uses the MySQL function
UNIX_TIMESTAMP() to get a Unix timestamp from the last-updated date and time in the database. So here is how I did this in version 0.981:
$guid = $post; $result = $wpdb->get_row(" SELECT id, guid, UNIX_TIMESTAMP(post_modified) AS modified FROM $wpdb->posts WHERE guid='$guid' ");
if (!$result) : $freshness = 2; // New content elseif ($post > $result->modified) : $freshness = 1; // Updated content else : $freshness = 0; endif;
parse_w3cdtf() take account of time zone data;
UNIX_TIMESTAMP() presumes that the time is in the local time zone. So, I just need to feed it the DATETIME object that’s in the local time zone —
post_modified instead of
post_modified_gmt, right? Wrong. Again, in spite of (or because of) its seeming so sensible, this way of comparing timestamps introduces a subtle bug.
Here’s what was wrong with both of these sensible steps: in any given WordPress installation, there are three potentially distinct
local time zones that you have to consider:
The local time zone of the web server on which WordPress is running (usually set by the web server’s administrator);
The local time zone of the MySQL server that provides the WordPress database (usually set by the MySQL server’s administrator);
The local time zone set by the user in WordPress’s
General Optionspage (set by the blog owner);
If any of these differ from each other, then the mismatch could cause problems for the way that older versions of FeedWordPress handled dates.
When the PHP date and time functions convert back and forth between human-readable formats and Unix time stamps, they do so relative to the web server’s local time zone. When MySQL converts a DATETIME object to a Unix timestamp, it does so relative to the MySQL server’s local time zone. When WordPress prepares dates and times for storage in the database, or processes them for sorting and displaying posts, it does so relative to WordPress’s local time zone, as set under General Options.
So, when FeedWordPress used the PHP date and time functions to generate
post_modified, it used the wrong time zone — WordPress expects these to be local times in the time zone set under General Options, but the PHP functions use the web server’s local time zone. If the user has set a different time zone from the web server’s default time zone, then this date and time information will be incorrect. In order to get the correct time, we need to get the time zone information from the WordPress database, and then manually apply that offset, rather than leaning on PHP’s date and time functions. So here is how we do this task in FeedWordPress 0.99:
// Dealing with timestamps in WordPress is so fucking fucked. $offset = (int) get_option('gmt_offset') * 60 * 60; $this->post = gmdate('Y-m-d H:i:s', $this->published() + $offset); $this->post = gmdate('Y-m-d H:i:s', $this->updated() + $offset); $this->post = gmdate('Y-m-d H:i:s', $this->published()); $this->post = gmdate('Y-m-d H:i:s', $this->updated());
Note that we use
gmdate() in all cases, because we are doing the time zone offset manually rather than letting PHP do it for us.
On the other hand, when FeedWordPress used the MySQL date and time functions to convert MySQL DATETIME objects to Unix timestamps, it used the wrong time zone again — since it was using
post_modified, and in the older versions of FeedWordPress
post_modified was generated using PHP date and time functions, the time was a local time relative to the web server’s time zone. But
UNIX_TIMESTAMP() presumes that the time it is given is a local time relative to the MySQL server’s time zone. Usually this difference should cause no problem, since the web server and the MySQL server are the same machine, or if not the same machine, at least machines that are located in the same place as one another. But if that assumption ever fails,
UNIX_TIMESTAMP() will return the wrong time–a time that is either a few hours before, or a few hours after, the real last-updated time. Which will mean that posts either get updated when there’s nothing new to update, or don’t get updated even when there is something new to include.
So we need to convert the MySQL DATETIME value to a Unix timestamp in a context where we know, and can set, the right time zone for the conversion. In this case, the best thing to do is to get the GMT date and time of the last update (in order to avoid issues that might arise from changes in the local timezone setting in WordPress), and then manually convert that into a Unix timestamp using a function that works relative to GMT. So here’s how FeedWordPress 0.99 checks the update times against each other:
$guid = $wpdb->escape($this->guid());
$result = $wpdb->get_row(" SELECT id, guid, post_modified_gmt FROM $wpdb->posts WHERE guid='$guid' ");
preg_match('/(+)-(+)-(+) (+):(+):(+)/', $result->post_modified_gmt, $backref); $updated = gmmktime($backref, $backref, $backref, $backref, $backref, $backref); if (!$result) : $this->_freshness = 2; // New content elseif ($this->updated() > $updated) : $this->_freshness = 1; // Updated content $this->_wp_id = $result->id; else : $this->_freshness = 0; // Same old, same old $this->_wp_id = $result->id; endif;
post_modified_gmt always returns the string representation of a MySQL DATETIME object; the regular expression breaks that representation down into its component parts; and the PHP function
gmmktime() reassembles those parts into a Unix timestamp. (You might worry that using the PHP date and time functions might reintroduce the first problem, since it doesn’t account for the local time zone set in WordPress. But since everything is guaranteed to be in GMT, this problem doesn’t arise.)
Strictly speaking, it would probably be better to use MySQL functions, instead of a regular expression, to extract the parts of the MySQL DATETIME object, since ostensibly MySQL knows more about its internal formats than PHP does. In practice this is not likely to make a difference, but it’s likely that in future releases of FeedWordPress I’ll change the section to something more like this:
$guid = $wpdb->escape($this->guid());
$result = $wpdb->get_row(" SELECT id, guid, YEAR(post_modified_gmt) AS year, MONTH(post_modified_gmt) AS month, DAYOFMONTH(post_modified_gmt) AS day, HOUR(post_modified_gmt) AS hour, MINUTE(post_modified_gmt) AS minute, SECOND(post_modified_gmt) AS second FROM $wpdb->posts WHERE guid='$guid' ");
if (!$result) : $this->_freshness = 2; // New content else: $updated = gmmktime( $result->hour, $result->minute, $result->second, $result->month, $result->day, $result->year ); if ($this->updated() > $updated) : $this->_freshness = 1; // Updated content $this->_wp_id = $result->id; else : $this->_freshness = 0; // Same old, same old $this->_wp_id = $result->id; endif; endif;
Oh, and in case you were wondering, the reason that moving across the country helped me find this out is that I moved from Eastern Time to Pacific Time, and I changed the default time zone on one of my web servers before I changed the default time zone on my MySQL server. The mismatch exposed this bug while I was doing testing for the most recent release of FeedWordPress. Not particularly interesting, but it did expose a bug to fix, and I hope the guide to time zone issues that resulted may be of some interest.