The horrors of internet calendar formats

I have been working on a little side-project for a few weeks now. One of the things I thought I could add easy-peasy, slam-dunk, was a read-only view of a shared calendar. I have seen the iCal data format before, and it didn’t strike me as particularly difficult, just A BUNCH OF CAPITAL LETTERS and a touch of parsing.

BEGIN:VCALENDAR
PRODID:-//Google Inc//Google Calendar 70.9054//EN
VERSION:2.0
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:Some Calendar
X-WR-TIMEZONE:UTC
X-WR-CALDESC:Some Calendar
BEGIN:VEVENT
DTSTART:20100526T020000Z
DTEND:20100526T030000Z
DTSTAMP:20100602T023638Z
UID:q2eo6ejpn8scsdb1u2u0bvml6g@google.com
...and so on...

For the most part, it really is quite straightforward. The state machine for parsing is small: read a line at a time, events start with BEGIN:VEVENT and end with END:VEVENT. For my purposes there were exactly three fields I cared about: DTSTART (the start time of the event), SUMMARY (the title of the event), and LOCATION (the location, duh!).

What gets tricky is repeating events. The DSTART is the very first event in the sequence that was ever recorded. You then get one or more RRULE record, which has you digging through RFCs (RFC-2445, to be exact) so that you can calculate the next occurrence. It’s not too bad for yearly events that occur on the same day/month, but gets a little wiggily with things like “the second Monday” or “the Tuesday after the first Monday in November.” And let’s not get into exception rules (EXRULE): if you have, say, a monthly meeting but skip a month or move it to a different day one month.

What I thought would be a simple exercise in parsing GMT timestamps into an array suddenly balloons out to a much larger and complex problem of forecasting days. At that point, the path of least resistance is to embed a Google calendar (the source of the iCal feeds) in an iframe. There are a number of open source implementations for parsing the file — as in reading the raw strings into an array — but not much out there for understanding the format — for interpreting it and doing something intelligible with recurring events. Singlehandedly writing a full iCal engine is beyond the scope of this project.

Oh, and that XML format that Google allows you to export? It is a couple of normalized, yet useless, fields in XML (create date, GUID, and whatnot) with the meat of the calendar in a nice, human-readable, nonstandard HTML blob, containing the first occurrence and that fact that things recur, but nothing allowing you to reconstruct the recurrences. Good times.

Posted in: Code Dear Diary

Published by

Brian Enigma

Brian Enigma is a Portlander, manipulator of atoms & bits, minor-league blogger, and all-around great guy. He typically writes about the interesting “maker” projects he's working on, but sometimes veers off into puzzles, software, games, local news, and current events.

Leave a Reply

Your email address will not be published. Required fields are marked *