I read. I read a lot. I do not necessarily read as fast as the folks that can power through a book in a weekend, but I get a lot of enjoyment out of it nonetheless. Part of this reading occurs during my daily commute.
Recently, I managed to grab a copy of the WikiLeaks cablegate archive. It’s pretty easy if you know where to look and know how to operate BitTorrent. I wanted to read the cables in the mornings, but ran into a few snags. When downloading a mirror of the cables, you get a whole bunch of HTML files that are formatted for a desktop browser. I do most of my reading on the iPhone and only occasionally switch to the iPad (if I’m not juggling a travel mug, messenger bag, and transit pass). It was easy enough to load the zip into GoodReader, extract the raw HTML, and browse it within that app, but it really wasn’t a good reading experience on the small screen due to formatting.
What I really wanted was something I could load into iBooks. I wanted something I could bookmark, underline, and search that was nicely formatted and word-wrapped for the screen. I wanted WikiLeaks as an ePub file. That’s where this project comes in.
WikiPub is a Ruby script that will take a WikiLeaks archive, scan it, and convert it into an ePub that you can load into your book-reader of choice.
The ePub can be read on your iPhone or iPad, or any other device that can handle that format. Do keep in mind that the text is quite lengthy and can take a little while for iBooks to initially index and paginate.
The first big requirement is Ruby on a Unix-like system (Linux, OS X, etc.) and enough command-line experience to not be afraid of opening a terminal window, navigating to a folder, and running a few commands.
The other big requirement is that you have downloaded and extracted the WikiLeaks cablegate archive. WikiPub does not contain, download, or otherwise provide the actual cables. You’re on your own for that. WikiPub only converts the cables you’ve already obtained into a bookreader-friendly format.
Specific technical requirements (you’ll need the “tidy” and “zip” apps, but your OS probably already has them installed) in the readme.txt file contained in the archive.
- Open a terminal window
- Change directories to the source folder
- Run: “./wikipub.rb /path/to/your/cablegate/files”, substituting the appropriate path to your extracted cablegate archive
- Wait. Depending on the speed of your computer, and whether you’ve previously run this script, you might be waiting an hour. You’ll see a progress indicator such as “Parsing 242 of 1095 …”
- Your output files will be in the current folder, named similarly to “wikileaks-2010.epub” (with differing years). You can copy these to your book reader, drag them into iTunes, or however you consume your epub files.
- MOST IMPORTANTLY: Share the epub with your friends. Not everybody has Ruby command-line mojo. Share it with the people in your life that don’t know how to turn a cablegate archive into an epub with this script.
- You can add “–nomono” as a command line flag to convert the document from a monospaced font to a proportional one. This conversion is still considered beta. Due to quirks in formatting in the original document, sometimes newlines get misinterpreted as paragraph breaks and sometimes too-close-together paragraphs get misinterpreted as one continuous paragraph.
- You can add “–nosplit” if you want one humungous ePub file with all cables, otherwise you will get one file per year.
Kindles prefer mobi files instead of ePub files. If you want to read the WikiLeaks cables on your Kindle, you will need to convert them. This can be done for free with an app like Caliber or for a small charge by emailing to your special Amazon account.
- WikiPub-1.1.zip – defaults to split files (one per year) rather than one giant file, adds option for proportional font
- WikiPub-1.0.zip – initial release