sudoku-small

Sudoku Scraper

Summary

The Sudoku Scraper is a small Ruby script that scrapes the New York Times and USA Today web­sites for the daily sudoku puz­zle and gen­er­ates a uni­fied PDF in the Hipster PDA 3x5 for­mat of those puz­zles, suit­able for print­ing.

Background

For a while, I have had a lit­tle PHP script that scraped the USA Today and New York Times web­sites for the daily sudoku puz­zle.  And for a while, this script was the home­page on my iPhone.  Each day, it would grab the daily puz­zles and give me a lit­tle text dump that looked some­thing like this:

In that form it is fairly use­less.  That prompted me to make the “Sudoku Blank” Hipster PDA cards.  I could then copy down the daily puz­zles onto the blanks and solve from there.  I was quite happy with this for a cou­ple of years, but slowly grew tired of hav­ing to tran­scribe the puz­zles every day.  It even­tu­ally got to the point where I stopped doing the daily puz­zles.

Enter this new script.  The new Ruby script, much like the old PHP script, scrapes the two daily puz­zles.  It then goes a step fur­ther.  Instead of forc­ing me to tran­scribe the puz­zles (some­times with errors) onto a blank tem­plate, it actu­ally gen­er­ates the PDF tem­plate, already filled out with the day’s puz­zles.  Not only that, but if you set it up on a cron job, look at it in the iCab browser on the iPhone, then send it to Printopia, it can be on your printer before you even get out of bed.  I had to use iCab because it turns out that the iPhone’s Safari will not print PDFs.  You can use Safari to send the PDF to some other app that can print (GoodReader, iBooks), but it won’t print directly.  It’s then just a lit­tle bit of origami fold­ing to stuff it in your Field Notes book.

The Code

The code itself is fairly straight­for­ward.  It con­sists of two com­po­nents.  The scraper will grab the cur­rent puz­zles (caching locally in case you re-run or want to do lots of iter­a­tive work on the sec­ond piece).  The PDF gen­er­a­tor, which requires the prawn gem, will for­mat and out­put the final PDF.

Downloading the Puzzles

I have the source code avail­able.  I do not have the actual PDFs avail­able for you to down­load your­self.  There is a cer­tain gray area with regards to scrap­ing web­sites.  I believe it falls under Fair Use for you to scrape it your­self for your own pur­poses.  It would be morally (and per­haps legally) wrong for me to scrape the puz­zles and rebroad­cast them to the world.

Downloading the Code

The source code is on github at https://github.com/BrianEnigma/ruby_sudoku

You should be able to down­load the code on a Mac or Linux box (and maybe Windows?), install the afore­men­tioned prawn gem (sudo gem install prawn), and run the script.  You’ll then have a folder named “results” with the final PDF out­puts.  If you were so inclined, you could set up a cron job to run it nightly, then pos­si­bly pub­lish it to an inter­net or intranet web­site for later con­sump­tion — for instance to grab it on the iPhone and send it to a printer.

One thought on “Sudoku Scraper

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>