Sudoku Scraper – Netninja.com

Summary

The Sudoku Scraper is a small Ruby script that scrapes the New York Times and USA Today websites for the daily sudoku puzzle and generates a unified PDF in the Hipster PDA 3×5 format of those puzzles, suitable for printing.

Background

For a while, I have had a little PHP script that scraped the USA Today and New York Times websites for the daily sudoku puzzle. And for a while, this script was the homepage on my iPhone. Each day, it would grab the daily puzzles and give me a little text dump that looked something like this:

In that form it is fairly useless. That prompted me to make the “Sudoku Blank” Hipster PDA cards. I could then copy down the daily puzzles onto the blanks and solve from there. I was quite happy with this for a couple of years, but slowly grew tired of having to transcribe the puzzles every day. It eventually got to the point where I stopped doing the daily puzzles.

Enter this new script. The new Ruby script, much like the old PHP script, scrapes the two daily puzzles. It then goes a step further. Instead of forcing me to transcribe the puzzles (sometimes with errors) onto a blank template, it actually generates the PDF template, already filled out with the day’s puzzles. Not only that, but if you set it up on a cron job, look at it in the iCab browser on the iPhone, then send it to Printopia, it can be on your printer before you even get out of bed. I had to use iCab because it turns out that the iPhone’s Safari will not print PDFs. You can use Safari to send the PDF to some other app that can print (GoodReader, iBooks), but it won’t print directly. It’s then just a little bit of origami folding to stuff it in your Field Notes book.

The Code

The code itself is fairly straightforward. It consists of two components. The scraper will grab the current puzzles (caching locally in case you re-run or want to do lots of iterative work on the second piece). The PDF generator, which requires the prawn gem, will format and output the final PDF.

Downloading the Puzzles

I have the source code available. I do not have the actual PDFs available for you to download yourself. There is a certain gray area with regards to scraping websites. I believe it falls under Fair Use for you to scrape it yourself for your own purposes. It would be morally (and perhaps legally) wrong for me to scrape the puzzles and rebroadcast them to the world.

Downloading the Code

The source code is on github at https://github.com/BrianEnigma/ruby_sudoku

You should be able to download the code on a Mac or Linux box (and maybe Windows?), install the aforementioned prawn gem (sudo gem install prawn), and run the script. You’ll then have a folder named “results” with the final PDF outputs. If you were so inclined, you could set up a cron job to run it nightly, then possibly publish it to an internet or intranet website for later consumption — for instance to grab it on the iPhone and send it to a printer.

5 thoughts on “Sudoku Scraper”

Pingback: Sudoku Scraper: newspaper sudoku in your Hipster PDA | Netninja.com
Louise says:

June 18, 2017 6:35am at 6:35 am

Hi! Do you still scrape these nightly? I was working on The NY Times *Hard* Soduku for today (06/18/2017) and it is identical to the hard puzzle I did last night on their website… unsure if they posted Sunday’s yesterday evening early or if there was a mistake. I was scouring the web for anyone who archived The NY Times Soduku puzzles and came across this blog post from 6 years ago… thanks!

1. Brian Enigma says:
  
  June 18, 2017 7:25am at 7:25 am
  
  Unfortunately, I no longer scrape any sudoku puzzle sites. The computer I had set up to do that got retired and reformatted long ago, and I never thought to install the sudoku scraper on the new one.
  
RRRR says:

March 22, 2021 7:18am at 7:18 am

Anyone here still scraping? I am trying to track down the easy, medium, and hard from NYT on 11 March 2020. Anyone got it?

1. Brian Enigma says:
  
  March 22, 2021 8:29am at 8:29 am
  
  I don’t know that I have those ones available, but there was definitely a time when it was busted because something changed on their end and I didn’t notice until much later — so there were certainly some number of months of lost sudokus.