LJProxy

LJProxy

by Brian Enigma < brian at net­ninja . com >
v1.1, 2009-11-31
http://netninja.com/projects/ljproxy

LJProxy is a proxy between your Live­Jour­nal groups (includ­ing friends who post pro­tected entries) and Google Reader (or any other RSS news­reader you care to use).  It is designed to be sim­ple to install and main­tain with min­i­mum server requirements.

Down­load

The cur­rent ver­sion, v1.1, is avail­able at http://netninja.com/wp-content/uploads/2009/11/LJProxy-1.1.zip

Back­ground

I have been on Live­Jour­nal since 2001.  I have had a Per­ma­nent Account since 2005.  Many of my friends have been using it for sim­i­lar amounts of time.  In the past few years, I have migrated my post­ing and read­ing off of Live­Jour­nal.  I do most of my blog read­ing through Google Reader, but still have a num­ber of peo­ple I want to fol­low on Live­Jour­nal.  They often post friends-only entries that are not acces­si­ble from within Google Reader.  What I really want is a way to get each of my Live­Jour­nal friend groups as a feed that con­tains both pub­lic and pro­tected entries.  This is what LJProxy was designed to do.

Design Goals

  • Secure : Com­mu­ni­ca­tions between the web appli­ca­tion and Live­Jour­nal are using the LJ challenge/response pro­to­col, when allowed by Live­Jour­nal, so that your pass­word is never sent as plain­text.  Com­mu­ni­ca­tions between Google Reader and the web appli­ca­tion use a user-generated pass key as part of the URL.  The pass key is unre­lated to your Live­Jour­nal cre­den­tials.  More detail about secu­rity is given in a later section.
  • Eas­ily main­tain­able : As you add/remove peo­ple from friend groups on Live­Jour­nal, they should auto­mat­i­cally get added/removed from the RSS feed.
  • Min­i­mal setup and sys­tem require­ments : Any­one run­ning a PHP web­site should be able to install this webapp on their site with min­i­mal setup.  No MySQL con­fig­u­ra­tion.  No spe­cial library depen­den­cies.  All you need is write per­mis­sion from PHP to the direc­tory (for hold­ing cached information).
  • Fast : Retriev­ing and merg­ing the recent entries 50–100 users from Live­Jour­nal can take a minute or two.  This web app tries its best to min­i­mize pro­cess­ing time for faster results.
  • Cached results : Part of keep­ing fast, as well as reduc­ing load on the Live­Jour­nal servers, is caching results.  The Live­Jour­nal servers will only get hit once per friend per hour.
  • Dis­trib­uted : I have no inter­est in set­ting up and main­tain­ing a cen­tral server for every­one to use.  Addi­tion­ally, run­ning such a server means a sin­gle IP address makes a large num­ber of requests to LiveJournal’s servers.  As the num­ber of LJProxy users using that sin­gle server increased, so would the strain on their servers.  Since all those requests would leave a big foot­print in their request logs, it would make it that much eas­ier for them to block the cen­tral server.  For these rea­sons, I am leav­ing the archi­tec­ture as a host-it-yourself one-person-per-installation setup.
  • Open : I have always been a big fan of Open Source projects, espe­cially when secu­rity is involved.  The code for LJProxy is released under a GPL license and is free to use and mod­ify.  It should with­stand the scrutiny of peer review — pass­words are han­dled cor­rectly and securely, and there’s no lit­tle tro­jan horse or back door.

Pre­req­ui­sites

  • First and fore­most, you need a web host for your instal­la­tion.  Basi­cally, you need a web­site in which you can install this appli­ca­tion in a subdirectory.
  • This web host needs to be on the pub­lic inter­net.  Google Reader’s servers need to be able to see this host.
  • The web host must sup­port PHP with the “xml” (also known as Sim­pleXML) exten­sion.  This is one of the default exten­sions, so if your host sup­ports PHP, it prob­a­bly sup­ports SimpleXML.
  • The web host must also sup­port the PHP “curl” exten­sion.  This is used to com­mu­ni­cate with Live­Jour­nal.  It is a fairly com­mon exten­sion, but is occa­sion­ally unavail­able at cer­tain hosts.
  • The web server needs to be able to write cache files to the instal­la­tion direc­tory.  For many hosts (includ­ing Dreamhost, assum­ing you don’t change the default set­tings), this is auto­matic.  For other hosts, you may have to “chmod” the direc­tory hold­ing LJProxy to “777”.

At the time of writ­ing, all of the above pre­req­ui­sites are met with Dreamhost host­ing.  If you use the link http://www.dreamhost.com/r.cgi?37325 to sign up, I will get a refer­ral reward.  If you sign up for host­ing on Dreamhost, the lit­tle bit of cash I get from the refer­ral is a great incen­tive (which costs noth­ing extra to you) for me to con­tinue development.

I would love to hear from you if you can con­firm that LJProxy does or does not work at a given host.  It would help me com­pile a list of com­pat­i­ble and incom­pat­i­ble web hosts.

A Note About Sharing

Google Reader gives you all sorts of ways to share feeds and posts/articles with your friends.  It has no con­cept of “friends locked” posts.  LJProxy tries to make things clear by includ­ing the post’s secu­rity in the title, so if you see it is friends-only, please do not share.  When read­ing Live­Jour­nal directly in a web browser, it takes a bit more effort to be an idiot — you have to explic­itly copy and paste the pri­vate text.  Google makes being an incon­sid­er­ate idiot a lit­tle more easy by putting shar­ing func­tions a few clicks or key­presses away.  Don’t be an idiot.  Your Live­Jour­nal friends trust you with their pri­vate posts.  Don’t betray that trust.

Instal­la­tion

  • First, you need a Live­Jour­nal account and need to be fol­low­ing peo­ple you care about.
  • Sec­ond, if you have not already set up friend groups on Live­Jour­nal, you should do so.  You should at least have one friend group defined (for exam­ple, “Default View”).  For best results, no friend group should have more than about 50 peo­ple in it.
  • Copy config-example.php to config.php and open it with a text editor.
  • Fill in your Live­Jour­nal user­name and pass­word in the appro­pri­ate places
  • Gen­er­ate a secret passkey for PASS_KEY.  This should be some­thing long and com­plex enough as to not be eas­ily guess­able.  Unless you are com­fort­able with URL-Encoding, it should be entirely let­ters and num­bers.  A sam­ple Linux com­mand is given to gen­er­ate a ran­dom one for you, so it might be eas­i­est to run the com­mand and paste the results into the con­fig file.
  • Upload the PHP files to a folder on your web host.

First Test

  • Point your browser to where you uploaded the files.
  • If you set up the con­fig file cor­rectly, your user­name should be displayed.
  • Enter your Live­Jour­nal password
  • You should see a list of your friend groups, each fol­lowed by a list of the users within the group.  Only active users and com­mu­ni­ties are dis­played.  Syn­di­cated (RSS) feeds, OpenID users, and deleted users are not dis­played.  For syn­di­cated feeds, you should sub­scribe directly to the feed’s RSS.  Any other account types besides active users and com­mu­ni­ties are unsupported.
  • Click on the URL next to a friend group (prefer­able, one of the small­est groups).  It will take any­where from a few sec­onds to a few min­utes depend­ing on your web host’s horse­power and band­width, but you should even­tu­ally get an RSS that con­tains aggre­gated friend posts for that group.

Usage

  • Right-click on one of the RSS URLs next to a Live­Jour­nal group and select your browser’s option for copy­ing the link location
  • Go to Google Reader and add that link as a new feed
  • Lather, rinse, repeat for each group.

The­ory of Operation

There are two modes of oper­a­tion within LJProxy.

The first is retriev­ing a list of your friends and friend groups.  This is what gets dis­played imme­di­ately after log­ging in.  It uses the client/server API (http://www.livejournal.com/doc/server/ljp.csp.protocol.html) to make secure requests.

The sec­ond mode of oper­a­tion is retrieval of RSS.  This mode con­forms to the fol­low­ing pseudocode:

  • If a cache file exists and has not yet expired, use it.  Oth­er­wise, per­form the fol­low­ing operations:
    • Per­form all of the steps in the first mode to get a list of users and group memberships.
    • For each user within the requested group:
      • Request that user’s RSS using Digest Authen­ti­ca­tion, accord­ing to the meth­ods in http://www.livejournal.com/support/faqbrowse.bml?faqid=149
      • Parse that user’s RSS into a PHP Sim­pleXML object
      • Use XPath to dis­card every­thing but the … nodes
      • Rewrite the item’s title tag to include user­name and secu­rity (and use the item’s pub­li­ca­tion date if no title was given)
      • Load them into a mas­ter array (of all users) using the item’s pub­li­ca­tion date as key and the XML as value
    • Sort the mas­ter array by newest date to old­est date
    • Save as cache and display

Gen­er­ated Files

log.txt

This is a log of the most recent set of Live­Jour­nal requests.  If you have direct access to the server run­ning LJProxy, (and it’s Linux-based) you can run “tail -f log.txt” to see it process and aggre­gate the feeds.

{group_name}-{passkey}.xml

This is the cache file for each group.  The passkey is included in the file­name so that some­one can’t just directly access “Default_View.xml” (or any other group name, assum­ing they know it).

Secu­rity

There are three places in this project where secu­rity is impor­tant: the com­mu­ni­ca­tion between your web host and Live­Jour­nal, the com­mu­ni­ca­tion between your web host and Google, the filesys­tem on your web host.

For com­mu­ni­ca­tion between your web host and Live­Jour­nal, the com­mu­ni­ca­tion is as secure as we can make it.  Fea­tures avail­able through the offi­cial LJ client/server API use the challenge/response mech­a­nism.  This includes the ini­tial login and retriev­ing the list of your friends and friend groups.  Retriev­ing actual friend entries is not avail­able through the API and instead goes through RSS fetches with Digest Authen­ti­ca­tion.  Both meth­ods hash your pass­word so that it is never sent across the wire in plaintext.

For com­mu­ni­ca­tions between Google and your web host, you gen­er­ated a pri­vate key.  This pri­vate key, obvi­ously, needs to be vis­i­ble to Google’s sys­tems but does not reveal, in any way, your Live­Jour­nal credentials.

The filesys­tem on your web host is the last place where secu­rity comes into play.  Since your Live­Jour­nal pass­word is stored in plain­text in the con­fig­u­ra­tion file, you must be cer­tain that — if you are using shared host­ing — the con­fig­u­ra­tion file is not vis­i­ble to oth­ers.  Most hosts do this auto­mat­i­cally, but you may want to double-check this with your host.

Change Log

Ver­sion 1.1 : UI and per­for­mance enhancements:

  • Out­put lim­ited to 100 most recent entries instead of all aggre­gated entries.
  • Fixed low­er­cas­ing issue in post titles.
  • Added ⚠Excla­ma­tion⚠ icons around non-public posts in titles.
  • Added default user icon to post.  (The spe­cific icon the user posted as is unavail­able, unfortunately.)
  • Added small LJ post time­stamp as first line in text — for clients that do not (or do not reli­ably) use the RSS pub­Date field.
  • Added 1 hour “TTL” field in RSS preamble
  • Doc­u­men­ta­tion changes

Ver­sion 1.0 : Ini­tial release

Con­tact

I’m open to sug­ges­tions for improve­ment (espe­cially if those sug­ges­tions come with patch files).  I can answer some ques­tions, but may not be able to answer webhost-specific ques­tions since every web host seems to be a lit­tle dif­fer­ent.  I can be reached at the email address at the top of this document.

{ 2 trackbacks }

LJProxy: From friend-locked LiveJournal posts to Google Reader — Netninja.com
November 30, 2009 9:09pm at 9:09 pm
Netninja site changes: Hipster PDA & Code — Netninja.com
December 29, 2009 7:46am at 7:46 am

{ 3 comments… read them below or add one }

1 stanleylieber February 19, 2010 5:29pm at 5:29 pm

This is fan­tas­tic, thanks! Do you know how often Google queries the feed, or how long its default time­out is? I have a huge num­ber of LJs in my cho­sen group, and when I run the script from a browser it takes a very long time, then even­tu­ally gives an “Inter­nal Server Error.” How­ever, refresh­ing the browser dis­plays how­ever many posts (a _lot_) it man­aged to grab before puking.

Reply

2 Brian Enigma February 26, 2010 11:03am at 11:03 am

Regard­less of how often Google queries the feed, the feed itself caches for 60 min­utes.  It also tells Google that it only changes every 60 min­utes, so I’m assum­ing it hon­ors that value.

If you have a lot of friends, it can take a long time to get their entries.  Many web hosts do not allow you to change the response time and will kill your script if it takes too long to respond.  Many also kill your script if it eats up too much CPU power (because you’re usu­ally on a server that is shared with dozens of other peo­ple and they don’t want some­one hog­ging the resources).  Both of these things can kick back an “Inter­nal Server Error.”  Unfor­tu­nately, there’s not a lot you can do to pre­vent this unless you run your own server.

Reply

3 stanleylieber February 19, 2010 5:30pm at 5:30 pm

Sorry, a cou­ple more details:

- Run­ning at Dreamhost, and it does seem to be suc­cess­fully scrap­ing LJ for at least some entries
– log.txt reports suc­cess on every LJ in the list

Reply

Leave a Comment