LJProxy

LJProxy

by Brian Enigma < brian at net­ninja . com >
v1.1, 2009-11-31
http://netninja.com/projects/ljproxy

LJProxy is a proxy between your LiveJournal groups (includ­ing friends who post pro­tected entries) and Google Reader (or any other RSS news­reader you care to use). It is designed to be sim­ple to install and main­tain with min­i­mum server require­ments.

Download

The cur­rent ver­sion, v1.1, is avail­able at http://netninja.com/wp-content/uploads/2009/11/LJProxy-1.1.zip

Background

I have been on LiveJournal since 2001. I have had a Permanent Account since 2005. Many of my friends have been using it for sim­i­lar amounts of time. In the past few years, I have migrated my post­ing and read­ing off of LiveJournal. I do most of my blog read­ing through Google Reader, but still have a num­ber of peo­ple I want to fol­low on LiveJournal. They often post friends-only entries that are not acces­si­ble from within Google Reader. What I really want is a way to get each of my LiveJournal friend groups as a feed that con­tains both pub­lic and pro­tected entries. This is what LJProxy was designed to do.

Design Goals

  • Secure : Communications between the web appli­ca­tion and LiveJournal are using the LJ challenge/response pro­to­col, when allowed by LiveJournal, so that your pass­word is never sent as plain­text. Communications between Google Reader and the web appli­ca­tion use a user-generated pass key as part of the URL. The pass key is unre­lated to your LiveJournal cre­den­tials.  More detail about secu­rity is given in a later sec­tion.
  • Easily main­tain­able : As you add/remove peo­ple from friend groups on LiveJournal, they should auto­mat­i­cally get added/removed from the RSS feed.
  • Minimal setup and sys­tem require­ments : Anyone run­ning a PHP web­site should be able to install this webapp on their site with min­i­mal setup. No MySQL con­fig­u­ra­tion. No spe­cial library depen­den­cies. All you need is write per­mis­sion from PHP to the direc­tory (for hold­ing cached infor­ma­tion).
  • Fast : Retrieving and merg­ing the recent entries 50–100 users from LiveJournal can take a minute or two. This web app tries its best to min­i­mize pro­cess­ing time for faster results.
  • Cached results : Part of keep­ing fast, as well as reduc­ing load on the LiveJournal servers, is caching results. The LiveJournal servers will only get hit once per friend per hour.
  • Distributed : I have no inter­est in set­ting up and main­tain­ing a cen­tral server for every­one to use.  Additionally, run­ning such a server means a sin­gle IP address makes a large num­ber of requests to LiveJournal’s servers.  As the num­ber of LJProxy users using that sin­gle server increased, so would the strain on their servers.  Since all those requests would leave a big foot­print in their request logs, it would make it that much eas­ier for them to block the cen­tral server.  For these rea­sons, I am leav­ing the archi­tec­ture as a host-it-yourself one-person-per-installation setup.
  • Open : I have always been a big fan of Open Source projects, espe­cially when secu­rity is involved.  The code for LJProxy is released under a GPL license and is free to use and mod­ify.  It should with­stand the scrutiny of peer review — pass­words are han­dled cor­rectly and securely, and there’s no lit­tle tro­jan horse or back door.

Prerequisites

  • First and fore­most, you need a web host for your instal­la­tion. Basically, you need a web­site in which you can install this appli­ca­tion in a sub­di­rec­tory.
  • This web host needs to be on the pub­lic inter­net. Google Reader’s servers need to be able to see this host.
  • The web host must sup­port PHP with the “xml” (also known as SimpleXML) exten­sion. This is one of the default exten­sions, so if your host sup­ports PHP, it prob­a­bly sup­ports SimpleXML.
  • The web host must also sup­port the PHP “curl” exten­sion.  This is used to com­mu­ni­cate with LiveJournal.  It is a fairly com­mon exten­sion, but is occa­sion­ally unavail­able at cer­tain hosts.
  • The web server needs to be able to write cache files to the instal­la­tion direc­tory. For many hosts (includ­ing Dreamhost, assum­ing you don’t change the default set­tings), this is auto­matic. For other hosts, you may have to “chmod” the direc­tory hold­ing LJProxy to “777”.

At the time of writ­ing, all of the above pre­req­ui­sites are met with Dreamhost host­ing. If you use the link http://www.dreamhost.com/r.cgi?37325 to sign up, I will get a refer­ral reward. If you sign up for host­ing on Dreamhost, the lit­tle bit of cash I get from the refer­ral is a great incen­tive (which costs noth­ing extra to you) for me to con­tinue devel­op­ment.

I would love to hear from you if you can con­firm that LJProxy does or does not work at a given host.  It would help me com­pile a list of com­pat­i­ble and incom­pat­i­ble web hosts.

A Note About Sharing

Google Reader gives you all sorts of ways to share feeds and posts/articles with your friends.  It has no con­cept of “friends locked” posts.  LJProxy tries to make things clear by includ­ing the post’s secu­rity in the title, so if you see it is friends-only, please do not share.  When read­ing LiveJournal directly in a web browser, it takes a bit more effort to be an idiot — you have to explic­itly copy and paste the pri­vate text.  Google makes being an incon­sid­er­ate idiot a lit­tle more easy by putting shar­ing func­tions a few clicks or key­presses away.  Don’t be an idiot.  Your LiveJournal friends trust you with their pri­vate posts.  Don’t betray that trust.

Installation

  • First, you need a LiveJournal account and need to be fol­low­ing peo­ple you care about.
  • Second, if you have not already set up friend groups on LiveJournal, you should do so. You should at least have one friend group defined (for exam­ple, “Default View”). For best results, no friend group should have more than about 50 peo­ple in it.
  • Copy config-example.php to config.php and open it with a text edi­tor.
  • Fill in your LiveJournal user­name and pass­word in the appro­pri­ate places
  • Generate a secret passkey for PASS_KEY. This should be some­thing long and com­plex enough as to not be eas­ily guess­able. Unless you are com­fort­able with URL-Encoding, it should be entirely let­ters and num­bers. A sam­ple Linux com­mand is given to gen­er­ate a ran­dom one for you, so it might be eas­i­est to run the com­mand and paste the results into the con­fig file.
  • Upload the PHP files to a folder on your web host.

First Test

  • Point your browser to where you uploaded the files.
  • If you set up the con­fig file cor­rectly, your user­name should be dis­played.
  • Enter your LiveJournal pass­word
  • You should see a list of your friend groups, each fol­lowed by a list of the users within the group. Only active users and com­mu­ni­ties are dis­played. Syndicated (RSS) feeds, OpenID users, and deleted users are not dis­played. For syn­di­cated feeds, you should sub­scribe directly to the feed’s RSS. Any other account types besides active users and com­mu­ni­ties are unsup­ported.
  • Click on the URL next to a friend group (prefer­able, one of the small­est groups). It will take any­where from a few sec­onds to a few min­utes depend­ing on your web host’s horse­power and band­width, but you should even­tu­ally get an RSS that con­tains aggre­gated friend posts for that group.

Usage

  • Right-click on one of the RSS URLs next to a LiveJournal group and select your browser’s option for copy­ing the link loca­tion
  • Go to Google Reader and add that link as a new feed
  • Lather, rinse, repeat for each group.

Theory of Operation

There are two modes of oper­a­tion within LJProxy.

The first is retriev­ing a list of your friends and friend groups. This is what gets dis­played imme­di­ately after log­ging in. It uses the client/server API (http://www.livejournal.com/doc/server/ljp.csp.protocol.html) to make secure requests.

The sec­ond mode of oper­a­tion is retrieval of RSS. This mode con­forms to the fol­low­ing pseudocode:

  • If a cache file exists and has not yet expired, use it. Otherwise, per­form the fol­low­ing oper­a­tions:
    • Perform all of the steps in the first mode to get a list of users and group mem­ber­ships.
    • For each user within the requested group:
      • Request that user’s RSS using Digest Authentication, accord­ing to the meth­ods in http://www.livejournal.com/support/faqbrowse.bml?faqid=149
      • Parse that user’s RSS into a PHP SimpleXML object
      • Use XPath to dis­card every­thing but the ... nodes
      • Rewrite the item’s title tag to include user­name and secu­rity (and use the item’s pub­li­ca­tion date if no title was given)
      • Load them into a mas­ter array (of all users) using the item’s pub­li­ca­tion date as key and the XML as value
    • Sort the mas­ter array by newest date to old­est date
    • Save as cache and dis­play

Generated Files

log.txt

This is a log of the most recent set of LiveJournal requests. If you have direct access to the server run­ning LJProxy, (and it’s Linux-based) you can run “tail -f log.txt” to see it process and aggre­gate the feeds.

{group_name}-{passkey}.xml

This is the cache file for each group. The passkey is included in the file­name so that some­one can’t just directly access “Default_View.xml” (or any other group name, assum­ing they know it).

Security

There are three places in this project where secu­rity is impor­tant: the com­mu­ni­ca­tion between your web host and LiveJournal, the com­mu­ni­ca­tion between your web host and Google, the filesys­tem on your web host.

For com­mu­ni­ca­tion between your web host and LiveJournal, the com­mu­ni­ca­tion is as secure as we can make it. Features avail­able through the offi­cial LJ client/server API use the challenge/response mech­a­nism. This includes the ini­tial login and retriev­ing the list of your friends and friend groups. Retrieving actual friend entries is not avail­able through the API and instead goes through RSS fetches with Digest Authentication. Both meth­ods hash your pass­word so that it is never sent across the wire in plain­text.

For com­mu­ni­ca­tions between Google and your web host, you gen­er­ated a pri­vate key. This pri­vate key, obvi­ously, needs to be vis­i­ble to Google’s sys­tems but does not reveal, in any way, your LiveJournal cre­den­tials.

The filesys­tem on your web host is the last place where secu­rity comes into play. Since your LiveJournal pass­word is stored in plain­text in the con­fig­u­ra­tion file, you must be cer­tain that — if you are using shared host­ing — the con­fig­u­ra­tion file is not vis­i­ble to oth­ers. Most hosts do this auto­mat­i­cally, but you may want to double-check this with your host.

Change Log

Version 1.1 : UI and per­for­mance enhance­ments:

  • Output lim­ited to 100 most recent entries instead of all aggre­gated entries.
  • Fixed low­er­cas­ing issue in post titles.
  • Added ⚠Exclamation⚠ icons around non-public posts in titles.
  • Added default user icon to post.  (The spe­cific icon the user posted as is unavail­able, unfor­tu­nately.)
  • Added small LJ post time­stamp as first line in text — for clients that do not (or do not reli­ably) use the RSS pub­Date field.
  • Added 1 hour “TTL” field in RSS pre­am­ble
  • Documentation changes

Version 1.0 : Initial release

Contact

I’m open to sug­ges­tions for improve­ment (espe­cially if those sug­ges­tions come with patch files). I can answer some ques­tions, but may not be able to answer webhost-specific ques­tions since every web host seems to be a lit­tle dif­fer­ent. I can be reached at the email address at the top of this doc­u­ment.

20 thoughts on “LJProxy

  1. This is fan­tas­tic, thanks! Do you know how often Google queries the feed, or how long its default time­out is? I have a huge num­ber of LJs in my cho­sen group, and when I run the script from a browser it takes a very long time, then even­tu­ally gives an “Internal Server Error.” However, refresh­ing the browser dis­plays how­ever many posts (a _lot_) it man­aged to grab before puk­ing.

    1. Regardless of how often Google queries the feed, the feed itself caches for 60 min­utes. It also tells Google that it only changes every 60 min­utes, so I’m assum­ing it hon­ors that value.

      If you have a lot of friends, it can take a long time to get their entries. Many web hosts do not allow you to change the response time and will kill your script if it takes too long to respond. Many also kill your script if it eats up too much CPU power (because you’re usu­ally on a server that is shared with dozens of other peo­ple and they don’t want some­one hog­ging the resources). Both of these things can kick back an “Internal Server Error.” Unfortunately, there’s not a lot you can do to pre­vent this unless you run your own server.

  2. Sorry, a cou­ple more details:

    - Running at Dreamhost, and it does seem to be suc­cess­fully scrap­ing LJ for at least some entries
    – log.txt reports suc­cess on every LJ in the list

  3. It has to be said; LJProxy is pure awe­some. Thanks for this.

    Quick ques­tion, though. Last cou­ple days for no appar­ent rea­son the script appears to run, but the feeds it gen­er­ates are empty. I’m on a cou­ple active com­mu­ni­ties, so when the entries stopped com­ing in, I kind of sus­pected some­thing was up. I checked the actual feeds, and not even the older entries are present. I’m on Dreamhost, and as said this only started hap­pen­ing maybe a day or two ago. Any ideas?

    1. It’s been work­ing fine for me (on Dreamhost, but I have a vir­tual pri­vate server there, so have some CPU/memory resources above and beyond their stan­dard host­ing). You might take a peek at the log.txt file to see what LJProxy thinks it’s doing. That’ll tell you how many entries of each feed it has retrieved. That’ll help deter­mine if it’s some­thing on the LJ side of things (or in the retrieving/parsing of the LJ RSS) or if it’s some­thing with the feed aggre­ga­tion within LJProxy itself. Also, I know that Dreamhost some­times kills long-running or resource-intensive processes and I seem to remem­ber some­thing about a log of what it’s auto­mat­i­cally killed, but don’t remem­ber the details. If you can find such a log, that might let you know if that is the root cause.

  4. Hello! I ran the LJProxy, and every­thing worked fine until I attempted to put the feed link into Google Reader! Then, Google Reader just loads and gives me an error. Any advice? Thank you!

  5. Thanks so much for cre­at­ing this. It’s incred­i­bly help­ful.

    The only prob­lem I’m hav­ing is that when it shows posts from com­mu­ni­ties, it doesn’t show who the indi­vid­ual poster is. For some com­mu­ni­ties, this is essen­tial infor­ma­tion. Is there any way to dis­play the user­name of who­ever posted the entry?

    Thanks again. :)

    1. I believe this just aggre­gates all of the avail­able infor­ma­tion from the LJ RSS into a sin­gle con­sol­i­dated RSS. I’m not involved in any LJ com­mu­ni­ties, so do not have a lot of test data, but tak­ing a quick peek at a sam­ple community’s RSS, it looks like the orig­i­nal poster might be avail­able in a non-standard RSS ele­ment (lj:poster, specif­i­cally). I have a num­ber of other work/home projects going on right now, so it’s unlikely I’ll be able to add this for a few weeks or months — though the code is open-source and avail­able, so obvi­ously some­body with a lit­tle more free time could con­tribute a patch if they were so inclined.

      1. Cool, thanks. I got it work­ing by adding to ljrss.php on line 301:

        // Get the poster if it’s a comm
        $posterN­ode = $node->xpath(“lj:poster”);
        if (sizeof($posterNode) > 0)
        $com­mu­ni­ty­poster = $posterNode[0]->asXml();
        $com­mu­ni­ty­poster = preg_replace(‘//’, ”, $com­mu­ni­ty­poster);

        And then append­ing the fol­low­ing to the title after the user­name:

        . ” ” . $this->sanitizeXml($communityposter)

        I’m not really a coder so not 100% sure how this works but it is work­ing so yay. :) Thanks for the tip.

  6. Hey! Thank you so much for this, it’s exactly what I’ve been look­ing for. I’m hav­ing a prob­lem though and I’m not sure what’s wrong...? I log in on my web­site and every­thing gen­er­ates prop­erly as far as I can tell but when I add the feed to my Google Reader it comes up as “no posts”? I’ve waited until a new post was made on my LJ friends list but still noth­ing at all has appeared on my Reader. Could I have done some­thing wrong? I’ve checked all the require­ments and every­thing seems to be set up right (and my web­site is also on dreamhost so it should sup­port all the exten­sions.) Any ideas would be greatly appre­ci­ated, thanks again!

  7. A recent forced change to the new friends page for­mat broke my old method of a script that would log in to my friends page with a cus­tom RSS s2 style to make a feed out of it. The new (as of 2013, so not that new) friends page does not respect s2 styling, so I found this lit­tle gem instead. I know it hasn’t been updated in years, and that most peo­ple have left LJ by now. I’m hop­ing the rea­son it hasn’t been updated is because it is still work­ing. Can you con­firm that for me?

    1. Hi, Chad. I orig­i­nally wrote ljproxy a num­ber of years ago, but I stopped using it about a year ago when Google Reader shut down and I pared down my sub­scrip­tions when switch­ing to NewsBlur. I have no idea whether it still func­tions as adver­tised. My sus­pi­cion is that it _probably_ does. It retrieves a list of your friends, which is an API I wouldn’t expect to change much, then it retrieves a sep­a­rate RSS for for each friend, another API I wouldn’t expect to change. It then col­lates those indi­vid­ual RSS feeds into a sin­gle one (entirely in-script, not depen­dent upon LJ) and serves that up.

      1. Thanks. Testing it out. Log file says it is suc­cess­fully scrap­ing, but the resul­tant xml file has no entries. Poking into the ljrss file to see if I can fol­low along and find out where the issue is.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>