LJProxy

LJProxy

by Brian Enigma < brian at netninja . com >
v1.1, 2009-11-31
https://netninja.com/projects/ljproxy

LJProxy is a proxy between your LiveJournal groups (including friends who post protected entries) and Google Reader (or any other RSS newsreader you care to use). It is designed to be simple to install and maintain with minimum server requirements.

Download

The current version, v1.1, is available at https://netninja.com/wp-content/uploads/2009/11/LJProxy-1.1.zip

Background

I have been on LiveJournal since 2001. I have had a Permanent Account since 2005. Many of my friends have been using it for similar amounts of time. In the past few years, I have migrated my posting and reading off of LiveJournal. I do most of my blog reading through Google Reader, but still have a number of people I want to follow on LiveJournal. They often post friends-only entries that are not accessible from within Google Reader. What I really want is a way to get each of my LiveJournal friend groups as a feed that contains both public and protected entries. This is what LJProxy was designed to do.

Design Goals

  • Secure : Communications between the web application and LiveJournal are using the LJ challenge/response protocol, when allowed by LiveJournal, so that your password is never sent as plaintext. Communications between Google Reader and the web application use a user-generated pass key as part of the URL. The pass key is unrelated to your LiveJournal credentials.  More detail about security is given in a later section.
  • Easily maintainable : As you add/remove people from friend groups on LiveJournal, they should automatically get added/removed from the RSS feed.
  • Minimal setup and system requirements : Anyone running a PHP website should be able to install this webapp on their site with minimal setup. No MySQL configuration. No special library dependencies. All you need is write permission from PHP to the directory (for holding cached information).
  • Fast : Retrieving and merging the recent entries 50-100 users from LiveJournal can take a minute or two. This web app tries its best to minimize processing time for faster results.
  • Cached results : Part of keeping fast, as well as reducing load on the LiveJournal servers, is caching results. The LiveJournal servers will only get hit once per friend per hour.
  • Distributed : I have no interest in setting up and maintaining a central server for everyone to use.  Additionally, running such a server means a single IP address makes a large number of requests to LiveJournal’s servers.  As the number of LJProxy users using that single server increased, so would the strain on their servers.  Since all those requests would leave a big footprint in their request logs, it would make it that much easier for them to block the central server.  For these reasons, I am leaving the architecture as a host-it-yourself one-person-per-installation setup.
  • Open : I have always been a big fan of Open Source projects, especially when security is involved.  The code for LJProxy is released under a GPL license and is free to use and modify.  It should withstand the scrutiny of peer review — passwords are handled correctly and securely, and there’s no little trojan horse or back door.

Prerequisites

  • First and foremost, you need a web host for your installation. Basically, you need a website in which you can install this application in a subdirectory.
  • This web host needs to be on the public internet. Google Reader’s servers need to be able to see this host.
  • The web host must support PHP with the “xml” (also known as SimpleXML) extension. This is one of the default extensions, so if your host supports PHP, it probably supports SimpleXML.
  • The web host must also support the PHP “curl” extension.  This is used to communicate with LiveJournal.  It is a fairly common extension, but is occasionally unavailable at certain hosts.
  • The web server needs to be able to write cache files to the installation directory. For many hosts (including Dreamhost, assuming you don’t change the default settings), this is automatic. For other hosts, you may have to “chmod” the directory holding LJProxy to “777”.

At the time of writing, all of the above prerequisites are met with Dreamhost hosting. If you use the link http://www.dreamhost.com/r.cgi?37325 to sign up, I will get a referral reward. If you sign up for hosting on Dreamhost, the little bit of cash I get from the referral is a great incentive (which costs nothing extra to you) for me to continue development.

I would love to hear from you if you can confirm that LJProxy does or does not work at a given host.  It would help me compile a list of compatible and incompatible web hosts.

A Note About Sharing

Google Reader gives you all sorts of ways to share feeds and posts/articles with your friends.  It has no concept of “friends locked” posts.  LJProxy tries to make things clear by including the post’s security in the title, so if you see it is friends-only, please do not share.  When reading LiveJournal directly in a web browser, it takes a bit more effort to be an idiot — you have to explicitly copy and paste the private text.  Google makes being an inconsiderate idiot a little more easy by putting sharing functions a few clicks or keypresses away.  Don’t be an idiot.  Your LiveJournal friends trust you with their private posts.  Don’t betray that trust.

Installation

  • First, you need a LiveJournal account and need to be following people you care about.
  • Second, if you have not already set up friend groups on LiveJournal, you should do so. You should at least have one friend group defined (for example, “Default View”). For best results, no friend group should have more than about 50 people in it.
  • Copy config-example.php to config.php and open it with a text editor.
  • Fill in your LiveJournal username and password in the appropriate places
  • Generate a secret passkey for PASS_KEY. This should be something long and complex enough as to not be easily guessable. Unless you are comfortable with URL-Encoding, it should be entirely letters and numbers. A sample Linux command is given to generate a random one for you, so it might be easiest to run the command and paste the results into the config file.
  • Upload the PHP files to a folder on your web host.

First Test

  • Point your browser to where you uploaded the files.
  • If you set up the config file correctly, your username should be displayed.
  • Enter your LiveJournal password
  • You should see a list of your friend groups, each followed by a list of the users within the group. Only active users and communities are displayed. Syndicated (RSS) feeds, OpenID users, and deleted users are not displayed. For syndicated feeds, you should subscribe directly to the feed’s RSS. Any other account types besides active users and communities are unsupported.
  • Click on the URL next to a friend group (preferable, one of the smallest groups). It will take anywhere from a few seconds to a few minutes depending on your web host’s horsepower and bandwidth, but you should eventually get an RSS that contains aggregated friend posts for that group.

Usage

  • Right-click on one of the RSS URLs next to a LiveJournal group and select your browser’s option for copying the link location
  • Go to Google Reader and add that link as a new feed
  • Lather, rinse, repeat for each group.

Theory of Operation

There are two modes of operation within LJProxy.

The first is retrieving a list of your friends and friend groups. This is what gets displayed immediately after logging in. It uses the client/server API (http://www.livejournal.com/doc/server/ljp.csp.protocol.html) to make secure requests.

The second mode of operation is retrieval of RSS. This mode conforms to the following pseudocode:

  • If a cache file exists and has not yet expired, use it. Otherwise, perform the following operations:
    • Perform all of the steps in the first mode to get a list of users and group memberships.
    • For each user within the requested group:
      • Request that user’s RSS using Digest Authentication, according to the methods in http://www.livejournal.com/support/faqbrowse.bml?faqid=149
      • Parse that user’s RSS into a PHP SimpleXML object
      • Use XPath to discard everything but the … nodes
      • Rewrite the item’s title tag to include username and security (and use the item’s publication date if no title was given)
      • Load them into a master array (of all users) using the item’s publication date as key and the XML as value
    • Sort the master array by newest date to oldest date
    • Save as cache and display

Generated Files

log.txt

This is a log of the most recent set of LiveJournal requests. If you have direct access to the server running LJProxy, (and it’s Linux-based) you can run “tail -f log.txt” to see it process and aggregate the feeds.

{group_name}-{passkey}.xml

This is the cache file for each group. The passkey is included in the filename so that someone can’t just directly access “Default_View.xml” (or any other group name, assuming they know it).

Security

There are three places in this project where security is important: the communication between your web host and LiveJournal, the communication between your web host and Google, the filesystem on your web host.

For communication between your web host and LiveJournal, the communication is as secure as we can make it. Features available through the official LJ client/server API use the challenge/response mechanism. This includes the initial login and retrieving the list of your friends and friend groups. Retrieving actual friend entries is not available through the API and instead goes through RSS fetches with Digest Authentication. Both methods hash your password so that it is never sent across the wire in plaintext.

For communications between Google and your web host, you generated a private key. This private key, obviously, needs to be visible to Google’s systems but does not reveal, in any way, your LiveJournal credentials.

The filesystem on your web host is the last place where security comes into play. Since your LiveJournal password is stored in plaintext in the configuration file, you must be certain that — if you are using shared hosting — the configuration file is not visible to others. Most hosts do this automatically, but you may want to double-check this with your host.

Change Log

Version 1.1 : UI and performance enhancements:

  • Output limited to 100 most recent entries instead of all aggregated entries.
  • Fixed lowercasing issue in post titles.
  • Added ⚠Exclamation⚠ icons around non-public posts in titles.
  • Added default user icon to post.  (The specific icon the user posted as is unavailable, unfortunately.)
  • Added small LJ post timestamp as first line in text — for clients that do not (or do not reliably) use the RSS pubDate field.
  • Added 1 hour “TTL” field in RSS preamble
  • Documentation changes

Version 1.0 : Initial release

Contact

I’m open to suggestions for improvement (especially if those suggestions come with patch files). I can answer some questions, but may not be able to answer webhost-specific questions since every web host seems to be a little different. I can be reached at the email address at the top of this document.

20 thoughts on “LJProxy”

  1. This is fantastic, thanks! Do you know how often Google queries the feed, or how long its default timeout is? I have a huge number of LJs in my chosen group, and when I run the script from a browser it takes a very long time, then eventually gives an “Internal Server Error.” However, refreshing the browser displays however many posts (a _lot_) it managed to grab before puking.

    1. Regardless of how often Google queries the feed, the feed itself caches for 60 minutes. It also tells Google that it only changes every 60 minutes, so I’m assuming it honors that value.

      If you have a lot of friends, it can take a long time to get their entries. Many web hosts do not allow you to change the response time and will kill your script if it takes too long to respond. Many also kill your script if it eats up too much CPU power (because you’re usually on a server that is shared with dozens of other people and they don’t want someone hogging the resources). Both of these things can kick back an “Internal Server Error.” Unfortunately, there’s not a lot you can do to prevent this unless you run your own server.

  2. Sorry, a couple more details:

    – Running at Dreamhost, and it does seem to be successfully scraping LJ for at least some entries
    – log.txt reports success on every LJ in the list

  3. It has to be said; LJProxy is pure awesome. Thanks for this.

    Quick question, though. Last couple days for no apparent reason the script appears to run, but the feeds it generates are empty. I’m on a couple active communities, so when the entries stopped coming in, I kind of suspected something was up. I checked the actual feeds, and not even the older entries are present. I’m on Dreamhost, and as said this only started happening maybe a day or two ago. Any ideas?

    1. It’s been working fine for me (on Dreamhost, but I have a virtual private server there, so have some CPU/memory resources above and beyond their standard hosting). You might take a peek at the log.txt file to see what LJProxy thinks it’s doing. That’ll tell you how many entries of each feed it has retrieved. That’ll help determine if it’s something on the LJ side of things (or in the retrieving/parsing of the LJ RSS) or if it’s something with the feed aggregation within LJProxy itself. Also, I know that Dreamhost sometimes kills long-running or resource-intensive processes and I seem to remember something about a log of what it’s automatically killed, but don’t remember the details. If you can find such a log, that might let you know if that is the root cause.

  4. Hello! I ran the LJProxy, and everything worked fine until I attempted to put the feed link into Google Reader! Then, Google Reader just loads and gives me an error. Any advice? Thank you!

  5. Thanks so much for creating this. It’s incredibly helpful.

    The only problem I’m having is that when it shows posts from communities, it doesn’t show who the individual poster is. For some communities, this is essential information. Is there any way to display the username of whoever posted the entry?

    Thanks again. 🙂

    1. I believe this just aggregates all of the available information from the LJ RSS into a single consolidated RSS. I’m not involved in any LJ communities, so do not have a lot of test data, but taking a quick peek at a sample community’s RSS, it looks like the original poster might be available in a non-standard RSS element (lj:poster, specifically). I have a number of other work/home projects going on right now, so it’s unlikely I’ll be able to add this for a few weeks or months — though the code is open-source and available, so obviously somebody with a little more free time could contribute a patch if they were so inclined.

      1. Cool, thanks. I got it working by adding to ljrss.php on line 301:

        // Get the poster if it’s a comm
        $posterNode = $node->xpath(“lj:poster”);
        if (sizeof($posterNode) > 0)
        $communityposter = $posterNode[0]->asXml();
        $communityposter = preg_replace(‘//’, ”, $communityposter);

        And then appending the following to the title after the username:

        . ” ” . $this->sanitizeXml($communityposter)

        I’m not really a coder so not 100% sure how this works but it is working so yay. 🙂 Thanks for the tip.

  6. Hey! Thank you so much for this, it’s exactly what I’ve been looking for. I’m having a problem though and I’m not sure what’s wrong…? I log in on my website and everything generates properly as far as I can tell but when I add the feed to my Google Reader it comes up as “no posts”? I’ve waited until a new post was made on my LJ friends list but still nothing at all has appeared on my Reader. Could I have done something wrong? I’ve checked all the requirements and everything seems to be set up right (and my website is also on dreamhost so it should support all the extensions.) Any ideas would be greatly appreciated, thanks again!

  7. A recent forced change to the new friends page format broke my old method of a script that would log in to my friends page with a custom RSS s2 style to make a feed out of it. The new (as of 2013, so not that new) friends page does not respect s2 styling, so I found this little gem instead. I know it hasn’t been updated in years, and that most people have left LJ by now. I’m hoping the reason it hasn’t been updated is because it is still working. Can you confirm that for me?

    1. Hi, Chad. I originally wrote ljproxy a number of years ago, but I stopped using it about a year ago when Google Reader shut down and I pared down my subscriptions when switching to NewsBlur. I have no idea whether it still functions as advertised. My suspicion is that it _probably_ does. It retrieves a list of your friends, which is an API I wouldn’t expect to change much, then it retrieves a separate RSS for for each friend, another API I wouldn’t expect to change. It then collates those individual RSS feeds into a single one (entirely in-script, not dependent upon LJ) and serves that up.

      1. Thanks. Testing it out. Log file says it is successfully scraping, but the resultant xml file has no entries. Poking into the ljrss file to see if I can follow along and find out where the issue is.

Leave a Reply

Your email address will not be published. Required fields are marked *