IPFS: The InterPlanetary File System

The InterPlanetary File System (IPFS) is a new Open Source project whose main goal is to turn the web into a distributed system — one that is not reliant on any one server or path. I’ve played around with it a bit this past week and would like to share my findings. My goal is to make this minimally technical, so that a large variety of audiences can understand what’s going on.

What does it do?

When you go to a page in your web browser, a lot of things happen behind the scenes. To view this blog post, your machine must find what other machine is netninja.com. It does this by asking a question following a tree of responses. “Who is dot-com?” “That machine over there.” “Okay dot-com, who is netninja?” “That machine over there knows who.” “Okay, netninja, where do I go to connect to you?” Your machine then opens up a connection, along a linear path that hops across several other machines, to the one running my website software. It then asks for and receives the web page.

The problem with this is that there are several single-points-of-failure. The FBI has discovered that the easiest way to take down a website is not to get at the site itself, but to alter the name lookup. You ask where netninja can be found and the lookup returns “that FBI-operated server over there,” which then prints up a “we’ve seized this domain” notice. The “undesirable” server is still running, but most people can’t get to it unless they possess some extra technical knowledge.

The internet has a mechanism baked into it that helps your connection automatically re-route around broken, damaged, or openly hostile servers. In fact, that’s why the internet was invented. But there are still a number of choke-points that exist. If you live in China, every connection passes through The Great Firewall of China. If you’re on Comcast, all your traffic must pass through Comcast’s servers. China can (and does) filter and block traffic. Internet service providers in the UK filter illegal content, but that filtering sometimes gets a little overzealous.

On the other hand, the peer-to-peer system used by IPFS has about a dozen central servers. There’s no reason why there couldn’t be multiple dozens or hundreds. They can live in many legal jurisdictions. Those servers help you find other machines running IPFS near you. Those nearby machines might be people using IPFS ephemerally — only running it for 10 minutes while they grab a file — or it could be someone running it full-time. No matter. Your IPFS software connects to dozens of others, who are connected to dozens of others, creating a dynamic mesh. As people join and leave, the mesh adapts. On top of this, each of those spokes out from your IPFS software to that of another is encrypted so that any “bad actors” monitoring the network cannot inspect the traffic.

Audience

Who is the target audience? Let’s look at two scenarios.

Oppressive Government

One obvious target audience might be citizens of oppressive governments. Information wants to be free, as they say, and eliminating the single-points-of-failure helps prevent any one entity from blocking access to information. Caveat: I am no longer a security researcher and have done no comparison of the anonymity, security, and trustworthiness of IPFS to, say, Tor. It might work for this purpose, but there’s no guarantee it will prevent you from getting caught without some more rigorous research into its security.

Scrappy Content Producers & Artists with Music Videos

Podcasters produce audio files on a regular schedule. The audio files aren’t that large that they can’t just be hosted on a regular shared website somewhere. As more podcasters (and producers, in general) move to video, the media files become more burdensome. If you want someone to watch a video in realtime, whether live or prerecorded, you end up generating not just one video file, but one for many different scenarios: smartphone on a slow cell network, smartphone on wifi, tablet, desktop, XBox. Every movie you upload to YouTube or that is available on Netflix, has been encoded into over a hundred such variants.

Many video podcasters and short-film producers are currently just fine with uploading to YouTube or Vimeo. The website tacks an ad on the front or slides up a lower-third ad banner and the content producer doesn’t need to worry about anything. The ads generate revenue and YouTube worries about the technical details or the cost of serving up large video files. But as Zoe Keating found out recently, Google is starting to throw its weight around and tie artists releasing music videos on YouTube to unreasonable deals with Google Music. When the big players in the video space go bad, where do you turn to?

Larger video sites, such as Netflix or even YouTube itself, have to worry about the storage and bandwidth costs of serving up videos. They spend money on Content Delivery Networks (CDNs) and large amounts of storage. Amateur audio podcasters can throw an mp3 file on a website and call it good. Amateur video folks that want to take things into their own hands quickly run out of storage and bandwidth.

IPFS is poised to bridge that gap. If it’s able to gain a little more user-friendliness (for both content producers and viewers, although the web gateways are a good step toward user-friendliness for viewers) and get some traction, it could easily be a “prosumer” video delivery service that sits between YouTube and hiring a CDN on your own.

Concepts

Programmers familiar with the git source control tool already know most of what they need to know. Everyone else, stick with me for a bit. Every unique file can be distilled down into a very large number, called the one-way-hash (or hash, for short). These numbers are so large that we represent them as a combination of letters and numbers. This number is calculated based on the content of the file. If you alter the file, you end up with a different number.

As a simplified example QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ, is a picture of a cat. If you modify the picture (resize it or add text, for example), its hash value changes because the content of the file changed. This parallels how source control works. Every time you save a new version of a file you get a new copy with a new hash. Source control extends this by tracking a log of history.

When you receive a link from someone and request it via IPFS, it’s just one of these hash values. Your client (or the web gateway) queries its swarm of peers to see who’s seen this hash before and who has a copy. It then grabs a copy and caches it locally. Popular content ends up duplicated across the network, leading to faster retrieval times. Content that diminishes in popularity will slowly get pruned from the swarm as individual systems need to make space to cache the popular stuff.

Testing

Since my day job revolves around encoding and serving up video, I have a lot of spare video on hand for IPFS experiments. I also have an affinity for folks who want to produce and distribute video on the cheap. I decided to create and upload a multi-bitrate video in Apple’s HTTP Livestreaming format (HLS) as a test. With this video format, your source video is cut into chunks. My 12 minute test video was chopped into about 70 pieces. Each of those pieces is encoded using six different encode settings, ranging in quality and size. The low end is designed for your phone over a slow cellphone network; it’s low resolution and grainy. The high end is your tablet on broadband. It’s high-res and high quality. If you’re counting along, you’ll know that’s over 400 video files, representing chunks of the overall video at various qualities. All of the variants add up to about 1.6GB of total disk space.

Your player starts playing the first segment — the first 10 seconds of video — and when it reaches the end it re-evaluates how good your network connection is when it goes to request the second segment. It may decide it’s getting the right quality. It may figure your network is better than it though and switch up to a better rendition. It may decide that the network slowed and there’s no way it can keep up with downloading and playing in realtime, in which case it will drop to a lower quality smaller file.

I generated these streams from an Open Source short film, Tears of Steal and then loaded this content into ipfs (literally as simple as typing ipfs add -r tears). From there, I attempted to load the video over one of the IPFS web gateways. In my first attempt, the results were dismal. When it actually played, I only ever got the crappy cellphone version. But generally it didn’t play and instead stalled.

After a brief Twitter conversation with one of the developers, I learned one of the limits of the current version of IPFS. It’s an alpha release, after all. It doesn’t work well when it’s behind a router (NAT translation, for the nerds out there).

No problem. The IPFS service isn’t terribly complex, doesn’t need a ton of computing power, and virtual machines through Amazon Web Services are cheap. A “micro” instance, which is plenty to join the IPFS swarm, is 1.3¢ per hour. That’s $2.18 for a week, which isn’t bad for a dumb little experiment.

Once I had IPFS fired up on a public-facing server, performance was much better. I got broadband-quality video and absolutely no stuttering or stalling. I didn’t ever see the maximum quality version on my player. I’m not sure why, nor do I know how to best diagnose that. It could be the web gateway or it could be the swarm itself.

Do you want to try out the video stream yourself? Check it out and report your results in the comments. Each of the six renditions has a quality indicator burned in to the top left of the video letting you know which of the 6 streams your player has decided it can play in realtime based on available bandwidth. “Quality 1/6” is the lowest resolution and bitrate. “Quality 6/6” is the highest. The easiest way is to point your iPhone, iPad, or Android device (capable of playing Apple’s HLS format) to:

http://ipfs.io/ipfs/QmPqSiSXCYALxKBg2Hvd9NQ4FdKhRiNeN5tqrgQssm1CKF/tears.m3u8

Initial Conclusions

I am, by no means, an expert in IPFS or its limits. The initial walkthrough of its operation and my basic experiments show me that it works well for “brochure” sites: static content that doesn’t really change. This includes large media files, such as video. It’s unclear to me that this would work well (if at all) for interactive sites or e-commerce. For instance, a message board needs one definitive touchpoint so that you know you’re seeing the latest messages and not a cache of what state it was in last week. How do you communicate “latest version” out to everyone, and keep it updated, when you have a plurality of hashes of various states it’s been in the past. I also haven’t delved into the underlying of the protocol and storage/caching system, but it seems like end-to-end security may be a concern. Are peer nodes able to see your credit card number (i.e. “man in the middle”) if you wanted to enter it in to a remote site somewhere? And again, is this even a use case that the project has targeted?

You can probably construct your own mental list of static content that might be useful served through IPFS. Media files such as music and movies come to mind immediately. It saves on CDN bandwidth. Legit content? Sure. Indie content? Sure. Pirate content? That’s feasible. It’s unclear to me how this could be legally prevented, especially with the plausible deniability of caching content in a swarm — but I am not a lawyer. The oppressive-government “dark web” stuff seems like a slam dunk: manifestos, instructions, meetup locations of subversive-minded folks, homepage for your local improvised munitions club. There’s no way to prevent that.

Challenges

I foresee two major challenges with IPFS. The first is the installed userbase. If and when IPFS catches on, this will solve itself. The system is pretty alpha right now. With the exception of folks being able to follow web-gateway links, everything else about the system is only user-friendly to programmers. As the back-end code matures, we’ll see more user-friendly front-end code that helps people inject their content into the swarm. In the future, this whole system may just be integrated into a web browser extension.

The second is searchability. How do I find what I’m looking for? So far, I’ve found content by someone tweeting a link or posting it to a website. If I want to find a specific document, or search for a document containing a phrase like like “Occupy Portland,” how do I go about doing that? Will there be a Google of IPFS? Or will search be built into the protocol itself?

Summary

This is definitely interesting technology. If you are tech-minded, I highly encourage you to play around with it. If you are not, maybe tell your friends and keep an eye on it until it matures?

Posted in: