Why IPv4 Inhibits Decentralization
IPv4 makes it harder to write decentralized applications. Why? As you’ll see, it all comes down to the “address space” or the number of possible addresses. IPv4 addresses are four bytes long (e.g. 127.0.0.1), which means the IPv4 address space contains 2³² (~4.3 billion) unique addresses. That may sound like a lot, but there are currently over 8 billion Internet-connected devices on the planet, with millions more being added every day! Simply put, IPv4 doesn’t have enough addresses to meet our needs. Why didn’t its creators foresee this? Well, they probably did, but they never expected us to still be using IPv4 in 2015, nearly 35 years after it was designed. IPv4 was just an internal test of DARPA’s networking concepts, with the intent that a future iteration would be used for the public release. But this newfangled “Inter-network” idea escaped from the lab, and as a result our Internet infrastructure is built on unfinished tech. (The longer you study the Internet, the more amazed you’ll be that it works at all.)
As Internet adoption exploded in the early 90’s, it became abundantly clear that the IPv4 address space would soon be exhausted. Internet engineers quickly completed IPv6, which features a vastly bigger address space, but found it hard to convince people to switch; IPv4 worked well enough, and the cost of switching outweighed the immediate benefit (zero). So instead, developers implemented a number of workarounds that slowed the depletion of IPv4 addresses, extending its lifespan by many years. But in doing so, they had to compromise a basic principle of the decentralized Internet.
NAT
The main workaround to prevent IPv4 address exhaustion is the use of private networks and Network Address Translation (NAT). Your computer is part of a private network that comprises all the devices connected to your home router, which in turn is connected to the Internet. The router has a unique IPv4 address, but the devices connected to it do not; they always fall within a handful of standardized ranges, such as 192.168.1.X or 10.0.0.X. Because millions of devices share these addresses, they aren’t useful on the public Internet; they’re ambiguous. This violates item 4.4 in the RFC linked above.
Because your device has an ambiguous IP address, you can’t ask a server to send it a web page. Instead, you must ask your router to make the request on your behalf. This is where the translation comes in: the router rewrites the packets you send it, changing your local address to its own unique address, before sending it along. So the web server will see a request originating from your router’s IP address, and send its response back to the router. Meanwhile, the router has added a temporary rule: “forward packets from server X to device Y.” Now when the response arrives, it will be forwarded to your device. This all happens invisibly, as though your device were connected directly to the server.
That’s all fine and dandy when you’re connecting to the Internet. But what if the Internet wants to connect to you? And this is the crux of the matter. You cannot give out your local IP address, because it does not uniquely identify your device. Nor can you give out your router’s IP address; how would the router know which device to forward the packets to? So now we have an unfortunate asymmetry: you can talk to other people, but they can’t talk to you. Even worse, if both peers are behind NAT, neither can talk to the other at all!
There are two primary ways we can address this issue, and Sia uses both. The first is to make connections bi-directional; that is, once a connection is formed, it is held open indefinitely, and either party can use it to send requests to the other. This is accomplished via stream multiplexing, which is sort of like having two conversations with your friend at the same time. The win here is that if either party is reachable, you can establish two-way communication. But if both peers are behind NAT, you’re still out of luck.
The second, more effective fix is to add a permanent rule to your router that says “forward packets addressed to port X to device Y.” This is called port forwarding. For example, when you advertise yourself as a host on the Sia network, you include your router’s IP address and the port you forwarded. Then, anyone wishing to form file contracts can send unsolicited packets to that port, and the router will forward them to your device. In practice this works very well, but there are a few downsides. Namely, we don’t like asking users to monkey around with their router configuration. Another related problem is that if your local IP address changes, you need to update the forwarding rule. Fortunately, there is a technology that addresses both of these downsides.
UPnP
UPnP (Universal Plug and Play) is a set of technologies that allow devices to advertise their services on a network. Other devices on the network can then access these services by issuing specially-crafted commands to the recipient device. In theory this has a wide range of applications, but in practice it is mostly used to talk to media servers and routers. One possible reason for this is that UPnP is a truly abysmal protocol that developers valuing their sanity should avoid like the plague. For the acronym-savvy, it is SOAP over HTTP over UDP. Precisely why the UPnP Forum elected to layer an XML-based control protocol on top of the already-perfectly-capable HTTP, we may never know.
UPnP sucks for a lot of reasons; device discovery is slow, marshalling and unmarshalling XML is annoying, and the error messages are insultingly useless despite their verbosity — here’s the error I got when an IP address parameter wasn’t formatted correctly:
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" s:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <s:Body> <s:Fault> <faultcode>s:Client</faultcode> <faultstring>UPnPError</faultstring> <detail> <UPnPError xmlns="urn:schemas-upnp-org:control-1-0"> <errorCode>402</errorCode> <errorDescription>InvalidArgs</errorDescription> </UPnPError> </detail> </s:Fault> </s:Body> </s:Envelope>But as much as we hate it, UPnP works: your decentralized application can now instruct your router to forward the necessary ports without any user intervention. UPnP can also be used to learn the router’s unique IP address, which is essential when you want to advertise your address to potential peers. Sadly, not all routers support UPnP, and many disable it by default. But for now, it’s the best we’ve got.
The nice thing about software is that you can often build nice abstractions on top of ugly protocols. If you are a developer writing peer-to-peer software, there are a number of UPnP libraries available to you. The most popular C implementation is mini-upnp. If you use Go, check out our go-upnp package, which makes it dead-simple to forward your ports and discover your external IP address. (Actually, those are the only things it does, so the name is a bit misleading. But like I said, that’s basically all UPnP is used for anyway.)
Conclusion
IPv4’s small address space has led to an abundance of ambiguous addresses, making peer-to-peer software harder to write. Eventually, IPv6 will allow every device to have a unique address, but until then we have to find ways of working around IPv4’s deficiencies. UPnP is one such workaround, albeit an ugly one. There are a number of others not covered here, including hole-punching and relay nodes. These are not used in Sia; hole-punching is unreliable over TCP, and relay nodes introduce a number of man-in-the-middle attack vectors.
Sia, by Nebulous Inc., is a blockchain-based decentralized cloud storage platform.