Addressing Thoughtful Concerns about Sia’s Security + Viability

The Sia Foundation
The Sia Blog
Published in
13 min readFeb 8, 2017

--

Following a recent article, a flurry of tweets appeared questioning Sia’s security and viability. A lot of these concerns are more focused, more technical, and aren’t brought up often. But the responses do give a lot of insight into the construction of the Sia network, so I thought I would answer them thoroughly.

1. How vulnerable is Sia to a 51% attack?

Sia is more vulnerable than Bitcoin to a 51% attack. We have fewer mining pools by a significant amount, and less hashrate by a significant amount. That said, our hashrate suggests that there is between 10,000 and 50,000 GPUs actively hashing on our network at all times. A 51% attack is not something you could just perform by pointing your own hardware at it, or even by dumping a bunch of money into an AWS cluster. Your best strategy would be to either bribe the larger mining pools, or find some way to compromise them. Of course, the moment a mining pool starts acting up all of the miners can flee to another pool.

Generally speaking, Sia is safe against 51% attacks. It’s not as strong as Bitcoin. But it’s also a much larger + more heavily hashed blockchain than any that has been successfully attacked for a sustained period of time.

2. The IP addresses of the Sia hosts + nodes are a hit list for hackers. What if someone targets all of the machines on the ecosystem?

Hosts in Sia are expected to be performing their own devops. A concern was expressed that this would not be sufficient, and perhaps that is valid concern. To-date, we have not seen widespread hacks of peer-to-peer ecosystems. When an ecosystem is getting attacked, generally its the protocol that’s being attacked or DDoS’d, and not the individual nodes themselves.

And there’s a good reason for that. Nodes run on a highly diverse set of platforms, running different code stacks and with different vulnerability profiles. Taking out a significant part of the Sia network either requires finding a significant vulnerability directly in the Sia software itself, or else requires using a wide range of exploits to target a large number of highly diverse machines.

Sia adds high redundancy to files to improve security. The default settings are 10-of-30, meaning out of 30 hosts, you only need 10 to survive an attack/disaster/etc. for the file to be retrievable. Hackers trying to target your specific hosts would need a 66% success rate to actually do any damage. And if this attack didn’t all go off at the same time, you’d be able to quickly restore the redundancy.

Overall, I would say that the risk of wide-ranging attacks targeting our hosts is a more significant risk than a 51% attack. But it’s an attack that would require a lot of expertise, a lot of time, and a lot of ready-to-go exploits. Unfortunately, ready-to-go exploits are not actually uncommon today, which definitely amplifies the risk of this type of attack.

3. Do you really expect your host nodes to be able to perform competent SecOps?

That is a good question. Sia expects hosts to be performing their own secops in the standard case. Certainly today we don’t have any certifications or checks that we put hosts through to make sure they are protecting themselves from attack. It’s something we’re looking into but would like to avoid, as it clearly hurts decentralization (getting a certification requires having a central entity that can choose to not certify you).

As Sia continues to grow, we very much expect the vast majority of nodes on Sia to be dedicated machines. This happened with Bitcoin mining. We tried to mirror that template — Bitcoin mining was so brutally effective because there was a very clear relationship between hashrate and profit. No business development needed, no marketing needed, you needed to be good at only one thing to build a successful Bitcoin mining operation (hashrate!). Sia has tried to replicate this with storage. You don’t need to do any marketing, and business development, you don’t need to have a recognized name. You just need a storage offering that is more competitive than the other guys on the network.

We expect most hosts on Sia to be dedicated hosts, which makes SecOps a lot easier. Run a secure operating system, open only exactly the ports you need, run only exactly the software you need. It’s not bulletproof but it’s also much easier than keeping your family computer safe. And, there’s a strong incentive to keep the hosts safe, because they have money on the line. Hosts are putting collateral into contracts that they will lose if their machine is compromised. Hosts need to keep money in their wallets (to provide the collateral in the first place) that they will lose if their machine is compromised.

I ultimately do expect our hosts to be able to have reasonable security. And further, I expect our hosts will be choosing a high diversity of platforms, which means you’ll need a large number of vulnerabilities to do any real damage.

4. I can buy harddrives at $25 / TB, what’s the value add for Sia?

The storage price on Sia today is $1/TB/Mo (that’s after redundancy). This can be compared to $23/TB/Mo on Amazon S3, which is the most comparable service to Sia. Sia is extremely inexpensive for the type of service it offers, and that gives Sia a massive advantage in the market. And in fact, accounting for redundancy, Sia is price competitive with hard drives as well. And there are a lot of elements that drive Sia’s competitiveness.

First, Sia hosts get to enjoy unique advantages that you don’t get to enjoy when you are buying drives for home. Some hosts already had drives laying around that weren’t in use. Some hosts were able to get strong bulk discounts on their storage. Some hosts have access to cheap/free phyical space or electricity, among other advantages that we don’t all get to enjoy. These advantages drive the overall price of storage down.

Second, Sia uses high-scale redundancy which makes the platform substantially more reliable. Probabilistically, a 10-of-30 redundancy scheme is far safer than a 1-of-3 redundancy scheme, provided the reliability of each is higher than 33% (Sia targets 95% reliability for hosts, drives will typically have 98%+ reliability). As the confidence in our software grows, we plan on reducing to as far as a 30-of-50 scheme, which probabilistically is more reliable even than a 1-of-5 redundancy scheme for the same quality of hard-drive, despite being only 1.66x overhead instead of a 5x overhead.

Finally, Sia has all the benefits and more of a cloud. Sia is a very early platform, so not all of the features I’m about to mention are in place, but we expect they will all be there within the next 12 months. The architecture has been explicitly designed to allow for them, we just need more engineers and engineering time:

  • Data can be easily retrieved from any device.
  • Data is backed up across dozens of hosts on many different continents
  • Data can be recovered using only your wallet seed. You do not need to store the metadata (note: this feature is still a few months away. In the meantime, keep your metadata!)
  • Sia is decentralized such that only you are in control of the data. No corporation can deny you access to your files, no storage provider can spy on your files, you are not subject to weird terms of service, not subject to vendor lock-in, not subject to arbitrary changes in price.

5. I can buy tape for $5 / TB, what’s the value add for Sia?

Tape requires a lot of hardware, some expertise, can be difficult to work with, and has slow seek / random access times. It’s a completely different class of storage from what Sia currently offers.

That said, tape offerings are actually something that could be done over the Sia network. And I’m guessing that if tape was offered over the Sia network, you would see substantially lower prices for those offerings. Uploads to Sia would not even need to be all tape — you could do the first 1.25x redundancy as fast-storage, and the second 1.25x redundancy on tape as contingency storage. Or you could fully optimize for price and just put it all on tape. And if you are doing that, Sia has all the same advantages as listed in the previous question.

6. Other clouds such as S3 are far more secure, why would I take a risk with Sia?

I don’t believe that you can cleanly assert that clouds such as S3 are more secure. It is true that they have dedicated experts looking over the security of the system, which is something Sia hosts do not have. But it’s also true that they are much juicier targets. When all of the data is on a single system, it becomes much more worthwhile for a dedicated hacker to target that system in depth. The iCloud leaks are a good example of this. A massive treasure trove of illicit photos was distributed resulting from a single vulnerability.

With Sia, data is spread across a large number of nodes. At scale, we’ll be talking about thousands of nodes that are mostly dedicated exclusively to hosting data on Sia. Each host will certainly be more vulnerable than a massive datacenter like iCloud, however the payoff for hacking one will be a lot smaller. Is it easier to hack a single Apple cloud, or hundreds of diverse, widely distributed nodes which are each individually weaker?

It ultimately depends on how much weaker each individual node is. I am guessing that each host will in fact be pretty secure on average, because they will be earning revenue from the machine, will have collateral on the machine, and will be motivated to protect what they own.

7. I have a fire-proof vault at home, a sophisticated backup scheme, with expert-grade security on my system, and tapes in another country that I use to back up my data. It’s very cheap. Why should I switch to Sia?

If you are doing the above, you are probably a security expert and probably well above the competence of our general target market. If you are a business, you are probably paying a team of people to manage your backups and such, so don’t forget to include their salaries in the total-cost-of-ownership. Plus the overheads of needing to deal with things like headhunting + retirement.

The truth is that, with enough expertise, you absolutely can beat the Sia offering. But that requires you to either be an expert, or to hire an expert. And Sia proposes to be cheaper than hiring the expert. If you are a normal person or a normal business, you have other things that you should be focused on and worrying about. Sia hopes to offer you a complete solution that completely obliterates the TCO of other solutions in the space. If you run the numbers, I expect you’ll find that our price ($1/TB/Mo) is hard to beat.

8. I don’t understand why Sia is a cheaper option.

Sia’s fundamental advantage is the marketplace. It’s a simple marketplace where the only thing that matters is efficiency. The cloud storage market today requires a substantial amount of trust, marketing, and business overhead. If you want to start a cloud storage company that is selling data out of your garage, you won’t be able to. And it won’t matter if your garage has superlative security, price, speed, etc. etc., because nobody will trust you. Garages aren’t known for being competent data storage facilities. Enterprises want datacenters, and they don’t just want any datacenter, they want the reliable one, the known one. They want legal paperwork and SLAs and a guarantee that someone can be sued if they screw up. There’s a saying “nobody ever got fired for choosing IBM.” That’s because IBM has a name, a reputation, a massive business arm, and they will take care of you in many ways that extend beyond your data.

All of that is overhead. It’s expensive. Names take a long time to establish, and once they are established they can be cashed in through increased prices. In today’s world, running a cloud storage business requires massive overhead.

Sia changes all of that. The fundamental improvement is that we make it safe to trust some random garage to hold your data. And there are a number of things that we need to do to make this reasonable and secure where it was not previously reasonable.

  • Sia uses wide redundancy. A 10-of-30 scheme is probabilistically far superior to a 1-of-3 or even a 1-of-6 security scheme. You get massive reliability benefits with little overhead. With this redundancy scheme, we no longer need datacenter’s with 99.99999999% reliability. Getting that much reliability in a traditional setting is very expensive. Sia however works extremely well if hosts can reach just 95% reliability. This is something you can measure on your own, you don’t need to perform security audits to see that of your 30 hosts, less than 1 is disappearing every day.
  • Sia uses strong client-side encryption. Not only is the data being distributed to 30 different hosts, but each data piece is being encrypted using a separate encryption key, and the encryption all happens before the data ever leaves the machine. Having your nude photos exposed to the public because of a single database hack is a thing of the past. Nobody can see your data, nobody can deduplicate your data, and we distribute the data such that the metadata can’t be used to identify specific files. Your data is kept private.
  • Sia uses blockchain contracts which require hosts to supply collateral. When you pay a host to store your data, that payment is conditional on the host actually storing the data. But Sia goes one step further and requires the host to also make a payment that they lose if they lose your data. The host can’t just walk away, if they do they will incur penalties far beyond merely losing your business or your revenue. They have a monetary stake in protecting your data. The renter makes the payment up front, and the host puts in collateral up front, and then the host gets all of the money if and only if they perform the storage proof after the designated amount of time (default 12 weeks) has passed.

So we’ve now made it reasonable to trust a random nobody with your data. Because you aren’t merely trusting them, you are trusting that out of 30 nobodies, at least 10 will still be there when you need the file. And you know that if any of the hosts drop out, the ones that drop out will be paying huge financial penalties. And as they do drop off, it’s easy for you to form new contracts and restore the lost redundancy.

This means that hosts and datacenters can sell storage over the Sia network without having a brand name. They don’t need to be a trusted or an audited entity because the specific requirements over their reliability are a lot lower. You don’t need legal SLAs because the blockchain file contracts handle that for you. And you don’t need a marketing arm or business development arm because you are able to easily announce yourself as a host and then compete over an open marketplace. Computers, not humans, are deciding who the best hosts are, and that means that it’s much easier to consider a large number of highly diverse offerings.

Finally, this open market place means that there is absolutely brutal competition. If you aren’t bringing something special to the table, you won’t be able to compete because the competitive hosts are all bringing something special to the table. And as the overall cost of storage gets cheaper, the marketplace automatically, instantly, and mercilessly follows the decline in price, because the new hosts that are able to offer storage at the lower prices will swallow the old hosts that are unable to keep up.

9. Why do you need your own blockchain? Why can’t you use something like the 21.co marketplace?

There are multiple reasons why a separate blockchain makes sense. But the biggest is that Sia’s file contract is very efficient and simple, and it can’t be replicated on the bitcoin blockchain. Sia’s file contracts lock up money (including collateral from the host) for long periods of time, and release the money contingent on the host providing a proof-of-storage. I do not know of another platform with a similar model.

Sia’s file contract has a host prove that they are still storing a file. After the storage proof is provided, the funds are released. The file contract itself contains a Merkle root of the data. The host proves that they are still storing the full data by providing a single 64 byte segment selected randomly from the file and writing a Merkle proof that this 64 byte segment is in fact part of the file.

The random segment is selected by the blockchain, and is derived from the hash of the block immediately preceding the first block where the host is allowed to submit the storage proof. Bitcoin currently has no way for the script to specify the block hash, and even if it did converting the block hash into a random segment indexed from a finitely-sized file is going to be a complex script. And then performing verification of the Merkle proof is another thing that’s not very easily done in Bitcoin script. Maybe we could have done something in Rootstock, except that Sia exists and is usable today, and Rootstock is not.

But there are other issues as well. If we put Sia on the Bitcoin blockchain, we have to compete with the whole bitcoin blockchain for space. Bitcoin fees today are something like $0.20 per transaction, and between the formation of 30 file contracts and the submission of 30 storage proofs for those file contracts (storage proofs being closer to 1kb each for reasonably sized files) we’re talking $10-$20 just to get set up. Sia’s traffic is exclusive to Sia, which means that we don’t have to deal with those levels of fees until we’ve already bootstrapped the network.

In general by using our own blockchain we were able to add a lot of customizations that fit our needs specifically. We don’t particularly subscribe to the idea that there can only be one true blockchain, and we think that the scalability is going to make more sense in the long run if you have different blockchains serving different needs. This of course comes with security tradeoffs, but those are discussed in question 1.

--

--

Decentralized cloud storage. Store your data securely in the cloud without the need to trust any central service. Download at: https://sia.tech/