Since AWS Glacier hit the road there are some interesting discussions and blogs on comparing costs between Glacier and local solutions such as OpenStack Swift. Glacier is hard to beat if you do TCO calculations that include everything like datacenter, power, cooling & staff. For many of us these costs vary a lot dependent on things like being in a fortune 500 or in a government funded agency or residing in a location with low power and cooling costs vs LA or NYC. Some of us even have the notion of sunk costs .....
If we just look at the plain storage hardware we know for example that we can get 36 drive standard Supermicro storage servers for less than $5k and we have seen the latest and greatest 4TB Hitachi Deskstar for $239 on Google Shopping. The Hitachi Deskstar model seems to have an excellent reputation and folks who know what they are doing recommend it as well. (albeit the older 3TB version).
So we seem to be getting 144TB RAW which might roughly translate to 130TiB usable in this box and it costs ($5000+36*$239)/130TiB = $105-$115/TiB dependent on your sales tax... let's say $110/TiB. Swift needs 2-3 replicas so your actual costs would end up at $330/TB or $66/TB/Y if we assume that the whole system will run for 5 years. That's not too bad compared to Glacier which runs minimally at $120/TB/Y.
If swift sounds compelling to you, you still have to operate and support it but you can actually get tech support from a number of vendors such as www.swiftstack.com .
Amar here has another idea which I find intriguing. LTFS allows you to mount each tape drive (up to 3TB capacity each) into an individual folder on your Linux box. Just using LTFS is probably painful since you may have hundreds of small 3TB storage buckets ......but if there was a way to use Swift with LTFS this could possibly push down storage costs to under $20/TB/Y. I'd like to learn more about this.
Wednesday, August 29, 2012
Sunday, May 13, 2012
OpenStack Swift vs Gluster
As I am trying to get my head around OpenStack Swift storage I need to compare this to something we already know. We have been using GlusterFS for years in our shop and are reasonably happy with it for data that does not require high performance disk and high uptime. Gluster sounds like a simple solution but its codebase has grown over the years and it has not been free of bugs. As of 2012 it is really quite stable.
---------------------------------------------------
Language files comment code
---------------------------------------------------
C 272 14179 256462
C/C++ Header 214 5289 23208
XML 24 2 6544
Python 25 1836 5114
m4 3 85 1447
Bourne Shell 34 359 1419
Java 7 168 988
make 107 36 965
yacc 1 15 468
Lisp 2 59 124
vim script 1 49 89
lex 1 15 64
---------------------------------------------------
SUM: 691 22092 296892
---------------------------------------------------
---------------------------------------------------
Language files comment code
---------------------------------------------------
Python 101 6137 32575
CSS 3 59 627
Bourne Shell 8 138 251
HTML 2 0 82
Bourne Again Shell 3 0 23
---------------------------------------------------
SUM: 117 6334 33558
---------------------------------------------------
Let's look at the 2 codebases:
git clone https://github.com/gluster/glusterfs.git
git clone https://github.com/openstack/swift.git
>du -h --summarize glusterfs/
44M glusterfs/
>du -h --summarize swift/
15M swift
Well, gluster is 3 times the size, let's take a more detailed look at the code:
>cloc --by-file-by-lang glusterfs/
---------------------------------------------------
Language files comment code
---------------------------------------------------
C 272 14179 256462
C/C++ Header 214 5289 23208
XML 24 2 6544
Python 25 1836 5114
m4 3 85 1447
Bourne Shell 34 359 1419
Java 7 168 988
make 107 36 965
yacc 1 15 468
Lisp 2 59 124
vim script 1 49 89
lex 1 15 64
---------------------------------------------------
SUM: 691 22092 296892
---------------------------------------------------
>cloc --by-file-by-lang swift/
---------------------------------------------------
Language files comment code
---------------------------------------------------
Python 101 6137 32575
CSS 3 59 627
Bourne Shell 8 138 251
HTML 2 0 82
Bourne Again Shell 3 0 23
---------------------------------------------------
SUM: 117 6334 33558
---------------------------------------------------
Hm, gluster has 8 times more lines of C code (SLOC) than swift has python code. I'm not in the position to compare python with C (other than stating that as of 2012 they seem to be similarly popular) but if we simply assumed that the numbers of errors per lines of code is similar swift may at some point have a stability advantage over gluster. Gluster has been developed for many years and it took a long time to come along. Swift is only been around for 2 years and some really big shops seem to be betting on it. Of course this is somewhat an apples to oranges comparison because Gluster is accessible as posix file system and object store and also has it's own protocol stack (NFS/glusterfs) while Swift just uses HTTP. Also performance considerations are not discussed here.
As a comparison, the Linux kernel has roughly 25 million lines of code and a tool like GNU make has about 33000 lines of code. Make is not a very complex piece of software. Is OpenStack swift?
As a comparison, the Linux kernel has roughly 25 million lines of code and a tool like GNU make has about 33000 lines of code. Make is not a very complex piece of software. Is OpenStack swift?
Saturday, May 12, 2012
Starting to research OpenStack Swift
As we are always looking at lowering our storage costs while still trying to manage petabytes of storage we heard about "object storage" for a few years. This Buzzword sounds a bit like a bad disease to a traditional Linux/Unix heavy Scientific Computing shop. It sounds like something that could break in all sorts of ways and would have unbearable latency etc.
On the other hand we see almost every day that storage and other IT vendors are jumping on the object and cloud storage bandwagon. Is it all just cloud hype or is there something more to it? One platform that sticks out particularly is OpenStack after more than a dozen companies (AT&T, IBM,
Red Hat, SUSE, Cisco, Dell, Canonical, etc) have pledged to support the OpenStack foundation. OpenStack was created by Rackspace and NASA (here is the story behind it) and the storage component Swift was originally developed at Rackspace. As we are most interested in storage, Swift is the thing we are looking at.
Now, is this really a OSS project with broad support and many contributors? Until today Rackspace appears to be doing most of the real work, but there is a fair number of other big names who are also contributing code.
We work quite a bit with Dell hardware and it is nice to see that they have created a nice deployment solution called Crowbar that uses an OSS DevOps approach to push openstack to their servers. Their cloud dude seems to be a bit of an OpenStack enthusiast. But there are also a few startups that are betting on OpenStack Swift, such as SwiftStack.com who sells you a customized Ubuntu Image with a web management tool that lets you deploy a Swift storage cluster in a few minutes. The SwiftStack people are core contributors to the OpenStack swift project so they know the code base very well.
How about end user adoption in Universities and other research places? The San Diego Super Computing Center has brought their OpenStack storage cloud online last year and is offering pretty reasonable pricing (about 1/3 of the price of S3).
Why are all these large companies joining OpenStack? Well, of course they all are way behind Amazon EC2/S3 and joining forces can either be seen as a good strategy or as a desperate attempt to catch up.
From a storage technology perspective there are may be 3 reasons for this push that come to my mind. First, it takes a very long time to develop a storage platform. For BlueArc, 3PAR, Compellent, Isilon, etc it took almost 10 years to convince many IT managers that those were viable options. HP and Dell needed to suck up one of those manufacturers to get the know how. Second, customers are increasing vary of vendor lock in and lack of scalability because big data capacity and especially performance needs are very unpredictable. And third, traditional storage techniques such as RAID will not be viable in the future and alternatives (examples are gpfs, panassas but also 3PAR with it's chunklet stuff) take a very long time to develop (again, see first point).
But why does OpenStack seem to have more followers than CloudStack, Eucalyptus or others? It is extremely scalable but I could not (yet) find any strong hints that it is more scalable than other stacks.
From a developer and system integrator view the OpenStack trump card seems to be modularity which is important for keeping up development speed and for allowing a large community of developers to participate.
What strikes me from a systems management perspective is the simplicity of the underlying toolset. Every Unix admin is familiar with Python, Sqlite, Rsync and Linux/XFS. At first you might think: What, that's what they are using? After all, rsync is more than 15 years old and this is the tool that is supposed to help conquering the storage world in the 21st century?
Then you think: Oh if our sysadmins ever have to do a root cause analysis on performance issues they already know rsync and if they ever have to throttle the replication engine they already know what --bwlimit is. That does not sound too bad....but we will have to take a deeper look at this ..... to be continued.
Random Links & Blogs:
http://programmerthoughts.com/openstack/swift-tech-overview/
http://searchstorage.techtarget.com/news/2240105808/Caringo-CAStor-integrates-object-storage-with-OpenStack-Swift
http://www.slideshare.net/HuiCheng2/integrating-open-stack
http://www.buildcloudstorage.com/
http://www.cloudconnectevent.com/santaclara/2012/presentations/free/99-john-dickinson.pdf
http://www.buildcloudstorage.com/2012/01/can-openstack-swift-hit-amazon-s3-like.html
Consultants:
http://www.talkincloud.com/it-consultants-build-openstack-cloud-business-practices/
http://www.griddynamics.com/ or http://openstackgd.wordpress.com/
On the other hand we see almost every day that storage and other IT vendors are jumping on the object and cloud storage bandwagon. Is it all just cloud hype or is there something more to it? One platform that sticks out particularly is OpenStack after more than a dozen companies (AT&T, IBM,
Red Hat, SUSE, Cisco, Dell, Canonical, etc) have pledged to support the OpenStack foundation. OpenStack was created by Rackspace and NASA (here is the story behind it) and the storage component Swift was originally developed at Rackspace. As we are most interested in storage, Swift is the thing we are looking at.
Now, is this really a OSS project with broad support and many contributors? Until today Rackspace appears to be doing most of the real work, but there is a fair number of other big names who are also contributing code.
We work quite a bit with Dell hardware and it is nice to see that they have created a nice deployment solution called Crowbar that uses an OSS DevOps approach to push openstack to their servers. Their cloud dude seems to be a bit of an OpenStack enthusiast. But there are also a few startups that are betting on OpenStack Swift, such as SwiftStack.com who sells you a customized Ubuntu Image with a web management tool that lets you deploy a Swift storage cluster in a few minutes. The SwiftStack people are core contributors to the OpenStack swift project so they know the code base very well.
How about end user adoption in Universities and other research places? The San Diego Super Computing Center has brought their OpenStack storage cloud online last year and is offering pretty reasonable pricing (about 1/3 of the price of S3).
Why are all these large companies joining OpenStack? Well, of course they all are way behind Amazon EC2/S3 and joining forces can either be seen as a good strategy or as a desperate attempt to catch up.
From a storage technology perspective there are may be 3 reasons for this push that come to my mind. First, it takes a very long time to develop a storage platform. For BlueArc, 3PAR, Compellent, Isilon, etc it took almost 10 years to convince many IT managers that those were viable options. HP and Dell needed to suck up one of those manufacturers to get the know how. Second, customers are increasing vary of vendor lock in and lack of scalability because big data capacity and especially performance needs are very unpredictable. And third, traditional storage techniques such as RAID will not be viable in the future and alternatives (examples are gpfs, panassas but also 3PAR with it's chunklet stuff) take a very long time to develop (again, see first point).
But why does OpenStack seem to have more followers than CloudStack, Eucalyptus or others? It is extremely scalable but I could not (yet) find any strong hints that it is more scalable than other stacks.
From a developer and system integrator view the OpenStack trump card seems to be modularity which is important for keeping up development speed and for allowing a large community of developers to participate.
What strikes me from a systems management perspective is the simplicity of the underlying toolset. Every Unix admin is familiar with Python, Sqlite, Rsync and Linux/XFS. At first you might think: What, that's what they are using? After all, rsync is more than 15 years old and this is the tool that is supposed to help conquering the storage world in the 21st century?
Then you think: Oh if our sysadmins ever have to do a root cause analysis on performance issues they already know rsync and if they ever have to throttle the replication engine they already know what --bwlimit is. That does not sound too bad....but we will have to take a deeper look at this ..... to be continued.
Random Links & Blogs:
http://programmerthoughts.com/openstack/swift-tech-overview/
http://searchstorage.techtarget.com/news/2240105808/Caringo-CAStor-integrates-object-storage-with-OpenStack-Swift
http://www.slideshare.net/HuiCheng2/integrating-open-stack
http://www.buildcloudstorage.com/
http://www.cloudconnectevent.com/santaclara/2012/presentations/free/99-john-dickinson.pdf
http://www.buildcloudstorage.com/2012/01/can-openstack-swift-hit-amazon-s3-like.html
Consultants:
http://www.talkincloud.com/it-consultants-build-openstack-cloud-business-practices/
http://www.griddynamics.com/ or http://openstackgd.wordpress.com/
Subscribe to:
Posts (Atom)