Sunday, May 13, 2012

OpenStack Swift vs Gluster

As I am trying to get my head around OpenStack Swift storage I need to compare this to something we already know. We have been using GlusterFS for years in our shop and are reasonably happy with it for data that does not require high performance disk and high uptime. Gluster sounds like a simple solution but its codebase has grown over the years and it has not been free of bugs. As of 2012 it is really quite stable.

Let's look at the 2 codebases:

git clone https://github.com/gluster/glusterfs.git
git clone https://github.com/openstack/swift.git

>du -h --summarize glusterfs/
44M     glusterfs/
>du -h --summarize swift/
15M     swift

Well, gluster is 3 times the size, let's take a more detailed look at the code:

>cloc --by-file-by-lang glusterfs/


---------------------------------------------------
Language         files       comment           code
---------------------------------------------------
C                  272         14179         256462
C/C++ Header       214          5289          23208
XML                 24             2           6544
Python              25          1836           5114
m4                   3            85           1447
Bourne Shell        34           359           1419
Java                 7           168            988
make               107            36            965
yacc                 1            15            468
Lisp                 2            59            124
vim script           1            49             89
lex                  1            15             64
---------------------------------------------------
SUM:               691         22092         296892
---------------------------------------------------


>cloc --by-file-by-lang swift/


---------------------------------------------------
Language         files       comment           code
---------------------------------------------------
Python             101          6137          32575
CSS                  3            59            627
Bourne Shell         8           138            251
HTML                 2             0             82
Bourne Again Shell   3             0             23
---------------------------------------------------
SUM:               117          6334          33558
---------------------------------------------------

Hm, gluster has 8 times more lines of C code (SLOC) than swift has python code. I'm not in the position to compare python with C (other than stating that as of 2012 they seem to be similarly popular) but if we simply assumed that the numbers of errors per lines of code is similar swift may at some point have a stability advantage over gluster. Gluster has been developed for many years and it took a long time to come along. Swift is only been around for 2 years and some really big shops seem to be betting on it. Of course this is somewhat an apples to oranges comparison because Gluster is accessible as posix file system and object store and also has it's own protocol stack (NFS/glusterfs) while Swift just uses HTTP. Also performance considerations are not discussed here.
As a comparison, the Linux kernel has roughly 25 million lines of code and a tool like GNU make has about 33000 lines of code. Make is not a very complex piece of software. Is OpenStack swift?





Saturday, May 12, 2012

Starting to research OpenStack Swift

As we are always looking at lowering our storage costs while still trying to manage petabytes of storage we heard about "object storage" for a few years. This Buzzword sounds a bit like a bad disease to a traditional Linux/Unix heavy Scientific Computing shop. It sounds like something that could break in all sorts of ways and would have unbearable latency etc.


On the other hand we see almost every day that storage and other IT vendors are jumping on the object and cloud storage bandwagon. Is it all just cloud hype or is there something more to it? One platform that sticks out particularly is OpenStack after more than a dozen companies (AT&T, IBM, 
Red Hat, SUSE, Cisco, Dell, Canonical, etc) have pledged to support the OpenStack foundation. OpenStack was created by Rackspace and NASA (here is the story behind it) and the storage component Swift was originally developed at Rackspace. As we are most interested in storage, Swift is the thing we are looking at. 
Now, is this really a OSS project with broad support and many contributors? Until today Rackspace appears to be doing most of the real work, but there is a fair number of other big names who are also contributing code.


We work quite a bit with Dell hardware and it is nice to see that they have created a nice deployment solution called Crowbar that uses an OSS DevOps approach to push openstack to their servers. Their cloud dude seems to be a bit of an OpenStack enthusiast. But there are also a few startups that are betting on OpenStack Swift, such as SwiftStack.com who sells you a customized Ubuntu Image with a web management tool that lets you deploy a Swift storage cluster in a few minutes. The SwiftStack people are core contributors to the OpenStack swift project so they know the code base very well.
How about end user adoption in Universities and other research places? The San Diego Super Computing Center has brought their OpenStack storage cloud online last year and is offering pretty reasonable pricing (about 1/3 of the price of S3).
Why are all these large companies joining OpenStack? Well, of course they all are way behind Amazon EC2/S3 and joining forces can either be seen as a good strategy or as a desperate attempt to catch up. 

From a storage technology perspective there are may be 3 reasons for this push that come to my mind. First, it takes a very long time to develop a storage platform. For BlueArc, 3PAR, Compellent, Isilon, etc it took almost 10 years to convince many IT managers that those were viable options. HP and Dell needed to suck up one of those manufacturers to get the know how. Second, customers are increasing vary of vendor lock in and lack of scalability because big data capacity and especially performance needs are very  unpredictable. And third, traditional storage techniques such as RAID will not be viable
in the future and alternatives (examples are gpfs, panassas but also 3PAR with it's chunklet stuff) take a very long time to develop (again, see first point).

But why does OpenStack seem to have more followers than CloudStack, Eucalyptus or others? It is extremely scalable but I could not (yet) find any strong hints that it is more scalable than other stacks.
From a developer and system integrator view the OpenStack trump card seems to be modularity which is important for keeping up development speed and for allowing a large community of developers to participate. 

What strikes me from a systems management perspective is the simplicity of the underlying toolset. Every Unix admin is familiar with Python, Sqlite, Rsync and Linux/XFS. At first you might think: What, that's what they are using? After all, rsync is more than 15 years old and this is the tool that is supposed to help conquering the storage world in the 21st century?
Then you think: Oh if our sysadmins ever have to do a root cause analysis on performance issues they already know rsync and if they ever have to throttle the replication engine they already know what --bwlimit is. That does not sound too bad....but we will have to take a deeper look at this ..... to be continued.




Random Links & Blogs:
http://programmerthoughts.com/openstack/swift-tech-overview/
http://searchstorage.techtarget.com/news/2240105808/Caringo-CAStor-integrates-object-storage-with-OpenStack-Swift
http://www.slideshare.net/HuiCheng2/integrating-open-stack
http://www.buildcloudstorage.com/
http://www.cloudconnectevent.com/santaclara/2012/presentations/free/99-john-dickinson.pdf
http://www.buildcloudstorage.com/2012/01/can-openstack-swift-hit-amazon-s3-like.html

Consultants:
http://www.talkincloud.com/it-consultants-build-openstack-cloud-business-practices/
http://www.griddynamics.com/ or http://openstackgd.wordpress.com/