Sunday, May 13, 2012

OpenStack Swift vs Gluster

As I am trying to get my head around OpenStack Swift storage I need to compare this to something we already know. We have been using GlusterFS for years in our shop and are reasonably happy with it for data that does not require high performance disk and high uptime. Gluster sounds like a simple solution but its codebase has grown over the years and it has not been free of bugs. As of 2012 it is really quite stable.

Let's look at the 2 codebases:

git clone https://github.com/gluster/glusterfs.git
git clone https://github.com/openstack/swift.git

>du -h --summarize glusterfs/
44M     glusterfs/
>du -h --summarize swift/
15M     swift

Well, gluster is 3 times the size, let's take a more detailed look at the code:

>cloc --by-file-by-lang glusterfs/


---------------------------------------------------
Language         files       comment           code
---------------------------------------------------
C                  272         14179         256462
C/C++ Header       214          5289          23208
XML                 24             2           6544
Python              25          1836           5114
m4                   3            85           1447
Bourne Shell        34           359           1419
Java                 7           168            988
make               107            36            965
yacc                 1            15            468
Lisp                 2            59            124
vim script           1            49             89
lex                  1            15             64
---------------------------------------------------
SUM:               691         22092         296892
---------------------------------------------------


>cloc --by-file-by-lang swift/


---------------------------------------------------
Language         files       comment           code
---------------------------------------------------
Python             101          6137          32575
CSS                  3            59            627
Bourne Shell         8           138            251
HTML                 2             0             82
Bourne Again Shell   3             0             23
---------------------------------------------------
SUM:               117          6334          33558
---------------------------------------------------

Hm, gluster has 8 times more lines of C code (SLOC) than swift has python code. I'm not in the position to compare python with C (other than stating that as of 2012 they seem to be similarly popular) but if we simply assumed that the numbers of errors per lines of code is similar swift may at some point have a stability advantage over gluster. Gluster has been developed for many years and it took a long time to come along. Swift is only been around for 2 years and some really big shops seem to be betting on it. Of course this is somewhat an apples to oranges comparison because Gluster is accessible as posix file system and object store and also has it's own protocol stack (NFS/glusterfs) while Swift just uses HTTP. Also performance considerations are not discussed here.
As a comparison, the Linux kernel has roughly 25 million lines of code and a tool like GNU make has about 33000 lines of code. Make is not a very complex piece of software. Is OpenStack swift?





2 comments:

Dave Neary said...

I found the post when looking for "Gluster Swift" - I noticed a logical fallacy in your argument - "since Swift has less code than Gluster, then if they have the same number of defects per line..."

The issue is that you're assuming the C code in the Python interpreter has zero defects per line (so that the only defects in Python code come from the Python programmer). That *may* be true, but I don't think so.

You might ask how many lines of C code are required in Cpython to interpret a single line of Python - in which case I think the Python program would actually be longer than the C program.

That assumes that the defect rate in the Python interpreter (which is well reviewed and tested, and widely used) is the same as in Gluster (which has probably had less peer review, and is certainly a younger project) - that would not seem to be a fair comparison either.

Suffice it to say that I do not believe that "defects per line of code" is a metric which can be easily applied to the comparison of C and Python code.

Dave.

pefu said...

Dave wrote:
The issue is that you're assuming the C code in the Python interpreter has zero defects per line (so that the only defects in Python code come from the Python programmer). That *may* be true, but I don't think so.

Well: Python 2.7 is in feature freeze for some time now (new development happens in Python3). So the number of remaining bugs in Python 2.7 is probably very low (compared to remaining bugs in the C-Compiler which in turn could also effect Python of course...)