Friday, January 4, 2013

Partial partitioning and sharding

I came across this: http://stackoverflow.com/questions/14136633/difference-between-partial-replication-and-sharding
I was wondering if sharding is an alternate name for partial replication or not. What I have figured out that --
  • Partial Repl. – each data item has only copies at some but not all of the nodes (‘Sharding’?)
  • Pure Partial Repl. – has only copies of a subset of the data item but no node contains a full copy of the database
  • Hybrid Partial Repl. – a set of nodes are full replicas and another set of nodes are partial replicas

I thought it was a good topic, I write a really nice answer, but there was a problem when I pressed "post this answer", probably some error or mistake on their side. Anyway - this is what I have to say about partial partitioning and sharding:

Partial replication is an interesting way, in which you distribute the data with replication from a master to slaves, each contains a portion of the data. Eventually you get an array of smaller DBs, read only, each contains a portion of the data. Reads can very well be distributed and parallelized. 

But what about the writes? 

Those are still clogged, in 1 big fat lazy master database, tasks as buffer management, locking, thread locks/semaphores, and recovery tasks - are the real bottleneck of the OLTP, they make writes impossible to scale... As I wrote in many previous posts, for example here: http://database-scalability.blogspot.com/2012/08/scale-up-partitioning-scale-out.html.

Sharding is where every piece of data lives only in one place, within an array of DBs. Each database is the complete owner of the data: data is read from there, data is written to there. This way, reads and writes are distributed and parallelized, real scale-out can be achieved. 

The idea behind sharding is great, the ultimate scaling solution (ask Facebook, Google, Twitter and all the other big guys) but it's a mess to handle, to maintain. It's hard as hell if done by yourself, ScaleBase enables an automatic transparent scale-out machine - that does all that, so you won't have too...



1 comment:

  1. Excellent article, Doron. This is certainly not one of the easiest tasks. Our company recently integrated DCIM software to help with our partioning and data center infrastructure management. I'd highly recommend taking a look at this great solution!

    - Jackie O.

    ReplyDelete