RegisterSign In
By paulicka on Jan 1, 2012 12:05 AM.
Sharding...
You mention in another post:
4) Partitioning/Sharding is setup by creating a new database through the interface. A database shard has its own consensus function therefore it is one way to scale out if the consensus function is a bottleneck. What's nice about MckoiDDB's shards is that you can freely share as much data as you want between shards with no consequences. For example, making a shard that becomes an independent branch of another working dataset is no problem and can be set up instantly regardless of the dataset size. Or if, for example, you need to move a very large index from one shard to another you only need to link to the data in your new shard which is a simple meta-data operation rather than copying the data byte for byte (uses exactly the same function to copy information I described in answer 2).

Could you elaborate a bit?
Do you think of a path as a shard, then?

I tend to think of writing a row to a table as appending to one of the blocks defining the table.
Sharding then is if multiple blocks may be associated with a table, so a row write may got to one of many blocks.

Are you suggesting writing a consensus function for a path that actually, for example, writes round-robin to a set of other paths?

"For example, making a shard that becomes an independent branch of another working dataset is no problem and can be set up instantly regardless of the dataset size."
Wow. I just reread that, and realized how powerful that could be.

Could you elaborate, perhaps even give an example? That seems extremely powerful if you could in effect expose the transaction layering for use by the end user.


By Tobias Downer (toby) on Jan 2, 2012 5:41 PM.
Just to clear up on the terminology - in MckoiDDB a Path is a data partition and a data model (the consensus function). Sharding is the process of splitting data across multiple partitions, sacrificing consistency for better concurrency.

Sharding works a bit differently in MckoiDDB than other databases. For example, lets say I have an application that runs on a single MySQL database server. The application becomes really popular and the MySQL server can't cope so the administrator installs 3 other MySQL database servers and during a maintenance and application upgrade window, divides her data into 4 parts and transitions the application to the new topology. This process is very complex and involves provisioning/testing new hardware, and if there's a lot of data it can take a long time just to physically copy all the data over the new servers alone.

In MckoiDDB, you are using a system that is distributed by default, however, partitions in the data are still important considerations because consistency is still desirable. In MckoiDDB, data partitions are logical - you do not have to install a new MckoiDDB instance to make a new partition, you simply go into the console and create a new Path. The data and partition systems are separate, which means 'copying' data from one partition to another is a logical operation involving changing meta data rather than physically copying data between servers, which makes the process very fast and provides a number of other advantages.

In MckoiDDB, when you need to partition your data you create the desirable number of Paths, then update each with the data you want in it. If all you want to do is branch the data and create exact copies of the original set across the new partitions, you can perform such an operation in seconds regardless of the size of the data set because you only need to link each Path with the root node. Each Path from that point forward can then be changed independently of the others. Another really nice advantage of logical partitions is that it makes it practical to test your 1 to many partition migration without disrupting your online version, or you may be able to run the versions side-by-side and migrate gradually. Also it makes it practical to implement automatic dynamic partitioning if that's what you want to do with your data model/application.
Please sign in or register to post in this topic.
The text on this page is licensed under the Creative Commons Attribution 3.0 License. Java is a registered trademark of Oracle and/or its affiliates.
Mckoi is Copyright © 2000 - 2017 Diehl and Associates, Inc. All rights reserved.