Tuesday, June 5, 2012

Dynarr256 for DBACC -or- The death of MAX_ROWS

Back in 2006 we became aware of problems storing large numbers of rows in a single table in cluster. Johan Andersson and Yves Trudeau have each blogged about the problem and the common workaround here and here.  We've since then done some cleanup to provide a more proper "Table is full" error message when running into this problem.

As explained in the referenced blog posts, the problem is the result of a limitation on the size of the hash index of each partition. The hash index for each partition would allow at most ~49 million records. By default an ndbd or ndbmtd node have only 1 local query handler (LQH) block and thus 1 partition per node.  The ndbmtd nodes having MaxNoOfExecutionThreads = 4 or 8 cluster will have 2 or 4 LQH per node respectively. So, the point at which you may encounter the "Table is full" error will vary depending upon the size of your configuration.  The recommended workaround was to instruct the NDBCLUSTER engine to create more partitions than the default by adding a MAX_ROWS clause to your ALTER or CREATE TABLE statements. However, the maximum accepted value for MAX_ROWS is limited to 4294967295 by mysqld. This limit causes the maximum number of partitions allowed by this hint to be at or around 88.

mysql> create table t1 (a bigint unsigned primary key ) engine=NDBCLUSTER MAX_ROWS=900000000000;
Query OK, 0 rows affected, 1 warning (1.06 sec)


mysql> show warnings\G
*************************** 1. row ***************************
Level: Warning
Code: 1105
Message: Ndb might have problems storing the max amount of rows specified
1 row in set (0.00 sec)

mysql> show create table t1\G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`a` bigint(20) unsigned NOT NULL,
PRIMARY KEY (`a`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1 MAX_ROWS=4294967295
1 row in set (0.00 sec)


Now, it is a little know fact that there are a total of 8 partitions allowed in each LQH. These 8 slots are divided by NoOfReplicas to give the number of "primary" partitions that can be defined. The maximum number of partitions that could be defined for a given table is calculated as:

#Nodes * #LQH per node * 8 / NoOfReplicas

In a two node cluster, with NoOfReplicas=2 and MaxNoOfExecutionThreads=4 (2 LQH per node), this MAX_ROWS hint would results in a total of 16 partitions allowing 784 million rows to be inserted.  But in a larger six node cluster, for example, with NoOfReplicas=2 and MaxNoOfExecutionThreads=8 (4 LQH per node) cluster should allow a maximum of 6*4*8/2 = 96 partitions, or ~4.7Bn rows total.  But with the MAX_ROWS hint cluster would only create 88 partitions, allowing ~4.3Bn rows per table.

In those rare case where the number of partitions created by the MAX_ROWS hint is not sufficient for the number of  rows you will need to add, you can calculate the maximum number of partitions capable for your configuration and explicitly specify that number using:

ALTER|CREATE TABLE... PARTITION BY KEY () PARTITIONS N;

The empty "BY KEY ()" definition defaults to using the primary key as the partitioning key. The trade off to this method is that when you scale up the cluster you could no longer use ALTER... REORGANIZE PARTITION to redistribute rows onto those newly added nodes in an online fashion.  The MAX_ROWS method adds no such limitation to REORGANIZE PARTITION.

*** CORRECTION ***

"The reports of my death are greatly exaggerated" -- Maximus R. Ows

Now, in version 7.2.6 there is no longer any limitation on the number of rows that may be created in a given partition and for most use cases it should no longer be necessary to use the MAX_ROWS hint or perform explicit key based partitioning to avoid the "Table is full" errors.  Due to a bug discovered at the last minute the fix to remove the hash index size limitation which was intended for 7.2.6 was not actually pushed.  :'-(  http://forums.mysql.com/read.php?25,544882,544882#msg-544882
 
There is still one limitation left which may still require these MAX_ROWS / PARTITION BY KEY workarounds. The maximum size of the fixed-length portion of any partition is maxed at 16 GB.  This limitation is being addressed and will be removed in a future version.

Also at version 7.2.5 the upper limit of MaxNoOfExecutionThreads was raised from 8 to 44, and the number of allowed LQH was raised from 4 to 16 per node.  See also:  http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbmtd-threadconfig for how to configure greater than 4 LQH.  The documentation for this change is still a bit incomplete at the moment but I am working to get that cleared up.

No comments: