Friday, May 28, 2010

NDB ENGINE LIMITATIONS (rebuttal)

Anirudh Tamsekar made a post yesterday that laid out a few of the limitations of MySQL Cluster that seem to cause him the most pain. However his assessment of the situation is quite misleading. A few of his statements are inaccurate but more than half of the limitations he cites to are out right false. Since comments on Anirudh's blog are being moderated, I give my rebuttal here, and cite sources.

· Database names, table names, and attribute names cannot be as long in NDB tables as with other table handlers. In NDB, attribute names are truncated to 31 characters, and if they are not unique after truncation, errors occur. Database names and table names can total a maximum of 122 characters

False: "Identifiers. Formerly (in MySQL 5.0 and earlier), database names, table names and attribute names could not be as long for NDB tables as tables using other storage engines, because attribute names were truncated internally. In MySQL 5.1 and later, names of MySQL Cluster databases, tables, and table columns follow the same rules regarding length as they do for any other storage engine."
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

· NDB does not support prefix indexes; only entire fields can be indexed.

True

· A big limitation is that in MySQL 4.1 and 5.0, all cluster table rows are of fixed length. This means, for example, that if a table has one or more VARCHAR fields containing only relatively small values, more memory and disk space will be required when using the NDB storage engine than would be for the same table and data using the MyISAM engine. This issue is on the “to-fix” list for MySQL Cluster 5.1.

False: As of 5.1 (which has been GA for over a year and a half) this limitation applies only to on-disk columns.

"* Variable-length column support. The NDBCLUSTER storage engine now supports variable-length column types for in-memory tables.
Previously, for example, any Cluster table having one or more VARCHAR fields which contained only relatively small values, much more memory and disk space were required when using the NDBCLUSTER storage engine than would have been the case for the same table and data using the MyISAM engine. In other words, in the case of a VARCHAR column, such a column required the same amount of storage as a CHAR column of the same size. In MySQL 5.1, this is no longer the case for in-memory tables, where storage requirements for variable-length column types such as VARCHAR and BINARY are comparable to those for these column types when used in MyISAM tables (see Section 10.5, “Data Type Storage Requirements”). "
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

· In NDB, the maximum number of metadata objects is limited to 20,000, including database tables, system tables, indexes, and BLOBs (binary large objects). This is a hard-coded limit that you cannot override with a configuration option.

True: However the actual number is 20320.

· The maximum permitted size of any one row in NDB is 8KB, not including data stored in BLOB columns (which are actually stored in a separate table internally).

True: However, with version 7.2 we have increased the variable length part of a row to support up to 14KB, with the fixed portion still limited to 8KB

There is actually a constant you can modify at compile time to increase the max row length (as number of 4 byte words).

Current values are:
./storage/ndb/include/kernel/ndb_limits.h
#define MAX_TUPLE_SIZE_IN_WORDS 3500
./storage/ndb/include/ndbapi/ndbapi_limits.h
#define NDB_MAX_TUPLE_SIZE_IN_WORDS 3500
and
./src/ndbjtie/com/mysql/ndbjtie/ndbapi/NDBAPIConst.java
NDB_MAX_TUPLE_SIZE_IN_WORDS = 3500,

Changing these values is untested and unsupported so your mileage may vary.

· The maximum number of attributes per key in NDB is 32.

True:

· Autodiscovery of databases is not supported in NDB for multiple MySQL servers accessing the same cluster in MySQL Cluster. (You have to add each database manually on each SQL node.)

False: "Autodiscovery of databases is now supported for multiple MySQL servers accessing the same MySQL Cluster. Formerly, autodiscovery in MySQL Cluster 5.1 and MySQL Cluster NDB 6.x releases required that a given mysqld was already running and connected to the cluster at the time that the database was created on a different mysqld—in other words, when a mysqld process connected to the cluster after a database named db_name was created, it was necessary to issue a CREATE DATABASE db_name or CREATE SCHEMA db_name statement on the “new” MySQL server when it first accesseed that MySQL Cluster. Beginning with MySQL Cluster NDB 6.2.16 and MySQL Cluster NDB 6.3.18, such a CREATE statement is no longer required. (Bug#39612)

This also means that online schema changes in NDB tables are now possible. That is, the result of operations such as ALTER TABLE and CREATE INDEX performed on one SQL node in the cluster are now visible to the cluster's other SQL nodes without any additional action being taken."
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

· MySQL replication does not work correctly in NDB if updates are done on multiple MySQL servers; replication between clusters is on the feature list for MySQL 5.1.

False: Again, 5.1 is GA over a year and a half. No need to treat this is a forward looking statement.

"* Replication with MySQL Cluster. It is now possible to use MySQL replication with Cluster databases. For details, see Section 17.6, “MySQL Cluster Replication”.

Circular Replication. Circular replication is also supported with MySQL Cluster, beginning with MySQL 5.1.18. See Section 17.6.10, “MySQL Cluster Replication: Multi-Master and Circular Replication”.

* auto_increment_increment and auto_increment_offset. The auto_increment_increment and auto_increment_offset server system variables are supported for Cluster replication beginning with MySQL 5.1.20, MySQL Cluster NDB 6.2.5, and MySQL Cluster 6.3.2."
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

However, since the SQL node cannot know of the statements executed on another SQL node only row based replication is supported:
"Replication. Statement-based replication is not supported. Use --binlog-format=ROW (or --binlog-format=MIXED) when setting up cluster replication. See Section 17.6, “MySQL Cluster Replication”, for more information. "

· ALTER TABLE is not fully locking in NDB when you’re running multiple MySQL servers.

False: With the changes to support discovery of CREATE DATABASE in MySQL Cluster NDB 6.2.16 and MySQL Cluster NDB 6.3.18 a global schema lock was introduced which consistiently locks tables across all SQL nodes in the cluster during DDL operations

· All storage and management nodes within a cluster in NDB must have the same architecture. This restriction does not apply to machines simply running SQL nodes or any other clients that may be accessing the cluster.

Misleading: All nodes must be the same endian nature. However machines running "mysql" clients have no such restriction. SQL nodes themselves do have limitation of being the same endian type as the data nodes. i.e. A Linux x86 machine cannot be a mysqld (SQL node) front end to ndbd (DATA nodes) running Solaris Sparc.
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-exclusive-to-cluster.html

· It is not possible to make online schema changes in NDB, such as those accomplished using ALTER TABLE or CREATE INDEX. (However, you can import or create a table that uses a different storage engine and then convert it to NDB by using ALTER TABLE tbl_name ENGINE=NDBCLUSTER;.) ALTER TABLE works on occasions, but all it does is create a new table with the new structure and then import the data. This generally causes an error as NDB hits a limit somewhere. It is strongly recommended that you not use ALTER TABLE to make online schema changes.

False: Again with the schema changes... See above.

· Adding or removing nodes online is not possible in NDB. (The cluster must be restarted in such cases.)

False: "In MySQL Cluster NDB 7.0 (beginning with MySQL Cluster NDB 6.4.0) and later MySQL Cluster release series, it is possible to add new data nodes to a running MySQL Cluster by performing a rolling restart, so that the cluster and the data stored in it remain available to applications."
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

Removal of data nodes online is possible for nodes that were added online. Nodes that were in the cluster at initial system startup cannot be removed without a total shutdown and restore. However online removal is not thouroghly tested so it is not officially supported.

· The maximum number of storage nodes within an NDB cluster is 48.

True:

· The total maximum number of nodes in a cluster in MySQL Cluster is 63. This number includes all MySQL servers (that is, SQL nodes), storage nodes, and management servers.

False: "Starting with MySQL Cluster NDB 6.1.1, the total maximum number of nodes in a MySQL Cluster is 255, including all SQL nodes (MySQL Servers), API nodes (applications accessing the cluster other than MySQL servers), data nodes, and management servers. The total number of data nodes and management nodes beginning with this version is 63, of which up to 48 can be data nodes.

Note: The limitation that a data node cannot have a node ID greater than 49 continues to apply."
-- http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-limitations-resolved.html

7 comments:

hingo said...

Good rebuttal. But why do you refer to 5.1? For NDB, 5.1 was never a GA release, the numbering in GA releases jumped from 5.0.x to 6.2.5 in Sep 2007, so much more than one and a half year ago. (This release contains what you here reference as being in 5.1.) The MySQL 5.1 release date was rather irrelevant for MySQL Cluster.

K said...
This comment has been removed by the author.
K said...

Good to see your rebuttal here as Anirudh's post contains some seriously misleading limitation.

LinuxJedi said...

It should also probably be noted that Anirudh's text was plagiarised from the book "MySQL Clustering" by Alex Davies and Harrison Fisk. At the time they published this (2006) the text was true. A lot changes in 4 years.

I wonder if any copyright lawyers are reading.

Matthew Montgomery said...

Good point hingo, I just never felt comfortable suggesting using "GA" versions of cluster that were based upon non-GA 5.1 versions. The documentation does show some of these limitations as being resolved in 5.1 and not later 6.x/7.x releases.

Matthew Montgomery said...
This comment has been removed by the author.
Matthew Montgomery said...

** CORRECTION **

· ALTER TABLE is not fully locking in NDB when you’re running multiple MySQL servers.

True:

DML operations are permitted from other SQL nodes while ALTER TABLE is in progress. This is true weather an ONLINE or OFFLINE alter is used. Any issuance of LOCK TABLES is local only to the SQL node running that statement. However, With the changes to support discovery of CREATE DATABASE in MySQL Cluster NDB 6.2.16 and MySQL Cluster NDB 6.3.18 a global schema lock was introduced which locks tables across all SQL nodes against other DDL operations while any DDL are in progress in the cluster.