SAND CDBMS SQL Reference Guide
CREATE PARTITION TABLE

 

The CREATE PARTITION TABLE command defines a partitioned table in the database. This special type of table, created on the head node, does not itself contain data, but rather points to related partitions of data on other nodes. It is essentially an updatable view of corresponding tables in multiple databases across a network.


Required Privileges

The user authorization must own the schema where the new partitioned table will be created, possess OWNER privileges on the schema, or possess DBA privileges. 

For each linked table referenced in the partition list, the user authorization must also have at least one of the following:

If a domain is referenced in the CREATE PARTITION TABLE statement, the user authorization must own the domain, own the schema that contains the domain, possess USAGE privileges on the domain, or possess DBA privileges. 


Syntax


partition table name

The partition table name argument is an identifier that names the table uniquely within the schema: it cannot match any other table or view name within the current schema. Table names can be up to 128 characters long. It is strongly recommended that none of the SAND CDBMS SQL keywords be used as a partition table name, as this may cause problems when referencing the table in certain SQL statements.

The table may be qualified by the schema to which it will belong: prefix the partitioned table name with the name of the appropriate schema followed by a period ( . ), that is, schema-name.partition-table-name.

column name 1
Each column name 1 argument in this list is an SQL identifier (up to 128 characters long) specifying a column in the partitioned table. The column names for partitioned tables must satisfy the same criteria as base table column names.

The number of column names must correspond to the number of columns in the linked tables referenced in the partition list. If all of the linked tables have the same corresponding column names, then the column name 1 list is optional, as the linked table column names will be used for the partitioned table if the column name 1 list is omitted. On the other hand, if any of the referenced linked tables have corresponding columns with different names, the column name 1 list is mandatory.

PARTITION BY
The PARTITION BY clause, which specifies the method used to partition the data (hash, round robin, or range), is optional. If the PARTITION BY clause is omitted from the CREATE PARTITION TABLE statement, round robin is the default partitioning method.

<hash/round robin options>
The hash and round robin options (see below) specify that the data is partitioned according to hashing or round robin algorithms, respectively.

<range option >
The range option (see below) specifies that the data is partitioned according to value ranges.

table name
The table name argument is a linked table in the local database that points to a remote table that will act as a partition for the partitioned table being created.

The linked table may be qualified by the schema to which it belongs, by prefixing the linked table name with the name of the appropriate schema followed by a period ( . ), that is, schema-name.linked-table-name.

column name 2
Each column name 2 argument in this list is an identifier that names the column uniquely within the partitioned table. Column names can be up to 128 characters long. It is strongly recommended that none of the SAND CDBMS SQL keywords be used as a column name, as this may cause problems when referencing the column in certain SQL statements.

data type
The data type argument identifies a valid SAND CDBMS data type for the column. SAND CDBMS data types are described in the Data Types section.

domain name
The domain name argument identifies the domain in which the column values will be stored. The domain must already have been created with the CREATE DOMAIN statement. If the domain is not in the current schema, prefix the domain name with its schema name, separated by a period (schema-name.domain-name).


Hash/Round Robin Options


ROUND ROBIN

The round robin partitioning option is the default, if the PARTITION BY clause is not included in the CREATE PARTITION TABLE statement.

key field
Each key field argument in the list specifies a non-repeated column in the partitioned table. Taken together, the key field list represents the keys on which the hashing is performed.


Range Option


key field
The key field argument specifies the column in the partitioned table whose values are used to partition the data by range.

table name
The table name argument is a linked table in the local database that points to a remote table that will act as a partition for the partitioned table being created.

The linked table may be qualified by the schema to which it belongs, by prefixing the linked table name with the name of the appropriate schema followed by a period ( . ), that is, schema-name.table-name.

VALUES LESS THAN 'value'
This argument, which comes at the end of each PARTITION definition except the last, specifies the upper boundary (value) for values that go into the partition.
Each specified value must be enclosed by single quotation marks.

The last partition in this list must not include a VALUES LESS THAN 'value' specification. This partition is for all values above and including the 'value' boundary of the previously defined partition in the list (or else, if there is a single partition in the list, this partition contains all values).

 


Description of CREATE PARTITION TABLE Actions


The CREATE PARTITION TABLE command is used to create a partitioned table on the local (head) node. This type of table does not contain data itself; instead it acts as an updatable view for compatible tables in other databases (remote nodes) across the network. This allows a data set to be partitioned among multiple nodes to take advantage of Massively Parallel Processing (MPP) for better performance. A query on the partitioned table transparently redirects the query to each partition node for parallel execution, after which the results are collated and returned to the client. From the end user's perspective, a partitioned table looks and acts like any other table in the database.

There are three main sections in the CREATE PARTITION TABLE command syntax:

  1. the primary command and object name specification: CREATE PARTITION TABLE partition-table-name
  2. the partitioning method specification, optionally preceded by the partitioned table column names: [ (column name list) ] PARTITION BY <partitioning method>
  3. the partitioned table structure:
    1. the partition tables list: ( {PARTITION table-name}... ) or ( PARTITION table-name {VALUES LESS THAN 'value' PARTITION table-name}...)
      - or -
    2. the partitioned table column definitions: ( column-name {datatype|domain} {, column-name {datatype|domain} }... )

The first section of the command is self-explanatory.

The second section specifies the partitioning method used for this partitioned table. There are three partitioning options (hash, round robin, range). The default partitioning option is round robin, so this section can be omitted if round robin is the desired option. If the remote partitions will be pre-loaded, the partitioning method used must correspond to the one selected in the CREATE PARTITION TABLE statement. After the partitioned table is created, subsequent load operations on this table using the SAND Data Loader will automatically distribute the data among the remote partitions according to the partitioning method specified during table creation.

The comma-separated column name list that precedes the partitioning method is required only if the remote partition tables specified in the section that follows have different corresponding column names. If all the partition tables are identical, the column name list can still be used to customize the column names in the partitioned table. Otherwise, if the column name list is omitted, the partitioned table will inherit the column names from the remote partition tables.

The last section of the CREATE PARTITION TABLE command defines the partitioned table structure. There are two different ways to define the partitioned table here:

With the partition list method, the remote table partitions must already exist, as they are associated with the new partitioned table via a list of linked tables pointing to the remote tables.

With the column definition method, the remote table partitions are created automatically on the fly and associated with the new partitioned table. All of the remote tables will share the same name as the partitioned table, and a linked table named "<partitioned table>_<connection object>" will be generated on the head node for each of these remote tables. The remote tables are initially empty, but are intended to be populated using an INSERT...SELECT statement that targets the new partitioned table for insertion and selects from an existing partitioned table and dimension tables. The INSERT...SELECT statement will actually be applied separately on each node where a remote table resides, so it will be loaded with selected data specific to that node, joined with dimension data common to all nodes.

Note that the CREATE TABLE...SELECT command in MPP mode can also be used to create a new partitioned table from an existing partitioned table and dimensions, in effect combining in one command the CREATE PARTITION TABLE and INSERT...SELECT statements described above. Refer to the CREATE TABLE section of the SQL documentation for more information.


Examples

Assume that schema s1 in database db1 (the head node) contains linked tables tpart2, tpart3, and tpart4 that all point to remote tables with the same following structure:

Column Name Data Type
c1 INTEGER
c2 NUMERIC(8,3)
c3 CHAR(12)
c4 CHAR(6)

Linked table tpart2 points to a table in database db2, tpart3 points to a table in database db3, and tpart4 points to a table in database db4.

The following command creates a partitioned table called part1 in schema s1 of the head node:

CREATE PARTITION TABLE s1.part1
  PARTITION BY HASH (C1, C2)
  (PARTITION s1.tpart2 PARTITION s1.tpart3 PARTITION s1.tpart4);

Data is partitioned among the remote tables according to a hashing algorithm that uses fields c1 and c2 as input.. If an import operation is performed on partitioned table part1, the loader will automatically hash partition the data and load it into the remote tables pointed at by tpart2, tpart3, and tpart4.


Alternatively, if the round robin partitioning scheme is desired, it can be set implicitly by omitting the PARTITION BY clause altogether, since round robin is the default. For example:

CREATE PARTITION TABLE s1.part1
  (PARTITION s1.tpart2 PARTITION s1.tpart3 PARTITION s1.tpart4);


If the partitioning method is changed to range, acting on the first column's values, the CREATE PARTITION TABLE command could be the following:

CREATE PARTITION TABLE s1.part1
  PARTITION BY RANGE (C1)
  (PARTITION s1.tpart2 VALUES LESS THAN '1000'
   PARTITION s1.tpart3 VALUES LESS THAN '10000'
   PARTITION s1.tpart4);

The command above indicates that partitioning of the records are done according to the value of column c1:

Note that the final PARTITION specification in the list does not include the "VALUES LESS THAN" syntax. This is because the last range partition is the catch-all partition for the remaining values. There cannot be an upper boundary for this partition.


If linked table tpart4 had pointed at a remote table with different column names from the table structure shown above, explicit column names would have to be specified for the partitioned table. For example, say that the remote table referenced by tpart4 had this structure:

Column Name Data Type
col1 INTEGER
col2 NUMERIC(8,3)
col3 CHAR(12)
col4 CHAR(6)

The original CREATE PARTITION TABLE command could be modified as follows to include explicit column names for the partitioned table:

CREATE PARTITION TABLE s1.part1 (alpha, beta, gamma, delta)
  PARTITION BY HASH (alpha, beta)
  (PARTITION s1.tpart2 PARTITION s1.tpart3 PARTITION s1.tpart4);

In this case, the partitioned table would have column names alpha, beta, gamma, and delta, regardless of the remote table column names. Note that any key(s) specified in the PARTITION BY clause must use the explicit column names, if present, rather than the remote table column names.


The following example uses the alternative syntax for creating a partitioned table, where column definitions are specified instead of a list of linked tables. In addition to partitioned table s1.part1 created in an example above, assume there are two dimension tables, d1 and d2. The user authorization u1 has partitioned table part1 set as default, so the user will always start a session in MPP mode in the context of this table. In MPP mode on the head node, user u1 issues the following command:

CREATE PARTITION TABLE s1.part2
  PARTITION BY ROUND ROBIN 
  (sn CHAR(12), sname CHAR(6), pname CHAR(20), qty INT);

This command will create a table s1.part2 with the structure defined above on every node associated with partitioned table part1. Once those remote tables are created, corresponding linked tables will be generated on the head node, and partitioned table part2 will be created with those linked tables (remote tables) as its partitions. Since the remote tables are empty at this point, the next step is to populate those tables with data, in this case from a join of partitioned table part1 and dimension tables d1 and d2, using the following INSERT...SELECT command on the head node:

INSERT INTO s1.part2 (sn, sname, pname, qty)
  SELECT part1.c3, d2.c4, d1.c4, d2.c5
  FROM s1.part1, s1.d1, s1.d2
  WHERE part1.c3 = d1.c3 AND d2.c4 = part1.c4;

The above command will actually be executed independently on each of part2's partition nodes. While the dimension data of d1 and d2 is the same on all the nodes, part1's partitioned data should vary as per the partitioning strategy. Since the data inserted into the remote tables of part2 will reflect the partitioning of part1 at each node, it is recommended to use either the same partitioning strategy as the data source partitioned table (part1 in this case) or else round robin when creating a new partitioned table with the column definition syntax.