SAND CDBMS Administration Guide
Massively Parallel Processing (MPP)

 

Previous Topic:
Introduction
Chapter Index
Next Topic:
MPP Mode

 

Overview


The following are the main components in an MPP environment:

Partitioned Table
A special table created on the head node that does not contain data, but rather points via local linked tables to the remote tables (partitions) that store the partitioned data. From the perspective of querying, a partitioned table looks and behaves like any other table to the end user. Under the covers, a query on a partitioned table is redirected to each remote node for parallel execution, after which the results are collated and returned to the client transparently.

Partition
A remote table that stores partitioned data. Typically, multiple such tables are associated with a partitioned table. Each of these remote tables should contain a segment of the full data set, which has been partitioned among the remote tables according to a specific partitioning strategy (hash, range, or round robin).

Dimension Table
A special table created on the head node that is automatically replicated and maintained across the different nodes of the system. In data warehousing terms, this table functions as a "dimension" table in relation to a partitioned "fact" table, if a star or snowflake schema model is being used.

Distributed Domain
A special domain created on the head node that is automatically replicated across the different nodes of the system. This type of domain is used in the field definitions of dimension tables.

Head Node
The main database/computer in the MPP network, where the partitioned table(s) and associated dimension table(s) are defined.

Remote/Partition Node
The remote computers/databases in the MPP network, where the data partitions (tables) associated with the head node's partitioned table(s) reside. If dimension tables are used, they should be exist identically across all remote nodes associated with a partitioned table.

SAND Data Loader (ndlm)
The loader can be used to distribute data among the different partitions associated with a partitioned table. When the partitioned table is the target of the load operation, the loader will determine the partitioning strategy defined for the table, and then automatically distribute the data among the remote partitions according to this strategy. The partitioned table itself does not store any of the loaded data.


The relationships between the head node and the remote (partition) nodes are illustrated below (Figure 5):

Figure 5: Partitioned Table and Partitions


The way that data is partitioned and loaded into the remote tables is summarized in the example below (Figure 6):

Figure 6: Loading into a Partitioned Table

 

Previous Topic:
Introduction
Chapter Index
Next Topic:
MPP Mode