SAND CDBMS MPP Columnar Database Architecture Background
The patented data architecture underlying SAND CDBMS realizes the initial vision of Ted Glaser, one of the leading technology thinkers of his generation. Arthur Ritchie, co-founder of SAND Technology, was responsible for developing Glaser’s concepts and bringing the resulting product to market. The future direction of SAND CDBMS is now in the hands of Richard Grondin, who has been responsible for developing the massively scalable features of SAND CDBMS.
MPP Data Scalability
To achieve near-linear data scalability to exabytes and beyond, SAND’s MPP architecture uses decoupled shared storage in conjunction with full distributed processing, with dynamic allocation of resources to prevent “hot spots”. SAND’s architecture is deployed using an elastic cloud which expands and contracts according to real-time requirements.
SAND’s Massively Parallel Processing is architected into every aspect of the SAND CDBMS product. SAND CDBMS data is stored in a shared location using the best available media, which can include tiered storage. SAND CDBMS is designed to support concurrent data load and query capability in parallel, with parallel load streams enabling near-linear scalability.
MPP User Scalability
SAND CDBMS delivers the most efficient Massively Parallel Processing for large communities of users. SAND’s Workload Manager utilizes all resources available to process user requests, and features a virtual execution mode delivering non-locking, non-blocking file operations.
MPP Analytic Scalability
SAND CDBMS uses a column-oriented data architecture to deliver easy and powerful analysis. Query processing involved scanning individual data fields instead of the entire record, eliminating the requirement to move unneeded fields in and out of memory. Each column is effectively fully indexed. SAND’s approach greatly enhances performance for OLAP queries and pure ad hoc analytics.
SAND tokenizes data for both storage and processing efficiency. Each row is augmented with a unique identifier, called a Tuple Identifier (TID), and in each column an Entity Identifier (EID) is assigned to each unique value. This approach dramatically improves performance and maintainability.
SAND Memory Manager
SAND’s patented memory manager is based on Knuth’s buddy block system, which allocates memory with minimal disk accesses. Designed for optimized metadata management using non-sequential accesses to find free memory, this architecture results in blistering performance. SAND CDBMS is organized as a virtual address space, working with frames designed to minimize memory usage and provide instantaneous access.
Generation Based Concurrency Control
In SAND, concurrent transactions are performed as if each transaction were managing its own specific version of the database. The available public version represents any committed work, while the user’s private version represents active transactions. This model supports full ACID compliance without involving a concept of “dirty reads”. GBCC concurrency control is both lockless and optimistic, providing the optimal model for supporting thousands of users.
Domains (Tokenization)
SAND CDBMS is unique in its use of a column/domain concept of data organization. While the concept of “dictionaries” is used in other Columnar Database Management Systems, SAND uses page-level granularity rather than the column-level granularity found in other systems. In addition, SAND’s approach delivers self-encoding integer data for better performance and superior compression ratios.
Encoded Bit Vectors
SAND CDBMS uses an encoded bit vector concept (as distinguished from bit-strings) to achieve speed and space advantages. With SAND, internal CDBMS operations are executed directly on encoded bit vectors, with no decoding required. SAND’s unique inverted-index column architecture handles both database operations and text search operations in a uniform way. Together, these features ensure that SAND makes the most efficient use of resources to deliver optimal performance and scalability.
Centimal Math
SAND uses a centimal math package to ensure correct arithmetic operations on decimal data.
TimeTravel
Users can connect to a version of the SAND CDBMS database as it existed at some point in the past, without restoring any files, and while other users continue to use the current version of the database. Users can ‘freeze’ their environments, in order to carry out long-running analyses unaffected by others’ updates, enabling both historical analysis and real-time updates.
Persistent Virtual Memory
SAND’s Persistent Virtual Memory architecture means SAND CDBMS operates with the speed of an in-memory database, without requiring the entire database to be resident in memory.
Log Management
SAND CDBMS can collect data feeds (for example, web logs, email logs, system logs, network logs, application logs, call details, and so on), and store this data in a small footprint with up to 98% disk space reduction, reducing storage costs and management time. SAND produces a self-defining file with no need to maintain an external structure. From this compressed file, the original file can be reproduced to feed any ETL tool, database or audit process.
SAND Data Transform ELT
SAND Data Transform ELT adds analytic data to historical data in real time, as the data is captured. Archival data can be used to enrich demographic data: for example, IP demographic information may be added to web log data. SAND supports slowly changing dimensions, for example, the reassignment of IP addresses over time.
Slowly Changing Dimensions
SAND’s CDBMS supports all Slowly Changing Dimension types, from agnostic SCD 1 through 6-hybrid.
More information
To receive more information from SAND, please complete and submit the following form:
