SAND CDBMS SQL Reference Guide
Value Expressions

 

Previous Topic:
Functions
Chapter Index
Next Topic:
OLAP Functions

 

Functions


Aggregate Functions

An aggregate function (also known as a set or column function) is an operation applied to the values contained in a particular column (or in a derived column created by an expression), producing a single value that summarizes the (derived) column. The value expression argument cannot itself contain an aggregate function. As well, aggregate functions cannot contain subqueries.

SAND CDBMS supports the following aggregate functions:

AVG ( value-expression )
AVG ( ALL value-expression )
Return the average of the non-null values in the specified or derived column (numeric data only).

AVG ( DISTINCT value-expression )
Returns the average of the non-null values in the specified or derived column, when duplicates are omitted (numeric data only).

CENTMOM2 ( value-expression )
CENTMOM3 ( value-expression )
CENTMOM4 ( value-expression )

Return the rth moment about the mean, or central moment. That is, CENTMOM2 returns the 2nd moment, CENTMOM3 returns the 3rd moment, and CENTMOM4 returns the 4th moment.

The sample central moment is estimated by the following:

where:

Note that CENTMOM3 results are accurate for input values with up to 17 digits to the left of the decimal place; and CENTMOM4 results are accurate for input values with up to 13 digits to the left of the decimal place.

COUNT ( * )
Returns the number of rows in the table or virtual table.

COUNT ( value-expression )
COUNT ( ALL value-expression )

Return the number of non-null values in the specified or derived column.

COUNT ( DISTINCT column )
Returns the number of non-null values in the specified or derived column, when duplicates are omitted.

KURTOSIS ( value-expression )
Returns the degree of peakedness in the distribution given by a specified or derived column (numeric data only). Peakedness is defined here as a normalized form of the fourth central moment of the distribution. Kurtosis is estimated using the following equation:

where the ks are k-statistics.

Note that KURTOSIS results are accurate for input values with up to 13 digits to the left of the decimal place.

MAX ( value-expression )
Returns the maximum value in the specified column or derived column.

MEANDEV ( value-expression )
MEANDEV ( ALL value-expression )
Return the mean deviation, or the average of the absolute values of the differences between individual values in the specified or derived column and their mean (numeric data only).

MEDIAN ( value-expression )
MEDIAN ( ALL value-expression )
Return the middle value when all the values in the specified or derived column are rank ordered from lowest to highest (numeric data only).

MIN ( value-expression )
Returns the minimum value in the specified or derived column.

ONEOF ( value-expression )
Returns one random value per group from the specified, non-grouped column.

SKEWNESS ( value-expression )
Returns the degree of asymmetry in the distribution given by a specified or derived column (numeric data only). Skewness is estimated using the following equation:

where the ks are k-statistics.

Note that SKEWNESS results are accurate for input values with up to 17 digits to the left of the decimal place.

STDDEV ( value-expression )
STDDEV ( ALL value-expression )
Return the standard deviation, or the square root of the sum of the squared deviations from their average divided by one less than the number of values in the specified or derived column (numeric data only).

SUM ( value-expression )
SUM ( ALL value-expression )
Return the sum of the non-null values in the specified or derived column (numeric data only).

SUM ( DISTINCT value-expression )
Returns the sum of the non-null values in the specified or derived column, when duplicates are omitted (numeric data only).

SUMCUBE ( value-expression )
SUMCUBE ( ALL value-expression )

Return the sum of the cubes ( Σ n3 ) of the values in a specified or derived column (numeric data only).

SUMQUAD ( value-expression )
SUMQUAD ( ALL value-expression )

Return the sum of the values to the fourth power ( Σ n4 ) in a specified or derived column (numeric data only).

SUMSQ ( value-expression )
SUMSQ ( ALL value-expression )
Return the sum of the squares of the values in a specified or derived column (numeric data only).

VARIANCE ( value-expression )
VARIANCE ( ALL value-expression )
Return the sample variance, or the sum of the squared deviations from their average divided by one less than the number of values in the specified or derived column (numeric data only).


Aggregate functions may only be used in expressions appearing in the projection list of a SELECT statement or in HAVING clause predicates. Aggregate functions cannot be used in WHERE clause expressions, in the value list clause of an INSERT statement, or in the SET clause of an UPDATE.

Any direct column references (or expressions that contain column references) that appear in a query projection list along with aggregate functions must also be represented in a GROUP BY clause, either as a direct column reference, an alias, a position number from the projection list, or a full value expression. When a SELECT statement is structured in this way, the aggregate function is applied and produces a separate value for each distinct combination of values from the GROUP BY columns. If the query expression projection list includes one or more aggregate functions but no grouped columns, the whole table is treated as a group.

Consider the following employee table and a SELECT statement applied to it:

lname
fname
dept
payrate
Harris
Joe
1050
36100.00
Jacobi
Ron
1190
25000.00
Mann
Susan
1050
39800.00
Prince
Peter
1050
31045.00
Desai
Anita
1050
46000.00
Gilbert
Gilles
1190
21500.00

 

SELECT dept, COUNT(*), SUM(payrate), AVG(payrate), MAX(payrate), MIN(payrate)
FROM employee
GROUP BY dept;


This SELECT statement returns the distinct values in the dept (department) column in the employee table, along with the number of rows (employees) belonging to each department, and some aggregate payrate information for each department (the sum total of pay rates, the average pay rate, the highest pay rate, and the lowest pay rate):

dept
COUNT(*)
SUM(payrate)
AVG(payrate)
MAX(payrate)
MIN(payrate)
1050
4
152945.00
38236.2500000
46000.00
31045.00
1190
2
46500.00
23250.0000000
25000.00
21500.00

 

Aggregate functions can be part of an arithmetic operation. For example:

MAX(population) - MIN(population)


The next example illustrates the use of the ONEOF() function, which returns one random value per group from a non-grouped column. Given this table:

c1
c2
1
a
1
b
2
c
2
d
3
e
3
f

The following query

SELECT ONEOF(c2) FROM t1 GROUP BY c1;

will return three values, one from each group (1, 2, 3). From group 1, either "a" or "b" will be returned; from group 2, either "c" or "d" will be returned; and from group 3, either "e" or "f" will be returned. Since the value returned from each group is random, the possible result sets for the above query are:

{ a, c, e }
{ b, c, e }
{ a, d, e }
{ b, d, e }
{ a, c, f }
{ b, c, f }
{ a, d, f }
{ b, d, f }


Errors

The following examples demonstrate the usages of aggregate functions that return errors. The first fails because the aggregate function is used in a WHERE clause.

SELECT * FROM State WHERE SUM (Population) = 0;

The following statement fails because nested aggregate functions are not permitted.

SELECT SUM (SUM(Population)) FROM State;

The following statement fails because the column fname appears in the projection list with the COUNT aggregate function, yet is not included in a GROUP BY clause.

SELECT fname, COUNT(*) FROM white_house;

 

Previous Topic:
Functions
Chapter Index
Next Topic:
OLAP Functions