SAND CDBMS Tools Reference Guide
Data Loader (ndlm)

 

Previous Topic:
Import Examples
Chapter Index
Next Topic:
Importing Data from an SCT File

 

Verifying and Filtering Data Using the PIC Clause


While much of its functionality has been superseded by the NDL++ scripting language, the special PIC clause is available for inclusion in the RECORD section of the ndlm specification file to specify verification and filtering operations on the contents of a field. The PIC clause is included immediately after the field-mapping declaration, and has the following syntax:

PIC [ ~ ] "< formatting-string >"


The formatting string contains special characters (described below) that are used to check for, delete, or add characters to the input field value. It must be delimited by double quotation marks as illustrated above (single quotes are used to denote a pattern within the formatting string). The formatting string has a maximum length of 256 characters.

The formatting-string may optionally be preceded by a tilde ( ~ ), which indicates that the input field length should be truncated to the number of characters in the formatting string. Truncation is applied to the right-hand side of a field. Thus, for example, if a field that is five characters in length is truncated to four characters, it will retain the four leftmost characters of the input value.

An alternate PIC specification can also be defined, to specify filtering operations to be undertaken in the event that the first has no effect. The alternate PIC formatting string is separated from the primary specification by a semicolon ( ; ), and both are enclosed within a single set of double quotation marks, as follows:

PIC [ ~ ] " <formatting-string >; <formatting-string > "


In this case, if the first filtering specification is inapplicable to the field for which it is declared, the alternate specification will be applied.


The Formatting String

The formatting string operates in terms of a character's position in a field, reading from left to right. That is, the first character in the formatting string is associated with the leftmost character in the field, and all subsequent characters follow going from left to right. Any spaces present within the formatting string are ignored: this facilitates the writing of more legible scripts by allowing "white space" to be included between elements. Comments cannot be included in the formatting string.

Three types of symbols are used in the formatting string: control symbols, data-type symbols, and standard ASCII characters. Control symbols define the filtering operations to be performed, while data-type symbols and ASCII characters specify the input data to be affected by these operations. Note that these symbols are specific to the PIC clause and are not used in the new style of NDL++ statements described above.

Control Symbols

A control symbol affects the symbol or character that immediately follows it in the formatting string. Any number of control symbol/character pairs may appear consecutively in the formatting string; each pair is associated with one character position in the input field.

However, if a single quote appears after a control character, then ndlm will check for a pattern-string of characters or data-type symbols that follows it. If two consecutive single quotes appear, ndlm will check for the existence of a single quote character in the field at the assigned character position (this is because the doubling of the symbol indicates an "escaped" character).

Control symbols specify actions that should be taken on a given field position, on the whole field, or on a script section. They can be included before any of the data-type symbols listed below, and can also be used with any ASCII character (which will then be interpreted as itself). If a pattern of characters enclosed in single-quotes follows the control symbol, ndlm will use that pattern literally in checking, deleting, or inserting characters; if a pattern of data-type symbols enclosed in single-quotes follows the control symbol, ndlm will use that pattern to check the data types in the field.
 

\x
(back-slash)

Check for the presence of x anywhere in the field, and remove it. If x appears anywhere in the field, it is removed and the record is imported.

Note:
So that it will not interfere with other specified operations, this symbol must appear first in the formatting string.

/x
(slash)

Check for the presence of x at the specified character position. If x is not found at the specified position in the field, the record is not imported.

-x
(hyphen)

Check for the presence of x at the specified character position, and remove it. If x is found, it is removed and the record is imported.

+x
(plus)

Insert the character x. The character may not be single or double quotes. The character will not replace any of the input data, but will be inserted at the specified position, displacing data to the right. If the addition of a character to a field causes the field to exceed the database table column size, the character insertion operation will not be performed.

Note:
When all the fields in the input data are declared as fixed length, and adding one or more characters with a PIC clause causes the new field to exceed the width defined in the record specification, this field will be truncated on the right. This will not occur, however, if at least one field is declared as variable length. If necessary, the last field in a record can be declared as a variable-length field that has the line feed character (\n) as a trailing delimiter (since the line feed character always appears at the end of a record).

Records that do not satisfy the check operation are not imported, but are saved to a special file named input-file-name.pct (where input-file-name is the name of the source data file for the import).


Filtering Control Characters from Input Values

Control symbols identify the operations involved in filtering data. Because of the special properties of control and data-type symbols, a special notation must be used when checking for their presence within fields. This involves doubling the symbol in question after the control character. For example:

/## tests for the existence of # at the character position
-## removes the # from the character position
\## removes all occurrences of # from a field

This special style of notation works with the following symbols:

$ ? ^ % / \ - ' "


Additionally, single and double quotes can be added to a field with the following constructs:

+"" adds a double quote character to a field value
+'' adds a single quote character to a field value


Data-Type Symbols

Datatype symbols indicate the classes of ASCII data that can be manipulated using control symbols. Therefore, they must be preceded by a control symbol in a formatting string, unless they appear in a pattern statement enclosed by single quotes. If a pattern of datatype symbols appears between single quotes after a control character, ndlm will look for data conforming to this datatype pattern (for example, ' $#$#$# ' might be used to check fields containing Canadian postal codes).

The following datatype symbols can be used in the formatting string:

# numeric 
$ alphabetic 
% alphanumeric (containing both alphabetic and numeric characters)
(underscore) spaces
? alphabetic and spaces together
^ other symbols

 
PIC Clause Examples

These examples illustrate the various functions of the PIC clause when it is included after field-mapping declarations.


Checking for Values and Patterns

In this example, the script verifies that the contents of field1 begins with the digit 0, and that the contents of field2 is 15 characters long. Note that if any of the characters within field2 is a space value, the row will be ignored, since the space character is not considered to be an alphabetic value. If field1 contains anything other than a 0 in the first character position, the row will be ignored.

IMPORT @sample.dat
{

RECORD
{
field1 (1:15) PIC "/0"
field2 (16:15)\r\n PIC " /'$$$$$$$$$$$$$$$' "
}
schema1.table1
{
column1 field1
column2 field2
}

}


Removing Characters

In this example, the script removes all alphabetic characters and special symbols from the contents of field1, and all numeric characters and special characters from field2. None of the rows containing errors will be ignored, but messages containing information about the errors encountered will be issued.

IMPORT @sample.dat
{

RECORD
{
field1 (1:15) PIC " \ $ \ ^ "
field2 (16:15)\r\n PIC " \ # \ ^ "
}
schema1.table1
{
column1 field1
column2 field2
}

}


Removing a Character or Pattern from a Position

In this example, the script removes a 0 (if it exists) from the first character position in field1; also, the pattern ‘ace’ is removed from the beginning of field2 if it is found to be present.

IMPORT @sample.dat
{

RECORD
{
field1 (1:15) PIC "-0"
field2 (16:15)\r\n PIC "-‘ace’"
}
schema1.table1
{
column1 field1
column2 field2
}

}


Adding a Character

In this example, the script adds the digit 0 at the first position of field1, and the letter ‘a’ to the beginning of field2 in each row.

IMPORT @sample.dat
{

RECORD
{
field1 (1:15) PIC "+0"
field2 (16:15)\r\n PIC "+a"
}
schema1.table1
{
column1 field1
column2 field2
}

}

Note:
If the addition of a character to a field causes the field to be longer than the width of the table column, then the field will be truncated on the right-hand side. If the added characters are placed on the right-hand side and the declared size is exceeded, then the added characters will be truncated.

 

Previous Topic:
Import Examples
Chapter Index
Next Topic:
Importing Data from an SCT File