SAND CDBMS Tools Reference Guide
Data Loader (ndlm)

 

Previous Topic:
Creating the Import Specification Script
Chapter Index
Next Topic:
The NULLIF and MISSINGIF Statements

 

Import Record Specification


The
record specification (or RECORD section) must describe the arrangement and composition of all the fields making up the records in the ASCII data input file including those to be ignored during import. Since only one record specification can be defined in the import specification file, all records in the input file must conform to this record specification. The most basic form of a record specification is as follows:

field-label
.
.
.

field-label
field-format
.
.
.

field-format

There must be exactly one field-label/field-format pair for each field in an input record. Each label/format pair must be separated by some form of "white space" (that is, at least one space, tab, or line feed). No field may be omitted from the record specification, since it is intended to provide ndlm with a complete description of the records in the input file. Furthermore, the field labels must be listed in the same order as the corresponding fields appear in the input record, with the left-most field defined first.

Note:
ndlm automatically removes space characters appearing at the end of an input field. To strip off leading spaces as well, include the -l flag in the ndlm invocation.

 

Elements of the Record Specification

field-label

Assigns a label to a field in the input file. The label can be any string between 1 and 256 characters in length, and must begin with an alphabetical character. The field label can be the same as the table column name to which it is assigned in the map specification.


field-format


Defines the length of the field, along with any delimiters that are used to separate it from the fields that precede or follow it. The field-format entry has this basic syntax:

[ <delimiter> ] field-length [ <delimiter> ]

field-length
The field-length parameter is required. It may be an integer ( n ), indicating a fixed-length field with a length of n characters, or an asterisk (*), indicating a variable-length field. The end of a variable length field must be indicated by at least one delimiter character. Fixed-length fields do not require delimiters.

delimiters
Delimiter entries are included when describing variable length fields. They indicate the characters that separate a field from the fields appearing before and after it in the input record. A variable length field must have at least one trailing delimiter. The delimiter may be a string of any length, up to 256 characters; comma and tab characters are examples of typical delimiters.

Any ASCII character that does not appear within the data fields can be used as a delimiter, with the exception of the following: alphabetic and numeric symbols, parentheses ( ( ) ), and the asterisk ( * ). The following non-printing characters may be used as delimiters (each must be preceded by the backslash escape character, as shown):

ASCII nul \0
space \s
tab \t
line feed \n
carriage return \r


Hexadecimal representations of ASCII characters are also accepted in delimiter specifications: to specify a delimiter in this way, use " \x " followed by the two-digit hexadecimal number corresponding to the appropriate ASCII character: for example, ASCII character 127 would be specified as follows:

\x7f

If a period ( . ) is used to delimit fields, make sure that the period is separated by a space from the field-length entry definition in the RECORD section of the load specification file (for example, field1 6 .). Otherwise, ndlm will not parse the record.


End-of-Record

When input records contain variable length fields, the end of each record must be specially marked by an ASCII line feed character (\n) if the data file was created in UNIX, or the ASCII carriage return and line feed characters together (\r\n) if the file was created in Windows.


Example

The following simple example shows a record specification that describes an input record consisting of three variable-length fields. The fields are separated by commas, and the end of the record is indicated by a carriage-return/line feed combination (that is, the Windows format).

RECORD
{
field1 *,
field2 *,
field3 *\r\n
}



Alternative Syntax for Specifying Input Field Format in the RECORD Section

The format of fixed-length character fields can be specified in an alternative way, allowing for greater control over the data input process by mapping fields in terms of their position within the record and their length.

The basic method described above remains valid, and in fact must be used when describing variable-length or numeric fields.

The alternative field specification syntax has the following form:

[ <delimiter> ] ( starting-position : field-length ) [ <delimiter> ]

where starting-position and field-length are integers.

Note that the presence of parentheses around the field-format definition, and a colon within it, differentiates the alternative syntax from the basic form described in the previous section.

In this syntax, the starting-position parameter marks the beginning of the field, measured in number of characters from the beginning (or leftmost position) of the record. The field-length parameter must be separated from the starting-position entry by a colon ( : ), and the two parameters must be enclosed in parentheses.


Example

The following format specifications define a record composed of two consecutive fixed-length fields, each 15 characters long. The fields are not separated by a delimiter, but the end of the record is signaled by the carriage return and line feed characters (\r\n). Note that this example uses the Windows end-of-record format.

field1 (1:15)
field2 (16:15)\r\n


Remapping Fields

The field-format entry can also define the remapping of a fixed-length, character-type input field onto one or more subsequent fields. That is, one field, or a part of it, can be used as the source for another field (thus creating "sub-fields"). Field remapping is defined using the following syntax:

( reference-field-name : position : length )

In this syntax, reference-field-name is the field-label of the field to be parsed, position specifies the starting position of the sub-field within that reference field, and length defines the length of the sub-field (in characters). These elements must be separated by colons ( : ), and the entire entry must be enclosed in parentheses, which signal a mapping of the field in terms of starting position and length.

Because the presence of field delimiters would interfere with the interpretation of the remapping syntax, reference fields cannot contain delimiter symbols.

For example, the following format definition will load the first eight characters of field2 into field3:

field3 (field2:1:8)


The PIC Clause

Basic data filtering and manipulation operations can also be specified in RECORD section field declarations, through inclusion of a PIC clause. The syntax of the PIC clause is described in the Verifying/Manipulating Data Using the PIC Clause section below. However, much of the same functionality is also provided by NDL++ statements.

 

Previous Topic:
Creating the Import Specification Script
Chapter Index
Next Topic:
The NULLIF and MISSINGIF Statements