SAND CDBMS Tools Reference Guide
Data Loader (ndlm)

 

Previous Topic:
SAND Compacted Table (SCT) Functionality
Chapter Index
Next Topic:
Importing Data with ndlm

 

Running ndlm


Prerequisites

When ndlm is being used to import data into a database, queries cannot be executed on that database instance (that is, using the nserv process to which ndlm is connected). Conversely, it is not possible to import data into a database instance while queries are being executed. To preclude this situation, all non-ndlm clients must be disconnected from the database before an ndlm process can connect to the database, although once the load transaction has commenced, those clients can reconnect. Concurrent ndlm processes can operate on the same database without any problems: load operations on different tables are performed in parallel, while load operations on the same table are serialized.

Before starting ndlm for an import operation, other, non-ndlm users should be instructed to disconnect, or the -u flag should be included in the invocation to disconnect other users automatically.

On the other hand, database exclusivity is not required to start an ndlm export operation, since the database is not changed when data is exported. Furthermore, non-ndlm users can execute queries while ndlm is exporting, which are disallowed while ndlm is importing.

Invoking ndlm

The ndlm invocation is entered from the operating system command prompt, and has the following form:

ndlm [ <option flags> ] connection-name instance-name user-name[/user-password] spec-file-name


ndlm Invocation Arguments:

connection-name

The name of the connection specified when instance-name was started with the nserv program. If ndlm is being used with a database instance running on a remote server, the client-side nucleus.ini file must contain a [CONNECTION connection-name] section that specifies the port number and host name used to connect to the database instance, and these must match the values specified for the same connection in the server-side nucleus.ini file. The connection name is not case-sensitive.


instance-name

The name of the database instance to which to connect. Instance-name must have already been started using nserv with the specified connection name. The instance name is not case-sensitive.


user-name


The user-name argument must be a valid user authorization with certain privileges on the database table accessed during the loader operation. Specifically, to run ndlm, one of the following must be true:

The user name is not case-sensitive.


/user-password

The user-password argument is required only if the specified user-name authorization has an associated password.
The user-password argument must be separated from the user-name argument only by a slash character ( / ); no "white space" may appear between the two arguments. The password must precisely match the password associated with the specified user authorization.

Note:
User passwords are case-sensitive.

spec-file-name

The name of the import or export specification file. The spec-file-name argument must include a path specification if the file is not located in the current directory. On the Windows platform, if the path contains any spaces, either the full path and specification file name must be enclosed by quotation marks, or else short file/folder names must be used.


option flags

One or more of the following option flags may be specified before the connection-name argument in the ndlm invocation. If multiple flags are specified, they must be separated by at least one blank space. In the case of flags that require arguments, a space character must separate the flag from the argument. Flags specified in the ndlm invocation take precedence over option flags included in the load specification file, if there is any conflict between them.

-b
(Import only)
Suppresses ndlm prompt display. This flag must be used if you intend to run ndlm unattended (that is, as a batch process). If this flag is not included, ndlm shows the correspondence between columns and their related values for the first three records in the import file, and the user is given the opportunity to interrupt processing if the data is not correctly formatted. This flag must be included if ndlm is running in Parallel Loader mode.

-c
(Import only)
Instructs ndlm to commit all loaded records automatically in the event of an error. If this flag is not included, a ROLLBACK is performed in the event of an error.

-d
Instructs ndlm to overwrite the error log file (log.ndl, or the file specified with the -e flag) rather than append to it. If this flag is not included, ndlm appends messages to the existing error log file.

-e file-name
Specifies a particular error log file (the default log file is log.ndl). If you have a batch file set up with multiple ndlm invocations, this flag enables you to specify a different log file for each ndlm invocation.

-f x [ y ]
Allocates x kilobytes of memory to the ndlm input buffer and, optionally in standard mode, y kilobytes to the accumulation buffer (where input buffer data is amassed before being sent to the server).

In standard mode, the default value for both x and y is 65536 KB (64 MB), while the maximum size for both is 256000 KB (250 MB). The value for x should always be less than or equal to that of y, but if the value specified for x is greater, it will be set automatically to the value of y.

In Parallel Loader mode, only x can be specified. By default, the input buffer is set to 1024 KB (1 MB), while the maximum buffer size is 256000 (250 MB).

Normally, you should let ndlm manage memory allocation automatically for standard loads. However, on machines with limited resources, reducing the size of the accumulation buffer can reduce memory utilization and allow data to be loaded. When operating the Parallel Loader, it is best to set the input buffer as high as possible. In most cases, the buffer size should only be set below the maximum if system resources are limited and a certain amount must be reserved for other critical processes.

Also note that a small amount of overhead memory is required per data record. Therefore, if a large amount of buffer memory is allocated explicitly and the record size is very small, the total overhead memory can become substantial as the buffer fills up, possibly exhausting system resources. To avoid such memory allocation problems, it is best to let ndlm set the buffer automatically when the input record size is very small.

The flag and flag values must be separated by space characters.

-g
Instructs ndlm to include special markers ( | ) in the "reject file" (input-file-name.BAD), which contains any data records that are rejected due to error conditions returned during the load process. These markers are placed above the position within the record at which the problematic data is located. Only data rejected due to ndlm errors will be marked in this way.

Note that specifying this flag will make the input-file-name.BAD file unsuitable for loading: the markers will have to be stripped out of the file before attempting to reload the data.

-h
Displays ndlm invocation argument and flag information. If this flag is included, all other ndlm invocation arguments are ignored.

-i n
(Import only)
Causes ndlm to ignore the first n bytes (where n is an integer) in the input data file. For example, to process a file with a 256-byte header, enter -i 256 to instruct ndlm to ignore the header. If this flag is not included, processing begins with the first byte. A space character must separate the flag from its argument.

-j n
(Parallel Loader only)
Sets the number of threads used for reading data. The -j value (n) must be an integer from 0 to 16 (with 0 reserved for pipe operations). At least one blank space must separate -j from its argument.

If the -j flag is omitted from the Parallel Loader invocation, the number of reading threads defaults to 1.

If the -j value exceeds 16, the number of reading threads will be set to 16.

When using the -j option, ensure that all data files contain records that are either delimited by end of line characters (\n or \r\n) or are fixed length, otherwise an error will be returned.

The higher the -j value, the more file descriptors will be used by the loader. In UNIX, if the number of descriptors exceeds the maximum number of open files permitted per process by the operating system, both the loader and the nserv program will crash. See the Troubleshooting section below for information about increasing the maximum number of file descriptors for the load operation.

When this option is used with pipes, -j 0 should be specified, as this will set the number of reading threads to one (1) per pipe, which is the optimal amount. Do not use -j 0 when the load operation involves regular files, since this will negatively affect performance.

Note:
A rule of thumb is to set -j to the load input file size divided by the size of the input buffer (-f). For example, if the input files add up to 6144 KB (6 MB) and the input buffer is set to 1024 KB (1 MB), the number of reading threads should be 6 (the result of 6144 / 1024).


-k
n
(Parallel Loader only)
Sets the number of processing threads for a Parallel Loader operation. The -k value (n) must be an integer from 0 to 32. Specifying a -k value of zero (that is, -k 0) tells the loader to use the maximum number of threads available. Generally, n should not exceed the number of processors in the system. There must be at least one blank space between -k and the flag value. If -k is used, then the -b flag must be included on the command line as well.

Note:
The presence of the -k flag activates Parallel Loader functionality. If this flag is absent, the regular ndlm functionality is used for the operation.


-l
Instructs ndlm to remove leading spaces from the input fields. Trailing spaces are always stripped from input fields.

-m n
(Import only)
Instructs ndlm to allow n errors (where n is an integer) to occur before stopping the import. A prompt appears after n errors are encountered, asking whether or not to continue the load. This flag may not be used in conjunction with the -b flag. A space character must separate the flag from n.

-n string
Establishes string as the null indicator. During an import, ndlm assigns a null to the appropriate database row and column whenever it encounters a value of string in the input file. During an export, ndlm writes string to the output file for each null encountered in the database. Do not enclose the string argument in single quotes ().

-o { 0 | 1 }
Specifying -o 0 instructs ndlm to convert input data to lowercase characters. Specifying -o 1 instructs ndlm to convert input data to uppercase characters. A space must separate the flag from the 0 or 1 argument.

-p n
Instructs ndlm to process (that is, import or export) only n records, where n is an integer. Normally, all records (excluding any that are skipped in accordance with the -s option) are processed. For example, to load only the first 101 records of an input file, enter -p 101. A space must separate the flag from n.

Note:
When the -j option is included along with -p, and the specified number of reading threads is greater than one (1), there is no guarantee that -p n will process the first n records of a file, since each thread will read records from a different section of the input file. In this case, -p can set the maximum number of records to load, but cannot dictate which records will be loaded.

-q
Turns on case sensitivity: ndlm will process any field, table, and column names contained in the specification file in a case-sensitive manner.

-s n
Instructs ndlm to skip n records (where n is an integer) of the input file (import) or table (export). If -s is not specified, processing begins with the first record. A space must separate the flag from n.

Note:
When the input file contains variable-length fields, the skipped records will be checked for errors; if any are found, the load will not be executed. When the input fields are fixed-length, ndlm will simply skip the specified number of records and then process the load in normal fashion.


-t timing-file-name
Records the elapsed processing time for each chunk of data in the specified timing file. The timing-file-name argument must specify a legal file name, optionally including a full path specification. In the timing file, the first column shows the number of records loaded, and the second shows elapsed time in seconds. The start and end times (in elapsed seconds) of data inversion (or "indexing") are also recorded.

-u
Automatically kills (disconnects) all non-ndlm client sessions connected to the database instance before starting a load transaction. Current transactions are rolled back.

-v n
(Import only)
When specified in conjunction with the -m flag, causes ndlm to pause and ask whether to continue after every n errors (where n is an integer, with a default value of 1). This flag has no effect until the number of errors specified by the -m flag is reached. This flag may not be used in conjunction with the -b flag. A space character must separate the flag from n.

-w
Instructs ndlm to stop the issuing of warning messages during the data load. Note that if the -w option is set, ndlm will return exit code 0 for all warnings.

-z { 0 | 1 | 2 }
Specifying -z 0 instructs ndlm to reject any records containing unprintable characters (ASCII decimal codes 0 to 31, or hexadecimal 0x00 to 0x1F). Specifying -z 1 instructs ndlm to replace unprintable characters with ASCII space characters. Specifying -z 2 (the default option) instructs ndlm to accept all unprintable characters (except for the ASCII NUL character: 0x00), which is the option that should be used when loading BLOB data. The flag must be separated from the numeric argument by a space.

Note that string values beginning with the DEL character (ASCII code 127 or hexadecimal 7F) cannot be loaded by ndlm, even if the -z 2 option is specified. The purpose of this restriction is to protect data integrity, since a string literal that begins with the DEL character is interpreted as an empty string by the SAND server, regardless of the string length. However, specifying -z 1 will cause the leading DEL character to be replaced with a space character, and the string will be loaded as expected.


Stopping ndlm

ndlm can only be stopped using Ctrl+C or Ctrl+<break>.


COMMIT WORK

ndlm executes a COMMIT WORK command at the end of an import or EXECSQL operation. To determine whether all information was transferred properly during the ndlm session, examine the log.ndl file (located in the directory from which ndlm is invoked). The results of ndlm imports and exports are recorded in the log.ndl file.

 
Previous Topic:
SAND Compacted Table (SCT) Functionality
Chapter Index
Next Topic:
Importing Data with ndlm