teem / nrrd

  File Format

This document defines the NRRD file format. It gives requirements and recommendations of readers and writers of this format. Because the nrrd library also supports PPM, PGM, and plain text files, see also NRRD-Compatible File Formats to see how comments embedded in these files can be used to losslessly encode the information normally found in a NRRD header. Since this document aims to be a self-contained reference for the NRRD file format, some of the material here repeats ideas found elsewhere in the nrrd documentation. Besides defining the format, this document also seeks to supply some rationale. I have no prior experience in writing file format definitions; please feel free to email me questions and/or suggestions: gk@cs.utah.edu. This page has been written with a view towards keeping it useful upon printing.

0. Description of format

The NRRD format was primarily designed to be easy to write, rather than easy to read. The NRRD header is simple ASCII text, one field per line. The fields in the header do not have a strict ordering, and most of them are optional. Most strings are case insensitive, and alternate forms of many of the identifiers and descriptors are allowed. Writing NRRD headers by hand, from scratch, is entirely feasible (although the Utah Nrrd Utilities program "unu make -h" is probably a better solution). When writing non-ASCII data, the byte ordering is recorded, but not altered to match one particular endianness. The format flexibility greatly increases the complexity and responsibilities of a NRRD reader, but the compelling benefit is having a simple portable format that is general with respect to dimension and type.

The NRRD file format was also conceived as being somewhat analogous to the PPM format for color images: straight-forward, friendly to programmers, and descriptive of a sufficiently large class of data to be useful in research. Time and experience with the NRRD format has gradually increased its complexity (such as with the introduction of node- versus cell-centered samples), but the feature set has very nearly converged. As a general representation of raster data, NRRD is intended to occupy the very large but sparsely populated niche between

  1. Raw, headerless data, hopefully with some nearby README file explaining the type and dimensions.
  2. Very sophisticated, powerful (complicated) formats such as HDF (http://hdf.ncsa.uiuc.edu/).

The VTK (http://public.kitware.com/VTK/pdf/file-formats.pdf) file format is the closest to NRRD, but there are many differences. In NRRD:

Various aspects of the NRRD format borrow heavily from the PCGV volume dataset format developed by James Durkin at the Cornell Program of Computer Graphics. The nrrdRead() and nrrdWrite() functions of the nrrd library are intended to completely support the format as described here, but there is currently no test suite or validation program.

1. General information

When saved, the filenames for NRRD files should end in ".nrrd". Detached headers, discussed in Section 3, should end in ".nhdr". Standard suffixes for data files associated with detached headers are listed in the encoding part of Section 4.

Because the NRRD format uses a multi-line text header, some mention should be made of what exactly a line is. When Windows (and sometimes Cygwin) create a text file, each line is terminated by a pair of characters, "\r\n". When everyone else creates a text file, each line is terminated by just "\n". NRRD readers must be able to handle both types of line terminations; NRRD writers can use one or the other or both. While C's ability to open files in "ASCII" mode ideally handles this, this approach may be complicated by the fact that the header can be followed by binary data, which must be read and written byte-for-byte untransformed. The line termination is ignored once the line has been read from file, so when the description below says "the line contains ...", it is referring to everything prior to the character(s) comprising the line termination. In the current definition of the format, there is no limit to how long a line can be. If a reader can't handle a given line in a file, it should die with an error message to that effect. The content, labels, and comment fields (described below) allow for arbitrarily long strings. ASCII encoding is assumed.

NRRD files come in two forms: with an attached header (in which the header and the data are in the same file), and with a detached header (header and data are in two separate files). The attached-header form is described first, and the minor variation that enables detached headers is described in Section 3. NRRD readers must be able to support attached and detached headers.

The general format of a NRRD file (with attached header) is:

NRRD0001
<field>: <desc>
<field>: <desc>
# <comment>
...
<field>: <desc>
<field>: <desc>
# <comment>

<data><data><data><data><data><data>...

The very first line contains nothing but the NRRD "magic". The magic is what identifies the file as a NRRD file. For NRRD files, the first four characters are always "NRRD". The idea is that the next four numbers give the version number of the file format, but this may or may not be developed in the future. The only current magic for NRRD files is "NRRD0001". For the time being, this is the magic to use when writing all NRRD files. NRRD readers should obviously recognize "NRRD0001", as well as a magic used in early implementations, "NRRD00.01". Readers encountering a magic starting with "NRRD" but ending with something besides "0001" or "00.01" need not make any effort to continue parsing the file.

Each of the "<field>: <desc>" lines specifies information about one of the fields in the nrrd. Each of these lines is called a "field specification", or more loosely, a "field". Each field specification is contained in one line.

Comment lines start with a pound, "#", with no proceeding whitespace. The comment string itself starts with the first character which is a not a pound or a space (" "). Comment lines with a zero-length comment string should be ignored. Comment lines may be interspersed with field specifications in any order. This allows field specifications to be commented out and commented upon easily. Since comments are effectively a catch-all for peripheral information which doesn't otherwise fit in nrrd, gracious NRRD readers should store all the comments seen in the header, but this is not a requirement. Comments are intended to be case sensitive.

The magic, field specifications, and comments comprise the NRRD header. After the header, there is a single blank line containing zero characters. This separates the header from the data, which follows. Unlike the header, the data segment is not structured in ASCII lines. The encoding of the data (raw, ASCII, compressed, or other) is specified by the encoding field in the header. The header, the blank line, and the data comprise the NRRD file. A single NRRD file can store the information and data for a single array. There is currently no facility for storing multiple arrays in a single NRRD file.

All of the field specifications have the same structure: a string "<field>" identifying the field (called the field identifier), then a colon followed by a single space ": ", and then the information describing the field "<desc>" (called the field descriptor). All field identifiers are case insensitive. The only field descriptors which are not case insensitive are the one which contain strings (the content and labels fields). Whitespace (which does not constitute the previous line's termination) is not allowed before a field identifier. Extra whitespace after the field descriptor and before the line termination should be ignored. The NRRD format is complicated by the fact that some fields are always necessary, while some fields are always optional, and some fields are necessary only some of the time. Whether or not a field is necessary depends on previous information in the header. This is why there is no standard template for all NRRD headers, or a context-free grammar for NRRD headers.

Some field specifications have field descriptors which give a piece of information about each and every axis in a nrrd; these are called "per-axis specifications". Identifying how many samples there are along each dimension of the array is an example of this. Per-axis specifications must have as many components as there are dimensions in the array, so that they can identify information for every axis. There is no partial or "implied" per-axis information. Other field specifications are general with respect to array dimension, such as the type of the array; these are called "non-per-axis specifications".

An important, and always necessary, non-per-axis field specification is the one giving the dimension of the nrrd. The format is:

dimension: <int>
<int> can be any integer greater than 0. NRRD readers may not have the ability to represent absolutely any dimension, but they must be able to handle nrrds with dimension 16 or less, which is what the current nrrd implementation can do.

Per-axis specifications can only appear after the dimension field specification. This simplifies the task of parsing per-axis specifications, since we know how many pieces of information need to be parsed (the same as the dimension). This also avoids any attempts at cleverness in the form of guessing the nrrd dimension based on the first per-axis specification. Non-per-axis specifications can appear both before and after the dimension field. Other than the one constraint about per-axis specification location, field specifications may appear in any order within the header (after the magic).

The number of samples along each axis is the only always-necessary per-axis specification. The format is:

sizes: <size[0]> <size[1]> ... <size[dim-1]>
<size[i]> is the number of samples along axis i, with axis ordering going from fastest to slowest. The number of integers must obviously equal the dimension specified in the dimension field. As with all the other per-axis field identifiers, sizes ends with "s" to emphasize the plurality of the field it specifies. The field identifiers do not change, however, for one-dimensional nrrds.

The issue of axis ordering is fundamental. In memory and on disk, there is a strict linear order of all the values in an array. Logically, however, each sample has one or more coordinates which identify its position, as many coordinates as there are dimensions in the array. The "fastest" axis is the one associated with the coordinate which increments fastest as the samples are traversed in memory order. Typical raster ordering of interleaved RGB data is logically a three-dimensional array. The fastest axis is the three-samples-long color axis, followed by the horizontal axis, with the vertical axis being the slowest. All the per-axis field specifications identify information for each axis, and the axis ordering is always (reading left to right) fastest to slowest. NRRD makes no attempt to "name" the axes, such as "X", "Y", and "Z": they are identified solely by their location in the ordering from fastest to slowest.

Besides dimension, there are two other always-necessary non-per-axis specifications: the type and the encoding specifications. Their format is:

type: <type>
encoding: <encoding>
The possible values for type include the C identifiers you would probably use to identify a type: "int" means a 32-bit signed integer, "float" means 32-bit floating point, and so on. Useful variants like "uchar" (same as "unsigned char") are allowed. There is also the block type, which is used to represent some chunk of opaque memory, of user-specified size; see the type specification in Section 4 for all the details.

The encoding tells how the data (following the blank line after the header) is written out; "ascii" and "raw" are common values, but "hex" allows extraction of images from some PostScript files, and compression is also supported; see the encoding specification (Section 4). NRRD readers must be able to support raw and ASCII encoding, everything else is optional. However, the only optional encodings which may be added to NRRD in the future will be ones for which there exist freely available command-line tools to convert the encoded data (in isolation) to raw data. If you have a NRRD file volume.nrrd, with an attached header, using a data encoding not supported by the available nrrd implementation, you can always use the unix/linux/cygwin command "tail +N volume.nrrd" (where N is two plus the number of lines in the header) to get at all the data, so as to pass it onto a stand-alone converter. Or, the Utah Nrrd Utilities command "unu data" is a much easier way of doing the exact same thing. Data in a separate file, detached from the NRRD header, is obviously trivial to pass to a converter.

The field specifications described so far provide the means of writing a minimal NRRD header:

NRRD0001
# my first nrrd
type: uchar
dimension: 3
sizes: 3 640 480
encoding: raw
This is identical in meaning to the PPM header:
P6
# my first nrrd
640 480
255
If "encoding: ascii" had appeared instead in the NRRD header, the PPM magic would be "P3" instead of "P6".

The field specifiers described so far, and illustrated above, are the only ones which are always necessary. However, other field specifications become necessary as a function of other fields: if the type was "float", and the encoding is something other than "ascii", then the endian of the data would have to be recorded. The details of which fields require which other fields are spelled out in Section 4 and Section 5.

One of the things that makes reading a NRRD header complicated is the idea that the absence of an optional piece of information is just as important to record as the information when it was explicitly given. Whatever data structure is created or updated as the result of reading a NRRD header must enable the writer to write the exact same NRRD header, up to field ordering. If you generate a nrrd at run-time, and you don't have specific values for an optional field, or, if values for an optional field were never specified by an input NRRD header, then specific values for this field must not be invented in an output NRRD header, and the field should not be saved at all. Keeping headers as concise as possible makes them easier to understand when read by humans, and eliminates the risk that misleading information is invented solely for the sake of conforming to a file format.

In order to implement this in NRRD, each of the optional fields must have a way of representing the idea of "don't know"-- a state distinct from knowing a specific default value, or knowing the value specified in the header. All of optional fields can be initialized to "don't know", and only after "known" values are specified in an input header does the field become worthy of being saved in an output NRRD header. For optional fields with string values (content, labels, and units), the empty string ("") is the obvious choice for "don't know". For centers, the value used to represent "???" means "don't know". In contrast, the optional fields with integer values (line skip and byte skip) actually have sensible a sensible "known" default value, namely zero.

But how does one represent "don't know" with optional floating point data? NRRD uses NaN, or Not-a-Number. NaN is a value that can be represented in the ubiquitous IEEE 754 floating point standard, as the result of doing undefined arithmetic operations, such as zero divided by zero. While it may seem overly cute or clever to use NaN as a flag for "don't know", this is in fact exactly in keeping with the purpose of NaN as described in the original 754 standard. Furthermore, as described in the documentation for the air library, it is possible to generate a NaN at compile time (so that it doesn't have to be produced as a result of doing an undefined arithmetic operation), and it is possible to quickly test if a given number is NaN. Even if operations involving NaN are not implemented in the floating point hardware, but in software emulation supplied by the operating system, they will never be the bottle neck in reading and writing a NRRD file. Because of NaN's important role as signifier of "don't know", the NRRD reader must be able to interpret the case-insensitive string "nan" as a NaN, even if this is not already the behavior of sscanf() on a given platform. Writing a small wrapper function around sscanf() is a very small price to pay for the representational convenience of NaN. Section 2 gives the details for how NRRD readers and writers should handle the IEEE 754 special values.

In Section 4 and Section 5, when a field specification is described as harmless, it means that the field specification probably shouldn't be in the header, because its information is either irrelevant or meaningless in that context. However, for the sake of implementation simplicity, its presence shouldn't count as an error. The information in harmless field specifications must be ignored by NRRD readers, but it is okay to complain if the field specification is malformed and unparsable.

In Section 4 and Section 5, numeric field specification descriptions include a "Type", which identifies the minimum precision with which the information must be represented by the NRRD reader. In this context, "int" means a 32-bit signed integer, and "double" means a 64-bit floating point number. Field specifications with alternate equivalent forms are listed together (for example, "block size" is the same as blocksize"). Equivalent field descriptors are listed together in the table enumerating the meanings of the various descriptors (for example, "uchar is the same as "unsigned char"). Quotes are used to delimit the field descriptors in the explanation of their meaning; quotes are not part of the descriptor itself (except for the labels specification, in which the descriptors (strings) are delimited by quotes).

No other fields are allowed in a "NRRD0001" magic NRRD file other than those defined in Section 4 and Section 5. NRRD readers are not expected to make any efforts to deal with non-conformant files (in contrast to PNM recommendations), but specific and intelligent human-readable error messages are of course encouraged.

2. ASCII encoding of floating point

Float point field descriptors in the NRRD header, and floating point data written with ASCII encoding, have some special rules regarding their interpretation. Unfortunately, these are not always consistent with how the sscanf() C library call works on all platforms. The rules described here are a minimalist way of making sure that the basic IEEE 754 special values can be reliably read and written in NRRD files, across a variety of platforms. The rules described here apply to both 32-bit and 64-bit floating point values, or rather, the strings which are used to represent these values:
  1. If the string contains the substring "nan" (case-insensitive), the value must be parsed as NaN, either signaling or quiet. There is no restriction on the specific bit pattern used in the fraction field of the value.
  2. Otherwise, if the string contains "-inf" (case-insensitive), the value must be parsed as negative infinity.
  3. Otherwise, if the string contains "inf" (case-insensitive), the value must be parsed as positive infinity.
  4. Otherwise, the string can be parsed with the standard sscanf() C library function, and if that fails, than the string is malformed.
The use of special floating point values NaN, +inf, and -inf, are justified by the extremely widespread adoption of the IEEE 754 standard for 32-bit and 64-bit floating point representation. If it weren't for IEEE 754, there would be no portable way of storing raw floating point data. NRRD is not portable to platforms not supporting this standard.

The difference between a quiet and signaling NaN is a detail of IEEE 754 which was left implementation-specific, so different platforms have different ways of distinguishing between quiet and signaling NaN, and some don't distinguishing between them at all. The intent was that quiet NaNs represent an indeterminate value, as in 0/0, or inf/inf, meaning simply that arithmetic doesn't define a single value for the result. On the other hand, signaling NaNs represent an invalid value, to signal that a non-existent or uninitialized floating point value was accessed, or that the input parameters to a function were so botched that no valid output can be generated; the signaling NaN is supposed to signal "someone goofed". Based on the fact that different portions of 754 can implemented in software, or hardware, or a combination of the two, there may be performance considerations between the two kinds of NaNs. But in any case, its basically all moot, since unfortunately, there is no cross-platform standard API for the floating point exception handlers which can interact with signaling NaNs.

Given this, in the NRRD file format (and in the nrrd) library, a NaN is a NaN is a NaN, with no difference between signaling and quiet, and no recognition of the integer value in the fraction field of the NaN. If the signaling/quiet distinction mattered, then when writing raw floating point data, not only would endianness have to be recorded, but also the convention for representing quiet NaN, and if the data came from a platform that knows the difference between the two NaNs. Readers would have to possibly traverse the whole array after input to detect and switch NaN representations. Doing this checking is not practical or efficient, and the consequences of not doing it are either moot or non-existent. Thankfully, there are unique and fully specified bit patterns for positive and negative infinity.

NRRD writers should verify that their printf() function behaves in accordance with these rules.

3. Detached headers

The ability to have detached headers is one of the most useful features of the NRRD format. Detached headers allow data stored in another file format to be accessed by nrrd functions, while leaving the original file intact. The line skip and byte skip fields are especially useful for these cases. Detached headers are also very useful in situations where very large amounts of data are to be read or written with direct IO, a very fast method of IO in which the device driver transfers data directly between blocks on disk and user-space memory (nrrd currently supports direct IO on SGIs, via the air library). Direct IO requires special alignment between the data segment beginning and block boundaries on disk, which makes using attached headers nearly impossible. Detached headers are also the simplest way to deal with NRRD readers which do not support the optional encodings, since a stand-alone program (such as "gzip/gunzip" or bzip2/bunzip2") will be able to process the separate data file.

There is one new field specification which is required in detached headers. The format is:

data file: <filename>
"datafile:" is also valid. "filename" identifies the file that contains the data. The addition of this field is the only difference between attached headers and detached headers. The magic at the beginning is the same, so there is currently no way to immediately detect if the header being parsed is attached or detached. The rest the field specifications are the same, and because the data file field specification is a non-per-axis specification, it may appear anywhere in the header. Detached headers may end with the the last field specification, or with a single blank line following the last field specification (in which case anything following the blank line is ignored).

Breaking the dataset into two files raises new concerns, namely that the header file can't know if the data file has been erased, renamed, or moved. NRRD provides no means to overcome these problems once they've been created. On the other hand, moving the header and data files together to a new place is a common operation, and is supported by the special semantics associated with the data filename descriptor:

By using "./", a reader invoked in a directory different than the detached header can know how to find the data file. Without it, the detached header would either have to know where the reader is invoked from, or the header would have to specify the full path to the datafile, at which point it becomes annoying to move the header and data together to a different directory.

4. Non-Per-Axis Field specifications


dimension: <int>
This gives the dimension of the array stored in the nrrd (1 for univariate histograms, 2 for grayscale images, 3 for scalar volumes and color images, 4 for time-varying scalar volumes, etc.) The dimension must be greater than 0, but can in principle be arbitrarily large. On the other hand, dynamic allocation of all the per-axis information is pretty annoying, so currently, NRRD readers are only required to handle dimensions of 16 or less.
type: <type>
This identifies the type of the data within the array.
<type> Meaning C type
"signed char", "int8", "int8_t" signed 1-byte integer signed char
"uchar", "unsigned char", "uint8", "uint8_t" unsigned 1-byte integer unsigned char
"short", "short int", "signed short", "signed short int", "int16", "int16_t" signed 2-byte integer short
"ushort", "unsigned short", "unsigned short int", "uint16", "uint16_t" unsigned 2-byte integer unsigned short
"int", "signed int", "int32", "int32_t" signed 4-byte integer int
"uint", "unsigned int", "uint32", "uint32_t" unsigned 4-byte integer unsigned int
"longlong", "long long", "long long int", "signed long long", "signed long long int", "int64", "int64_t" signed 8-byte integer long long int
"ulonglong", "unsigned long long", "unsigned long long int", "uint64", "uint64_t" unsigned 8-byte integer unsigned long long int
"float" 4-byte floating point float
"double" 8-byte floating point double
"block" An opaque chunk of memory with user-defined size (via the "block size:" specifier)

The type descriptors used are valid type declarations in C, C99, Matlab, Microsoft-land, or some other program. Notice that "char" is not a NRRD type descriptor, to avoid potential confusion associated with the inherent signed/unsigned ambiguity of the "char" C type. If the platform has different C type names for the types described (for example, if "int" is 8 bytes, or an "unsigned char" is 4), there will have to be a disconnect between the type implied by the type descriptor, and the actual types used. In other words, the NRRD format requires a binding between the first two columns in the chart above. The third column is just what the current nrrd implementation uses on most supported platforms; this has proven surprisingly portable.

As currently defined, NRRD is simply not portable to platforms on which all the types described above (second column) are not available via some C type declaration or another. For example, in Windows, there is no "long long", so "__int64" is used instead. We will eventually have many computers in which the minimum addressable unit is larger than 8 bits, in which case NRRD will either have to be expanded to allow types with unaddressable values (in which case a bit type might as well be added), or, some rules will have to be defined for converting a smaller type into an addressable type during data read. Having addressable samples vastly simplifies the task of implementing array operations.

The block type is unlike the others. It is included for completeness in representation of the types available in the nrrd library, which uses this type to represent C structs or C++ objects: opaque chunks of memory that can be copied and permuted, but not interpreted as (or generated from) scalar values. The size of that chunk is given in the block size field specification. But block is not safe as a cross-platform general purpose type. Here are the special considerations:

  1. There is no way to fix the endianness of the block type, so it is not at all portable between machines with differing byte orders.
  2. In addition, if the block is representing a C struct or a C++ object, there are no guarantees that another machine would use the same amount of space (block size) to represent the same struct or object.
  3. ascii encoding is not possible with block type, but raw, and any alternate representation of raw (hex, and the optional compressions gzip and bzip2) are all valid.
One may be tempted to pack descriptive information about the block into the content field, and this is certainly possible, but if it is really important to represent general structures in a portable way, you shouldn't be using NRRD files. Use XDR (http://www.faqs.org/rfcs/rfc1014.html) instead.
block size: <int>
blocksize: <int>
Blocks are opaque chunks of memory of user-specified size. The block size specification gives the size of that chunk, it must be greater than zero. Specifically, it identifies the number of bytes between the beginning of one block and the beginning of the next. Perhaps confusingly, it says nothing about the block size of the file system in use. Knowing the block size allows the NRRD reader to know how many bytes of raw data should be read from file. The block size is often the return from sizeof() operator applied to a C struct or a C++ object, but as usual, this may not be the same as the sum of the sizes of the constituent members. See the points regarding the block type (above) for the restrictions and warnings on encoding of blocks.
encoding: <encoding>
The encoding field descriptor describes how the data (following the blank line, following the header) is formatted. Possible values for the encoding field descriptor, with associated meaning are:
<encoding> Meaning Standard detached suffix
"raw" The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread(). ".raw"
"txt", "text", "ascii" Integral values are written/read as with printf()/sscanf(), and floating point values are used in a way consistent with Section 2. The individual values are separated by one or more whitespace characters (from the C string " \t\n\r\v\f"). No line terminations are required anywhere. Their presence is no different than any other kind of whitespace. ".txt"
"hex" The data is raw, but written with two (case-insensitive) hexadecimal characters per byte. White space characters (as defined above) are ignored on reading. Writers should put a line termination after every 70 characters, and after the last line of numbers. ".hex"
"gz", "gzip" The data is raw, but compressed with the gzip program. Implementation and specification is available from http://www.gzip.org/, but the nrrd library actually uses the zlib library available from http://www.gzip.org/zlib/. However, the compressed data must start with the gzip binary header, the same as is produced/read by the gzip/gunzip command-line tools. Compressed data starting with only the zlib binary header (from the underlying library) is not allowed. ".raw.gz"
"bz2", "bzip2" The data is raw, but compressed with the bzip2 program. Analogous to the gzip encoding, the compressed data must start with the same binary header as produced by the command-line bzip2 program, to ensure inter-operability with it. Implementation and information is available from http://sources.redhat.com/bzip2/. ".raw.bz2"

The formatting for hex is mostly the same as the ASCIIHexDecode and the ASCIIHexEncode filters of PostScript, but they are not identical: PostScript allows multiple filters (data can be run-length encoded as well as hex-encoded), null ('\0') characters count as whitespace, and the end of the data is explicitly indicated by a ">". However, in combination with the line skip specifier, it is usually possible to extract 8-bit image data from PostScript files, assuming you understand enough PostScript to determine the image dimensions.

The "standard detached suffix" is the filename suffix that should be used by NRRD writers producing a separate data file in conjunction with a detached header. This is most important for the compressed encodings; as the stand-alone programs expect certain suffixes, and the filename of their output should end in ".raw". gzip and bzip2 are currently the only compression methods supported. NRRD readers, however, should not care about the filename suffix of a detached data file.

Data file contents remaining after all data has been read should be ignored. This sanctions the strategy of using a detached nrrd header to refer to some smaller chunk of data in a separate larger data file. Data before the region of interest can be passed over with line skip and/or byte skip.

See the byte skip specification for information about how compression encoding changes its meaning.

There is complete orthogonality between the encoding of the data, and whether the header is attached or detached. The header is never compressed- it is necessarily straight ASCII text.


endian: <endian>
<endian> Meaning Who
"little" Most significant bytes are at higher addresses ("little end first") Intel and compatible
"big" Most significant bytes are at lower addresses ("big end first") Everyone else

The convention with NRRD files is that non-ascii data should reflect the byte ordering of the current platform. There is no preference for one endian or the other in NRRD files, and NRRD writers should never have to worry about fixing endianness, only recording it when necessary. Fixing endianness is the responsibility of the NRRD reader. This way, NRRD readers and writers used within one platform never pay the overhead of fixing endianness. That overhead should only be incurred when going between platforms with different endiannesses.


content: <string>
The <string> field starts after the ": " colon/space pair separating the field identifier and descriptor, and continues until the line termination, with no explicit delimiting. There is no fixed limit on how long the line containing the content field can be.

This field is intended as the place to store a very concise textual description of the information in the array, similar to the what what what line of a VTK file format header. The nrrd library, for instance, uses content to store a textual representation of a summary of the operations applied to a nrrd. If nrrdSlice() slices a nrrd with content "engine" along axis 0 at position 50, then the content of the result will be "slice(engine,0,50)".


min: <min>
This can be used to record the minimum value in the array, to save the effort of finding the extremal values after reading the data in. Any value, infinite or not, NaN or not, is valid. "nan" is the best way to say, "don't know", but in that case, the field shouldn't be written in the header anyway. Of course, the NRRD header has no way of insuring that the information here is correct.
max: <max>
Same the min field, but represents the maximum value in the array.
old min: <min>
oldmin: <min>
For integral data values which were produced as a result of linear quantization, this records the lowest input value that was mapped to the lowest output integral value. If a floating point nrrd with values ranging from 0.0 to 1.0 is quantized to 8 bits, old min will be 0.0. This is not the middle of the range of values that were all mapped to the lowest output integer, but the lowest of those values.

Infinite values are not valid, "nan" means "don't know".


old max: <max>
oldmax: <max>
Same as old min, but represents the highest of the input values that were mapped to the highest output integral value.
data file: <file>
datafile: <file>
This is always optional, but it the only means of distinguishing from an attached or a detached NRRD header. When it is present, it is interpreted according to Section 3, and the header is considered finished at the EOF, or at the blank line following the last field, and any data after the blank line (if present) is ignored. If this field is not present, the data is assumed to be in the same file as the header, following the blank line marking the end of the header.
line skip: <skip>
lineskip: <skip>
This is most useful in a detached header. It tells the NRRD reader to skip some number of lines (greater than or equal to zero) in the data file in order to get to the where the data actually begins. This enables detached NRRD headers to access data in VTK files, for instance. If this is used in an attached NRRD header, then the lines are skipped after the blank line at the end of the NRRD header. The definition of a "line" in this context is the same as given at the beginning of Section 1. When this field does not appear, skip is taken to be zero. Negative values are not valid.

When used in combination with byte skip, the line skipping is done before the byte skipping. The meaning of line skip is not affected by the encoding field.


byte skip: <skip>
byteskip: <skip>
Like line skip, this is most useful in a detached header. It tells how many bytes (greater than or equal to zero) to skip in a data file in order to get to the beginning of the data. By definition, the bytes are skipped according to the action of fgetc(). When used in combination with line skip, the byte skipping is done after the line skipping. When this field does not appear, skip is taken to be zero. Negative values are not valid.

The interpretation of byte skip changes according to whether or not the encoding used is a form of compression or not. The only compressions supported in NRRD0001 are gzip and bzip2. In uncompressed encodings, the byte skipping is done just like the line skipping: within the data file, so as to locate the beginning of the data, and prior to the decoding of any data. In compressed encodings however, the line skipping is done first, and then the decompression begins. The byte skipping is done within the stream of decompressed data.

The reason for skipping bytes but not lines in the decompressed stream is basically motivated by the conceptual difference between ASCII and binary headers. One reason to write headers in ASCII is to make them human readable, so they probably shouldn't be compressed to begin with. Also, ASCII headers (such as in PNM images) often allow multiple lines of optional comments, so the number of lines to skip has to be determined on a per-file basis by looking at the (uncompressed) file, at which point the data might as well be written out as a NRRD file. In contrast, binary headers (such as in many non-DICOM images saved from medical scanners), are very often fixed length, and not human readable, which means that when the header and data are compressed together, the beginning of the data can be easily found via a byte skip offset. This also applies to large datasets written by FORTRAN programs, for which even "raw" data can be proceeded by a four-byte representation of the data length.


number: <string>
In the early days of NRRD (when the magic was NRRD00.01), the number of elements in the whole array had to be explicitly given in the header, even though this is entirely redundant with the information implied by the sizes field. The number of bytes in the (perhaps uncompressed) data can always be determined by the product of the sizes, multiplied by the byte length of one element, which is determined from the type, and possibly blocksize fields. In keeping with the principle of making NRRD headers as concise as possible, the number field should never be written, and always ignored on reading, without even an attempt to parse the field as an integer.

5. Per-Axis Specifications

The individual field descriptors in these specifications are delimited by one or more spaces (" ") or tabs ("\t"), or some combination of the two, but no other kinds of white space delimiters are valid.
sizes: <size[0]> <size[1]> ... <size[dim-1]>
All the <size[i]> are integers greater than 0. <size[i]> is the number of samples along axis i in the array.
spacings: <space[0]> <space[1]> ... <space[dim-1]>
This field describes how sample spacing along each axis can vary among the axes, common in medical datasets where the slice spacing is different than the within-slice pixel spacing. Spacing values of positive and negative infinity are not allowed, nor is zero. Positive and negative finite values are allowed, as is NaN.

Because there must be one spacing for each axis, spacings must be given for axes which don't logically have a spatial component, such as the RGB axis of color image data, which is usually axis 0. Rather than invent a value (such as 1.0) for sample spacing where no value is sensible, a spacing value of "nan" should be used instead. In addition, "nan" can represent the fact that spacing information would be sensible here, but simply isn't known. Of course, if spacings are NaN for every axis, the field probably shouldn't be in the header.

The meaning and interpretation of the spacings field is independent of the centers, axis min and axis maxs fields, even though mutually incompatible settings are possible.


axis mins: <min[0]> <min[1]> ... <min[dim-1]>
axismins: <min[0]> <min[1]> ... <min[dim-1]>
In those cases where there the samples along an axis are logically located along a certain range in some assumed world space, then the axis mins information gives the lower bound of that range. These cases are probably a superset of those cases where spacings information is meaningful. In a computer graphics context, this allows representation of the lower bounds in the (U,V) space of the image plane that was sampled during the rendering process. Also, in order to be meaningful, univariate histograms and multi-dimensional scatterplots require the use the axis mins field.

Infinite values are not valid as axis mins. Any non-infinite values, including zero, are valid. As with spacings information, the use of "nan" as an axis min value is probably preferable to inventing one where no value is meaningful or known.

Presence of the axis mins field does not require presence of the axis maxs field, although it is often useful for these to appear together. However, using the axis mins field alone can emulate the ORIGIN field of the VTK file format header.


axis maxs: <max[0]> <max[1]> ... <max[dim-1]>
axismaxs: <max[0]> <max[1]> ... <max[dim-1]>
This field is useful in the same contexts as axis mins; it specifies the upper bound of the axes in some assumed world space. Which values are valid for this field are identical to those of axis mins, and the utility of "nan" is also the same.

The settings of axis mins and axis maxs would seem to imply a value for spacings, but this also depends on the values of centers. Mutually incompatible settings of these fields are possible to save in a NRRD header, but is not the job of the NRRD reader to ensure their consistency, only to check that the individual values in isolation are sensible (for instance, an axis max can't be infinite).


centers: <cent[0]> <cent[1]> ... <cent[dim-1]>
This field indicates if the information along each axis is cell or node centered, or if neither is known.
<cent[i]> Meaning Examples
"cell"       The location of the sample is centered in the interior of the grid element. Histograms, scatterplots, images for mip-maps, images in contexts in which a pixel can be correctly thought of as "a little square", volumes as a grid of cuberilles.
"node" The location of the sample is at the boundary between grid elements. Volumes as a grid of "voxels", and pretty much any multi-dimensional signal processing context in which something other than nearest neighbor filtering is being applied.
"???" Centering information for this axis is either meaningless or unknown Any non-spatial axis, such as a short axis for vector or tensor components, preceding all the spatial axes.
As one example of the distinction between cell and node centering, supposed that some axis has axis min 0.0, axis max 1.0, with five samples. In node-centered sampling, the samples would be "located" at positions 0.00, 0.25, 0.50, 0.75, and 1.00, for a spacing of 0.25. In cell-centered sampling, the samples would be "located" at positions 0.10, 0.30, 0.50, 0.70, and 0.90, for a spacing of 0.20.


labels: "<label[0]>" "<label[1]>" ... "<label[dim-1]>"
This allows the axes to be "named", which is helpful and descriptive in functions where the axes of an output nrrd are a subset of the axes of in input nrrd, such as with slicing or projecting. In scatterplots and lookup tables, the axis labels can name the quantity associated with each axis. The label strings aren't otherwise parsed or interpreted by functions that operate on nrrd arrays, other than to simply associate a label with an axis through all operations in which an axis is logically preserved. For example, slicing a nrrd with labels "X", "Y", and "Z" along axis 2 should result in a nrrd with labels "X" and "Y".

As shown above, each label is delimited by double quotes. Within each label, double quotes may be included by escaping them (\"), but no other form of escaping is supported. For axes with no labels, use a quoted empty string ("").

There is no fixed limit on how long the line containing the labels field can be.


units: "<unit[0]>" "<unit[1]>" ... "<unit[dim-1]>"
For all practical purposes, this is just like the labels field, in that the field gives a quote-delimited string for each axis. As with labels, these strings are not otherwise parsed or interpreted by functions, but should remain associated with an axis whenever sensible. The intended role of this field is to allow saving the units of the world space associated with the spacings, axis mins, and axis maxs fields. For scalar arrays in which every axis is spatial, and logically lives in the same space, the units information will unfortunately be repeated for all axes. Having separate units for all axes is more compelling in the case of multi-dimensional scatterplots and spectral image data.

6. Future Extensions

I have some ideas on how the NRRD file format may be extended in the future, but these are not likely to happen within the next year. If they are developed, it will be with the introduction of a slightly different magic, to be distinct from NRRD0001.