netCDF  4.3.0
 All Data Structures Files Functions Variables Typedefs Macros Groups Pages
Limitations of NetCDF

The netCDF classic data model is widely applicable to data that can be organized into a collection of named array variables with named attributes, but there are some important limitations to the model and its implementation in software.

Some of these limitations have been removed or relaxed in netCDF-4 files, but still apply to netCDF classic and netCDF 64-bit offset files.

Currently, netCDF classic and 64-bit offset formats offer a limited number of external numeric data types: 8-, 16-, 32-bit integers, or 32- or 64-bit floating-point numbers. (The netCDF-4 format adds 64-bit integer types and unsigned integer types.)

With the netCDF-4/HDF5 format, new unsigned integers (of various sizes), 64-bit integers, and the string type allow improved expression of meaning in scientific data. The new VLEN (variable length) and COMPOUND types allow users to organize data in new ways.

With the classic netCDF file format, there are constraints that limit how a dataset is structured to store more than 2 GiBytes (a GiByte is 2^30 or 1,073,741,824 bytes, as compared to a Gbyte, which is 1,000,000,000 bytes.) of data in a single netCDF dataset. (see Classic Limitations). This limitation is a result of 32-bit offsets used for storing relative offsets within a classic netCDF format file. Since one of the goals of netCDF is portable data, and some file systems still can't deal with files larger than 2 GiB, it is best to keep files that must be portable below this limit. Nevertheless, it is possible to create and access netCDF files larger than 2 GiB on platforms that provide support for such files (see Large File Support).

The new 64-bit offset format allows large files, and makes it easy to create to create fixed variables of about 4 GiB, and record variables of about 4 GiB per record. (see 64 bit Offset Limitations). However, old netCDF applications will not be able to read the 64-bit offset files until they are upgraded to at least version 3.6.0 of netCDF (i.e. the version in which 64-bit offset format was introduced).

With the netCDF-4/HDF5 format, size limitations are further relaxed, and files can be as large as the underlying file system supports. NetCDF-4/HDF5 files are unreadable to the netCDF library before version 4.0.

Another limitation of the classic (and 64-bit offset) model is that only one unlimited (changeable) dimension is permitted for each netCDF data set. Multiple variables can share an unlimited dimension, but then they must all grow together. Hence the classic netCDF model does not permit variables with several unlimited dimensions or the use of multiple unlimited dimensions in different variables within the same dataset. Variables that have non-rectangular shapes (for example, ragged arrays) cannot be represented conveniently.

In netCDF-4/HDF5 files, multiple unlimited dimensions are fully supported. Any variable can be defined with any combination of limited and unlimited dimensions.

The extent to which data can be completely self-describing is limited: there is always some assumed context without which sharing and archiving data would be impractical. NetCDF permits storing meaningful names for variables, dimensions, and attributes; units of measure in a form that can be used in computations; text strings for attribute values that apply to an entire data set; and simple kinds of coordinate system information. But for more complex kinds of metadata (for example, the information necessary to provide accurate georeferencing of data on unusual grids or from satellite images), it is often necessary to develop conventions.

Specific additions to the netCDF data model might make some of these conventions unnecessary or allow some forms of metadata to be represented in a uniform and compact way. For example, adding explicit georeferencing to the netCDF data model would simplify elaborate georeferencing conventions at the cost of complicating the model. The problem is finding an appropriate trade-off between the richness of the model and its generality (i.e., its ability to encompass many kinds of data). A data model tailored to capture the shared context among researchers within one discipline may not be appropriate for sharing or combining data from multiple disciplines.

The classic netCDF data model (which is used for classic-format and 64-bit offset format data) does not support nested data structures such as trees, nested arrays, or other recursive structures. Through use of indirection and conventions it is possible to represent some kinds of nested structures, but the result may fall short of the netCDF goal of self-describing data.

In netCDF-4/HDF5 format files, the introduction of the compound type allows the creation of complex data types, involving any combination of types. The VLEN type allows efficient storage of ragged arrays, and the introduction of hierarchical groups allows users new ways to organize data.

Finally, using the netCDF-3 programming interfaces, concurrent access to a netCDF dataset is limited. One writer and multiple readers may access data in a single dataset simultaneously, but there is no support for multiple concurrent writers.

NetCDF-4 supports parallel read/write access to netCDF-4/HDF5 files, using the underlying HDF5 library and parallel read/write access to classic and 64-bit offset files using the parallel-netcdf library.

For more information about HDF5, see the HDF5 web site: http://hdfgroup.org/HDF5/.

For more information about parallel-netcdf, see their web site: http://www.mcs.anl.gov/parallel-netcdf.


Generated on Tue Jul 9 2013 19:17:28 for netCDF. NetCDF is a Unidata library.