1 Appendix D. NetCDF-4 Filter Support {#filters}
2 ==================================
6 > See @ref nc_filters_quickstart for tips to get started quickly with NetCDF-4 Filter Support.
8 ## Filters Overview {#filters_overview}
10 NetCDF-C filters have some features of which the user
13 * ***Auto Install of filters***<br>
14 An option is now provided to automatically install
15 HDF5 filters into a default location, or optionally
16 into a user-specified location.
17 This is described in the *pluginpath.md* document.
19 * ***NCZarr Filter Support***<br>
20 [NCZarr filters](#filters_nczarr) are now supported.
21 This essentially means that it is possible to specify
22 Zarr Codecs (Zarr equivalent of filters) in Zarr files
23 and have them processed using HDF5-style wrapper shared libraries.
24 Zarr filters can be used even if HDF5 support is disabled
25 in the netCDF-C library.
27 ## Introduction to Filters {#filters_introduction}
29 The netCDF library supports a general filter mechanism to apply
30 various kinds of filters to datasets before reading or writing.
31 The most common kind of filter is a compression-decompression
32 filter, and that is the focus of this document.
33 But non-compression filters – fletcher32, for example – also exist.
35 This document describes the support for HDF5 filters and also
36 the newly added support for NCZarr filters.
38 The netCDF enhanced (aka netCDF-4) library inherits this
39 capability since it depends on the HDF5 library. The HDF5
40 library (1.8.11 and later) supports filters, and netCDF is based
41 closely on that underlying HDF5 mechanism.
43 Filters assume that a variable has chunking defined and each
44 chunk is filtered before writing and "unfiltered" after reading
45 and before passing the data to the user. In the event that
46 multiple filters are defined on a variable, they are applied in
47 first-defined order on writing and on the reverse order when
50 There is an important "caveat" with respect to filters
51 and their application to variables.
52 If the type of the variable is variable-sized, then attempts
53 to define a filter on such a variable will not be allowed.
54 In this case, the call to *nc\_def\_var\_filter* will succeed
55 but the filter will be suppressed and a warning will be logged.
56 Similarly, if an existing file is opened, and there is a
57 variable-sized variable with a filter, then that variable will be
58 suppressed and will be inaccessible through the netcdf-c API.
60 The concept of a variable-sized type is defined as follows:
61 1. The type NC_STRING is variable-sized.
62 2. Any user defined type of the class NC_VLEN is variable sized.
63 3. If a compound type has any field that is (transitively) variable-sized,
64 then that compound type is variable-sized.
65 4. All other types are fixed-size.
67 ## A Warning on Backward Compatibility {#filters_compatibility}
69 The API defined in this document should accurately reflect the
70 current state of filters in the netCDF-c library. Be aware that
71 there was a short period in which the filter code was undergoing
72 some revision and extension. Those extensions have largely been
73 reverted. Unfortunately, some users may experience some
74 compilation problems for previously working code because of
75 these reversions. In that case, please revise your code to
76 adhere to this document. Apologies are extended for any
79 A user may encounter an incompatibility if any of the following appears in user code.
81 * The function *\_nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
82 It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
83 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
84 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
85 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
87 For additional information, see [Appendix B](#filters_appendixb).
89 ## Enabling A HDF5 Compression Filter {#filters_enable}
91 HDF5 supports dynamic loading of compression filters using the
92 following process for reading of compressed data.
94 1. Assume that we have a dataset with one or more variables that were compressed using some algorithm.
95 How the dataset was compressed will be discussed subsequently.
96 2. Shared libraries or DLLs exist that implement the compress/decompress algorithm.
97 These libraries have a specific API so that the HDF5 library can locate, load, and utilize the compressor.
98 3. These libraries are expected to installed in a specific directory.
100 In order to compress a variable with an HDF5 compliant filter,
101 the netcdf-c library must be given three pieces of information:
103 1. some unique identifier for the filter to be used,
104 2. a vector of parameters for controlling the action of the compression filter, and
105 3. access to a shared library implementation of the filter.
107 The meaning of the parameters is, of course, completely filter
108 dependent and the filter description [3] needs to be consulted.
109 For bzip2, for example, a single parameter is provided
110 representing the compression level. It is legal to provide a
111 zero-length set of parameters. Defaults are not provided, so
112 this assumes that the filter can operate with zero parameters.
114 Filter ids are assigned by the HDF group. See [4] for a current
115 list of assigned filter ids. Note that ids above 32767 can be
116 used for testing without registration.
118 The first two pieces of information can be provided in one of
119 three ways: (1) using *ncgen*, (2) via an API call, or (3) via
120 command line parameters to *nccopy*. In any case, remember that
121 filtering also requires setting chunking, so the variable must
122 also be marked with chunking information. If compression is set
123 for a non-chunked variable, the variable will forcibly be
124 converted to chunked using a default chunking algorithm.
126 ## Using The API {#filters_API}
127 The necessary API methods are included in *netcdf\_filter.h* by default.
128 These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
130 ### nc\_def\_var\_filter
131 Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before *nc\_enddef* is invoked.
133 int nc_def_var_filter(int ncid, int varid, unsigned int id,
134 size_t nparams, const unsigned int* params);
138 * ncid — File and group ID.
139 * varid — Variable ID.
140 * id — Filter ID.
141 * nparams — Number of filter parameters.
142 * params — Filter parameters (a vector of unsigned integers)
146 * NC\_NOERR — No error.
147 * NC\_ENOTNC4 — Not a netCDF-4 file.
148 * NC\_EBADID — Bad ncid or bad filter id
149 * NC\_ENOTVAR — Invalid variable ID.
150 * NC\_EINDEFINE — called when not in define mode
151 * NC\_ELATEDEF — called after variable was created
152 * NC\_EINVAL — Scalar variable, or parallel enabled and parallel filters not supported or nparams or params invalid.
154 ### nc\_inq\_var\_filter\_ids
155 Query a variable to obtain a list of the ids of all filters associated with that variable.
157 int nc_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int* filterids);
161 * ncid — File and group ID.
162 * varid — Variable ID.
163 * nfiltersp — Stores number of filters found; may be zero.
164 * filterids — Stores set of filter ids.
168 * NC\_NOERR — No error.
169 * NC\_ENOTNC4 — Not a netCDF-4 file.
170 * NC\_EBADID — Bad ncid
171 * NC\_ENOTVAR — Invalid variable ID.
173 The number of filters associated with the variable is stored in *nfiltersp* (it may be zero).
174 The set of filter ids will be returned in *filterids*.
175 As is usual with the netcdf API, one is expected to call this function twice.
176 The first time to set *nfiltersp* and the second to get the filter ids in client-allocated memory.
177 Any of these arguments can be NULL, in which case no value is returned.
179 ### nc\_inq\_var\_filter\_info
180 Query a variable to obtain information about a specific filter associated with the variable.
182 int nc_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparamsp, unsigned int* params);
186 * ncid — File and group ID.
187 * varid — Variable ID.
188 * id — The filter id of interest.
189 * nparamsp — Stores number of parameters.
190 * params — Stores set of filter parameters.
194 * NC\_NOERR — No error.
195 * NC\_ENOTNC4 — Not a netCDF-4 file.
196 * NC\_EBADID — Bad ncid
197 * NC\_ENOTVAR — Invalid variable ID.
198 * NC\_ENOFILTER — Filter not defined for the variable.
200 The *id* indicates the filter of interest.
201 The actual parameters are stored in *params*.
202 The number of parameters is returned in *nparamsp*.
203 As is usual with the netcdf API, one is expected to call this function twice.
204 The first time to set *nparamsp* and the second to get the parameters in client-allocated memory.
205 Any of these arguments can be NULL, in which case no value is returned.
206 If the specified id is not attached to the variable, then NC\_ENOFILTER is returned.
208 ### nc\_inq\_var\_filter
209 Query a variable to obtain information about the first filter associated with the variable.
210 When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable.
211 Internally, it is implemented using the functions *nc\_inq\_var\_filter\_ids* and *nc\_inq\_filter\_info*.
214 int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparamsp, unsigned int* params);
219 * ncid — File and group ID.
220 * varid — Variable ID.
221 * idp — Stores the id of the first found filter, set to zero if variable has no filters.
222 * nparamsp — Stores number of parameters.
223 * params — Stores set of filter parameters.
227 * NC\_NOERR — No error.
228 * NC\_ENOTNC4 — Not a netCDF-4 file.
229 * NC\_EBADID — Bad ncid
230 * NC\_ENOTVAR — Invalid variable ID.
232 The filter id will be returned in the *idp* argument.
233 If there are no filters, then zero is stored in this argument.
234 Otherwise, the number of parameters is stored in *nparamsp* and the actual parameters in *params*.
235 As is usual with the netcdf API, one is expected to call this function twice.
236 The first time to get *nparamsp* and the second to get the parameters in client-allocated memory.
237 Any of these arguments can be NULL, in which case no value is returned.
239 ## Using ncgen {#filters_NCGEN}
241 In a CDL file, compression of a variable can be specified by annotating it with the following attribute:
243 * *\_Filter* — a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter.
244 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
246 This is a "special" attribute, which means that it will normally be invisible when using *ncdump* unless the -s flag is specified.
248 For backward compatibility it is probably better to use the *\_Deflate* attribute instead of *\_Filter*. But using *\_Filter* to specify deflation will work.
250 Multiple filters can be specified for a given variable by using the "|" separator.
251 Alternatively, this attribute may be repeated to specify multiple filters.
253 Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
255 ### Example CDL File (Data elided)
260 dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
262 float var(dim0, dim1, dim2, dim3) ;
263 var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
264 var:_Storage = "chunked" ;
265 var:_ChunkSizes = 4, 4, 4, 4 ;
271 Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
273 ## Using nccopy {#filters_NCCOPY}
275 When copying a netcdf file using *nccopy* it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
277 nccopy -F "var,307,9" unfiltered.nc filtered.nc
279 Assume that *unfiltered.nc* has a chunked but not bzip2 compressed variable named "var".
280 This command will copy that variable to the *filtered.nc* output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level.
281 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
283 The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
285 It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional *-F* cases are defined.
287 1. *-F \*,...* means apply the filter to all variables in the dataset.
288 2. *-F v1&v2&..,...* means apply the filter to multiple variables.
290 Multiple filters can be specified using the pipeline notions '|'.
293 1. *-F v1&v2,307,9|4,32,32* means apply filter 307 (bzip2) then filter 4 (szip) to the multiple variables.
295 Note that the characters '\*', '\&', and '\|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
297 As a rule, any input filter on an input variable will be applied to the equivalent output variable — assuming the output file type is netcdf-4.
298 It is, however, sometimes convenient to suppress output compression either totally or on a per-variable basis.
299 Total suppression of output filters can be accomplished by specifying a special case of "-F", namely this.
301 nccopy -F none input.nc output.nc
303 The expression *-F \*,none* is equivalent to *-F none*.
305 Suppression of output filtering for a specific set of variables can be accomplished using these formats.
307 nccopy -F "var,none" input.nc output.nc
308 nccopy -F "v1&v2&...,none" input.nc output.nc
310 where "var" and the "vi" are the fully qualified name of a variable.
312 The rules for all possible cases of the "-F none" flag are defined by this table.
314 <tr><th>-F none<th>-Fvar,...<th>Input Filter<th>Applied Output Filter
315 <tr><td>true<td>undefined<td>NA<td>unfiltered
316 <tr><td>true<td>none<td>NA<td>unfiltered
317 <tr><td>true<td>defined<td>NA<td>use output filter(s)
318 <tr><td>false<td>undefined<td>defined<td>use input filter(s)
319 <tr><td>false<td>none<td>NA<td>unfiltered
320 <tr><td>false<td>defined<td>undefined<td>use output filter(s)
321 <tr><td>false<td>undefined<td>undefined<td>unfiltered
322 <tr><td>false<td>defined<td>defined<td>use output filter(s)
325 ## Filter Specification Syntax {#filters_syntax}
327 The utilities <a href="#NCGEN">ncgen</a> and <a href="#NCCOPY">nccopy</a>, and also the output of *ncdump*, support the specification of filter ids, formats, and parameters in text format.
328 The BNF specification is defined in [Appendix C](#filters_appendixc).
329 Basically, These specifications consist of a filter id, a comma, and then a sequence of
330 comma separated constants representing the parameters.
331 The constants are converted within the utility to a proper set of unsigned int constants (see the <a href="#ParamEncode">parameter encoding section</a>).
333 To simplify things, various kinds of constants can be specified rather than just simple unsigned integers.
334 The *ncgen* and *nccopy* programs will encode them properly using the rules specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
335 Since the original types are lost after encoding, *ncdump* will always show a simple list of unsigned integer constants.
337 The currently supported constants are as follows.
339 <tr halign="center"><th>Example<th>Type<th>Format Tag<th>Notes
340 <tr><td>-17b<td>signed 8-bit byte<td>b|B<td>Truncated to 8 bits and sign extended to 32 bits
341 <tr><td>23ub<td>unsigned 8-bit byte<td>u|U b|B<td>Truncated to 8 bits and zero extended to 32 bits
342 <tr><td>-25S<td>signed 16-bit short<td>s|S<td>Truncated to 16 bits and sign extended to 32 bits
343 <tr><td>27US<td>unsigned 16-bit short<td>u|U s|S<td>Truncated to 16 bits and zero extended to 32 bits
344 <tr><td>-77<td>implicit signed 32-bit integer<td>Leading minus sign and no tag<td>
345 <tr><td>77<td>implicit unsigned 32-bit integer<td>No tag<td>
346 <tr><td>93U<td>explicit unsigned 32-bit integer<td>u|U<td>
347 <tr><td>789f<td>32-bit float<td>f|F<td>
348 <tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>LE encoding
349 <tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
350 <tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
354 1. In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
355 2. For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
356 3. For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
357 4. For double, and signed|unsigned long long, they are converted as specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
358 5. In order to support mutiple filters, the argument to *\_Filter* may be a pipeline separated (using '|') to specify a list of filters specs.
360 ## Dynamic Loading Process {#filters_Process}
362 Each filter is assumed to be compiled into a separate dynamically loaded library.
363 For HDF5 conformant filters, these filter libraries are assumed to be in some specific location.
364 The details for writing such a filter are defined in the HDF5 documentation[1,2].
366 ### Plugin directory {#filters_plugindir}
368 The HDF5 loader searches for plugins in a number of directories.
369 The netcdf-c process for installing and locating plugins is described
370 in detail in the *pluginpath.md* document.
372 ### Plugin Library Naming {#filters_Pluginlib}
374 Given a plugin directory, HDF5 examines every file in that directory
375 that conforms to a specified name pattern as determined by the
376 platform on which the library is being executed.
379 <tr halign="center"><th>Platform<th>Basename<th>Extension
380 <tr halign="left"><td>Linux<td>lib*<td>.so*
381 <tr halign="left"><td>OSX<td>lib*<td>.dylib*
382 <tr halign="left"><td>Cygwin<td>cyg*<td>.dll*
383 <tr halign="left"><td>Windows<td>*<td>.dll
386 ### Plugin Verification {#filters_Pluginverify}
388 For each dynamic library located using the previous patterns,
389 HDF5 attempts to load the library and attempts to obtain
390 information from it. Specifically, It looks for two functions
391 with the following signatures.
393 1. *H5PL\_type\_t H5PLget\_plugin\_type(void)* — This function is expected to return the constant value *H5PL\_TYPE\_FILTER* to indicate that this is a filter library.
394 2. *const void* H5PLget\_plugin\_info(void)* — This function returns a pointer to a table of type *H5Z\_class2\_t*.
395 This table contains the necessary information needed to utilize the filter both for reading and for writing.
396 In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in *nc\_def\_var\_filter* in order to be used.
398 If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
400 ## NCZarr Filter Support {#filters_nczarr}
402 The inclusion of Zarr support in the netcdf-c library creates the need to provide a new representation consistent with the way that Zarr files store filter information.
403 For Zarr, filters are represented using the JSON notation.
404 Each filter is defined by a JSON dictionary, and each such filter dictionary
405 is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
407 The parameters of the filter are defined by additional — algorithm specific — keys in the filter dictionary.
408 One commonly used filter is "blosc", which has a JSON dictionary of this form.
417 So it has three parameters:
419 1. "cname" — the sub-algorithm used by the blosc compressor, LZ4 in this case.
420 2. "clevel" — the compression level, 5 in this case.
421 3. "shuffle" — is the input shuffled before compression, yes (1) in this case.
423 NCZarr has four constraints that must be met.
425 1. It must store its filter information in its metadata in the above JSON dictionary format.
426 2. It is required to re-use the HDF5 filter implementations.
427 This is to avoid having to rewrite the filter implementations
428 This means that some mechanism is needed to translate between the HDF5 id+parameter model and the Zarr JSON dictionary model.
429 3. It must be possible to modify the set of visible parameters in response to environment information such as the type of the associated variable; this is required to mimic the corresponding HDF5 capability.
430 4. It must be possible to use filters even if HDF5 support is disabled.
432 Note that the term "visible parameters" is used here to refer to the parameters provided by `nc_def_var_filter` or those stored in the dataset's metadata as provided by the JSON codec. The term "working parameters" refers to the parameters given to the compressor itself and derived from the visible parameters.
434 The standard authority for defining Zarr filters is the list supported by the NumCodecs project [7].
435 Comparing the set of standard filters (aka codecs) defined by NumCodecs to the set of standard filters defined by HDF5 [3], it can be seen that the two sets overlap, but each has filters not defined by the other.
437 Note also that it is undesirable that a specific set of filters/codecs be built into the NCZarr implementation.
438 Rather, it is preferable for there be some extensible way to associate the JSON with the code implementing the codec. This mirrors the plugin model used by HDF5.
440 The mechanism provided to address these issues is similar to that taken by HDF5.
441 A shared library must exist that has certain well-defined entry points that allow the NCZarr code to determine information about a Codec.
442 The shared library exports a well-known function name to access Codec information and relate it to a corresponding HDF5 implementation,
443 Note that the shared library may optionally be the same library containing the HDF5
446 ### Processing Overview
448 There are several paths by which the NCZarr filter API is invoked.
450 1. The nc\_def\_var\_filter function is invoked on a variable or
451 (1a) the metadata for a variable is read when opening an existing variable that has associated Codecs.
452 2. The visible parameters are converted to a set of working parameters.
453 3. The filter is invoked with the working parameters.
454 4. The dataset is closed using the final set of visible parameters.
456 #### Step 1: Invoking nc\_def\_var\_filter
458 In this case, the filter plugin is located and the set of visible parameters (from nc\_def\_var\_filter) are provided.
460 #### Step 1a: Reading metadata
462 In this case, the codec is read from the metadata and must be converted to a visible set of HDF5 style parameters.
463 It is possible that this set of visible parameters differs from the set that was provided by nc\_def\_var\_filter.
464 If this is important, then the filter implementation is responsible for marking this difference using, for example, different number of parameters or some differing value.
466 #### Step 2: Convert visible parameters to working parameters
468 Given environmental information such as the associated variable's base type, the visible parameters
469 are converted to a potentially larger set of working parameters; additionally provide the opportunity
470 to modify the visible parameters.
472 #### Step 3: Invoking the filter
474 As chunks are read or written, the filter is repeatedly invoked using the working parameters.
476 #### Step 4: Closing the dataset
478 The visible parameters from step 2 are stored in the dataset's metadata.
479 It is desirable to determine if the set of visible parameters changes.
480 If no change is detected, then re-writing the compressor metadata may be avoided.
484 Currently, there is no way to specify use of a filter via Codec through
485 the netcdf-c API. Rather, one must know the HDF5 id and parameters of
486 the filter of interest and use the functions *nc\_def\_var\_filter* and *nc\_inq\_var\_filter*.
487 Internally, the NCZarr code will use information about known Codecs to convert the HDF5 filter reference to the corresponding Codec.
488 This restriction also holds for the specification of filters in *ncgen* and *nccopy*.
489 This limitation may be lifted in the future.
491 ### Special Codecs Attribute
493 A new special attribute is defined called *\_Codecs* in parallel to the current *\_Filters* special attribute. Its value is a string containing the JSON representation of the Codecs associated with a given variable.
494 This can be especially useful when a file is unreadable because it uses a filter not available to the netcdf-c library.
495 That is, no implementation was found in the e.g. *HDF5\_PLUGIN\_PATH* directory.
496 In this case *ncdump -hs* will display the raw Codec information so that it may be possible to see what filter is missing.
498 ### Pre-Processing Filter Libraries
500 The process for using filters for NCZarr is defined to operate in several steps.
501 First, as with HDF5, all shared libraries in a specified directory
502 (e.g. *HDF5\_PLUGIN\_PATH*) are scanned.
503 They are interrogated to see what kind of library they implement, if any.
504 This interrogation operates by seeing if certain well-known (function) names are defined in this library.
506 There will be two library types:
508 1. HDF5 — exports a specific API: `H5Z_plugin_type` and `H5Z_get_plugin_info`.
509 2. Codec — exports a specific API: `NCZ_get_codec_info`
511 Note that a given library can export either or both of these APIs.
512 This means that we can have three types of libraries:
518 Suppose that our *HDF5\_PLUGIN\_PATH* location has an HDF5-only library.
519 Then by adding a corresponding, separate, Codec-only library to that same location, it is possible to make an HDF5 library usable by NCZarr.
520 It is possible to do this without having to modify the HDF5-only library.
521 Over time, it is possible to merge an HDF5-only library with a Codec-only library to produce a single, combined library.
523 ### Using Plugin Libraries
525 The netcdf-c library processes all of the shared libraries by interrogating each one for the well-known APIs and recording the result.
526 Any libraries that do not export one or both of the well-known APIs is ignored.
528 Internally, the netcdf-c library pairs up each HDF5 library API with a corresponding Codec API by invoking the relevant well-known functions
529 (See [Appendix E](#filters_appendixe).
530 This results in this table for associated codec and hdf5 libraries.
532 <tr><th>HDF5 API<th>Codec API<th>Action
533 <tr><td>Not defined<td>Not defined<td>Ignore
534 <tr><td>Defined<td>Not defined<td>Ignore
535 <tr><td>Defined<td>Defined<td>NCZarr usable
538 ### Filter Defaults Library
540 As a special case, a shared library may be created to hold
541 defaults for a common set of filters.
542 Basically, there is a specially defined function that returns
543 a vector of codec APIs. These defaults are used only if
544 no other library provides codec information for a filter.
545 Currently, the defaults library provides codec defaults
546 for Shuffle, Fletcher32, Deflate (zlib), and SZIP.
548 ### Using the Codec API
550 Given a set of filters for which the HDF5 API and the Codec API
551 are defined, it is then possible to use the APIs to invoke the
552 filters and to process the meta-data in Codec JSON format.
554 #### Writing an NCZarr Container
556 When writing, the user program will invoke the NetCDF API function *nc\_def\_var\_filter*.
557 This function is currently defined to operate using HDF5-style id and parameters (unsigned ints).
558 The netcdf-c library examines its list of known filters to find one matching the HDF5 id provided by *nc\_def\_var\_filter*.
559 The set of parameters provided is stored internally.
560 Then during writing of data, the corresponding HDF5 filter is invoked to encode the data.
562 When it comes time to write out the meta-data, the stored HDF5-style parameters are passed to a specific Codec function to obtain the corresponding JSON representation. Again see [Appendix E](#filters_appendixe).
563 This resulting JSON is then written in the NCZarr metadata.
565 #### Reading an NCZarr Container
567 When reading, the netcdf-c library will read the metadata for a given variable and will see that some set of filters are applied to this variable.
568 The metadata is encoded as Codec-style JSON.
570 Given a JSON Codec, it is parsed to provide a JSON dictionary containing the string "id" and the set of parameters as various keys.
571 The netcdf-c library examines its list of known filters to find one matching the Codec "id" string.
572 The JSON is passed to a Codec function to obtain the corresponding HDF5-style *unsigned int* parameter vector.
573 These parameters are stored for later use.
575 ### Supporting Filter Chains
577 HDF5 supports *filter chains*, which is a sequence of filters where the output of one filter is provided as input to the next filter in the sequence.
578 When encoding, the filters are executed in the "forward" direction,
579 while when decoding the filters are executed in the "reverse" direction.
581 In the Zarr meta-data, a filter chain is divided into two parts:
582 the "compressor" and the "filters". The former is a single JSON codec
583 as described above. The latter is an ordered JSON array of codecs.
584 So if compressor is something like
585 "compressor": {"id": "c"...}
586 and the filters array is like this:
587 "filters": [ {"id": "f1"...}, {"id": "f2"...}...{"id": "fn"...}]
588 then the filter chain is (f1,f2,...fn,c) with f1 being applied first and c being applied last when encoding. On decode, the filter chain is executed in the order (c,fn...f2,f1).
590 So, an HDF5 filter chain is divided into two parts, where the last filter in the chain is assigned as the "compressor" and the remaining
591 filters are assigned as the "filters".
592 But independent of this, each codec, whether a compressor or a filter,
593 is stored in the JSON dictionary form described earlier.
597 The Codec style, using JSON, has the ability to provide very complex parameters that may be hard to encode as a vector of unsigned integers.
598 It might be desirable to consider exporting a JSON-base API out of the netcdf-c API to support user access to this complexity.
599 This would mean providing some alternate version of `nc_def_var_filter` that takes a string-valued argument instead of a vector of unsigned ints.
600 This extension is unlikely to be implemented until a compelling use-case is encountered.
602 One bad side-effect of this is that we then may have two classes of plugins.
603 One class can be used by both HDF5 and NCZarr, and a second class that is usable only with NCZarr.
605 ### Using The NetCDF-C Plugins
607 As part of its testing, the NetCDF build process creates a number of shared libraries in the *netcdf-c/plugins* (or sometimes *netcdf-c/plugins/.libs*) directory.
608 If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
609 to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
611 # Lossy One-Way Filters
613 As of NetCDF version 4.8.2, the netcdf-c library supports
614 bit-grooming filters.
616 Bit-grooming is a lossy compression algorithm that removes the
617 bloat due to false-precision, those bits and bytes beyond the
618 meaningful precision of the data. Bit Grooming is statistically
619 unbiased, applies to all floating point numbers, and is easy to
620 use. Bit-Grooming reduces data storage requirements by
621 25-80%. Unlike its best-known competitor Linear Packing, Bit
622 Grooming imposes no software overhead on users, and guarantees
623 its precision throughout the whole floating point range
624 [https://doi.org/10.5194/gmd-9-3199-2016].
626 The generic term "quantize" is used to refer collectively to the various
627 precision-trimming algorithms. The key thing to note about quantization is that
628 it occurs at the point of writing of data only. Since its output is
629 legal data, it does not need to be "de-quantized" when the data is read.
630 Because of this, quantization is not part of the standard filter
631 mechanism and has a separate API.
633 The API for bit-groom is currently as follows.
636 int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
637 int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
639 The *quantize_mode* argument specifies the particular algorithm.
640 Currently, three are supported: NC\_QUANTIZE\_BITGROOM, NC\_QUANTIZE\_GRANULARBR,
641 and NC\_QUANTIZE\_BITROUND. In addition quantization can be disabled using
642 the value NC\_NOQUANTIZE.
644 The input to ncgen or the output from ncdump supports special attributes
645 to indicate if quantization was applied to a given variable.
646 These attributes have the following form.
649 _QuantizeBitGroomNumberOfSignificantDigits = <NSD>
651 _QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
653 _QuantizeBitRoundNumberOfSignificantBits = <NSB>
655 The value NSD is the number of significant (decimal) digits to keep.
656 The value NSB is the number of bits to keep in the fraction part of an
657 IEEE754 floating-point number. Note that NSB of QuantizeBitRound is the same as
658 "number of explicit mantissa bits" (https://doi.org/10.5194/gmd-9-3199-2016) and same as
659 the number of "keep-bits" (https://doi.org/10.5194/gmd-14-377-2021), but is not
660 one less than the number of significant bunary figures:
661 `_QuantizeBitRoundNumberOfSignificantBits = 0` means one significant binary figure,
662 `_QuantizeBitRoundNumberOfSignificantBits = 1` means two significant binary figures etc.
664 ## Distortions introduced by lossy filters
666 Any lossy filter introduces distortions to data.
667 The lossy filters implemented in netcdf-c introduce a distortoin
668 that can be quantified in terms of a _relative_ error. The magnitude of
669 distortion introduced to every single value V is guaranteed to be within
670 a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
671 i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
674 Two other methods use different definitions of _decimal precision_, though both
675 are guaranteed to reproduce NSD decimals when printed.
676 The margin for a relative error introduced by the methods are summarised in the table
682 Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
685 Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
690 If one defines decimal precision as in BitGroom, i.e. the introduced relative
691 error must not exceed half of the unit at the decimal place NSD in the
692 worst-case scenario, the following values of NSB should be used for BitRound:
696 NSB 3 6 9 13 16 19 23
699 The resulting application of BitRound is as fast as BitGroom, and is free from
700 artifacts in multipoint statistics introduced by BitGroom
701 (see https://doi.org/10.5194/gmd-14-377-2021).
704 # Debugging {#filters_debug}
707 Depending on the debugger one uses, debugging plugins can be very difficult.
708 It may be necessary to use the old printf approach for debugging the filter itself.
710 One case worth mentioning is when there is a dataset that is using an unknown filter.
711 For this situation, you need to identify what filter(s) are used in the dataset.
712 This can be accomplished using this command.
714 ncdump -s -h <dataset filename>
716 Since ncdump is not being asked to access the data (the -h flag), it can obtain the filter information without failures.
717 Then it can print out the filter id and the parameters as well as the Codecs (via the -s flag).
719 ## Test Cases {#filters_TestCase}
721 Within the netcdf-c source tree, the directory two directories contain test cases for testing dynamic filter operation.
723 * *netcdf-c/nc\_test4* provides tests for testing HDF5 filters.
724 * *netcdf-c/nczarr\_test* provides tests for testing NCZarr filters.
726 These tests are disabled if *--disable-shared* or if *--disable-filter-tests* is specified
727 or if *--disable-plugins* is specified.
729 ### HDF5 Example {#filters_Example}
731 A slightly simplified version of one of the HDF5 filter test cases is also available as an example within the netcdf-c source tree directory *netcdf-c/examples/C*.
732 The test is called *filter\_example.c* and it is executed as part of the *run\_examples4.sh* shell script.
733 The test case demonstrates dynamic filter writing and reading.
735 The files *example/C/hdf5plugins/Makefile.am* and *example/C/hdf5plugins/CMakeLists.txt* demonstrate how to build the hdf5 plugin for bzip2.
739 ### Order of Invocation for Multiple Filters
741 When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which *nc\_def\_var\_filter*is called.
742 When reading a file the order of application is of necessity the reverse.
744 There are some special cases.
746 1. The fletcher32 filter is always applied first, if enabled.
747 2. If *nc\_def\_var\_filter*or *nc\_def\_var\_deflate*or *nc\_def\_var\_szip*is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change.
748 However the last set of parameters specified is used when actually writing the dataset.
749 3. Deflate and shuffle — these two are inextricably linked in the current API, but have quite different semantics.
750 If you call *nc\_def\_var\_deflate*multiple times, then the previous rule applies with respect to deflate.
751 However, the shuffle filter, if enabled, is *always* applied before applying any other filters, except fletcher32.
752 4. Once a filter is defined for a variable, it cannot be removed nor can its position in the filter order be changed.
754 ### Memory Allocation Issues
756 Starting with HDF5 version 1.10.*, the plugin code MUST be careful when using the standard *malloc()*, *realloc()*, and *free()* function.
758 In the event that the code is allocating, reallocating, for
759 free'ing memory that either came from or will be exported to the
760 calling HDF5 library, then one MUST use the corresponding HDF5
761 functions *H5allocate\_memory()*, *H5resize\_memory()*,
762 *H5free\_memory()* [5] to avoid memory failures.
764 Additionally, if your filter code leaks memory, then the HDF5 library generates a failure something like this.
766 H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
768 One can look at the the code in plugins/H5Zbzip2.c and H5Zmisc.c as illustrations.
772 The current szip plugin code in the HDF5 library has some behaviors that can catch the unwary.
773 These are handled internally to (mostly) hide them so that they should not affect users.
774 Specifically, this filter may do two things.
776 1. Add extra parameters to the filter parameters: going from the two parameters provided by the user to four parameters for internal use.
777 It turns out that the two parameters provided when calling nc\_def\_var\_filter correspond to the first two parameters of the four parameters returned by nc\_inq\_var\_filter.
778 2. Change the values of some parameters: the value of the *options\_mask* argument is known to add additional flag bits, and the *pixels\_per\_block* parameter may be modified.
780 The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset\_szip function is actually a subset of the capabilities of the real szip implementation.
781 Presumably this is for historical reasons.
783 In any case, if the caller uses the *nc\_inq\_var\_szip* or the *nc\_inq\_var\_filter* functions, then the parameter values returned may differ from those originally specified.
785 It should also be noted that the HDF5 szip filter wrapper that
786 is invoked depends on the configuration of the netcdf-c library.
787 If the HDF5 installation supports szip, then the NCZarr szip
788 will use the HDF5 wrapper. If HDF5 does not support szip, or HDF5
789 is not enabled, then the plugins directory will contain a local
790 HDF5 szip wrapper to be used by NCZarr. This can be confusing,
791 but is generally transparent to the use since the plugins
792 HDF5 szip wrapper was taken from the HDF5 code base.
794 ### Supported Systems
796 The current matrix of OS X build systems known to work is as follows.
798 <tr><th>Build System<th>Supported OS
799 <tr><td>Automake<td>Linux, Cygwin, OSX
800 <tr><td>Cmake<td>Linux, Cygwin, OSX, Visual Studio
803 ### Generic Plugin Build
804 If you do not want to use Automake or Cmake, the following has been known to work.
806 gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5\_hl -lhdf5 -L${ZLIBDIR} -lz
808 ## References {#filters_References}
810 1. [https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf]()
811 2. [https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf]()
812 3.[ https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins]()
813 4. [https://support.hdfgroup.org/services/contributions.html#filters]()
814 5. [https://support.hdfgroup.org/HDF5/doc/RM/RM\_H5.html]()
815 6. [https://confluence.hdfgroup.org/display/HDF5/Filters
817 7. [https://numcodecs.readthedocs.io/en/stable/]()
818 8. [https://github.com/ccr/ccr]()
819 9. [https://escholarship.org/uc/item/7xd1739k]()
821 ## Appendix A. HDF5 Parameter Encode/Decode {#filters_appendixa}
823 The filter id for an HDF5 format filter is an unsigned integer.
824 Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers.
825 It may be that the parameters required by a filter can naturally be encoded as unsigned integers.
826 The bzip2 compression filter, for example, expects a single integer value from zero thru nine.
827 This encodes naturally as a single unsigned integer.
829 Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved.
830 Simple signed integer values of type short or char can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then sign extending. Similarly, unsigned 8 and 16 bit
831 values can be used with zero extensions.
833 Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters.
834 You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
836 When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
838 Parameters whose size is larger than 32-bits present a byte order problem.
839 This specifically includes double precision floats and (signed or unsigned) 64-bit integers.
840 For these cases, the machine byte order issue must be handled, in part, by the compression code.
841 This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately.
842 This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately
844 But this will be incorrect for 64-bit values.
846 So, we have this situation (for HDF5 only):
848 1. the 8 bytes start as native machine order for the machine doing the call to *nc\_def\_var\_filter*.
849 2. The caller divides the 8 bytes into 2 four byte pieces and passes them to *nc\_def\_var\_filter*.
850 3. HDF5 takes each four byte piece and ensures that each piece is in network (big) endian order.
851 4. When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
853 ### Encoding Algorithms for HDF5
855 In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
857 The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers.
858 Note that little-endian is used as the standard because it is the most common machine format.
859 When read, the filter code needs to be aware of this convention and do the appropriate conversions.
861 This leads to the following set of rules.
865 1. Encode on little endian (LE) machine: no special action is required.
866 The 8-byte value is passed to HDF5 as two 4-byte integers.
867 HDF5 byte swaps each integer and stores it in the file.
868 2. Encode on a big endian (BE) machine: several steps are required:
870 1. Do an 8-byte byte swap to convert the original value to little-endian format.
871 2. Since the encoding machine is BE, HDF5 will just store the value.
872 So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
873 3. This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
877 1. Decode on LE machine: no special action is required.
878 HDF5 will get the two 4-bytes values from the file and byte-swap each separately.
879 The concatenation of those two integers will be the expected LE value.
880 2. Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
882 1. HDF5 sends the two 4-byte values to the filter.
883 2. The filter must then byte-swap each 4-byte value independently.
884 3. The filter then must concatenate the two 4-byte values into a single 8-byte value.
885 Because of the encoding rules, this 8-byte value will be in LE format.
886 4. The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
888 To support these rules, some utility programs exist and are discussed in [Appendix B](#filters_appendixb).
890 ## Appendix B. Support Utilities {#filters_appendixb}
892 Several functions are exported from the netcdf-c library for use by client programs and by filter implementations.
893 They are defined in the header file *netcdf\_aux.h*.
894 The h5 tag indicates that they assume that the result of the parse is a set of unsigned integers — the format used by HDF5.
896 1. *int ncaux\_h5filterspec\_parse(const char* txt, unsigned int* idp. size\_t* nparamsp, unsigned int** paramsp);*
897 * txt contains the text of a sequence of comma separated constants
898 * idp will contain the first constant — the filter id
899 * nparamsp will contain the number of params
900 * paramsp will contain a vector of params — the caller must free
901 This function can parse single filter spec strings as defined in the section on [Filter Specification Syntax](#filters_syntax).
902 2. *int ncaux\_h5filterspec\_parselist(const char* txt, int* formatp, size\_t* nspecsp, struct NC\_H5\_Filterspec*** vectorp);*
903 * txt contains the text of a sequence '|' separated filter specs.
904 * formatp currently always returns 0.
905 * nspecsp will return the number of filter specifications.
906 * vectorp will return a pointer to a vector of pointers to filter specification instances — the caller must free.
907 This function parses a sequence of filter specifications each separated by a '|' character.
908 The text between '|' separators must be parsable by *ncaux\_h5filterspec\_parse*.
909 3. *void ncaux\_h5filterspec\_free(struct NC\_H5\_Filterspec* f);*
910 * f is a pointer to an instance of *struct NC\_H5\_Filterspec*
911 Typically this was returned as an element of the vector returned
912 by *\_ncaux\_h5filterspec\_parselist*.
913 This reclaims the parameters of the filter spec object as well as the object itself.
914 4. *int ncaux\_h5filterspec\_fix8(unsigned char* mem8, int decode);*
915 * mem8 is a pointer to the 8-byte value either to fix.
916 * decode is 1 if the function should apply the 8-byte decoding algorithm
917 else apply the encoding algorithm.
918 This function implements the 8-byte conversion algorithms for HDF5.
919 Before calling *nc\_def\_var\_filter* (unless *NC\_parsefilterspec* was used), the client must call this function with the decode argument set to 0.
920 Inside the filter code, this function should be called with the decode argument set to 1.
922 Examples of the use of these functions can be seen in the test program *nc\_test4/tst\_filterparser.c*.
924 Some of the above functions use a C struct defined in *netcdf\_filter.h\_.
925 The definition of that struct is as follows.
928 typedef struct NC_H5_Filterspec {
929 unsigned int filterid; /* ID for arbitrary filter. */
930 size_t nparams; /* nparams for arbitrary filter. */
931 unsigned int* params; /* Params for arbitrary filter. */
934 This struct in effect encapsulates all of the information about and HDF5 formatted filter — the id, the number of parameters, and the parameters themselves.
936 ## Appendix C. Build Flags for Detecting the Filter Mechanism {#filters_appendixc}
938 The include file *netcdf\_meta.h* contains the following definition.
940 #define NC_HAS_MULTIFILTERS 1
942 This, in conjunction with the error code *NC\_ENOFILTER* in *netcdf.h* can be used to see what filter mechanism is in place as described in the section on [incompatibities](#filters_compatibility).
944 1. !defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) — indicates that the old pre-4.7.4 mechanism is in place.
945 It does not support multiple filters.
946 2. defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) — indicates that the 4.7.4 mechanism is in place.
947 It does support multiple filters, but the error return codes for *nc\_inq\_var\_filter* are different and the filter spec parser functions are in a different location with different names.
948 3. defined(NC\_ENOFILTER) && defined(NC\_HAS\_MULTIFILTERS) — indicates that the multiple filters are supported, and that *nc\_inq\_var\_filter* returns a filterid of zero to indicate that a variable has no filters.
949 Also, the filter spec parsers have the names and signatures described in this document and are define in *netcdf\_aux.h*.
951 ## Appendix D. BNF for Specifying Filters in Utilities {#filters_appendixd}
958 | filterid ',' parameterlist
962 parameterlist: parameter
963 | parameterlist ',' parameter
965 parameter: unsigned32
968 unsigned32: <32 bit unsigned integer>
971 ## Appendix E. Codec API {#filters_appendixe}
973 The Codec API mirrors the HDF5 API closely. It has one well-known function that can be invoked to obtain information about the Codec as well as pointers to special functions to perform conversions.
975 ### The Codec Plugin API
977 #### NCZ\_get\_codec\_info
979 This function returns a pointer to a C struct that provides detailed information about the codec plugin.
983 void* NCZ_get_codec_info(void);
985 The value returned is actually of type *struct NCZ\_codec\_t*,
986 but is of type *void\** to allow for extensions.
990 typedef struct NCZ_codec_t {
991 int version; /* Version number of the struct */
992 int sort; /* Format of remainder of the struct;
993 Currently always NCZ_CODEC_HDF5 */
994 const char* codecid; /* The name/id of the codec */
995 unsigned int hdf5id; /* corresponding hdf5 id */
996 void (*NCZ_codec_initialize)(void);
997 void (*NCZ_codec_finalize)(void);
998 int (*NCZ_codec_to_hdf5)(const char* codec, int* nparamsp, unsigned** paramsp);
999 int (*NCZ_hdf5_to_codec)(size_t nparams, const unsigned* params, char** codecp);
1000 int (*NCZ_modify_parameters)(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* nparamsp, unsigned** paramsp);
1004 The semantics of the non-function fields is as follows:
1006 1. *version* — Version number of the struct.
1007 2. *sort* — Format of remainder of the struct; currently always NCZ\_CODEC\_HDF5.
1008 3. *codecid* — The name/id of the codec.
1009 4. *hdf5id* — The corresponding hdf5 id.
1011 #### NCZ\_codec\_to\_hdf5
1013 Given a JSON Codec representation, it will return a corresponding vector of unsigned integers representing the
1018 int NCZ_codec_to_hdf(const char* codec, int* nparamsp, unsigned** paramsp);
1021 1. codec — (in) ptr to JSON string representing the codec.
1022 2. nparamsp — (out) store the length of the converted HDF5 unsigned vector
1023 3. paramsp — (out) store a pointer to the converted HDF5 unsigned vector; caller must free the returned vector. Note the double indirection.
1025 Return Value: a netcdf-c error code.
1027 #### NCZ\_hdf5\_to\_codec
1029 Given an HDF5 visible parameters vector of unsigned integers and its length,
1030 return a corresponding JSON codec representation of those visible parameters.
1034 int NCZ_hdf5_to_codec)(int ncid, int varid, size_t nparams, const unsigned* params, char** codecp);
1038 1. ncid — the variables' containing group
1039 2. varid — the containing variable
1040 3. nparams — (in) the length of the HDF5 visible parameters vector
1041 4. params — (in) pointer to the HDF5 visible parameters vector.
1042 5. codecp — (out) store the string representation of the codec; caller must free.
1044 Return Value: a netcdf-c error code.
1046 #### NCZ\_modify\_parameters
1048 Extract environment information from the (ncid,varid) and use it to convert a set of visible parameters
1049 to a set of working parameters; also provide option to modify visible parameters.
1053 int NCZ_modify_parameters(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* wnparamsp, unsigned** wparamsp);
1057 1. ncid — (in) group id containing the variable.
1058 2. varid — (in) the id of the variable to which this filter is being attached.
1059 3. vnparamsp — (in/out) the count of visible parameters
1060 4. vparamsp — (in/out) the set of visible parameters
1061 5. wnparamsp — (out) the count of working parameters
1062 4. wparamsp — (out) the set of working parameters
1064 Return Value: a netcdf-c error code.
1066 #### NCZ\_codec\_initialize
1068 Some compressors may require library initialization.
1069 This function is called as soon as a shared library is loaded and matched with an HDF5 filter.
1073 int NCZ_codec_initialize)(void);
1075 Return Value: a netcdf-c error code.
1077 #### NCZ\_codec\_finalize
1079 Some compressors (like blosc) require invoking a finalize function in order to avoid memory loss.
1080 This function is called during a call to *nc\_finalize* to do any finalization.
1081 If the client code does not invoke *nc\_finalize* then memory checkers may complain about lost memory.
1085 int NCZ_codec_finalize)(void);
1087 Return Value: a netcdf-c error code.
1091 As an aid to clients, it is convenient if a single shared library can provide multiple *NCZ\_code\_t* instances at one time.
1092 This API is not intended to be used by plugin developers.
1093 A shared library must only export this function.
1095 #### NCZ\_codec\_info\_defaults
1097 Return a NULL terminated vector of pointers to instances of *NCZ\_codec\_t*.
1101 void* NCZ_codec_info_defaults(void);
1103 The value returned is actually of type *NCZ\_codec\_t***,
1104 but is of type *void** to allow for extensions.
1105 The list of returned items are used to try to provide defaults
1106 for any HDF5 filters that have no corresponding Codec.
1107 This is for internal use only.
1109 ## Appendix F. Standard Filters {#filters_appendixf}
1111 Support for a select set of standard filters is built into the NetCDF API.
1112 Generally, they are accessed using the following generic API, where XXXX is
1113 the filter name. As a rule, the names are those used in the HDF5 filter ID naming authority [4] or the NumCodecs naming authority [7].
1115 int nc_def_var_XXXX(int ncid, int varid, unsigned filterid, size_t nparams, unsigned* params);
1116 int nc_inq_var_XXXX(int ncid, int varid, int* hasfilter, size_t* nparamsp, unsigned* params);
1118 The first function inserts the specified filter into the filter chain for a given variable.
1119 The second function queries the given variable to see if the specified function
1120 is in the filter chain for that variable. The *hasfilter* argument is set
1121 to one if the filter is in the chain and zero otherwise.
1122 As is usual with the netcdf API, one is expected to call this function twice.
1123 The first time to set *nparamsp* and the second to get the parameters in the client-allocated memory argument *params*.
1124 Any of these arguments can be NULL, in which case no value is returned.
1126 Note that NetCDF inherits four filters from HDF5, namely shuffle, fletcher32, deflate (zlib), and szip. The API's for these do not conform to the above API.
1127 So aside from those four, the current set of standard filters is as follows.
1129 <tr><th>Filter Name<th>Filter ID<th>Reference
1130 <tr><td>zstandard<td>32015<td>https://facebook.github.io/zstd/
1131 <tr><td>bzip2<td>307<td>https://sourceware.org/bzip2/
1134 It is important to note that in order to use each standard filter, several additonal libraries must be installed.
1135 Consider the zstandard compressor, which is one of the supported standard filters.
1136 When installing the netcdf library, the following other libraries must be installed.
1138 1. *libzstd.so* | *zstd.dll* | *libzstd.dylib* -- The actual zstandard compressor library; typically installed by using your platform specific package manager.
1139 2. The HDF5 wrapper for *libzstd.so* -- There are several options for obtaining this (see [Appendix G](#filters_appendixg).)
1140 3. (Optional) The Zarr wrapper for *libzstd.so* -- you need this if you intend to read/write Zarr datasets that were compressed using zstandard; again see [Appendix G](#filters_appendixg).
1142 ## Appendix G. Finding Filter Implementations {#filters_appendixg}
1144 A major problem for filter users is finding an implementation of an HDF5 filter wrapper and (optionally)
1145 its corresponding NCZarr wrapper. There are several ways to do this.
1147 * **--with-plugin-dir** — An option to *./configure* that will install the necessary wrappers.
1148 See [Appendix H](#filters_appendixh).
1150 * **HDF5 Assigned Filter Identifiers Repository [3]** —
1151 HDF5 maintains a page of standard filter identifiers along with
1152 additional contact information. This often includes a pointer
1153 to source code. This will provide only HDF5 wrappers and not NCZarr wrappers.
1155 * **Community Codec Repository** —
1156 The Community Codec Repository (CCR) project [8] provides
1157 filters, including HDF5 wrappers, for a number of filters.
1158 It does not as yet provide Zarr wrappers.
1159 You can install this library to get access to these supported filters.
1160 It does not currently include the required NCZarr Codec API,
1161 so they are only usable with netcdf-4. This will change in the future.
1163 ## Appendix H. Auto-Install of Filter Wrappers {#filters_appendixh}
1165 As part of the overall build process, a number of filter wrappers are built as shared libraries in the "plugins" directory.
1166 These wrappers can be installed as part of the overall netcdf-c installation process.
1167 WARNING: the installer still needs to make sure that the actual filter/compression libraries are installed: e.g. libzstd and/or libblosc.
1168 See the document *pluginpaths.md* for details on the installation process.
1169 If NCZarr is enabled, then in addition to wrappers for the standard filters,
1170 additional libraries will be installed to support NCZarr access to filters.
1171 Currently, this list includes the following:
1173 * shuffle — shuffle filter
1174 * fletcher32 — fletcher32 checksum
1175 * deflate — deflate compression
1176 * (optional) szip — szip compression, if libsz is available
1177 * bzip2 — an HDF5 filter for bzip2 compression
1178 * lib__nczh5filters.so — provide NCZarr support for shuffle, fletcher32, deflate, and (optionally) szip.
1179 * lib__nczstdfilters.so — provide NCZarr support for bzip2, (optionally)zstandard, and (optionally) blosc.
1181 The shuffle, fletcher32, and deflate filters in this case will
1182 be ignored by HDF5 and only used by the NCZarr code. But in
1183 order to use them, it needs additional Codec capabilities
1184 provided by the *lib__nczh5filters.so* shared library. Note also that
1185 if you disable HDF5 support, but leave NCZarr support enabled,
1186 then all of the above filters should continue to work.
1188 ## Appendix I. A Warning on Backward Compatibility {#filters_appendixi}
1190 The API defined in this document should accurately reflect the
1191 current state of filters in the netCDF-c library. Be aware that
1192 there was a short period in which the filter code was undergoing
1193 some revision and extension. Those extensions have largely been
1194 reverted. Unfortunately, some users may experience some
1195 compilation problems for previously working code because of
1196 these reversions. In that case, please revise your code to
1197 adhere to this document. Apologies are extended for any
1200 A user may encounter an incompatibility if any of the following appears in user code.
1202 * The function *nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
1203 It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
1204 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
1205 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
1206 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
1208 For additional information, see [Appendix B](#filters_appendixb).
1210 ## History {#filters_history}
1212 *Author*: Dennis Heimbigner<br>
1213 *Email*: dennis.heimbigner@gmail.com<br>
1214 *Initial Version*: 1/10/2018<br>
1215 *Last Revised*: 5/18/2022