NetCDF  4.9.3
filters.md
1 Appendix D. NetCDF-4 Filter Support {#filters}
2 ==================================
3 
4 [TOC]
5 
6 > See @ref nc_filters_quickstart for tips to get started quickly with NetCDF-4 Filter Support.
7 
8 ## Filters Overview {#filters_overview}
9 
10 NetCDF-C filters have some features of which the user
11 should be aware.
12 
13 * ***Auto Install of filters***<br>
14 An option is now provided to automatically install
15 HDF5 filters into a default location, or optionally
16 into a user-specified location.
17 This is described in the *pluginpath.md* document.
18 
19 * ***NCZarr Filter Support***<br>
20 [NCZarr filters](#filters_nczarr) are now supported.
21 This essentially means that it is possible to specify
22 Zarr Codecs (Zarr equivalent of filters) in Zarr files
23 and have them processed using HDF5-style wrapper shared libraries.
24 Zarr filters can be used even if HDF5 support is disabled
25 in the netCDF-C library.
26 
27 ## Introduction to Filters {#filters_introduction}
28 
29 The netCDF library supports a general filter mechanism to apply
30 various kinds of filters to datasets before reading or writing.
31 The most common kind of filter is a compression-decompression
32 filter, and that is the focus of this document.
33 But non-compression filters &ndash; fletcher32, for example &ndash; also exist.
34 
35 This document describes the support for HDF5 filters and also
36 the newly added support for NCZarr filters.
37 
38 The netCDF enhanced (aka netCDF-4) library inherits this
39 capability since it depends on the HDF5 library. The HDF5
40 library (1.8.11 and later) supports filters, and netCDF is based
41 closely on that underlying HDF5 mechanism.
42 
43 Filters assume that a variable has chunking defined and each
44 chunk is filtered before writing and "unfiltered" after reading
45 and before passing the data to the user. In the event that
46 multiple filters are defined on a variable, they are applied in
47 first-defined order on writing and on the reverse order when
48 reading.
49 
50 There is an important "caveat" with respect to filters
51 and their application to variables.
52 If the type of the variable is variable-sized, then attempts
53 to define a filter on such a variable will not be allowed.
54 In this case, the call to *nc\_def\_var\_filter* will succeed
55 but the filter will be suppressed and a warning will be logged.
56 Similarly, if an existing file is opened, and there is a
57 variable-sized variable with a filter, then that variable will be
58 suppressed and will be inaccessible through the netcdf-c API.
59 
60 The concept of a variable-sized type is defined as follows:
61 1. The type NC_STRING is variable-sized.
62 2. Any user defined type of the class NC_VLEN is variable sized.
63 3. If a compound type has any field that is (transitively) variable-sized,
64  then that compound type is variable-sized.
65 4. All other types are fixed-size.
66 
67 ## A Warning on Backward Compatibility {#filters_compatibility}
68 
69 The API defined in this document should accurately reflect the
70 current state of filters in the netCDF-c library. Be aware that
71 there was a short period in which the filter code was undergoing
72 some revision and extension. Those extensions have largely been
73 reverted. Unfortunately, some users may experience some
74 compilation problems for previously working code because of
75 these reversions. In that case, please revise your code to
76 adhere to this document. Apologies are extended for any
77 inconvenience.
78 
79 A user may encounter an incompatibility if any of the following appears in user code.
80 
81 * The function *\_nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
82  It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
83 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
84 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
85 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
86 
87 For additional information, see [Appendix B](#filters_appendixb).
88 
89 ## Enabling A HDF5 Compression Filter {#filters_enable}
90 
91 HDF5 supports dynamic loading of compression filters using the
92 following process for reading of compressed data.
93 
94 1. Assume that we have a dataset with one or more variables that were compressed using some algorithm.
95  How the dataset was compressed will be discussed subsequently.
96 2. Shared libraries or DLLs exist that implement the compress/decompress algorithm.
97  These libraries have a specific API so that the HDF5 library can locate, load, and utilize the compressor.
98 3. These libraries are expected to installed in a specific directory.
99 
100 In order to compress a variable with an HDF5 compliant filter,
101 the netcdf-c library must be given three pieces of information:
102 
103 1. some unique identifier for the filter to be used,
104 2. a vector of parameters for controlling the action of the compression filter, and
105 3. access to a shared library implementation of the filter.
106 
107 The meaning of the parameters is, of course, completely filter
108 dependent and the filter description [3] needs to be consulted.
109 For bzip2, for example, a single parameter is provided
110 representing the compression level. It is legal to provide a
111 zero-length set of parameters. Defaults are not provided, so
112 this assumes that the filter can operate with zero parameters.
113 
114 Filter ids are assigned by the HDF group. See [4] for a current
115 list of assigned filter ids. Note that ids above 32767 can be
116 used for testing without registration.
117 
118 The first two pieces of information can be provided in one of
119 three ways: (1) using *ncgen*, (2) via an API call, or (3) via
120 command line parameters to *nccopy*. In any case, remember that
121 filtering also requires setting chunking, so the variable must
122 also be marked with chunking information. If compression is set
123 for a non-chunked variable, the variable will forcibly be
124 converted to chunked using a default chunking algorithm.
125 
126 ## Using The API {#filters_API}
127 The necessary API methods are included in *netcdf\_filter.h* by default.
128 These functions implicitly use the HDF5 mechanisms and may produce an error if applied to a file format that is not compatible with the HDF5 mechanism.
129 
130 ### nc\_def\_var\_filter
131 Add a filter to the set of filters to be used when writing a variable. This must be invoked after the variable has been created and before *nc\_enddef* is invoked.
132 ````
133  int nc_def_var_filter(int ncid, int varid, unsigned int id,
134  size_t nparams, const unsigned int* params);
135 ````
136 Arguments:
137 
138 * ncid &mdash; File and group ID.
139 * varid &mdash; Variable ID.
140 * id &mdash; Filter ID.
141 * nparams &mdash; Number of filter parameters.
142 * params &mdash; Filter parameters (a vector of unsigned integers)
143 
144 Return codes:
145 
146 * NC\_NOERR &mdash; No error.
147 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
148 * NC\_EBADID &mdash; Bad ncid or bad filter id
149 * NC\_ENOTVAR &mdash; Invalid variable ID.
150 * NC\_EINDEFINE &mdash; called when not in define mode
151 * NC\_ELATEDEF &mdash; called after variable was created
152 * NC\_EINVAL &mdash; Scalar variable, or parallel enabled and parallel filters not supported or nparams or params invalid.
153 
154 ### nc\_inq\_var\_filter\_ids
155 Query a variable to obtain a list of the ids of all filters associated with that variable.
156 ````
157 int nc_inq_var_filter_ids(int ncid, int varid, size_t* nfiltersp, unsigned int* filterids);
158 ````
159 Arguments:
160 
161 * ncid &mdash; File and group ID.
162 * varid &mdash; Variable ID.
163 * nfiltersp &mdash; Stores number of filters found; may be zero.
164 * filterids &mdash; Stores set of filter ids.
165 
166 Return codes:
167 
168 * NC\_NOERR &mdash; No error.
169 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
170 * NC\_EBADID &mdash; Bad ncid
171 * NC\_ENOTVAR &mdash; Invalid variable ID.
172 
173 The number of filters associated with the variable is stored in *nfiltersp* (it may be zero).
174 The set of filter ids will be returned in *filterids*.
175 As is usual with the netcdf API, one is expected to call this function twice.
176 The first time to set *nfiltersp* and the second to get the filter ids in client-allocated memory.
177 Any of these arguments can be NULL, in which case no value is returned.
178 
179 ### nc\_inq\_var\_filter\_info
180 Query a variable to obtain information about a specific filter associated with the variable.
181 ````
182 int nc_inq_var_filter_info(int ncid, int varid, unsigned int id, size_t* nparamsp, unsigned int* params);
183 ````
184 Arguments:
185 
186 * ncid &mdash; File and group ID.
187 * varid &mdash; Variable ID.
188 * id &mdash; The filter id of interest.
189 * nparamsp &mdash; Stores number of parameters.
190 * params &mdash; Stores set of filter parameters.
191 
192 Return codes:
193 
194 * NC\_NOERR &mdash; No error.
195 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
196 * NC\_EBADID &mdash; Bad ncid
197 * NC\_ENOTVAR &mdash; Invalid variable ID.
198 * NC\_ENOFILTER &mdash; Filter not defined for the variable.
199 
200 The *id* indicates the filter of interest.
201 The actual parameters are stored in *params*.
202 The number of parameters is returned in *nparamsp*.
203 As is usual with the netcdf API, one is expected to call this function twice.
204 The first time to set *nparamsp* and the second to get the parameters in client-allocated memory.
205 Any of these arguments can be NULL, in which case no value is returned.
206 If the specified id is not attached to the variable, then NC\_ENOFILTER is returned.
207 
208 ### nc\_inq\_var\_filter
209 Query a variable to obtain information about the first filter associated with the variable.
210 When netcdf-c was modified to support multiple filters per variable, the utility of this function became redundant since it returns info only about the first defined filter for the variable.
211 Internally, it is implemented using the functions *nc\_inq\_var\_filter\_ids* and *nc\_inq\_filter\_info*.
212 
213 ````
214 int nc_inq_var_filter(int ncid, int varid, unsigned int* idp, size_t* nparamsp, unsigned int* params);
215 ````
216 
217 Arguments:
218 
219 * ncid &mdash; File and group ID.
220 * varid &mdash; Variable ID.
221 * idp &mdash; Stores the id of the first found filter, set to zero if variable has no filters.
222 * nparamsp &mdash; Stores number of parameters.
223 * params &mdash; Stores set of filter parameters.
224 
225 Return codes:
226 
227 * NC\_NOERR &mdash; No error.
228 * NC\_ENOTNC4 &mdash; Not a netCDF-4 file.
229 * NC\_EBADID &mdash; Bad ncid
230 * NC\_ENOTVAR &mdash; Invalid variable ID.
231 
232 The filter id will be returned in the *idp* argument.
233 If there are no filters, then zero is stored in this argument.
234 Otherwise, the number of parameters is stored in *nparamsp* and the actual parameters in *params*.
235 As is usual with the netcdf API, one is expected to call this function twice.
236 The first time to get *nparamsp* and the second to get the parameters in client-allocated memory.
237 Any of these arguments can be NULL, in which case no value is returned.
238 
239 ## Using ncgen {#filters_NCGEN}
240 
241 In a CDL file, compression of a variable can be specified by annotating it with the following attribute:
242 
243 * *\_Filter* &mdash; a string containing a comma separated list of constants specifying (1) the filter id to apply, and (2) a vector of constants representing the parameters for controlling the operation of the specified filter.
244 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
245 
246 This is a "special" attribute, which means that it will normally be invisible when using *ncdump* unless the -s flag is specified.
247 
248 For backward compatibility it is probably better to use the *\_Deflate* attribute instead of *\_Filter*. But using *\_Filter* to specify deflation will work.
249 
250 Multiple filters can be specified for a given variable by using the "|" separator.
251 Alternatively, this attribute may be repeated to specify multiple filters.
252 
253 Note that the lexical order of declaration is important when more than one filter is specified for a variable because it determines the order in which the filters are applied.
254 
255 ### Example CDL File (Data elided)
256 
257 ````
258 netcdf bzip2szip {
259 dimensions:
260  dim0 = 4 ; dim1 = 4 ; dim2 = 4 ; dim3 = 4 ;
261 variables:
262  float var(dim0, dim1, dim2, dim3) ;
263  var:_Filter = "307,9|4,32,32" ; // bzip2 then szip
264  var:_Storage = "chunked" ;
265  var:_ChunkSizes = 4, 4, 4, 4 ;
266 data:
267 ...
268 }
269 ````
270 
271 Note that the assigned filter id for bzip2 is 307 and for szip it is 4.
272 
273 ## Using nccopy {#filters_NCCOPY}
274 
275 When copying a netcdf file using *nccopy* it is possible to specify filter information for any output variable by using the "-F" option on the command line; for example:
276 
277  nccopy -F "var,307,9" unfiltered.nc filtered.nc
278 
279 Assume that *unfiltered.nc* has a chunked but not bzip2 compressed variable named "var".
280 This command will copy that variable to the *filtered.nc* output file but using filter with id 307 (i.e. bzip2) and with parameter(s) 9 indicating the compression level.
281 See the section on the <a href="#filters_syntax">parameter encoding syntax</a> for the details on the allowable kinds of constants.
282 
283 The "-F" option can be used repeatedly, as long as a different variable is specified for each occurrence.
284 
285 It can be convenient to specify that the same compression is to be applied to more than one variable. To support this, two additional *-F* cases are defined.
286 
287 1. *-F \*,...* means apply the filter to all variables in the dataset.
288 2. *-F v1&v2&..,...* means apply the filter to multiple variables.
289 
290 Multiple filters can be specified using the pipeline notions '|'.
291 For example
292 
293 1. *-F v1&v2,307,9|4,32,32* means apply filter 307 (bzip2) then filter 4 (szip) to the multiple variables.
294 
295 Note that the characters '\*', '\&', and '\|' are shell reserved characters, so you will probably need to escape or quote the filter spec in that environment.
296 
297 As a rule, any input filter on an input variable will be applied to the equivalent output variable &mdash; assuming the output file type is netcdf-4.
298 It is, however, sometimes convenient to suppress output compression either totally or on a per-variable basis.
299 Total suppression of output filters can be accomplished by specifying a special case of "-F", namely this.
300 
301  nccopy -F none input.nc output.nc
302 
303 The expression *-F \*,none* is equivalent to *-F none*.
304 
305 Suppression of output filtering for a specific set of variables can be accomplished using these formats.
306 
307  nccopy -F "var,none" input.nc output.nc
308  nccopy -F "v1&v2&...,none" input.nc output.nc
309 
310 where "var" and the "vi" are the fully qualified name of a variable.
311 
312 The rules for all possible cases of the "-F none" flag are defined by this table.
313 <table>
314 <tr><th>-F none<th>-Fvar,...<th>Input Filter<th>Applied Output Filter
315 <tr><td>true<td>undefined<td>NA<td>unfiltered
316 <tr><td>true<td>none<td>NA<td>unfiltered
317 <tr><td>true<td>defined<td>NA<td>use output filter(s)
318 <tr><td>false<td>undefined<td>defined<td>use input filter(s)
319 <tr><td>false<td>none<td>NA<td>unfiltered
320 <tr><td>false<td>defined<td>undefined<td>use output filter(s)
321 <tr><td>false<td>undefined<td>undefined<td>unfiltered
322 <tr><td>false<td>defined<td>defined<td>use output filter(s)
323 </table>
324 
325 ## Filter Specification Syntax {#filters_syntax}
326 
327 The utilities <a href="#NCGEN">ncgen</a> and <a href="#NCCOPY">nccopy</a>, and also the output of *ncdump*, support the specification of filter ids, formats, and parameters in text format.
328 The BNF specification is defined in [Appendix C](#filters_appendixc).
329 Basically, These specifications consist of a filter id, a comma, and then a sequence of
330 comma separated constants representing the parameters.
331 The constants are converted within the utility to a proper set of unsigned int constants (see the <a href="#ParamEncode">parameter encoding section</a>).
332 
333 To simplify things, various kinds of constants can be specified rather than just simple unsigned integers.
334 The *ncgen* and *nccopy* programs will encode them properly using the rules specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
335 Since the original types are lost after encoding, *ncdump* will always show a simple list of unsigned integer constants.
336 
337 The currently supported constants are as follows.
338 <table>
339 <tr halign="center"><th>Example<th>Type<th>Format Tag<th>Notes
340 <tr><td>-17b<td>signed 8-bit byte<td>b|B<td>Truncated to 8 bits and sign extended to 32 bits
341 <tr><td>23ub<td>unsigned 8-bit byte<td>u|U b|B<td>Truncated to 8 bits and zero extended to 32 bits
342 <tr><td>-25S<td>signed 16-bit short<td>s|S<td>Truncated to 16 bits and sign extended to 32 bits
343 <tr><td>27US<td>unsigned 16-bit short<td>u|U s|S<td>Truncated to 16 bits and zero extended to 32 bits
344 <tr><td>-77<td>implicit signed 32-bit integer<td>Leading minus sign and no tag<td>
345 <tr><td>77<td>implicit unsigned 32-bit integer<td>No tag<td>
346 <tr><td>93U<td>explicit unsigned 32-bit integer<td>u|U<td>
347 <tr><td>789f<td>32-bit float<td>f|F<td>
348 <tr><td>12345678.12345678d<td>64-bit double<td>d|D<td>LE encoding
349 <tr><td>-9223372036854775807L<td>64-bit signed long long<td>l|L<td>LE encoding
350 <tr><td>18446744073709551615UL<td>64-bit unsigned long long<td>u|U l|L<td>LE encoding
351 </table>
352 Some things to note.
353 
354 1. In all cases, except for an untagged positive integer, the format tag is required and determines how the constant is converted to one or two unsigned int values.
355 2. For an untagged positive integer, the constant is treated as of the smallest type into which it fits (i.e. 8,16,32, or 64 bit).
356 3. For signed byte and short, the value is sign extended to 32 bits and then treated as an unsigned int value, but maintaining the bit-pattern.
357 4. For double, and signed|unsigned long long, they are converted as specified in the section on <a href="#filters_paramcoding">parameter encode/decode</a>.
358 5. In order to support mutiple filters, the argument to *\_Filter* may be a pipeline separated (using '|') to specify a list of filters specs.
359 
360 ## Dynamic Loading Process {#filters_Process}
361 
362 Each filter is assumed to be compiled into a separate dynamically loaded library.
363 For HDF5 conformant filters, these filter libraries are assumed to be in some specific location.
364 The details for writing such a filter are defined in the HDF5 documentation[1,2].
365 
366 ### Plugin directory {#filters_plugindir}
367 
368 The HDF5 loader searches for plugins in a number of directories.
369 The netcdf-c process for installing and locating plugins is described
370 in detail in the *pluginpath.md* document.
371 
372 ### Plugin Library Naming {#filters_Pluginlib}
373 
374 Given a plugin directory, HDF5 examines every file in that directory
375 that conforms to a specified name pattern as determined by the
376 platform on which the library is being executed.
377 
378 <table>
379 <tr halign="center"><th>Platform<th>Basename<th>Extension
380 <tr halign="left"><td>Linux<td>lib*<td>.so*
381 <tr halign="left"><td>OSX<td>lib*<td>.dylib*
382 <tr halign="left"><td>Cygwin<td>cyg*<td>.dll*
383 <tr halign="left"><td>Windows<td>*<td>.dll
384 </table>
385 
386 ### Plugin Verification {#filters_Pluginverify}
387 
388 For each dynamic library located using the previous patterns,
389 HDF5 attempts to load the library and attempts to obtain
390 information from it. Specifically, It looks for two functions
391 with the following signatures.
392 
393 1. *H5PL\_type\_t H5PLget\_plugin\_type(void)* &mdash; This function is expected to return the constant value *H5PL\_TYPE\_FILTER* to indicate that this is a filter library.
394 2. *const void* H5PLget\_plugin\_info(void)* &mdash; This function returns a pointer to a table of type *H5Z\_class2\_t*.
395  This table contains the necessary information needed to utilize the filter both for reading and for writing.
396  In particular, it specifies the filter id implemented by the library and it must match that id specified for the variable in *nc\_def\_var\_filter* in order to be used.
397 
398 If plugin verification fails, then that plugin is ignored and the search continues for another, matching plugin.
399 
400 ## NCZarr Filter Support {#filters_nczarr}
401 
402 The inclusion of Zarr support in the netcdf-c library creates the need to provide a new representation consistent with the way that Zarr files store filter information.
403 For Zarr, filters are represented using the JSON notation.
404 Each filter is defined by a JSON dictionary, and each such filter dictionary
405 is guaranteed to have a key named "id" whose value is a unique string defining the filter algorithm: "lz4" or "bzip2", for example.
406 
407 The parameters of the filter are defined by additional &mdash; algorithm specific &mdash; keys in the filter dictionary.
408 One commonly used filter is "blosc", which has a JSON dictionary of this form.
409 ````
410  {
411  "id": "blosc",
412  "cname": "lz4",
413  "clevel": 5,
414  "shuffle": 1
415  }
416 ````
417 So it has three parameters:
418 
419 1. "cname" &mdash; the sub-algorithm used by the blosc compressor, LZ4 in this case.
420 2. "clevel" &mdash; the compression level, 5 in this case.
421 3. "shuffle" &mdash; is the input shuffled before compression, yes (1) in this case.
422 
423 NCZarr has four constraints that must be met.
424 
425 1. It must store its filter information in its metadata in the above JSON dictionary format.
426 2. It is required to re-use the HDF5 filter implementations.
427 This is to avoid having to rewrite the filter implementations
428 This means that some mechanism is needed to translate between the HDF5 id+parameter model and the Zarr JSON dictionary model.
429 3. It must be possible to modify the set of visible parameters in response to environment information such as the type of the associated variable; this is required to mimic the corresponding HDF5 capability.
430 4. It must be possible to use filters even if HDF5 support is disabled.
431 
432 Note that the term "visible parameters" is used here to refer to the parameters provided by `nc_def_var_filter` or those stored in the dataset's metadata as provided by the JSON codec. The term "working parameters" refers to the parameters given to the compressor itself and derived from the visible parameters.
433 
434 The standard authority for defining Zarr filters is the list supported by the NumCodecs project [7].
435 Comparing the set of standard filters (aka codecs) defined by NumCodecs to the set of standard filters defined by HDF5 [3], it can be seen that the two sets overlap, but each has filters not defined by the other.
436 
437 Note also that it is undesirable that a specific set of filters/codecs be built into the NCZarr implementation.
438 Rather, it is preferable for there be some extensible way to associate the JSON with the code implementing the codec. This mirrors the plugin model used by HDF5.
439 
440 The mechanism provided to address these issues is similar to that taken by HDF5.
441 A shared library must exist that has certain well-defined entry points that allow the NCZarr code to determine information about a Codec.
442 The shared library exports a well-known function name to access Codec information and relate it to a corresponding HDF5 implementation,
443 Note that the shared library may optionally be the same library containing the HDF5
444 filter processor.
445 
446 ### Processing Overview
447 
448 There are several paths by which the NCZarr filter API is invoked.
449 
450 1. The nc\_def\_var\_filter function is invoked on a variable or
451 (1a) the metadata for a variable is read when opening an existing variable that has associated Codecs.
452 2. The visible parameters are converted to a set of working parameters.
453 3. The filter is invoked with the working parameters.
454 4. The dataset is closed using the final set of visible parameters.
455 
456 #### Step 1: Invoking nc\_def\_var\_filter
457 
458 In this case, the filter plugin is located and the set of visible parameters (from nc\_def\_var\_filter) are provided.
459 
460 #### Step 1a: Reading metadata
461 
462 In this case, the codec is read from the metadata and must be converted to a visible set of HDF5 style parameters.
463 It is possible that this set of visible parameters differs from the set that was provided by nc\_def\_var\_filter.
464 If this is important, then the filter implementation is responsible for marking this difference using, for example, different number of parameters or some differing value.
465 
466 #### Step 2: Convert visible parameters to working parameters
467 
468 Given environmental information such as the associated variable's base type, the visible parameters
469 are converted to a potentially larger set of working parameters; additionally provide the opportunity
470 to modify the visible parameters.
471 
472 #### Step 3: Invoking the filter
473 
474 As chunks are read or written, the filter is repeatedly invoked using the working parameters.
475 
476 #### Step 4: Closing the dataset
477 
478 The visible parameters from step 2 are stored in the dataset's metadata.
479 It is desirable to determine if the set of visible parameters changes.
480 If no change is detected, then re-writing the compressor metadata may be avoided.
481 
482 ### Client API
483 
484 Currently, there is no way to specify use of a filter via Codec through
485 the netcdf-c API. Rather, one must know the HDF5 id and parameters of
486 the filter of interest and use the functions *nc\_def\_var\_filter* and *nc\_inq\_var\_filter*.
487 Internally, the NCZarr code will use information about known Codecs to convert the HDF5 filter reference to the corresponding Codec.
488 This restriction also holds for the specification of filters in *ncgen* and *nccopy*.
489 This limitation may be lifted in the future.
490 
491 ### Special Codecs Attribute
492 
493 A new special attribute is defined called *\_Codecs* in parallel to the current *\_Filters* special attribute. Its value is a string containing the JSON representation of the Codecs associated with a given variable.
494 This can be especially useful when a file is unreadable because it uses a filter not available to the netcdf-c library.
495 That is, no implementation was found in the e.g. *HDF5\_PLUGIN\_PATH* directory.
496 In this case *ncdump -hs* will display the raw Codec information so that it may be possible to see what filter is missing.
497 
498 ### Pre-Processing Filter Libraries
499 
500 The process for using filters for NCZarr is defined to operate in several steps.
501 First, as with HDF5, all shared libraries in a specified directory
502 (e.g. *HDF5\_PLUGIN\_PATH*) are scanned.
503 They are interrogated to see what kind of library they implement, if any.
504 This interrogation operates by seeing if certain well-known (function) names are defined in this library.
505 
506 There will be two library types:
507 
508 1. HDF5 &mdash; exports a specific API: `H5Z_plugin_type` and `H5Z_get_plugin_info`.
509 2. Codec &mdash; exports a specific API: `NCZ_get_codec_info`
510 
511 Note that a given library can export either or both of these APIs.
512 This means that we can have three types of libraries:
513 
514 1. HDF5 only
515 2. Codec only
516 3. HDF5 + Codec
517 
518 Suppose that our *HDF5\_PLUGIN\_PATH* location has an HDF5-only library.
519 Then by adding a corresponding, separate, Codec-only library to that same location, it is possible to make an HDF5 library usable by NCZarr.
520 It is possible to do this without having to modify the HDF5-only library.
521 Over time, it is possible to merge an HDF5-only library with a Codec-only library to produce a single, combined library.
522 
523 ### Using Plugin Libraries
524 
525 The netcdf-c library processes all of the shared libraries by interrogating each one for the well-known APIs and recording the result.
526 Any libraries that do not export one or both of the well-known APIs is ignored.
527 
528 Internally, the netcdf-c library pairs up each HDF5 library API with a corresponding Codec API by invoking the relevant well-known functions
529 (See [Appendix E](#filters_appendixe).
530 This results in this table for associated codec and hdf5 libraries.
531 <table>
532 <tr><th>HDF5 API<th>Codec API<th>Action
533 <tr><td>Not defined<td>Not defined<td>Ignore
534 <tr><td>Defined<td>Not defined<td>Ignore
535 <tr><td>Defined<td>Defined<td>NCZarr usable
536 </table>
537 
538 ### Filter Defaults Library
539 
540 As a special case, a shared library may be created to hold
541 defaults for a common set of filters.
542 Basically, there is a specially defined function that returns
543 a vector of codec APIs. These defaults are used only if
544 no other library provides codec information for a filter.
545 Currently, the defaults library provides codec defaults
546 for Shuffle, Fletcher32, Deflate (zlib), and SZIP.
547 
548 ### Using the Codec API
549 
550 Given a set of filters for which the HDF5 API and the Codec API
551 are defined, it is then possible to use the APIs to invoke the
552 filters and to process the meta-data in Codec JSON format.
553 
554 #### Writing an NCZarr Container
555 
556 When writing, the user program will invoke the NetCDF API function *nc\_def\_var\_filter*.
557 This function is currently defined to operate using HDF5-style id and parameters (unsigned ints).
558 The netcdf-c library examines its list of known filters to find one matching the HDF5 id provided by *nc\_def\_var\_filter*.
559 The set of parameters provided is stored internally.
560 Then during writing of data, the corresponding HDF5 filter is invoked to encode the data.
561 
562 When it comes time to write out the meta-data, the stored HDF5-style parameters are passed to a specific Codec function to obtain the corresponding JSON representation. Again see [Appendix E](#filters_appendixe).
563 This resulting JSON is then written in the NCZarr metadata.
564 
565 #### Reading an NCZarr Container
566 
567 When reading, the netcdf-c library will read the metadata for a given variable and will see that some set of filters are applied to this variable.
568 The metadata is encoded as Codec-style JSON.
569 
570 Given a JSON Codec, it is parsed to provide a JSON dictionary containing the string "id" and the set of parameters as various keys.
571 The netcdf-c library examines its list of known filters to find one matching the Codec "id" string.
572 The JSON is passed to a Codec function to obtain the corresponding HDF5-style *unsigned int* parameter vector.
573 These parameters are stored for later use.
574 
575 ### Supporting Filter Chains
576 
577 HDF5 supports *filter chains*, which is a sequence of filters where the output of one filter is provided as input to the next filter in the sequence.
578 When encoding, the filters are executed in the "forward" direction,
579 while when decoding the filters are executed in the "reverse" direction.
580 
581 In the Zarr meta-data, a filter chain is divided into two parts:
582 the "compressor" and the "filters". The former is a single JSON codec
583 as described above. The latter is an ordered JSON array of codecs.
584 So if compressor is something like
585  "compressor": {"id": "c"...}
586 and the filters array is like this:
587  "filters": [ {"id": "f1"...}, {"id": "f2"...}...{"id": "fn"...}]
588 then the filter chain is (f1,f2,...fn,c) with f1 being applied first and c being applied last when encoding. On decode, the filter chain is executed in the order (c,fn...f2,f1).
589 
590 So, an HDF5 filter chain is divided into two parts, where the last filter in the chain is assigned as the "compressor" and the remaining
591 filters are assigned as the "filters".
592 But independent of this, each codec, whether a compressor or a filter,
593 is stored in the JSON dictionary form described earlier.
594 
595 ### Extensions
596 
597 The Codec style, using JSON, has the ability to provide very complex parameters that may be hard to encode as a vector of unsigned integers.
598 It might be desirable to consider exporting a JSON-base API out of the netcdf-c API to support user access to this complexity.
599 This would mean providing some alternate version of `nc_def_var_filter` that takes a string-valued argument instead of a vector of unsigned ints.
600 This extension is unlikely to be implemented until a compelling use-case is encountered.
601 
602 One bad side-effect of this is that we then may have two classes of plugins.
603 One class can be used by both HDF5 and NCZarr, and a second class that is usable only with NCZarr.
604 
605 ### Using The NetCDF-C Plugins
606 
607 As part of its testing, the NetCDF build process creates a number of shared libraries in the *netcdf-c/plugins* (or sometimes *netcdf-c/plugins/.libs*) directory.
608 If you need a filter from that set, you may be able to set *HDF5\_PLUGIN\_PATH*
609 to point to that directory or you may be able to copy the shared libraries out of that directory to your own location.
610 
611 # Lossy One-Way Filters
612 
613 As of NetCDF version 4.8.2, the netcdf-c library supports
614 bit-grooming filters.
615 
616  Bit-grooming is a lossy compression algorithm that removes the
617  bloat due to false-precision, those bits and bytes beyond the
618  meaningful precision of the data. Bit Grooming is statistically
619  unbiased, applies to all floating point numbers, and is easy to
620  use. Bit-Grooming reduces data storage requirements by
621  25-80%. Unlike its best-known competitor Linear Packing, Bit
622  Grooming imposes no software overhead on users, and guarantees
623  its precision throughout the whole floating point range
624  [https://doi.org/10.5194/gmd-9-3199-2016].
625 
626 The generic term "quantize" is used to refer collectively to the various
627 precision-trimming algorithms. The key thing to note about quantization is that
628 it occurs at the point of writing of data only. Since its output is
629 legal data, it does not need to be "de-quantized" when the data is read.
630 Because of this, quantization is not part of the standard filter
631 mechanism and has a separate API.
632 
633 The API for bit-groom is currently as follows.
634 
635 ```
636 int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd);
637 int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp);
638 ```
639 The *quantize_mode* argument specifies the particular algorithm.
640 Currently, three are supported: NC\_QUANTIZE\_BITGROOM, NC\_QUANTIZE\_GRANULARBR,
641 and NC\_QUANTIZE\_BITROUND. In addition quantization can be disabled using
642 the value NC\_NOQUANTIZE.
643 
644 The input to ncgen or the output from ncdump supports special attributes
645 to indicate if quantization was applied to a given variable.
646 These attributes have the following form.
647 
648 ````
649 _QuantizeBitGroomNumberOfSignificantDigits = <NSD>
650 or
651 _QuantizeGranularBitRoundNumberOfSignificantDigits = <NSD>
652 or
653 _QuantizeBitRoundNumberOfSignificantBits = <NSB>
654 ````
655 The value NSD is the number of significant (decimal) digits to keep.
656 The value NSB is the number of bits to keep in the fraction part of an
657 IEEE754 floating-point number. Note that NSB of QuantizeBitRound is the same as
658 "number of explicit mantissa bits" (https://doi.org/10.5194/gmd-9-3199-2016) and same as
659 the number of "keep-bits" (https://doi.org/10.5194/gmd-14-377-2021), but is not
660 one less than the number of significant bunary figures:
661 `_QuantizeBitRoundNumberOfSignificantBits = 0` means one significant binary figure,
662 `_QuantizeBitRoundNumberOfSignificantBits = 1` means two significant binary figures etc.
663 
664 ## Distortions introduced by lossy filters
665 
666  Any lossy filter introduces distortions to data.
667  The lossy filters implemented in netcdf-c introduce a distortoin
668  that can be quantified in terms of a _relative_ error. The magnitude of
669  distortion introduced to every single value V is guaranteed to be within
670  a certain fraction of V, expressed as 0.5 * V * 2**{-NSB}:
671  i.e. it is 0.5V for NSB=0, 0.25V for NSB=1, 0.125V for NSB=2 etc.
672 
673 
674  Two other methods use different definitions of _decimal precision_, though both
675  are guaranteed to reproduce NSD decimals when printed.
676  The margin for a relative error introduced by the methods are summarised in the table
677 
678  ```
679  NSD 1 2 3 4 5 6 7
680 
681  BitGroom
682  Error Margin 3.1e-2 3.9e-3 4.9e-4 3.1e-5 3.8e-6 4.7e-7 -
683 
684  GranularBitRound
685  Error Margin 1.4e-1 1.9e-2 2.2e-3 1.4e-4 1.8e-5 2.2e-6 -
686 
687  ```
688 
689 
690  If one defines decimal precision as in BitGroom, i.e. the introduced relative
691  error must not exceed half of the unit at the decimal place NSD in the
692  worst-case scenario, the following values of NSB should be used for BitRound:
693 
694  ```
695  NSD 1 2 3 4 5 6 7
696  NSB 3 6 9 13 16 19 23
697  ```
698 
699  The resulting application of BitRound is as fast as BitGroom, and is free from
700  artifacts in multipoint statistics introduced by BitGroom
701  (see https://doi.org/10.5194/gmd-14-377-2021).
702 
703 
704 # Debugging {#filters_debug}
705 
706 
707 Depending on the debugger one uses, debugging plugins can be very difficult.
708 It may be necessary to use the old printf approach for debugging the filter itself.
709 
710 One case worth mentioning is when there is a dataset that is using an unknown filter.
711 For this situation, you need to identify what filter(s) are used in the dataset.
712 This can be accomplished using this command.
713 
714  ncdump -s -h <dataset filename>
715 
716 Since ncdump is not being asked to access the data (the -h flag), it can obtain the filter information without failures.
717 Then it can print out the filter id and the parameters as well as the Codecs (via the -s flag).
718 
719 ## Test Cases {#filters_TestCase}
720 
721 Within the netcdf-c source tree, the directory two directories contain test cases for testing dynamic filter operation.
722 
723 * *netcdf-c/nc\_test4* provides tests for testing HDF5 filters.
724 * *netcdf-c/nczarr\_test* provides tests for testing NCZarr filters.
725 
726 These tests are disabled if *--disable-shared* or if *--disable-filter-tests* is specified
727 or if *--disable-plugins* is specified.
728 
729 ### HDF5 Example {#filters_Example}
730 
731 A slightly simplified version of one of the HDF5 filter test cases is also available as an example within the netcdf-c source tree directory *netcdf-c/examples/C*.
732 The test is called *filter\_example.c* and it is executed as part of the *run\_examples4.sh* shell script.
733 The test case demonstrates dynamic filter writing and reading.
734 
735 The files *example/C/hdf5plugins/Makefile.am* and *example/C/hdf5plugins/CMakeLists.txt* demonstrate how to build the hdf5 plugin for bzip2.
736 
737 ## Notes
738 
739 ### Order of Invocation for Multiple Filters
740 
741 When multiple filters are defined on a variable, the order of application, when writing data to the file, is same as the order in which *nc\_def\_var\_filter*is called.
742 When reading a file the order of application is of necessity the reverse.
743 
744 There are some special cases.
745 
746 1. The fletcher32 filter is always applied first, if enabled.
747 2. If *nc\_def\_var\_filter*or *nc\_def\_var\_deflate*or *nc\_def\_var\_szip*is called multiple times with the same filter id, but possibly with different sets of parameters, then the position of that filter in the sequence of applictions does not change.
748  However the last set of parameters specified is used when actually writing the dataset.
749 3. Deflate and shuffle &mdash; these two are inextricably linked in the current API, but have quite different semantics.
750  If you call *nc\_def\_var\_deflate*multiple times, then the previous rule applies with respect to deflate.
751  However, the shuffle filter, if enabled, is *always* applied before applying any other filters, except fletcher32.
752 4. Once a filter is defined for a variable, it cannot be removed nor can its position in the filter order be changed.
753 
754 ### Memory Allocation Issues
755 
756 Starting with HDF5 version 1.10.*, the plugin code MUST be careful when using the standard *malloc()*, *realloc()*, and *free()* function.
757 
758 In the event that the code is allocating, reallocating, for
759 free'ing memory that either came from or will be exported to the
760 calling HDF5 library, then one MUST use the corresponding HDF5
761 functions *H5allocate\_memory()*, *H5resize\_memory()*,
762 *H5free\_memory()* [5] to avoid memory failures.
763 
764 Additionally, if your filter code leaks memory, then the HDF5 library generates a failure something like this.
765 
766  H5MM.c:232: H5MM_final_sanity_check: Assertion `0 == H5MM_curr_alloc_bytes_s' failed.
767 
768 One can look at the the code in plugins/H5Zbzip2.c and H5Zmisc.c as illustrations.
769 
770 ### SZIP Issues
771 
772 The current szip plugin code in the HDF5 library has some behaviors that can catch the unwary.
773 These are handled internally to (mostly) hide them so that they should not affect users.
774 Specifically, this filter may do two things.
775 
776 1. Add extra parameters to the filter parameters: going from the two parameters provided by the user to four parameters for internal use.
777  It turns out that the two parameters provided when calling nc\_def\_var\_filter correspond to the first two parameters of the four parameters returned by nc\_inq\_var\_filter.
778 2. Change the values of some parameters: the value of the *options\_mask* argument is known to add additional flag bits, and the *pixels\_per\_block* parameter may be modified.
779 
780 The reason for these changes is has to do with the fact that the szip API provided by the underlying H5Pset\_szip function is actually a subset of the capabilities of the real szip implementation.
781 Presumably this is for historical reasons.
782 
783 In any case, if the caller uses the *nc\_inq\_var\_szip* or the *nc\_inq\_var\_filter* functions, then the parameter values returned may differ from those originally specified.
784 
785 It should also be noted that the HDF5 szip filter wrapper that
786 is invoked depends on the configuration of the netcdf-c library.
787 If the HDF5 installation supports szip, then the NCZarr szip
788 will use the HDF5 wrapper. If HDF5 does not support szip, or HDF5
789 is not enabled, then the plugins directory will contain a local
790 HDF5 szip wrapper to be used by NCZarr. This can be confusing,
791 but is generally transparent to the use since the plugins
792 HDF5 szip wrapper was taken from the HDF5 code base.
793 
794 ### Supported Systems
795 
796 The current matrix of OS X build systems known to work is as follows.
797 <table>
798 <tr><th>Build System<th>Supported OS
799 <tr><td>Automake<td>Linux, Cygwin, OSX
800 <tr><td>Cmake<td>Linux, Cygwin, OSX, Visual Studio
801 </table>
802 
803 ### Generic Plugin Build
804 If you do not want to use Automake or Cmake, the following has been known to work.
805 
806  gcc -g -O0 -shared -o libbzip2.so <plugin source files> -L${HDF5LIBDIR} -lhdf5\_hl -lhdf5 -L${ZLIBDIR} -lz
807 
808 ## References {#filters_References}
809 
810 1. [https://support.hdfgroup.org/HDF5/doc/Advanced/DynamicallyLoadedFilters/HDF5DynamicallyLoadedFilters.pdf]()
811 2. [https://support.hdfgroup.org/HDF5/doc/TechNotes/TechNote-HDF5-CompressionTroubleshooting.pdf]()
812 3.[ https://portal.hdfgroup.org/display/support/Registered+Filter+Plugins]()
813 4. [https://support.hdfgroup.org/services/contributions.html#filters]()
814 5. [https://support.hdfgroup.org/HDF5/doc/RM/RM\_H5.html]()
815 6. [https://confluence.hdfgroup.org/display/HDF5/Filters
816 ]()
817 7. [https://numcodecs.readthedocs.io/en/stable/]()
818 8. [https://github.com/ccr/ccr]()
819 9. [https://escholarship.org/uc/item/7xd1739k]()
820 
821 ## Appendix A. HDF5 Parameter Encode/Decode {#filters_appendixa}
822 
823 The filter id for an HDF5 format filter is an unsigned integer.
824 Further, the parameters passed to an HDF5 format filter are encoded internally as a vector of 32-bit unsigned integers.
825 It may be that the parameters required by a filter can naturally be encoded as unsigned integers.
826 The bzip2 compression filter, for example, expects a single integer value from zero thru nine.
827 This encodes naturally as a single unsigned integer.
828 
829 Note that signed integers and single-precision (32-bit) float values also can easily be represented as 32 bit unsigned integers by proper casting to an unsigned integer so that the bit pattern is preserved.
830 Simple signed integer values of type short or char can also be mapped to an unsigned integer by truncating to 16 or 8 bits respectively and then sign extending. Similarly, unsigned 8 and 16 bit
831 values can be used with zero extensions.
832 
833 Machine byte order (aka endian-ness) is an issue for passing some kinds of parameters.
834 You might define the parameters when compressing on a little endian machine, but later do the decompression on a big endian machine.
835 
836 When using HDF5 format filters, byte order is not an issue for 32-bit values because HDF5 takes care of converting them between the local machine byte order and network byte order.
837 
838 Parameters whose size is larger than 32-bits present a byte order problem.
839 This specifically includes double precision floats and (signed or unsigned) 64-bit integers.
840 For these cases, the machine byte order issue must be handled, in part, by the compression code.
841 This is because HDF5 will treat, for example, an unsigned long long as two 32-bit unsigned integers and will convert each to network order separately.
842 This means that on a machine whose byte order is different than the machine in which the parameters were initially created, the two integers will be separately
843 endian converted.
844 But this will be incorrect for 64-bit values.
845 
846 So, we have this situation (for HDF5 only):
847 
848 1. the 8 bytes start as native machine order for the machine doing the call to *nc\_def\_var\_filter*.
849 2. The caller divides the 8 bytes into 2 four byte pieces and passes them to *nc\_def\_var\_filter*.
850 3. HDF5 takes each four byte piece and ensures that each piece is in network (big) endian order.
851 4. When the filter is called, the two pieces are returned in the same order but with the bytes in each piece consistent with the native machine order for the machine executing the filter.
852 
853 ### Encoding Algorithms for HDF5
854 
855 In order to properly extract the correct 8-byte value, we need to ensure that the values stored in the HDF5 file have a known format independent of the native format of the creating machine.
856 
857 The idea is to do sufficient manipulation so that HDF5 will store the 8-byte value as a little endian value divided into two 4-byte integers.
858 Note that little-endian is used as the standard because it is the most common machine format.
859 When read, the filter code needs to be aware of this convention and do the appropriate conversions.
860 
861 This leads to the following set of rules.
862 
863 #### Encoding
864 
865 1. Encode on little endian (LE) machine: no special action is required.
866  The 8-byte value is passed to HDF5 as two 4-byte integers.
867  HDF5 byte swaps each integer and stores it in the file.
868 2. Encode on a big endian (BE) machine: several steps are required:
869 
870  1. Do an 8-byte byte swap to convert the original value to little-endian format.
871  2. Since the encoding machine is BE, HDF5 will just store the value.
872  So it is necessary to simulate little endian encoding by byte-swapping each 4-byte integer separately.
873  3. This doubly swapped pair of integers is then passed to HDF5 and is stored unchanged.
874 
875 #### Decoding
876 
877 1. Decode on LE machine: no special action is required.
878  HDF5 will get the two 4-bytes values from the file and byte-swap each separately.
879  The concatenation of those two integers will be the expected LE value.
880 2. Decode on a big endian (BE) machine: the inverse of the encode case must be implemented.
881 
882  1. HDF5 sends the two 4-byte values to the filter.
883  2. The filter must then byte-swap each 4-byte value independently.
884  3. The filter then must concatenate the two 4-byte values into a single 8-byte value.
885  Because of the encoding rules, this 8-byte value will be in LE format.
886  4. The filter must finally do an 8-byte byte-swap on that 8-byte value to convert it to desired BE format.
887 
888 To support these rules, some utility programs exist and are discussed in [Appendix B](#filters_appendixb).
889 
890 ## Appendix B. Support Utilities {#filters_appendixb}
891 
892 Several functions are exported from the netcdf-c library for use by client programs and by filter implementations.
893 They are defined in the header file *netcdf\_aux.h*.
894 The h5 tag indicates that they assume that the result of the parse is a set of unsigned integers &mdash; the format used by HDF5.
895 
896 1. *int ncaux\_h5filterspec\_parse(const char* txt, unsigned int* idp. size\_t* nparamsp, unsigned int** paramsp);*
897  * txt contains the text of a sequence of comma separated constants
898  * idp will contain the first constant &mdash; the filter id
899  * nparamsp will contain the number of params
900  * paramsp will contain a vector of params &mdash; the caller must free
901 This function can parse single filter spec strings as defined in the section on [Filter Specification Syntax](#filters_syntax).
902 2. *int ncaux\_h5filterspec\_parselist(const char* txt, int* formatp, size\_t* nspecsp, struct NC\_H5\_Filterspec*** vectorp);*
903  * txt contains the text of a sequence '|' separated filter specs.
904  * formatp currently always returns 0.
905  * nspecsp will return the number of filter specifications.
906  * vectorp will return a pointer to a vector of pointers to filter specification instances &mdash; the caller must free.
907 This function parses a sequence of filter specifications each separated by a '|' character.
908 The text between '|' separators must be parsable by *ncaux\_h5filterspec\_parse*.
909 3. *void ncaux\_h5filterspec\_free(struct NC\_H5\_Filterspec* f);*
910  * f is a pointer to an instance of *struct NC\_H5\_Filterspec*
911  Typically this was returned as an element of the vector returned
912  by *\_ncaux\_h5filterspec\_parselist*.
913 This reclaims the parameters of the filter spec object as well as the object itself.
914 4. *int ncaux\_h5filterspec\_fix8(unsigned char* mem8, int decode);*
915  * mem8 is a pointer to the 8-byte value either to fix.
916  * decode is 1 if the function should apply the 8-byte decoding algorithm
917  else apply the encoding algorithm.
918 This function implements the 8-byte conversion algorithms for HDF5.
919 Before calling *nc\_def\_var\_filter* (unless *NC\_parsefilterspec* was used), the client must call this function with the decode argument set to 0.
920 Inside the filter code, this function should be called with the decode argument set to 1.
921 
922 Examples of the use of these functions can be seen in the test program *nc\_test4/tst\_filterparser.c*.
923 
924 Some of the above functions use a C struct defined in *netcdf\_filter.h\_.
925 The definition of that struct is as follows.
926 
927 ````
928 typedef struct NC_H5_Filterspec {
929  unsigned int filterid; /* ID for arbitrary filter. */
930  size_t nparams; /* nparams for arbitrary filter. */
931  unsigned int* params; /* Params for arbitrary filter. */
932 } NC_H5_Filterspec;
933 ````
934 This struct in effect encapsulates all of the information about and HDF5 formatted filter &mdash; the id, the number of parameters, and the parameters themselves.
935 
936 ## Appendix C. Build Flags for Detecting the Filter Mechanism {#filters_appendixc}
937 
938 The include file *netcdf\_meta.h* contains the following definition.
939 ````
940  #define NC_HAS_MULTIFILTERS 1
941 ````
942 This, in conjunction with the error code *NC\_ENOFILTER* in *netcdf.h* can be used to see what filter mechanism is in place as described in the section on [incompatibities](#filters_compatibility).
943 
944 1. !defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the old pre-4.7.4 mechanism is in place.
945  It does not support multiple filters.
946 2. defined(NC\_ENOFILTER) && !defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the 4.7.4 mechanism is in place.
947  It does support multiple filters, but the error return codes for *nc\_inq\_var\_filter* are different and the filter spec parser functions are in a different location with different names.
948 3. defined(NC\_ENOFILTER) && defined(NC\_HAS\_MULTIFILTERS) &mdash; indicates that the multiple filters are supported, and that *nc\_inq\_var\_filter* returns a filterid of zero to indicate that a variable has no filters.
949  Also, the filter spec parsers have the names and signatures described in this document and are define in *netcdf\_aux.h*.
950 
951 ## Appendix D. BNF for Specifying Filters in Utilities {#filters_appendixd}
952 
953 ````
954 speclist: spec
955  | speclist '|' spec
956  ;
957 spec: filterid
958  | filterid ',' parameterlist
959  ;
960 filterid: unsigned32
961  ;
962 parameterlist: parameter
963  | parameterlist ',' parameter
964  ;
965 parameter: unsigned32
966 
967 where
968 unsigned32: <32 bit unsigned integer>
969 ````
970 
971 ## Appendix E. Codec API {#filters_appendixe}
972 
973 The Codec API mirrors the HDF5 API closely. It has one well-known function that can be invoked to obtain information about the Codec as well as pointers to special functions to perform conversions.
974 
975 ### The Codec Plugin API
976 
977 #### NCZ\_get\_codec\_info
978 
979 This function returns a pointer to a C struct that provides detailed information about the codec plugin.
980 
981 ##### Signature
982 ````
983  void* NCZ_get_codec_info(void);
984 ````
985 The value returned is actually of type *struct NCZ\_codec\_t*,
986 but is of type *void\** to allow for extensions.
987 
988 #### NCZ\_codec\_t
989 ````
990 typedef struct NCZ_codec_t {
991  int version; /* Version number of the struct */
992  int sort; /* Format of remainder of the struct;
993  Currently always NCZ_CODEC_HDF5 */
994  const char* codecid; /* The name/id of the codec */
995  unsigned int hdf5id; /* corresponding hdf5 id */
996  void (*NCZ_codec_initialize)(void);
997  void (*NCZ_codec_finalize)(void);
998  int (*NCZ_codec_to_hdf5)(const char* codec, int* nparamsp, unsigned** paramsp);
999  int (*NCZ_hdf5_to_codec)(size_t nparams, const unsigned* params, char** codecp);
1000  int (*NCZ_modify_parameters)(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* nparamsp, unsigned** paramsp);
1001 } NCZ_codec_t;
1002 ````
1003 
1004 The semantics of the non-function fields is as follows:
1005 
1006 1. *version* &mdash; Version number of the struct.
1007 2. *sort* &mdash; Format of remainder of the struct; currently always NCZ\_CODEC\_HDF5.
1008 3. *codecid* &mdash; The name/id of the codec.
1009 4. *hdf5id* &mdash; The corresponding hdf5 id.
1010 
1011 #### NCZ\_codec\_to\_hdf5
1012 
1013 Given a JSON Codec representation, it will return a corresponding vector of unsigned integers representing the
1014 visible parameters.
1015 
1016 ##### Signature
1017 ````
1018  int NCZ_codec_to_hdf(const char* codec, int* nparamsp, unsigned** paramsp);
1019 ````
1020 ##### Arguments
1021 1. codec &mdash; (in) ptr to JSON string representing the codec.
1022 2. nparamsp &mdash; (out) store the length of the converted HDF5 unsigned vector
1023 3. paramsp &mdash; (out) store a pointer to the converted HDF5 unsigned vector; caller must free the returned vector. Note the double indirection.
1024 
1025 Return Value: a netcdf-c error code.
1026 
1027 #### NCZ\_hdf5\_to\_codec
1028 
1029 Given an HDF5 visible parameters vector of unsigned integers and its length,
1030 return a corresponding JSON codec representation of those visible parameters.
1031 
1032 ##### Signature
1033 ````
1034  int NCZ_hdf5_to_codec)(int ncid, int varid, size_t nparams, const unsigned* params, char** codecp);
1035 ````
1036 ##### Arguments
1037 
1038 1. ncid &mdash; the variables' containing group
1039 2. varid &mdash; the containing variable
1040 3. nparams &mdash; (in) the length of the HDF5 visible parameters vector
1041 4. params &mdash; (in) pointer to the HDF5 visible parameters vector.
1042 5. codecp &mdash; (out) store the string representation of the codec; caller must free.
1043 
1044 Return Value: a netcdf-c error code.
1045 
1046 #### NCZ\_modify\_parameters
1047 
1048 Extract environment information from the (ncid,varid) and use it to convert a set of visible parameters
1049 to a set of working parameters; also provide option to modify visible parameters.
1050 
1051 ##### Signature
1052 ````
1053  int NCZ_modify_parameters(int ncid, int varid, size_t* vnparamsp, unsigned** vparamsp, size_t* wnparamsp, unsigned** wparamsp);
1054 ````
1055 ##### Arguments
1056 
1057 1. ncid &mdash; (in) group id containing the variable.
1058 2. varid &mdash; (in) the id of the variable to which this filter is being attached.
1059 3. vnparamsp &mdash; (in/out) the count of visible parameters
1060 4. vparamsp &mdash; (in/out) the set of visible parameters
1061 5. wnparamsp &mdash; (out) the count of working parameters
1062 4. wparamsp &mdash; (out) the set of working parameters
1063 
1064 Return Value: a netcdf-c error code.
1065 
1066 #### NCZ\_codec\_initialize
1067 
1068 Some compressors may require library initialization.
1069 This function is called as soon as a shared library is loaded and matched with an HDF5 filter.
1070 
1071 ##### Signature
1072 ````
1073  int NCZ_codec_initialize)(void);
1074 ````
1075 Return Value: a netcdf-c error code.
1076 
1077 #### NCZ\_codec\_finalize
1078 
1079 Some compressors (like blosc) require invoking a finalize function in order to avoid memory loss.
1080 This function is called during a call to *nc\_finalize* to do any finalization.
1081 If the client code does not invoke *nc\_finalize* then memory checkers may complain about lost memory.
1082 
1083 ##### Signature
1084 ````
1085  int NCZ_codec_finalize)(void);
1086 ````
1087 Return Value: a netcdf-c error code.
1088 
1089 ### Multi-Codec API
1090 
1091 As an aid to clients, it is convenient if a single shared library can provide multiple *NCZ\_code\_t* instances at one time.
1092 This API is not intended to be used by plugin developers.
1093 A shared library must only export this function.
1094 
1095 #### NCZ\_codec\_info\_defaults
1096 
1097 Return a NULL terminated vector of pointers to instances of *NCZ\_codec\_t*.
1098 
1099 ##### Signature
1100 ````
1101  void* NCZ_codec_info_defaults(void);
1102 ````
1103 The value returned is actually of type *NCZ\_codec\_t***,
1104 but is of type *void** to allow for extensions.
1105 The list of returned items are used to try to provide defaults
1106 for any HDF5 filters that have no corresponding Codec.
1107 This is for internal use only.
1108 
1109 ## Appendix F. Standard Filters {#filters_appendixf}
1110 
1111 Support for a select set of standard filters is built into the NetCDF API.
1112 Generally, they are accessed using the following generic API, where XXXX is
1113 the filter name. As a rule, the names are those used in the HDF5 filter ID naming authority [4] or the NumCodecs naming authority [7].
1114 ````
1115 int nc_def_var_XXXX(int ncid, int varid, unsigned filterid, size_t nparams, unsigned* params);
1116 int nc_inq_var_XXXX(int ncid, int varid, int* hasfilter, size_t* nparamsp, unsigned* params);
1117 ````
1118 The first function inserts the specified filter into the filter chain for a given variable.
1119 The second function queries the given variable to see if the specified function
1120 is in the filter chain for that variable. The *hasfilter* argument is set
1121 to one if the filter is in the chain and zero otherwise.
1122 As is usual with the netcdf API, one is expected to call this function twice.
1123 The first time to set *nparamsp* and the second to get the parameters in the client-allocated memory argument *params*.
1124 Any of these arguments can be NULL, in which case no value is returned.
1125 
1126 Note that NetCDF inherits four filters from HDF5, namely shuffle, fletcher32, deflate (zlib), and szip. The API's for these do not conform to the above API.
1127 So aside from those four, the current set of standard filters is as follows.
1128 <table>
1129 <tr><th>Filter Name<th>Filter ID<th>Reference
1130 <tr><td>zstandard<td>32015<td>https://facebook.github.io/zstd/
1131 <tr><td>bzip2<td>307<td>https://sourceware.org/bzip2/
1132 </table>
1133 
1134 It is important to note that in order to use each standard filter, several additonal libraries must be installed.
1135 Consider the zstandard compressor, which is one of the supported standard filters.
1136 When installing the netcdf library, the following other libraries must be installed.
1137 
1138 1. *libzstd.so* | *zstd.dll* | *libzstd.dylib* -- The actual zstandard compressor library; typically installed by using your platform specific package manager.
1139 2. The HDF5 wrapper for *libzstd.so* -- There are several options for obtaining this (see [Appendix G](#filters_appendixg).)
1140 3. (Optional) The Zarr wrapper for *libzstd.so* -- you need this if you intend to read/write Zarr datasets that were compressed using zstandard; again see [Appendix G](#filters_appendixg).
1141 
1142 ## Appendix G. Finding Filter Implementations {#filters_appendixg}
1143 
1144 A major problem for filter users is finding an implementation of an HDF5 filter wrapper and (optionally)
1145 its corresponding NCZarr wrapper. There are several ways to do this.
1146 
1147 * **--with-plugin-dir** &mdash; An option to *./configure* that will install the necessary wrappers.
1148  See [Appendix H](#filters_appendixh).
1149 
1150 * **HDF5 Assigned Filter Identifiers Repository [3]** &mdash;
1151 HDF5 maintains a page of standard filter identifiers along with
1152 additional contact information. This often includes a pointer
1153 to source code. This will provide only HDF5 wrappers and not NCZarr wrappers.
1154 
1155 * **Community Codec Repository** &mdash;
1156 The Community Codec Repository (CCR) project [8] provides
1157 filters, including HDF5 wrappers, for a number of filters.
1158 It does not as yet provide Zarr wrappers.
1159 You can install this library to get access to these supported filters.
1160 It does not currently include the required NCZarr Codec API,
1161 so they are only usable with netcdf-4. This will change in the future.
1162 
1163 ## Appendix H. Auto-Install of Filter Wrappers {#filters_appendixh}
1164 
1165 As part of the overall build process, a number of filter wrappers are built as shared libraries in the "plugins" directory.
1166 These wrappers can be installed as part of the overall netcdf-c installation process.
1167 WARNING: the installer still needs to make sure that the actual filter/compression libraries are installed: e.g. libzstd and/or libblosc.
1168 See the document *pluginpaths.md* for details on the installation process.
1169 If NCZarr is enabled, then in addition to wrappers for the standard filters,
1170 additional libraries will be installed to support NCZarr access to filters.
1171 Currently, this list includes the following:
1172 
1173 * shuffle &mdash; shuffle filter
1174 * fletcher32 &mdash; fletcher32 checksum
1175 * deflate &mdash; deflate compression
1176 * (optional) szip &mdash; szip compression, if libsz is available
1177 * bzip2 &mdash; an HDF5 filter for bzip2 compression
1178 * lib__nczh5filters.so &mdash; provide NCZarr support for shuffle, fletcher32, deflate, and (optionally) szip.
1179 * lib__nczstdfilters.so &mdash; provide NCZarr support for bzip2, (optionally)zstandard, and (optionally) blosc.
1180 
1181 The shuffle, fletcher32, and deflate filters in this case will
1182 be ignored by HDF5 and only used by the NCZarr code. But in
1183 order to use them, it needs additional Codec capabilities
1184 provided by the *lib__nczh5filters.so* shared library. Note also that
1185 if you disable HDF5 support, but leave NCZarr support enabled,
1186 then all of the above filters should continue to work.
1187 
1188 ## Appendix I. A Warning on Backward Compatibility {#filters_appendixi}
1189 
1190 The API defined in this document should accurately reflect the
1191 current state of filters in the netCDF-c library. Be aware that
1192 there was a short period in which the filter code was undergoing
1193 some revision and extension. Those extensions have largely been
1194 reverted. Unfortunately, some users may experience some
1195 compilation problems for previously working code because of
1196 these reversions. In that case, please revise your code to
1197 adhere to this document. Apologies are extended for any
1198 inconvenience.
1199 
1200 A user may encounter an incompatibility if any of the following appears in user code.
1201 
1202 * The function *nc\_inq\_var\_filter* was returning the error value NC\_ENOFILTER if a variable had no associated filters.
1203  It has been reverted to the previous case where it returns NC\_NOERR and the returned filter id was set to zero if the variable had no filters.
1204 * The function *nc\_inq\_var\_filterids* was renamed to *nc\_inq\_var\_filter\_ids*.
1205 * Some auxilliary functions for parsing textual filter specifications have been moved to the file *netcdf\_aux.h*. See [Appendix A](#filters_appendixa).
1206 * All of the "filterx" functions have been removed. This is unlikely to cause problems because they had limited visibility.
1207 
1208 For additional information, see [Appendix B](#filters_appendixb).
1209 
1210 ## History {#filters_history}
1211 
1212 *Author*: Dennis Heimbigner<br>
1213 *Email*: dennis.heimbigner@gmail.com<br>
1214 *Initial Version*: 1/10/2018<br>
1215 *Last Revised*: 5/18/2022