1 Notes On the Internals of the NetCDF-C Library
2 ============================
3 <!-- double header is needed to workaround doxygen bug -->
5 # Notes On the Internals of the NetCDF-C Library {#intern_head}
7 This document attempts to record important information about
8 the internal architecture and operation of the netcdf-c library.
9 It covers the following issues.
11 * [Including C++ Code in the netcdf-c Library](#intern_cpp)
12 * [Managing instances of variable-length data types](#intern_vlens)
13 * [Inferring File Types](#intern_infer)
14 * [Adding a Standard Filter](#intern_filters)
15 * [Test Interference](#intern_isolation)
17 # 1. Including C++ Code in the netcdf-c Library {#intern_cpp}
19 The state of C compiler technology has reached the point where
20 it is possible to include C++ code into the netcdf-c library
21 code base. Two examples are:
23 1. The AWS S3 SDK wrapper *libdispatch/ncs3sdk.cpp* file.
24 2. The TinyXML wrapper *ncxml\_tinyxml2.cpp* file.
26 However there are some consequences that must be handled for this to work.
27 Specifically, the compiler must be told that the C++ runtime is needed
28 in the following ways.
30 ## Modifications to *lib\_flags.am*
31 Suppose we have a flag *ENABLE\_XXX* where that XXX
32 feature entails using C++ code. Then the following must be added
36 AM_LDFLAGS += -lstdc++
40 ## Modifications to *libxxx/Makefile.am*
42 The Makefile in which the C++ code is included and compiled
43 (assumed here to be the *libxxx* directory) must have this set.
45 AM_CXXFLAGS = -std=c++11
47 It is possible that other values (e.g. *-std=c++14*) may also work.
49 # 2. Managing instances of variable-length data types {#intern_vlens}
51 For a long time, there have been known problems with the
52 management of complex types containing VLENs. This also
53 involves the string type because it is stored as a VLEN of chars.
55 The term "variable-length" refers to any
56 type that directly or recursively references a VLEN type. So an
57 array of VLENS, a compound with a VLEN field, and so on.
59 In order to properly handle instances of these variable-length types, it
60 is necessary to have function that can recursively walk
61 instances of such types to perform various actions on them. The
62 term "deep" is also used to mean recursive.
64 Two primary deep walking operations are provided by the netcdf-c library
65 to aid in managing instances of variable-length types.
66 - free'ing an instance of the type
67 - copying an instance of the type.
69 Note that the term "vector" is used in the following text to
70 mean a contiguous (in memory) sequence of instances of some
71 type. Given an array with, say, dimensions 2 X 3 X 4, this will
72 be stored in memory as a vector of length 2\*3\*4=24 instances.
74 ## Special case reclamation functions
76 Previously The netcdf-c library provided functions to reclaim an
77 instance of a subset of the possible variable-length types. It
78 provided no specialized support for copying instances of
79 variable-length types.
81 These specialized functions are still available and can be used
82 when their pre-conditions are met. They have the advantage of
83 being faster than the general purpose functions described
84 later in this document.
86 These functions, and their pre and post conditions, are as follows.
87 * *int nc_free_string(size_t len, char\* data[])*
88 <u>Action</u>: reclaim a vector of variable length strings.
89 <u>Pre-condition(s)</u>: the data argument is a vector containing *len* pointers to variable length, nul-terminated strings.
90 <u>Post-condition(s)</u>: only the strings in the vector are reclaimed -- the vector itself is not reclaimed.
92 * int nc_free_vlens(size_t len, nc_vlen_t vlens[])
93 <u>Action</u>: reclaim a vector of VLEN instances.
94 <u>Pre-condition(s)</u>: the data argument is a vector containing *len* pointers to variable length, nul-terminated strings.
95 <u>Post-condition(s)</u>:
96 * only the data pointed to by the VLEN instances (i.e. *nc_vlen_t.p* in the vector are reclaimed -- the vector itself is not reclaimed.
97 * the base type of the VLEN must be a fixed-size type -- this means atomic types in the range NC_CHAR thru NC_UINT64, and compound types where the basetype of each field is itself fixed-size.
99 * *int nc_free_vlen(nc_vlen_t \*vl)*
100 <u>Action</u>: this is equivalent to calling *nc_free_vlens(1,&vl)*.
102 If the pre and post conditions are not met, then using these
103 functions can lead to a host of memory leaks and failures
104 because the deep data variable-length data is in effect
105 shared between the netcdf-c library internally and the user's data.
110 Suppose one is reading a vector of instances using nc\_get\_vars
111 (or nc\_get\_vara or nc\_get\_var, etc.). These functions will
112 return the vector in the top-level memory provided. All
113 interior blocks (from nested VLEN or strings) will have been
114 dynamically allocated. Note that computing the size of the vector
115 may be tricky because the strides must be taken into account.
117 After using this vector of instances, it is necessary to free
118 (aka reclaim) the dynamically allocated memory, otherwise a
119 memory leak occurs. So, the recursive reclaim function is used
120 to walk the returned instance vector and do a deep reclaim of
125 Suppose one is writing a vector of instances using nc\_put\_vars
126 (or nc\_put\_vara or nc\_put\_var, etc.). These functions will
127 write the contents of the vector to the specified variable.
128 Note that internally, the data passed to the nc\_put\_xxx function is
129 immediately written so there is no need to copy it internally. But the
130 caller may need to reclaim the vector of data that was created and passed
131 in to the nc\_put\_xxx function.
133 After writing this vector of instances, and assuming it was dynamically
134 created, at some point it will be necessary to reclaim that data.
135 So again, the recursive reclaim function can be used
136 to walk the returned instance vector and do a deep reclaim of
139 WARNING: If the data passed into these functions contains statically
140 allocated data, then using any of the reclaim functions will fail.
143 Suppose one is writing a vector of instances as the data of an attribute
144 using, say, nc\_put\_att.
146 Internally, the incoming attribute data must be copied and stored
147 so that changes/reclamation of the input data will not affect
148 the attribute. Note that this copying behavior is different from
149 writing to a variable, where the data is written immediately.
151 After defining the attribute, it may be necessary for the user
152 to free the data that was provided as input to nc\_put\_att() as in the
153 nc\_put\_xxx functions (previously described).
156 Suppose one is reading a vector of instances as the data of an attribute
157 using, say, nc\_get\_att.
159 Internally, the existing attribute data must be copied and returned
160 to the caller, and the caller is responsible for reclaiming
163 ## New Instance Walking API
165 Proper recursive functions were added to the netcdf-c library to
166 provide reclaim and copy functions and use those as needed.
167 These functions are defined in libdispatch/dinstance.c and their
168 signatures are defined in include/netcdf.h. For back
169 compatibility, corresponding "ncaux\_XXX" functions are defined
170 in include/netcdf\_aux.h.
172 int nc_reclaim_data(int ncid, nc_type xtypeid, void* memory, size_t count);
173 int nc_reclaim_data_all(int ncid, nc_type xtypeid, void* memory, size_t count);
174 int nc_copy_data(int ncid, nc_type xtypeid, const void* memory, size_t count, void* copy);
175 int nc_copy_data_all(int ncid, nc_type xtypeid, const void* memory, size_t count, void** copyp);
177 There are two variants. The two functions, nc\_reclaim\_data() and
178 nc\_copy\_data(), assume the top-level vector is managed by the
179 caller. For reclaim, this is so the user can use, for example, a
180 statically allocated vector. For copy, it assumes the user
181 provides the space into which the copy is stored.
183 The other two, nc\_reclaim\_data\_all() and
184 nc\_copy\_data\_all(), allow the called functions to manage the
185 top-level. So for nc\_reclaim\_data\_all, the top level is
186 assumed to be dynamically allocated and will be free'd by
187 nc\_reclaim\_data\_all(). The nc\_copy\_data\_all() function
188 will allocate the top level and return a pointer to it to the
189 user. The user can later pass that pointer to
190 nc\_reclaim\_data\_all() to reclaim the instance(s).
193 The netcdf-c library internals are changed to use the proper reclaim
194 and copy functions. This also allows some simplification of the code
195 since the stdata and vldata fields of NC\_ATT\_INFO are no longer needed.
199 In order to make these functions as efficient as possible, it is
200 desirable to classify all types as to whether or not they contain
201 variable-size data. If a type is fixed sized (i.e. does not contain
202 variable-size data) then it can be freed or copied as a single chunk.
203 This significantly increases the performance for such types.
204 For variable-size types, it is necessary to walk each instance of the type
205 and recursively reclaim or copy it. As another optimization,
206 if the type is a vector of strings, then the per-instance walk can be
207 sped up by doing the reclaim or copy inline.
209 The rules for classifying types as fixed or variable size are as follows.
211 1. All atomic types, except string, are fixed size.
212 2. All enum type and opaque types are fixed size.
213 3. All string types and VLEN types are variable size.
214 4. A compound type is fixed size if all of the types of its
215 fields are fixed size. Otherwise it has variable size.
217 The classification of types can be made at the time the type is defined
218 or is read in from some existing file. The reclaim and copy functions
219 use this information to speed up the handling of fixed size types.
223 1. The new API functions require that the type information be
224 accessible. This means that you cannot use these functions
225 after the file has been closed. After the file is closed, you
228 2. There is still one known failure that has not been solved; it is
229 possibly an HDF5 memory leak. All the failures revolve around
230 some variant of this .cdl file. The proximate cause of failure is
231 the use of a VLEN FillValue.
235 float(*) row_of_floats ;
239 row_of_floats ragged_array(m) ;
240 row_of_floats ragged_array:_FillValue = {-999} ;
242 ragged_array = {10, 11, 12, 13, 14}, {20, 21, 22, 23}, {30, 31, 32},
247 # 3. Inferring File Types {#intern_infer}
249 As described in the companion document -- docs/dispatch.md --
250 when nc\_create() or nc\_open() is called, it must figure out what
251 kind of file is being created or opened. Once it has figured out
252 the file kind, the appropriate "dispatch table" can be used
253 to process that file.
257 Figuring out the kind of file is referred to as model inference
258 and is, unfortunately, a complicated process. The complication
259 is mostly a result of allowing a path argument to be a URL.
260 Inferring the file kind from a URL requires deep processing of
261 the URL structure: the protocol, the host, the path, and the fragment
262 parts in particular. The query part is currently not used because
263 it usually contains information to be processed by the server
266 The "fragment" part of the URL may be unfamiliar.
267 The last part of a URL may optionally contain a fragment, which
268 is syntactically of this form in this pseudo URL specification.
270 <protocol>://<host>/<path>?<query>#<fragment>
272 The form of the fragment is similar to a query and takes this general form.
274 '#'<key>=<value>&<key>=<value>&...
276 The key is a simple name, the value is any sequence of characters,
277 although URL special characters such as '&' must be URL encoded in
278 the '%XX' form where each X is a hexadecimal digit.
279 An example might look like this non-sensical example:
281 https://host.com/path#mode=nczarr,s3&bytes
283 It is important to note that the fragment part is not intended to be
284 passed to the server, but rather is processed by the client program.
285 It is this property that allows the netcdf-c library to use it to
286 pass information deep into the dispatch table code that is processing the
289 ## Model Inference Inputs
291 The inference algorithm is given the following information
292 from which it must determine the kind of file being accessed.
296 The mode is a set of flags that are passed as the second
297 argument to nc\_create and nc\_open. The set of flags is define in
298 the netcdf.h header file. Generally it specifies the general
299 format of the file: netcdf-3 (classic) or netcdf-4 (enhanced).
300 Variants of these can also be specified, e.g. 64-bit netcdf-3 or
302 In the case where the path argument is a simple file path,
303 using a mode flag is the most common mechanism for specifying
307 The file path, the first argument to nc\_create and nc\_open,
308 Can be either a simple file path or a URL.
309 If it is a URL, then it will be deeply inspected to determine
313 When the contents of a real file are available,
314 the contents of the file can be used to determine the dispatch table.
315 As a rule, this is likely to be useful only for *nc\_open*.
316 It also requires access to functions that can open and read at least
317 the initial part of the file.
318 As a rule, the initial small prefix of the file is read
319 and examined to see if it matches any of the so-called
320 "magic numbers" that indicate the kind of file being read.
323 Is the file being opened or is it being created?
326 Is parallel IO available?
328 ## Model Inference Outputs
329 The inferencing algorithm outputs two pieces of information.
331 1. model - this is used by nc\_open and nc\_create to choose the dispatch table.
332 2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
334 The model output is actually a struct containing two fields:
336 1. implementation - this is a value from the NC\_FORMATX\_xxx
337 values in netcdf.h. It generally determines the dispatch
339 2. format -- this is an NC\_FORMAT\_xxx value defining, in effect,
340 the netcdf-format to which the underlying format is to be
341 translated. Thus it can tell the netcdf-3 dispatcher that it
342 should actually implement CDF5 rather than standard netcdf classic.
344 ## The Inference Algorithm
346 The construction of the model is primarily carried out by the function
347 *NC\_infermodel()* (in *libdispatch/dinfermodel.c*).
348 It is given the following parameters:
349 1. path -- (IN) absolute file path or URL
350 2. modep -- (IN/OUT) the set of mode flags given to *NC\_open* or *NC\_create*.
351 3. iscreate -- (IN) distinguish open from create.
352 4. useparallel -- (IN) indicate if parallel IO can be used.
353 5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
354 6. model -- (IN/OUT) place to store inferred model.
355 7. newpathp -- (OUT) the canonical rewrite of the path argument.
357 As a rule, these values are used in the this order of preference
360 1. file contents -- highest precedence
361 2. url (if it is one) -- using the "mode=" key in the fragment (see below).
363 4. default format -- lowest precedence
365 The sequence of steps is as follows.
367 ### URL processing -- processuri()
369 If the path appears to be a URL, then it is parsed
370 and processed by the processuri function as follows.
373 The protocol is extracted and tested against the list of
374 legal protocols. If not found, then it is an error.
375 If found, then it is replaced by a substitute -- if specified.
376 So, for example, the protocol "dods" is replaced the protocol "http"
377 (note that at some point "http" will be replaced with "https").
378 Additionally, one or more "key=value" strings is appended
379 to the existing fragment of the url. So, again for "dods",
380 the fragment is extended by the string "mode=dap2".
381 Thus replacing "dods" does not lose information, but rather transfers
382 it to the fragment for later use.
385 After the protocol is processed, the initial fragment processing occurs
386 by converting it to a list data structure of the form
388 {<key>,<value>,<key>,<value>,<key>,<value>....}
391 ### Macro Processing -- processmacros()
393 If the fragment list produced by processuri() is non-empty, then
394 it is processed for "macros". Notice that if the original path
395 was not a URL, then the fragment list is empty and this
396 processing will be bypassed. In any case, It is convenient to
397 allow some singleton fragment keys to be expanded into larger
398 fragment components. In effect, those singletons act as
399 macros. They can help to simplify the user's URL. The term
400 singleton means a fragment key with no associated value:
401 "#bytes", for example.
403 The list of fragments is searched looking for keys whose
404 value part is NULL or the empty string. Then the table
405 of macros is searched for that key and if found, then
406 a key and values is appended to the fragment list and the singleton
409 ### Mode Inference -- processinferences()
411 This function just processes the list of values associated
412 with the "mode" key. It is similar to a macro in that
413 certain mode values are added or removed based on tables
414 of "inferences" and "negations".
415 Again, the purpose is to allow users to provide simplified URL fragments.
417 The list of mode values is repeatedly searched and whenever a value
418 is found that is in the "modeinferences" table, then the associated inference value
419 is appended to the list of mode values. This process stops when no changes
420 occur. This form of inference allows the user to specify "mode=zarr"
421 and have it converted to "mode=nczarr,zarr". This avoids the need for the
422 dispatch table code to do the same inference.
424 After the inferences are made, The list of mode values is again
425 repeatedly searched and whenever a value
426 is found that is in the "modenegations" table, then the associated negation value
427 is removed from the list of mode values, assuming it is there. This process stops when no changes
428 occur. This form of inference allows the user to make sure that "mode=bytes,nczarr"
429 has the bytes mode take precedence by removing the "nczarr" value. Such illegal
430 combinations can occur because of previous processing steps.
432 ### Fragment List Normalization
433 As the fragment list is processed, duplicates appear with the same key.
434 A function -- cleanfragments() -- is applied to clean up the fragment list
435 by coalesing the values of duplicate keys and removing duplicate key values.
438 If the URL is determined to be a reference to a resource on the Amazon S3 cloud,
439 then the URL needs to be converted to what is called "path format".
440 There are four S3 URL formats:
442 1. Virtual -- ````https://<bucket>.s3.<region>.amazonaws.com/<path>````
443 2. Path -- ````https://s3.<region>.amazonaws.com/<bucket>/<path>````
444 3. S3 -- ````s3://<bucket>/<path>````
445 4. Other -- ````https://<host>/<bucket>/<path>````
447 The S3 processing converts all of these to the Path format. In the "S3" format case
448 it is necessary to find or default the region from examining the ".aws" directory files.
451 If the URL protocol is "file" and its path is a relative file path,
452 then it is made absolute by prepending the path of the current working directory.
454 In any case, after S3 or File rebuilds, the URL is completely
455 rebuilt using any modified protocol, host, path, and
456 fragments. The query is left unchanged in the current algorithm.
457 The resulting rebuilt URL is passed back to the caller.
459 ### Mode Key Processing
460 The set of values of the fragment's "mode" key are processed one by one
461 to see if it is possible to determine the model.
462 There is a table for format interpretations that maps a mode value
463 to the model's implementation and format. So for example,
464 if the mode value "dap2" is encountered, then the model
465 implementation is set to NC\_FORMATX\_DAP2 and the format
466 is set to NC\_FORMAT\_CLASSIC.
468 ### Non-Mode Key Processing
469 If processing the mode does not tell us the implementation, then
470 all other fragment keys are processed to see if the implementaton
471 (and format) can be deduced. Currently this does nothing.
474 If the model is still not determined and the path is a URL, then
475 the implementation is defaulted to DAP2. This is for back
476 compatibility when all URLS implied DAP2.
479 In the event that the path is not a URL, then it is necessary
480 to use the mode flags and the isparallel arguments to choose a model.
481 This is just a straight forward flag checking exercise.
483 ### Content Inference -- check\_file\_type()
484 If the path is being opened (as opposed to created), then
485 it may be possible to actually read the first few bytes of the
486 resource specified by the path and use that to determine the
487 model. If this succeeds, then it takes precedence over
488 all other model inferences.
491 Once the model is known, then the set of mode flags
492 is modified to be consistent with that information.
493 So for example, if DAP2 is the model, then all netcdf-4 mode flags
494 and some netcdf-3 flags are removed from the set of mode flags
495 because DAP2 provides only a standard netcdf-classic format.
497 # 4. Adding a Standard Filter {#intern_filters}
499 The standard filter system extends the netcdf-c library API to
500 support a fixed set of "standard" filters. This is similar to the
501 way that deflate and szip are currently supported.
502 For background, the file filter.md should be consulted.
504 In general, the API for a standard filter has the following prototypes.
505 The case of zstandard (libzstd) is used as an example.
507 int nc_def_var_zstandard(int ncid, int varid, int level);
508 int nc_inq_var_zstandard(int ncid, int varid, int* has_filterp, int* levelp);
510 So generally the API has the ncid and the varid as fixed, and then
511 a list of parameters specific to the filter -- level in this case.
512 For the inquire function, there is an additional argument -- has\_filterp --
513 that is set to 1 if the filter is defined for the given variable
515 The remainder of the inquiry parameters are pointers to memory
516 into which the parameters are stored -- levelp in this case.
518 It is important to note that including a standard filter still
519 requires three supporting objects:
521 1. The implementing library for the filter. For example,
522 libzstd must be installed in order to use the zstandard
524 2. A HDF5 wrapper for the filter must be installed in the
525 directory pointed to by the HDF5\_PLUGIN\_PATH environment
527 3. (Optional) An NCZarr Codec implementation must be installed
528 in the the HDF5\_PLUGIN\_PATH directory.
530 ## Adding a New Standard Filter
532 The implementation of a standard filter must be loaded from one
533 of several locations.
535 1. It can be part of libnetcdf.so (preferred),
536 2. it can be loaded as part of the client code,
537 3. or it can be loaded as part of an external library such as libccr.
539 However, the three objects listed above need to be
540 stored in the HDF5\_PLUGIN\_PATH directory, so adding a standard
541 filter still requires modification to the netcdf build system.
542 This limitation may be lifted in the future.
545 In order to detect a standard library, the following changes
546 must be made for Automake (configure.ac/Makefile.am)
547 and CMake (CMakeLists.txt)
550 Configure.ac must have a block that similar to this that locates
551 the implementing library.
553 \# See if we have libzstd
554 AC_CHECK_LIB([zstd],[ZSTD_compress],[have_zstd=yes],[have_zstd=no])
555 if test "x$have_zstd" = "xyes" ; then
556 AC_SEARCH_LIBS([ZSTD_compress],[zstd zstd.dll cygzstd.dll], [], [])
557 AC_DEFINE([HAVE_ZSTD], [1], [if true, zstd library is available])
559 AC_MSG_CHECKING([whether libzstd library is available])
560 AC_MSG_RESULT([${have_zstd}])
562 Note the the entry point (*ZSTD\_compress*) is library dependent
563 and is used to see if the library is available.
567 It is assumed you have an HDF5 wrapper for zstd. If you want it
568 to be built as part of the netcdf-c library then you need to
569 add the following to *netcdf-c/plugins/Makefile.am*.
572 noinst_LTLIBRARIES += libh5zstd.la
573 libh5szip_la_SOURCES = H5Zzstd.c H5Zzstd.h
578 \# Need our version of szip if libsz available and we are not using HDF5
580 noinst_LTLIBRARIES += libh5szip.la
581 libh5szip_la_SOURCES = H5Zszip.c H5Zszip.h
585 In an analog to *configure.ac*, a block like
586 this needs to be in *netcdf-c/CMakeLists.txt*.
591 The FIND\_PACKAGE requires a CMake module for the filter
592 in the cmake/modules directory.
593 The *set\_std\_filter* function is a macro.
595 An entry in the file config.h.cmake.in will also be needed.
597 /* Define to 1 if zstd library available. */
598 #cmakedefine HAVE_ZSTD 1
601 ### Implementation Template
602 As a template, here is the implementation for zstandard.
603 It can be used as the template for adding other standard filters.
604 It is currently located in *netcdf-d/libdispatch/dfilter.c*, but
605 could be anywhere as indicated above.
609 nc_def_var_zstandard(int ncid, int varid, int level)
614 if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
615 /* Filter is available */
616 /* Level must be between -131072 and 22 on Zstandard v. 1.4.5 (~202009)
617 Earlier versions have fewer levels (especially fewer negative levels) */
618 if (level < -131072 || level > 22)
620 ulevel = (unsigned) level; /* Keep bit pattern */
621 if((stat = nc_def_var_filter(ncid,varid,H5Z_FILTER_ZSTD,1,&ulevel))) goto done;
627 nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp)
634 if((stat = nc_inq_filter_avail(ncid,H5Z_FILTER_ZSTD))) goto done;
635 /* Filter is available */
636 /* Get filter info */
637 stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,NULL);
638 if(stat == NC_ENOFILTER) {stat = NC_NOERR; hasfilter = 0; goto done;}
639 if(stat != NC_NOERR) goto done;
641 if(nparams != 1) {stat = NC_EFILTER; goto done;}
642 if((stat = nc_inq_var_filter_info(ncid,varid,H5Z_FILTER_ZSTD,&nparams,¶ms))) goto done;
644 if(levelp) *levelp = (int)params;
645 if(hasfilterp) *hasfilterp = hasfilter;
651 # 5. Test Interference {#intern_isolation}
653 At some point, Unidata switched from running tests serially to running tests in parallel.
654 It soon became apparent that there were resources shared between tests and that parallel
655 execution sometimes caused interference between tests.
657 In order to fix the inter-test interference, several approaches were used.
658 1. Renaming resources (primarily files) so that tests would create difference test files.
659 2. Telling the test system that there were explicit dependencies between tests so that they would not be run in parallel.
660 3. Isolating test resources by creating independent directories for each test.
663 The isolation mechanism is currently used mostly in nczarr_tests.
664 It requires that tests are all executed inside a shell script.
665 When the script starts, it invokes a shell function called "isolate".
666 This function looks in current directory for a directory called "testset_<uid>".
667 If "testset_<uid> is not found then it creates it.
668 This directory is then used to isolate all test output.
670 After calling "isolate", the script enters the "testset_<uid>"
671 directory. Then each actual test creates a directory in which to
672 store any file resources that it creates during execution.
673 Suppose, for example, that the shell script is called "run_XXXX.sh".
674 The isolate function creates a directory with the general name "testset_<uid>".
675 Then the run_XXX.sh script creates a directory "testset_<uid>/testdir_XXX",
676 enters it and runs the test.
677 During cleanup, specifically "make clean", all the testset_<uid> directories are deleted.
679 The "<uid>" is a unique identifier created using the "date +%s" command. It returns an integer
680 representing the number of seconds since the start of the so-called "epoch" basically
681 "00:00:00 UTC, 1 January 1970". Using a date makes it easier to detect and reclaim obsolete
684 ## Cloud Test Isolation
686 When testing against the cloud (currently Amazon S3), the interference
687 problem is intensified.
688 This is because the current cloud testing uses a single S3 bucket,
689 which means that not only is there inter-test interference, but there
690 is also potential interference across builds.
691 This means, for example, that testing by github actions could
692 interfere with local testing by individual users.
693 This problem is difficult to solve, but a mostly complete solution has been implemented
694 possible with cmake, but not (as yet) possible with automake.
696 In any case, there is a shell function called s3isolate in nczarr_test/test_nczarr.sh that operates on cloud resources in a way that is similar to the isolate function.
697 The s3isolate does several things:
698 1. It invokes isolate to ensure local isolation.
699 2. It creates a path prefix relative to the Unidata S3 bucket that has the name "testset_<uid>", where this name
700 is the same as the one created by the isolate function.
701 3. It appends the uid to a file called s3cleanup_<pid>.uids. This file may accumulate several uids indicating
702 the keys that need to be cleaned up. The pid is a separate small unique id to avoid s3cleanup interference.
704 The test script then ensures that any cloud resources are created as extensions of the path prefix.
706 Cleanup of S3 resources is complex.
707 In configure.ac or the top-level CMakeList.txt files, the path "netcdf-c/testset_<uid>"
708 is created and via configuration commands, is propagated to various Makefile.am
709 and specific script files.
711 The actual cleanup requires different approaches for cmake and for automake.
712 In cmake, the CTestCustom.cmake mechanism is used and contains the following command:
714 IF(NETCDF_ENABLE_S3_TESTING)
715 # Assume run in top-level CMAKE_BINARY_DIR
716 set(CTEST_CUSTOM_POST_TEST "bash -x ${CMAKE_BINARY_DIR}/s3cleanup.sh")
720 In automake, the "check-local" extension mechanism is used
721 because it is invoked after all tests are run in the nczarr_test
722 directory. So nczarr_test/Makefile.am contains the following
725 if NETCDF_ENABLE_S3_TESTALL
727 bash -x ${top_srcdir}/s3cleanup.sh
732 This script is created by configuring the base file s3cleanup.in.
733 It is unfortunately complex, but roughly it does the following.
734 1. It computes a list of all the keys for all objects in the Unidata bucket and stores them in a file
735 named "s3cleanup_<pid>.keys".
736 2. Get a list of date based uids created above.
737 3. Iterate over the keys and the uids to collect the set of keys matching any one of the uids.
738 4. Divide the keys into sets of 500. This is because the delete-objects command will not accept more than 1000 keys at a time.
739 5. Convert each set of 500 keys from step 4 into a properly formatted JSON file suitable for use by the "aws delete-objects" command.
740 This file is called "s3cleanup_<pid>.json".
741 6. Use the "aws delete-objects" command to delete the keys.
742 7. Repeat steps 5 and 6 for each set of 500 keys.
744 The pid is a small random number to avoid local interference.
745 It is important to note that this script assumes that the
746 AWS command line package is installed.
747 This can be installed, for example, using this command:
748 ````apt install awscli````.
751 This script is created by configuring the base file s3gc.in.
752 It is intended as a way for users to manually cleanup the S3 Unidata bucket.
753 It takes a single argument, delta, that is the number of days before the present day
754 and computes a stop date corresponding to "present_day - delta".
755 All keys for all uids on or before the stop date.
757 It operates as follows:
758 1. Get a list of date based uids created above.
759 2. Iterate over the keys and collect all that are on or before the stop date.
760 3. Divide the keys into sets of 500. This is in recognition of the 1000 key limit mentioned previously.
761 4. Convert each set of 500 keys from step 3 into a properly formatted JSON file suitable for use by the "aws delete-objects" command.
762 This file is called "s3cleanup_<pid>.json".
763 5. Use the "aws delete-objects" command to delete the keys.
764 6. Repeat steps 4 and 5 for each set of 500 keys.
766 # Point of Contact {#intern_poc}
768 *Author*: Dennis Heimbigner<br>
769 *Email*: dmh at ucar dot edu<br>
770 *Initial Version*: 12/22/2021<br>
771 *Last Revised*: 9/16/2023