NetCDF  4.9.3
dispatch.md
1 Internal Dispatch Table Architecture
2 ============================
3 <!-- double header is needed to workaround doxygen bug -->
4 
5 # Internal Dispatch Table Architecture
6 
7 [TOC]
8 
9 # Introduction {#dispatch_intro}
10 
11 The netcdf-c library uses an internal dispatch mechanism
12 as the means for wrapping the netcdf-c API around a wide variety
13 of underlying storage and stream data formats.
14 As of last check, the following formats are supported and each
15 has its own dispatch table.
16 
17 Warning: some of the listed function signatures may be out of date
18 and the specific code should be consulted to see the actual parameters.
19 
20 <table>
21 <tr><th>Format<td>Directory<th>NC_FORMATX Name
22 <tr><td>NetCDF-classic<td>libsrc<td>NC_FORMATX_NC3
23 <tr><td>NetCDF-enhanced<td>libhdf5<td>NC_FORMATX_NC_HDF5
24 <tr><td>HDF4<td>libhdf4<td>NC_FORMATX_NC_HDF4
25 <tr><td>PNetCDF<td>libsrcp<td>NC_FORMATX_PNETCDF
26 <tr><td>DAP2<td>libdap2<td>NC_FORMATX_DAP2
27 <tr><td>DAP4<td>libdap4<td>NC_FORMATX_DAP4
28 <tr><td>UDF0<td>N.A.<td>NC_FORMATX_UDF0
29 <tr><td>UDF1<td>N.A.<td>NC_FORMATX_UDF1
30 <tr><td>NCZarr<td>libnczarr<td>NC_FORMATX_NCZARR
31 </table>
32 
33 Note that UDF0 and UDF1 allow for user-defined dispatch tables to
34 be implemented.
35 
36 The idea is that when a user opens or creates a netcdf file, a
37 specific dispatch table is chosen. A dispatch table is a struct
38 containing an entry for (almost) every function in the netcdf-c API.
39 During execution, netcdf API calls are channeled through that
40 dispatch table to the appropriate function for implementing that
41 API call. The functions in the dispatch table are not quite the
42 same as those defined in *netcdf.h*. For simplicity and
43 compactness, some netcdf.h API calls are mapped to the same
44 dispatch table function. In addition to the functions, the first
45 entry in the table defines the model that this dispatch table
46 implements. It will be one of the NC_FORMATX_XXX values.
47 The second entry in the table is the version of the dispatch table.
48 The rule is that previous entries may not be removed, but new entries
49 may be added, and adding new entries increases the version number.
50 
51 The dispatch table represents a distillation of the netcdf API down to
52 a minimal set of internal operations. The format of the dispatch table
53 is defined in the file *libdispatch/ncdispatch.h*. Every new dispatch
54 table must define this minimal set of operations.
55 
56 # Adding a New Dispatch Table
57 In order to make this process concrete, let us assume we plan to add
58 an in-memory implementation of netcdf-3.
59 
60 ## Defining configure.ac flags
61 
62 Define a *–-enable* flag option for *configure.ac*. For our
63 example, we assume the option "--enable-ncm" and the
64 internal corresponding flag "enable_ncm". If you examine the existing
65 *configure.ac* and see how, for example, *--enable_dap2* is
66 defined, then it should be clear how to do it for your code.
67 
68 ## Defining a "name space"
69 
70 Choose some prefix of characters to identify the new dispatch
71 system. In effect we are defining a name-space. For our in-memory
72 system, we will choose "NCM" and "ncm". NCM is used for non-static
73 procedures to be entered into the dispatch table and ncm for all other
74 non-static procedures. Note that the chosen prefix should probably start
75 with "nc" or "NC" in order to avoid name conflicts outside the netcdf-c library.
76 
77 ## Extend include/netcdf.h
78 
79 Modify the file *include/netcdf.h* to add an NC_FORMATX_XXX flag
80 by adding a flag for this dispatch format at the appropriate places.
81 ````
82  #define NC_FORMATX_NCM 7
83 ````
84 
85 Add any format specific new error codes.
86 ````
87 #define NC_ENCM (?)
88 ````
89 
90 ## Extend include/ncdispatch.h
91 
92 Modify the file *include/ncdispatch.h* to
93 add format specific data and initialization functions;
94 note the use of our NCM namespace.
95 ````
96  #ifdef ENABLE_NCM
97  extern NC_Dispatch* NCM_dispatch_table;
98  extern int NCM_initialize(void);
99  #endif
100 ````
101 
102 ## Define the dispatch table functions
103 
104 Define the functions necessary to fill in the dispatch table. As a
105 rule, we assume that a new directory is defined, *libsrcm*, say. Within
106 this directory, we need to define *Makefile.am* and *CMakeLists.txt*.
107 We also need to define the source files
108 containing the dispatch table and the functions to be placed in the
109 dispatch table -– call them *ncmdispatch.c* and *ncmdispatch.h*. Look at
110 *libsrc/nc3dispatch.[ch]* or *libnczarr/zdispatch.[ch]* for examples.
111 
112 Similarly, it is best to take existing *Makefile.am* and *CMakeLists.txt*
113 files (from *libsrcp* for example) and modify them.
114 
115 ## Adding the dispatch code to libnetcdf
116 
117 Provide for the inclusion of this library in the final libnetcdf
118 library. This is accomplished by modifying *liblib/Makefile.am* by
119 adding something like the following.
120 ````
121  if ENABLE_NCM
122  libnetcdf_la_LIBADD += $(top_builddir)/libsrcm/libnetcdfm.la
123  endif
124 ````
125 
126 ## Extend library initialization
127 
128 Modify the *NC_initialize* function in *liblib/nc_initialize.c* by adding
129 appropriate references to the NCM dispatch function.
130 ````
131  #ifdef ENABLE_NCM
132  extern int NCM_initialize(void);
133  #endif
134  ...
135  int NC_initialize(void)
136  {
137  ...
138  #ifdef ENABLE_NCM
139  if((stat = NCM_initialize())) return stat;
140  #endif
141  ...
142  }
143 ````
144 
145 Finalization is handled in an analogous fashion.
146 
147 ## Testing the new dispatch table
148 
149 Typically, tests for a new dispatcher are kept in a separate directory
150 with a related name. For our running example, it might be *ncm_test*.
151 The file *ncm_test/Makefile.am*
152 will look something like this.
153 ````
154  # These files are created by the tests.
155  CLEANFILES = ...
156  # These are the tests which are always run.
157  TESTPROGRAMS = test1 test2 ...
158  test1_SOURCES = test1.c ...
159  ...
160  # Set up the tests.
161  check_PROGRAMS = $(TESTPROGRAMS)
162  TESTS = $(TESTPROGRAMS)
163  # Any extra files required by the tests
164  EXTRA_DIST = ...
165 ````
166 
167 # Top-Level build of the dispatch code
168 
169 Provide for *libnetcdfm* to be constructed by adding the following to
170 the top-level *Makefile.am*.
171 
172 ````
173  if ENABLE_NCM
174  NCM=libsrcm
175  NCMTESTDIR=ncm_test
176  endif
177  ...
178  SUBDIRS = ... $(DISPATCHDIR) $(NCM) ... $(NCMTESTDIR)
179 ````
180 
181 # Choosing a Dispatch Table
182 
183 The dispatch table is ultimately chosen by the function
184 NC_infermodel() in libdispatch/dinfermodel.c. This function is
185 invoked by the NC_create and the NC_open procedures. This can
186 be, unfortunately, a complex process. The detailed operation of
187 NC_infermodel() is defined in the companion document in docs/dinternal.md.
188 
189 In any case, the choice of dispatch table is currently based on the following
190 pieces of information.
191 
192 1. The mode argument – this can be used to detect, for example, what kind
193 of file to create: netcdf-3, netcdf-4, 64-bit netcdf-3, etc.
194 Using a mode flag is the most common mechanism, in which case
195 *netcdf.h* needs to be modified to define the relevant mode flag.
196 
197 2. The file path – this can be used to detect, for example, a DAP url
198 versus a normal file system file. If the path looks like a URL, then
199 the fragment part of the URL is examined to determine the specific
200 dispatch function.
201 
202 3. The file contents - when the contents of a real file are available,
203 the contents of the file can be used to determine the dispatch table.
204 As a rule, this is likely to be useful only for *nc_open*.
205 
206 4. If the file is being opened vs being created.
207 
208 5. Is parallel IO available?
209 
210 The *NC_infermodel* function returns two values.
211 
212 1. model - this is used by nc_open and nc_create to choose the dispatch table.
213 2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
214 
215 # Special Dispatch Table Signatures.
216 
217 The entries in the dispatch table do not necessarily correspond
218 to the external API. In many cases, multiple related API functions
219 are merged into a single dispatch table entry.
220 
221 ## Create/Open
222 
223 The create table entry and the open table entry in the dispatch table
224 have the following signatures respectively.
225 ````
226  int (*create)(const char *path, int cmode,
227  size_t initialsz, int basepe, size_t *chunksizehintp,
228  int useparallel, void* parameters,
229  struct NC_Dispatch* table, NC* ncp);
230 
231  int (*open)(const char *path, int mode,
232  int basepe, size_t *chunksizehintp,
233  int use_parallel, void* parameters,
234  struct NC_Dispatch* table, NC* ncp);
235 ````
236 
237 The key difference is that these are the union of all the possible
238 create/open signatures from the include/netcdfXXX.h files. Note especially the last
239 three parameters. The parameters argument is a pointer to arbitrary data
240 to provide extra info to the dispatcher.
241 The table argument is included in case the create
242 function (e.g. *NCM_create_) needs to invoke other dispatch
243 functions. The very last argument, ncp, is a pointer to an NC
244 instance. The raw NC instance will have been created by *libdispatch/dfile.c*
245 and is passed to e.g. open with the expectation that it will be filled in
246 by the dispatch open function.
247 
248 ## Accessing Data with put_vara() and get_vara()
249 
250 ````
251  int (*put_vara)(int ncid, int varid, const size_t *start, const size_t *count,
252  const void *value, nc_type memtype);
253 ````
254 
255 ````
256  int (*get_vara)(int ncid, int varid, const size_t *start, const size_t *count,
257  void *value, nc_type memtype);
258 ````
259 
260 Most of the parameters are similar to the netcdf API parameters. The
261 last parameter, however, is the type of the data in
262 memory. Additionally, instead of using an "int islong" parameter, the
263 memtype will be either ::NC_INT or ::NC_INT64, depending on the value
264 of sizeof(long). This means that even netcdf-3 code must be prepared
265 to encounter the ::NC_INT64 type.
266 
267 ## Accessing Attributes with put_attr() and get_attr()
268 
269 ````
270  int (*get_att)(int ncid, int varid, const char *name,
271  void *value, nc_type memtype);
272 ````
273 
274 ````
275  int (*put_att)(int ncid, int varid, const char *name, nc_type datatype, size_t len,
276  const void *value, nc_type memtype);
277 ````
278 
279 Again, the key difference is the memtype parameter. As with
280 put/get_vara, it used ::NC_INT64 to encode the long case.
281 
282 ## Pre-defined Dispatch Functions
283 
284 It is sometimes not necessary to implement all the functions in the
285 dispatch table. Some pre-defined functions are available which may be
286 used in many cases.
287 
288 ## Inquiry Functions
289 
290 Many of The netCDF inquiry functions operate from an in-memory model of
291 metadata. Once a file is opened, or a file is created, this
292 in-memory metadata model is kept up to date. Consequenty the inquiry
293 functions do not depend on the dispatch layer code. These functions
294 can be used by all dispatch layers which use the internal netCDF
295 enhanced data model.
296 
297 - NC4_inq
298 - NC4_inq_type
299 - NC4_inq_dimid
300 - NC4_inq_dim
301 - NC4_inq_unlimdim
302 - NC4_inq_att
303 - NC4_inq_attid
304 - NC4_inq_attname
305 - NC4_get_att
306 - NC4_inq_varid
307 - NC4_inq_var_all
308 - NC4_show_metadata
309 - NC4_inq_unlimdims
310 - NC4_inq_ncid
311 - NC4_inq_grps
312 - NC4_inq_grpname
313 - NC4_inq_grpname_full
314 - NC4_inq_grp_parent
315 - NC4_inq_grp_full_ncid
316 - NC4_inq_varids
317 - NC4_inq_dimids
318 - NC4_inq_typeids
319 - NC4_inq_type_equal
320 - NC4_inq_user_type
321 - NC4_inq_typeid
322 
323 ## NCDEFAULT get/put Functions
324 
325 The mapped (varm) get/put functions have been
326 implemented in terms of the array (vara) functions. So dispatch layers
327 need only implement the vara functions, and can use the following
328 functions to get the and varm functions:
329 
330 - NCDEFAULT_get_varm
331 - NCDEFAULT_put_varm
332 
333 For the netcdf-3 format, the strided functions (nc_get/put_vars)
334 are similarly implemented in terms of the vara functions. So the following
335 convenience functions are available.
336 
337 - NCDEFAULT_get_vars
338 - NCDEFAULT_put_vars
339 
340 For the netcdf-4 format, the vars functions actually exist, so
341 the default vars functions are not used.
342 
343 ## Read-Only Functions
344 
345 Some dispatch layers are read-only (ex. HDF4). Any function which
346 writes to a file, including nc_create(), needs to return error code
347 ::NC_EPERM. The following read-only functions are available so that
348 these don't have to be re-implemented in each read-only dispatch layer:
349 
350 - NC_RO_create
351 - NC_RO_redef
352 - NC_RO__enddef
353 - NC_RO_sync
354 - NC_RO_set_fill
355 - NC_RO_def_dim
356 - NC_RO_rename_dim
357 - NC_RO_rename_att
358 - NC_RO_del_att
359 - NC_RO_put_att
360 - NC_RO_def_var
361 - NC_RO_rename_var
362 - NC_RO_put_vara
363 - NC_RO_def_var_fill
364 
365 ## Classic NetCDF Only Functions
366 
367 There are two functions that are only used in the classic code. All
368 other dispatch layers (except PnetCDF) return error ::NC_ENOTNC3 for
369 these functions. The following functions are provided for this
370 purpose:
371 
372 - NOTNC3_inq_base_pe
373 - NOTNC3_set_base_pe
374 
375 # Appendix A. HDF4 Dispatch Layer as a Simple Example
376 
377 The HDF4 dispatch layer is about the simplest possible dispatch
378 layer. It is read-only, classic model. It will serve as a nice, simple
379 example of a dispatch layer.
380 
381 Note that the HDF4 layer is optional in the netCDF build. Not all
382 users will have HDF4 installed, and those users will not build with
383 the HDF4 dispatch layer enabled. For this reason HDF4 code is guarded
384 as follows.
385 ````
386 #ifdef USE_HDF4
387 ...
388 #endif /*USE_HDF4*/
389 ````
390 Code in libhdf4 is only compiled if HDF4 is
391 turned on in the build.
392 
393 ## Header File Changes
394 
395 Adding the HDF4 dispatch table will first require changes to
396 a number of header files.
397 
398 ### The netcdf.h File
399 
400 In the main netcdf.h file, we add the following
401 to the list of NC_FORMATX_XXX definitions
402 ````
403 #define NC_FORMATX_NC_HDF4 (3)
404 ````
405 
406 ### The ncdispatch.h File
407 
408 In ncdispatch.h we add the following:
409 
410 ````
411 #ifdef USE_HDF4
412 extern NC_Dispatch* HDF4_dispatch_table;
413 extern int HDF4_initialize(void);
414 extern int HDF4_finalize(void);
415 #endif
416 ````
417 
418 ### The netcdf_meta.h File
419 
420 The netcdf_meta.h file allows for easy determination of what features
421 are in use. For HDF4, the following is added -- as set by *./configure*:
422 ````
423 #define NC_HAS_HDF4 0 /*!< HDF4 support. */
424 ````
425 
426 ### The hdf4dispatch.h File
427 
428 The file *hdf4dispatch.h* contains prototypes and
429 macro definitions used within the HDF4 code in libhdf4. This include
430 file should not be used anywhere except in libhdf4. It can be kept
431 in either the *include* directory or (preferably) the *libhdf4* directory.
432 
433 ### Initialization Code Changes in liblib Directory
434 
435 The file *nc_initialize.c* is modified to include the following:
436 ````
437 #ifdef USE_HDF4
438 extern int HDF4_initialize(void);
439 extern int HDF4_finalize(void);
440 #endif
441 ````
442 
443 ### Changes to libdispatch/dfile.c
444 
445 In order for a dispatch layer to be used, it must be correctly
446 determined in functions *NC_open()* or *NC_create()* in *libdispatch/dfile.c*.
447 HDF4 has a magic number that is detected in
448 *NC_interpret_magic_number()*, which allows *NC_open* to automatically
449 detect an HDF4 file.
450 
451 Once HDF4 is detected, the *model* variable is set to *NC_FORMATX_NC_HDF4*,
452 and later this is used in a case statement:
453 ````
454  case NC_FORMATX_NC_HDF4:
455  dispatcher = HDF4_dispatch_table;
456  break;
457 ````
458 
459 This sets the dispatcher to the HDF4 dispatcher, which is defined in
460 the libhdf4 directory.
461 
462 ### Dispatch Table in libhdf4/hdf4dispatch.c
463 
464 The file *hdf4dispatch.c* contains the definition of the HDF4 dispatch
465 table. It looks like this:
466 ````
467 /* This is the dispatch object that holds pointers to all the
468  * functions that make up the HDF4 dispatch interface. */
469 static NC_Dispatch HDF4_dispatcher = {
470 NC_FORMATX_NC_HDF4, /* The model identifier */
471 NC_DISPATCH_VERSION, /* The version of this dispatch table */
472 NC_RO_create,
473 NC_HDF4_open,
474 NC_RO_redef,
475 NC_RO__enddef,
476 NC_RO_sync,
477 ...
478 NC_NOTNC4_set_var_chunk_cache,
479 NC_NOTNC4_get_var_chunk_cache,
480 ...
481 };
482 ````
483 Note that most functions use some of the predefined dispatch
484 functions. Functions that start with NC_RO* are read-only, they return
485 ::NC_EPERM. Functions that start with NOTNC4* return ::NC_ENOTNC4.
486 
487 Only the functions that start with NC_HDF4* need to be implemented for
488 the HDF4 dispatch layer. There are 6 such functions:
489 
490 - NC_HDF4_open
491 - NC_HDF4_abort
492 - NC_HDF4_close
493 - NC_HDF4_inq_format
494 - NC_HDF4_inq_format_extended
495 - NC_HDF4_get_vara
496 
497 ### HDF4 Reading Code
498 
499 The code in *hdf4file.c* opens the HDF4 SD dataset, and reads the
500 metadata. This metadata is stored in the netCDF internal metadata
501 model, allowing the inq functions to work.
502 
503 The code in *hdf4var.c* does an *nc_get_vara()* on the HDF4 SD
504 dataset. This is all that is needed for all the nc_get_* functions to
505 work.
506 
507 # Appendix A. Changing NC_DISPATCH_VERSION
508 
509 When new entries are added to the *struct NC_Dispatch* type `located in include/netcdf_dispatch.h.in` it is necessary to do two things.
510 
511 1. Bump the NC_DISPATCH_VERSION number
512 2. Modify the existing dispatch tables to include the new entries.
513 It if often the case that the new entries do not mean anything for
514 a given dispatch table. In that case, the new entries may be set to
515 some variant of *NC_RO_XXX* or *NC_NOTNC4_XXX* *NC_NOTNC3_XXX*.
516 
517 Modifying the dispatch version requires two steps:
518 1. Modify the version number in *netcdf-c/configure.ac*, and
519 2. Modify the version number in *netcdf-c/CMakeLists.txt*.
520 
521 The two should agree in value.
522 
523 ## NC_DISPATCH_VERSION Incompatibility
524 
525 When dynamically adding a dispatch table
526 -- in nc_def_user_format (see libdispatch/dfile.c) --
527 the version of the new table is compared with that of the built-in
528 NC_DISPATCH_VERSION; if they differ, then an error is returned from
529 that function.
530 
531 # Appendix B. Inferring the Dispatch Table
532 
533 As mentioned above, the dispatch table is inferred using the following
534 information:
535 1. The mode argument
536 2. The file path/URL
537 3. The file contents (when available)
538 
539 The primary function for doing this inference is in the file
540 *libdispatch/dinfermodel.c* via the API in *include/ncmodel.h*.
541 The term *model* is used here to include (at least) the following
542 information (see the structure type *NCmodel* in *include/ncmodel.h*).
543 
544 1. impl -- this is an NC_FORMATX_XXX value defining, in effect, the
545  dispatch table to use.
546 2. format -- this is an NC_FORMAT_XXX value defining the API to support: netcdf classic or netcdf enhanced.
547 
548 The construction of the model is primarily carried out by the function
549 *NC*infermodel()* (in *libdispatch/dinfermodel.c*).
550 It is given the following parameters:
551 1. path -- (IN) absolute file path or URL
552 2. modep -- (IN/OUT) the set of mode flags given to *NC_open* or *NC_create*.
553 3. iscreate -- (IN) distinguish open from create.
554 4. useparallel -- (IN) indicate if parallel IO can be used.
555 5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
556 6. model -- (IN/OUT) place to store inferred model.
557 7. newpathp -- (OUT) the canonical rewrite of the path argument.
558 
559 As a rule, these values are used in the this order to infer the model.
560 1. file contents -- highest precedence
561 2. url (if it is one) -- using the "mode=" key in the fragment (see below).
562 3. mode flags
563 4. default format -- lowest precedence
564 
565 If the path appears to be a URL, then it is parsed.
566 Information is extracted from the URL, and specifically,
567 the fragment key "mode=" is the critical element.
568 The URL will be rewritten to a canonical form with the following
569 changes.
570 1. The fragment part ("#..." at the end) is parsed and the "mode=" key
571  is extracted and its value is converted to a list of tags.
572 2. If the leading protocol is not http/https, then the protocol is added
573  to the mode list. That protocol is then replaced with either http or https.
574 3. Certain singleton values in the fragment are extracted and removed
575  and added to the mode list. Consider, for example, "http://....#dap4".
576  The "dap4" singleton is removed and added to the mode list.
577 4. For backward compatibility, the values of "proto=" and "protocol="
578  are removed from the fragment and their value is added to the mode list.
579 5. The final mode list is converted to a comma separated string
580  and re-inserted into the fragment.
581 6. The final mode list is modified to remove duplicates.
582 
583 The final result is the canonical form of the URL and is returned in the
584 newpathp argument described above.
585 
586 The mode list then is used as part of the inference process to choose
587 a dispatch table.
588 
589 # Point of Contact {#dispatch_poc}
590 
591 *Author*: Dennis Heimbigner<br>
592 *Email*: dmh at ucar dot edu<br>
593 *Initial Version*: 12/22/2021<br>
594 *Last Revised*: 11/15/2022