1 Internal Dispatch Table Architecture
2 ============================
3 <!-- double header is needed to workaround doxygen bug -->
5 # Internal Dispatch Table Architecture
9 # Introduction {#dispatch_intro}
11 The netcdf-c library uses an internal dispatch mechanism
12 as the means for wrapping the netcdf-c API around a wide variety
13 of underlying storage and stream data formats.
14 As of last check, the following formats are supported and each
15 has its own dispatch table.
17 Warning: some of the listed function signatures may be out of date
18 and the specific code should be consulted to see the actual parameters.
21 <tr><th>Format<td>Directory<th>NC_FORMATX Name
22 <tr><td>NetCDF-classic<td>libsrc<td>NC_FORMATX_NC3
23 <tr><td>NetCDF-enhanced<td>libhdf5<td>NC_FORMATX_NC_HDF5
24 <tr><td>HDF4<td>libhdf4<td>NC_FORMATX_NC_HDF4
25 <tr><td>PNetCDF<td>libsrcp<td>NC_FORMATX_PNETCDF
26 <tr><td>DAP2<td>libdap2<td>NC_FORMATX_DAP2
27 <tr><td>DAP4<td>libdap4<td>NC_FORMATX_DAP4
28 <tr><td>UDF0<td>N.A.<td>NC_FORMATX_UDF0
29 <tr><td>UDF1<td>N.A.<td>NC_FORMATX_UDF1
30 <tr><td>NCZarr<td>libnczarr<td>NC_FORMATX_NCZARR
33 Note that UDF0 and UDF1 allow for user-defined dispatch tables to
36 The idea is that when a user opens or creates a netcdf file, a
37 specific dispatch table is chosen. A dispatch table is a struct
38 containing an entry for (almost) every function in the netcdf-c API.
39 During execution, netcdf API calls are channeled through that
40 dispatch table to the appropriate function for implementing that
41 API call. The functions in the dispatch table are not quite the
42 same as those defined in *netcdf.h*. For simplicity and
43 compactness, some netcdf.h API calls are mapped to the same
44 dispatch table function. In addition to the functions, the first
45 entry in the table defines the model that this dispatch table
46 implements. It will be one of the NC_FORMATX_XXX values.
47 The second entry in the table is the version of the dispatch table.
48 The rule is that previous entries may not be removed, but new entries
49 may be added, and adding new entries increases the version number.
51 The dispatch table represents a distillation of the netcdf API down to
52 a minimal set of internal operations. The format of the dispatch table
53 is defined in the file *libdispatch/ncdispatch.h*. Every new dispatch
54 table must define this minimal set of operations.
56 # Adding a New Dispatch Table
57 In order to make this process concrete, let us assume we plan to add
58 an in-memory implementation of netcdf-3.
60 ## Defining configure.ac flags
62 Define a *–-enable* flag option for *configure.ac*. For our
63 example, we assume the option "--enable-ncm" and the
64 internal corresponding flag "enable_ncm". If you examine the existing
65 *configure.ac* and see how, for example, *--enable_dap2* is
66 defined, then it should be clear how to do it for your code.
68 ## Defining a "name space"
70 Choose some prefix of characters to identify the new dispatch
71 system. In effect we are defining a name-space. For our in-memory
72 system, we will choose "NCM" and "ncm". NCM is used for non-static
73 procedures to be entered into the dispatch table and ncm for all other
74 non-static procedures. Note that the chosen prefix should probably start
75 with "nc" or "NC" in order to avoid name conflicts outside the netcdf-c library.
77 ## Extend include/netcdf.h
79 Modify the file *include/netcdf.h* to add an NC_FORMATX_XXX flag
80 by adding a flag for this dispatch format at the appropriate places.
82 #define NC_FORMATX_NCM 7
85 Add any format specific new error codes.
90 ## Extend include/ncdispatch.h
92 Modify the file *include/ncdispatch.h* to
93 add format specific data and initialization functions;
94 note the use of our NCM namespace.
97 extern NC_Dispatch* NCM_dispatch_table;
98 extern int NCM_initialize(void);
102 ## Define the dispatch table functions
104 Define the functions necessary to fill in the dispatch table. As a
105 rule, we assume that a new directory is defined, *libsrcm*, say. Within
106 this directory, we need to define *Makefile.am* and *CMakeLists.txt*.
107 We also need to define the source files
108 containing the dispatch table and the functions to be placed in the
109 dispatch table -– call them *ncmdispatch.c* and *ncmdispatch.h*. Look at
110 *libsrc/nc3dispatch.[ch]* or *libnczarr/zdispatch.[ch]* for examples.
112 Similarly, it is best to take existing *Makefile.am* and *CMakeLists.txt*
113 files (from *libsrcp* for example) and modify them.
115 ## Adding the dispatch code to libnetcdf
117 Provide for the inclusion of this library in the final libnetcdf
118 library. This is accomplished by modifying *liblib/Makefile.am* by
119 adding something like the following.
122 libnetcdf_la_LIBADD += $(top_builddir)/libsrcm/libnetcdfm.la
126 ## Extend library initialization
128 Modify the *NC_initialize* function in *liblib/nc_initialize.c* by adding
129 appropriate references to the NCM dispatch function.
132 extern int NCM_initialize(void);
135 int NC_initialize(void)
139 if((stat = NCM_initialize())) return stat;
145 Finalization is handled in an analogous fashion.
147 ## Testing the new dispatch table
149 Typically, tests for a new dispatcher are kept in a separate directory
150 with a related name. For our running example, it might be *ncm_test*.
151 The file *ncm_test/Makefile.am*
152 will look something like this.
154 # These files are created by the tests.
156 # These are the tests which are always run.
157 TESTPROGRAMS = test1 test2 ...
158 test1_SOURCES = test1.c ...
161 check_PROGRAMS = $(TESTPROGRAMS)
162 TESTS = $(TESTPROGRAMS)
163 # Any extra files required by the tests
167 # Top-Level build of the dispatch code
169 Provide for *libnetcdfm* to be constructed by adding the following to
170 the top-level *Makefile.am*.
178 SUBDIRS = ... $(DISPATCHDIR) $(NCM) ... $(NCMTESTDIR)
181 # Choosing a Dispatch Table
183 The dispatch table is ultimately chosen by the function
184 NC_infermodel() in libdispatch/dinfermodel.c. This function is
185 invoked by the NC_create and the NC_open procedures. This can
186 be, unfortunately, a complex process. The detailed operation of
187 NC_infermodel() is defined in the companion document in docs/dinternal.md.
189 In any case, the choice of dispatch table is currently based on the following
190 pieces of information.
192 1. The mode argument – this can be used to detect, for example, what kind
193 of file to create: netcdf-3, netcdf-4, 64-bit netcdf-3, etc.
194 Using a mode flag is the most common mechanism, in which case
195 *netcdf.h* needs to be modified to define the relevant mode flag.
197 2. The file path – this can be used to detect, for example, a DAP url
198 versus a normal file system file. If the path looks like a URL, then
199 the fragment part of the URL is examined to determine the specific
202 3. The file contents - when the contents of a real file are available,
203 the contents of the file can be used to determine the dispatch table.
204 As a rule, this is likely to be useful only for *nc_open*.
206 4. If the file is being opened vs being created.
208 5. Is parallel IO available?
210 The *NC_infermodel* function returns two values.
212 1. model - this is used by nc_open and nc_create to choose the dispatch table.
213 2. newpath - in some case, usually URLS, the path may be rewritten to include extra information for use by the dispatch functions.
215 # Special Dispatch Table Signatures.
217 The entries in the dispatch table do not necessarily correspond
218 to the external API. In many cases, multiple related API functions
219 are merged into a single dispatch table entry.
223 The create table entry and the open table entry in the dispatch table
224 have the following signatures respectively.
226 int (*create)(const char *path, int cmode,
227 size_t initialsz, int basepe, size_t *chunksizehintp,
228 int useparallel, void* parameters,
229 struct NC_Dispatch* table, NC* ncp);
231 int (*open)(const char *path, int mode,
232 int basepe, size_t *chunksizehintp,
233 int use_parallel, void* parameters,
234 struct NC_Dispatch* table, NC* ncp);
237 The key difference is that these are the union of all the possible
238 create/open signatures from the include/netcdfXXX.h files. Note especially the last
239 three parameters. The parameters argument is a pointer to arbitrary data
240 to provide extra info to the dispatcher.
241 The table argument is included in case the create
242 function (e.g. *NCM_create_) needs to invoke other dispatch
243 functions. The very last argument, ncp, is a pointer to an NC
244 instance. The raw NC instance will have been created by *libdispatch/dfile.c*
245 and is passed to e.g. open with the expectation that it will be filled in
246 by the dispatch open function.
248 ## Accessing Data with put_vara() and get_vara()
251 int (*put_vara)(int ncid, int varid, const size_t *start, const size_t *count,
252 const void *value, nc_type memtype);
256 int (*get_vara)(int ncid, int varid, const size_t *start, const size_t *count,
257 void *value, nc_type memtype);
260 Most of the parameters are similar to the netcdf API parameters. The
261 last parameter, however, is the type of the data in
262 memory. Additionally, instead of using an "int islong" parameter, the
263 memtype will be either ::NC_INT or ::NC_INT64, depending on the value
264 of sizeof(long). This means that even netcdf-3 code must be prepared
265 to encounter the ::NC_INT64 type.
267 ## Accessing Attributes with put_attr() and get_attr()
270 int (*get_att)(int ncid, int varid, const char *name,
271 void *value, nc_type memtype);
275 int (*put_att)(int ncid, int varid, const char *name, nc_type datatype, size_t len,
276 const void *value, nc_type memtype);
279 Again, the key difference is the memtype parameter. As with
280 put/get_vara, it used ::NC_INT64 to encode the long case.
282 ## Pre-defined Dispatch Functions
284 It is sometimes not necessary to implement all the functions in the
285 dispatch table. Some pre-defined functions are available which may be
290 Many of The netCDF inquiry functions operate from an in-memory model of
291 metadata. Once a file is opened, or a file is created, this
292 in-memory metadata model is kept up to date. Consequenty the inquiry
293 functions do not depend on the dispatch layer code. These functions
294 can be used by all dispatch layers which use the internal netCDF
313 - NC4_inq_grpname_full
315 - NC4_inq_grp_full_ncid
323 ## NCDEFAULT get/put Functions
325 The mapped (varm) get/put functions have been
326 implemented in terms of the array (vara) functions. So dispatch layers
327 need only implement the vara functions, and can use the following
328 functions to get the and varm functions:
333 For the netcdf-3 format, the strided functions (nc_get/put_vars)
334 are similarly implemented in terms of the vara functions. So the following
335 convenience functions are available.
340 For the netcdf-4 format, the vars functions actually exist, so
341 the default vars functions are not used.
343 ## Read-Only Functions
345 Some dispatch layers are read-only (ex. HDF4). Any function which
346 writes to a file, including nc_create(), needs to return error code
347 ::NC_EPERM. The following read-only functions are available so that
348 these don't have to be re-implemented in each read-only dispatch layer:
365 ## Classic NetCDF Only Functions
367 There are two functions that are only used in the classic code. All
368 other dispatch layers (except PnetCDF) return error ::NC_ENOTNC3 for
369 these functions. The following functions are provided for this
375 # Appendix A. HDF4 Dispatch Layer as a Simple Example
377 The HDF4 dispatch layer is about the simplest possible dispatch
378 layer. It is read-only, classic model. It will serve as a nice, simple
379 example of a dispatch layer.
381 Note that the HDF4 layer is optional in the netCDF build. Not all
382 users will have HDF4 installed, and those users will not build with
383 the HDF4 dispatch layer enabled. For this reason HDF4 code is guarded
390 Code in libhdf4 is only compiled if HDF4 is
391 turned on in the build.
393 ## Header File Changes
395 Adding the HDF4 dispatch table will first require changes to
396 a number of header files.
398 ### The netcdf.h File
400 In the main netcdf.h file, we add the following
401 to the list of NC_FORMATX_XXX definitions
403 #define NC_FORMATX_NC_HDF4 (3)
406 ### The ncdispatch.h File
408 In ncdispatch.h we add the following:
412 extern NC_Dispatch* HDF4_dispatch_table;
413 extern int HDF4_initialize(void);
414 extern int HDF4_finalize(void);
418 ### The netcdf_meta.h File
420 The netcdf_meta.h file allows for easy determination of what features
421 are in use. For HDF4, the following is added -- as set by *./configure*:
423 #define NC_HAS_HDF4 0 /*!< HDF4 support. */
426 ### The hdf4dispatch.h File
428 The file *hdf4dispatch.h* contains prototypes and
429 macro definitions used within the HDF4 code in libhdf4. This include
430 file should not be used anywhere except in libhdf4. It can be kept
431 in either the *include* directory or (preferably) the *libhdf4* directory.
433 ### Initialization Code Changes in liblib Directory
435 The file *nc_initialize.c* is modified to include the following:
438 extern int HDF4_initialize(void);
439 extern int HDF4_finalize(void);
443 ### Changes to libdispatch/dfile.c
445 In order for a dispatch layer to be used, it must be correctly
446 determined in functions *NC_open()* or *NC_create()* in *libdispatch/dfile.c*.
447 HDF4 has a magic number that is detected in
448 *NC_interpret_magic_number()*, which allows *NC_open* to automatically
451 Once HDF4 is detected, the *model* variable is set to *NC_FORMATX_NC_HDF4*,
452 and later this is used in a case statement:
454 case NC_FORMATX_NC_HDF4:
455 dispatcher = HDF4_dispatch_table;
459 This sets the dispatcher to the HDF4 dispatcher, which is defined in
460 the libhdf4 directory.
462 ### Dispatch Table in libhdf4/hdf4dispatch.c
464 The file *hdf4dispatch.c* contains the definition of the HDF4 dispatch
465 table. It looks like this:
467 /* This is the dispatch object that holds pointers to all the
468 * functions that make up the HDF4 dispatch interface. */
469 static NC_Dispatch HDF4_dispatcher = {
470 NC_FORMATX_NC_HDF4, /* The model identifier */
471 NC_DISPATCH_VERSION, /* The version of this dispatch table */
478 NC_NOTNC4_set_var_chunk_cache,
479 NC_NOTNC4_get_var_chunk_cache,
483 Note that most functions use some of the predefined dispatch
484 functions. Functions that start with NC_RO* are read-only, they return
485 ::NC_EPERM. Functions that start with NOTNC4* return ::NC_ENOTNC4.
487 Only the functions that start with NC_HDF4* need to be implemented for
488 the HDF4 dispatch layer. There are 6 such functions:
494 - NC_HDF4_inq_format_extended
497 ### HDF4 Reading Code
499 The code in *hdf4file.c* opens the HDF4 SD dataset, and reads the
500 metadata. This metadata is stored in the netCDF internal metadata
501 model, allowing the inq functions to work.
503 The code in *hdf4var.c* does an *nc_get_vara()* on the HDF4 SD
504 dataset. This is all that is needed for all the nc_get_* functions to
507 # Appendix A. Changing NC_DISPATCH_VERSION
509 When new entries are added to the *struct NC_Dispatch* type `located in include/netcdf_dispatch.h.in` it is necessary to do two things.
511 1. Bump the NC_DISPATCH_VERSION number
512 2. Modify the existing dispatch tables to include the new entries.
513 It if often the case that the new entries do not mean anything for
514 a given dispatch table. In that case, the new entries may be set to
515 some variant of *NC_RO_XXX* or *NC_NOTNC4_XXX* *NC_NOTNC3_XXX*.
517 Modifying the dispatch version requires two steps:
518 1. Modify the version number in *netcdf-c/configure.ac*, and
519 2. Modify the version number in *netcdf-c/CMakeLists.txt*.
521 The two should agree in value.
523 ## NC_DISPATCH_VERSION Incompatibility
525 When dynamically adding a dispatch table
526 -- in nc_def_user_format (see libdispatch/dfile.c) --
527 the version of the new table is compared with that of the built-in
528 NC_DISPATCH_VERSION; if they differ, then an error is returned from
531 # Appendix B. Inferring the Dispatch Table
533 As mentioned above, the dispatch table is inferred using the following
537 3. The file contents (when available)
539 The primary function for doing this inference is in the file
540 *libdispatch/dinfermodel.c* via the API in *include/ncmodel.h*.
541 The term *model* is used here to include (at least) the following
542 information (see the structure type *NCmodel* in *include/ncmodel.h*).
544 1. impl -- this is an NC_FORMATX_XXX value defining, in effect, the
545 dispatch table to use.
546 2. format -- this is an NC_FORMAT_XXX value defining the API to support: netcdf classic or netcdf enhanced.
548 The construction of the model is primarily carried out by the function
549 *NC*infermodel()* (in *libdispatch/dinfermodel.c*).
550 It is given the following parameters:
551 1. path -- (IN) absolute file path or URL
552 2. modep -- (IN/OUT) the set of mode flags given to *NC_open* or *NC_create*.
553 3. iscreate -- (IN) distinguish open from create.
554 4. useparallel -- (IN) indicate if parallel IO can be used.
555 5. params -- (IN/OUT) arbitrary data dependent on the mode and path.
556 6. model -- (IN/OUT) place to store inferred model.
557 7. newpathp -- (OUT) the canonical rewrite of the path argument.
559 As a rule, these values are used in the this order to infer the model.
560 1. file contents -- highest precedence
561 2. url (if it is one) -- using the "mode=" key in the fragment (see below).
563 4. default format -- lowest precedence
565 If the path appears to be a URL, then it is parsed.
566 Information is extracted from the URL, and specifically,
567 the fragment key "mode=" is the critical element.
568 The URL will be rewritten to a canonical form with the following
570 1. The fragment part ("#..." at the end) is parsed and the "mode=" key
571 is extracted and its value is converted to a list of tags.
572 2. If the leading protocol is not http/https, then the protocol is added
573 to the mode list. That protocol is then replaced with either http or https.
574 3. Certain singleton values in the fragment are extracted and removed
575 and added to the mode list. Consider, for example, "http://....#dap4".
576 The "dap4" singleton is removed and added to the mode list.
577 4. For backward compatibility, the values of "proto=" and "protocol="
578 are removed from the fragment and their value is added to the mode list.
579 5. The final mode list is converted to a comma separated string
580 and re-inserted into the fragment.
581 6. The final mode list is modified to remove duplicates.
583 The final result is the canonical form of the URL and is returned in the
584 newpathp argument described above.
586 The mode list then is used as part of the inference process to choose
589 # Point of Contact {#dispatch_poc}
591 *Author*: Dennis Heimbigner<br>
592 *Email*: dmh at ucar dot edu<br>
593 *Initial Version*: 12/22/2021<br>
594 *Last Revised*: 11/15/2022