NetCDF  4.9.3
byterange.md
1 NetCDF Byterange Support {#netcdf_byterange}
2 ================================
3 
4 [TOC]
5 <!-- Note that this file has the .dox extension, but is mostly markdown -->
6 <!-- Begin MarkDown -->
7 
8 # Introduction {#byterange_intro}
9 
10 Suppose that you have the URL to a remote dataset
11 which is a normal netcdf-3 or netcdf-4 file.
12 
13 The netCDF-c library now supports read-only access to such
14 datasets using the HTTP byte range capability [], assuming that
15 the remote server supports byte-range access.
16 
17 Two examples:
18 
19 1. A Thredds server supporting the Thredds "fileserver" Thredds protocol, and containing a netcdf classic file.
20  - location: "https://remotetest.unidata.ucar.edu/thredds/fileserver/testdata/2004050300_eta_211.nc#mode=bytes"
21 2. An Amazon S3 dataset containing a netcdf enhanced file.
22  - location: "http://noaa-goes16.s3.amazonaws.com/ABI-L1b-RadC/2017/059/03/OR_ABI-L1b-RadC-M3C13_G16_s20170590337505_e20170590340289_c20170590340316.nc#mode=bytes"
23 
24 Other remote servers may also provide byte-range access in a similar form.
25 
26 It is important to note that this is not intended as a true
27 production capability because it is believed that this kind of access
28 can be quite slow. In addition, the byte-range IO drivers do not
29 currently do any sort of optimization or caching.
30 
31 # Configuration {#byterange_config}
32 
33 This capability is enabled using the option *--enable-byterange* option
34 to the *./configure* command for Automake. For Cmake, the option flag is
35 *-DNETCDF_ENABLE_BYTERANGE=true*.
36 
37 This capability requires access to *libcurl*, and an error will occur
38 if byterange is enabled, but *libcurl* could not be located.
39 In this, it is similar to the DAP2 and DAP4 capabilities.
40 
41 # Run-time Usage {#byterange_url}
42 
43 In order to use this capability at run-time, with *ncdump* for
44 example, it is necessary to provide a URL pointing to the basic
45 dataset to be accessed. The URL must be annotated to tell the
46 netcdf-c library that byte-range access should be used. This is
47 indicated by appending the phrase ````#mode=bytes````
48 to the end of the URL.
49 The two examples above show how this will look.
50 
51 In order to determine the kind of file being accessed, the
52 netcdf-c library will read what is called the "magic number"
53 from the beginning of the remote dataset. This magic number
54 is a specific set of bytes that indicates the kind of file:
55 classic, enhanced, cdf5, etc.
56 
57 # Architecture {#byterange_arch}
58 
59 Internally, this capability is implemented with the following drivers:
60 
61 1. libdispatch/dhttp.c -- wrap libcurl operations.
62 2. libsrc/httpio.c -- provide byte-range reading to the netcdf-3 dispatcher.
63 3. libhdf5/H5FDhttp.c -- provide byte-range reading to the netcdf-4 dispatcher for non-cloud storage.
64 4. H5FDros3.c -- provide byte-range reading to the netcdf-4 dispatcher for cloud storage (Amazon S3 currently).
65 
66 Both *httpio.c* and *H5FDhttp.c* are adapters that use *dhttp.c*
67 to do the work. Testing for the magic number is also carried out
68 by using the *dhttp.c* code.
69 *H5FDros3* is also an adapter, but specialized for cloud storage access.
70 
71 ## NetCDF Classic Access
72 
73 The netcdf-3 code in the directory *libsrc* is built using
74 a secondary dispatch mechanism called *ncio*. This allows the
75 netcdf-3 code be independent of the lowest level IO access mechanisms.
76 This is how in-memory and mmap based access is implemented.
77 The file *httpio.c* is the dispatcher used to provide byte-range
78 IO for the netcdf-3 code.
79 Note that *httpio.c* is mostly just an
80 adapter between the *ncio* API and the *dhttp.c* code.
81 
82 ## NetCDF Enhanced Access
83 
84 ### Non-Cloud Access
85 Similar to the netcdf-3 code, the HDF5 library
86 provides a secondary dispatch mechanism *H5FD*. This allows the
87 HDF5 code to be independent of the lowest level IO access mechanisms.
88 The netcdf-4 code in libhdf5 is built on the HDF5 library, so
89 it indirectly inherits the H5FD mechanism.
90 
91 The file *H5FDhttp.c* implements the H5FD dispatcher API
92 and provides byte-range IO for the netcdf-4 code
93 (and for the HDF5 library as a side effect).
94 It only works for non-cloud servers such as the Unidata Thredds server.
95 
96 Note that *H5FDhttp.c* is mostly just an
97 adapter between the *H5FD* API and the *dhttp.c* code.
98 
99 #### The dhttp.c Code {#byterange_dhttp}
100 
101 The core of all this is *dhttp.c* (and its header
102 *include/nchttp.c*). It is a wrapper over *libcurl*
103 and so exposes the libcurl handles -- albeit as _void*_.
104 
105 The API for *dhttp.c* consists of the following procedures:
106 - int nc_http_open(const char* objecturl, void** curlp, fileoffset_t* filelenp);
107 - int nc_http_read(void* curl, const char* url, fileoffset_t start, fileoffset_t count, NCbytes* buf);
108 - int nc_http_close(void* curl);
109 - typedef long long fileoffset_t;
110 
111 The type *fileoffset_t* is used to avoid use of *off_t* or *off64_t*
112 which are too volatile. It is intended to be represent file lengths
113 and offsets.
114 
115 ##### nc_http_open
116 The *nc_http_open* procedure creates a *Curl* handle and returns it
117 in the *curlp* argument. It also obtains and searches the headers
118 looking for two headers:
119 
120 1. "Accept-Ranges: bytes" -- to verify that byte-range access is supported.
121 2. "Content-Length: ..." -- to obtain the size of the remote dataset.
122 
123 The dataset length is returned in the *filelenp* argument.
124 
125 #### nc_http_read
126 
127 The *nc_http_read* procedure reads a specified set of contiguous bytes
128 as specified by the *start* and *count* arguments. It takes the *Curl*
129 handle produced by *nc_http_open* to indicate the server from which to read.
130 
131 The *buf* argument is a pointer to an instance of type *NCbytes*, which
132 is a dynamically expandable byte vector (see the file *include/ncbytes.h*).
133 
134 This procedure reads *count* bytes from the remote dataset starting at
135 the offset *start* position. The bytes are stored in *buf*.
136 
137 #### nc_http_close
138 
139 The *nc_http_close* function closes the *Curl* handle and does any
140 necessary cleanup.
141 
142 ### Cloud Access
143 
144 The HDF5 library code-base also provides a Virtual File Drive (VFD)
145 capable of providing byte-range access to cloud storage
146 (Amazon S3 specifically).
147 
148 This VFD is called *H5FDros3*. In order for the netcdf library
149 to make use of it, the HDF5 library must be built using the
150 *--enable-ros3-vfd* option.
151 Netcdf can discover that this capability was enabled and can
152 then make use of it to provide byte-range access to the cloud.
153 
154 # Point of Contact {#byterange_poc}
155 
156 __Author__: Dennis Heimbigner<br>
157 __Email__: dmh at ucar dot edu<br>
158 __Initial Version__: 12/30/2018<br>
159 __Last Revised__: 3/11/2023
160 
161 <!-- End MarkDown -->