DataArray#

DataArray is the base class to load and manipulate big datasets to in xdas. It is mainly composed of two attributes:

data: any N-dimensional array-like object. Compared to xarray xdas.DataArray are more permissive to the kinds of array-like objects that can be used. In particular, virtual arrays can be used.
coords: a dict-like container of coordinates. As opposed to xarray, which uses dense arrays to label each point, xdas also implements interpolated coordinates that provides an efficient representation of evenly spaced data (gracefully handling gaps and small sampling variations).

Other important attributes are:

dims: a tuple that assign to each axis position a dimension name that is defined in the coords attribute. Note that having a coordinate per dimension is not mandatory and that the order of the coords does not necessary follow the order of the dims attribute.
name: the name of the array to specify the quantity stored (e.g., "velocity").
attribute: a dictionary containing metadata. Note that xdas does not use those metadata. It tries to keep as much as possible the information stored there as the DataArray is manipulated but it is up to the user to update information there if needed.

In the following examples, we use only one DataArray, if you have several DataArrays, you will just have to adapt the paths argument.

Creating a DataArray#

The user can wrap together an n-dimensional array and some related coordinates. See the related description of how to create coordinates here. For example:

import numpy as np
import xdas as xd

data = np.zeros((6000, 1000))
starttime = np.datetime64("2023-01-01T00:00:00")
endtime = starttime + np.timedelta64(10, "ms") * (data.shape[0] - 1)
distance = 5.0 * np.arange(data.shape[1])

da = xd.DataArray(
    data=data,
    coords={
        "time": {
            "tie_indices": [0, data.shape[0] - 1],
            "tie_values": [starttime, endtime],
        },
        "distance": distance,
    },
)
da

<xdas.DataArray (time: 6000, distance: 1000)>
[[0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 ...
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]
 [0. 0. 0. ... 0. 0. 0.]]
Coordinates:
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990
  * distance (distance): [   0. ... 4995.]

Writing a DataArray to disk#

xdas uses the CF conventions to write xdas.DataArray to disk as netCDF4 files. If the DataArray was generated from a netCDF4/HDF5 file and only slicing was performed, the DataArray can be written as a pointer to the original data using the virtual argument. See the part on Virtual Datasets.

da.to_netcdf("dataarray.nc", virtual=None)  # try to write virtual, here it's impossible

Reading a DataArray from disk#

Xdas can read several DAS file format with open() along with its own format. Xdas uses the netCDF4 format with CF conventions. By default Xdas assumes that files are Xdas NetCDF format. If not the case the engine argument must be passed.

To learn how to read your custom DAS data format with xdas, please see the chapter on Data Formats.

da = xd.open("dataarray.nc", engine=None)  # by default Xdas NetCDF
da

<xdas.DataArray (time: 6000, distance: 1000)>
VirtualSource: 45.8MB (float64)
Coordinates:
  * distance (distance): [   0. ... 4995.]
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990

By default any file is read in virtual mode, meaning that at this point only the metadata have been read.

Using Compression#

Compression that are included in HDF5 and NETCDF4 can be used along with additional one that are provided by the hdf5plugin library.

In this example, we use the Zfp compression which is a lossy compression that is particularly suited for floating point numbers. The recommended compression scheme is the fixed accuracy mode which ensure that your data is not altered by the compression above that threshold in absolute value. Be careful to choose a value which is much lower than your instrumental noise. Compression ratio of around 3-4 can usually be achieved in such a way. For big files, compressing by chunks can be useful to enhance slicing through the data (otherwise the entire data must be decompressed each time some part must be accessed).

import hdf5plugin
encoding = {"chunks": (10, 10), **hdf5plugin.Zfp(accuracy=1e-6)}

da.to_netcdf("chunked_and_compressed.nc", virtual=False, encoding=encoding)

Reading compressed data is completely transparent, you do not need to specify anything.

xd.open("chunked_and_compressed.nc")

<xdas.DataArray (time: 6000, distance: 1000)>
VirtualSource: 45.8MB (float64)
Coordinates:
  * distance (distance): [   0. ... 4995.]
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990

Note that the indicated data size is the uncompressed data size.

Assign new coordinates to your DataArray#

You can either replace the existing coordinates by new ones or assign new coordinates to a xdas.DataArray and link it them an existing dimension.

Replace existing coordinates#

In the example below, we replace the “distance” coordinate with new ones.

new_distances = np.linspace(30.8, 40.9, da.shape[1])
assigned = da.assign_coords(distance=new_distances)
assigned

<xdas.DataArray (time: 6000, distance: 1000)>
VirtualSource: 45.8MB (float64)
Coordinates:
  * distance (distance): [30.8 ... 40.9]
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990

Add new coordinates and link them to an existing dimension#

In the example below, we will add the new coordinate “latitude” linked with the “distance” dimension.

latitudes = np.linspace(-33.90, -35.90, da.shape[1])
assigned = da.assign_coords(latitude=("distance", latitudes))
assigned

<xdas.DataArray (time: 6000, distance: 1000)>
VirtualSource: 45.8MB (float64)
Coordinates:
  * distance (distance): [   0. ... 4995.]
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990
    latitude (distance): [-33.9 ... -35.9]

You can also swap a dimension to one of the new coordinates.

swapped = assigned.swap_dims({"distance": "latitude"})
swapped

<xdas.DataArray (time: 6000, latitude: 1000)>
VirtualSource: 45.8MB (float64)
Coordinates:
    distance (latitude): [   0. ... 4995.]
  * time (time): 2023-01-01T00:00:00.000 to 2023-01-01T00:00:59.990
  * latitude (latitude): [-33.9 ... -35.9]

Plot your DataArray#

xdas.DataArray includes the function xdas.DataArray.plot(). It uses the xarray way of plotting data depending on the number of dimensions your data array has. You’ll have to adapt the arguments and keyword arguments in xdas.DataArray.plot() depending on the dimensionality of your data:

If your xdas.DataArray has one dimension, please refer to the arguments and kwargs from the ‘xarray.plot.line’ function.
For 2 dimensions or more, please refer to the ‘xarray.plot.imshow’ function.
For other, please refer to ‘xarray.plot.hist’ function.