--- file_format: mystnb kernelspec: name: python3 --- ```{code-cell} :tags: [remove-cell] import os os.chdir("../../_data") ``` # DataArray {py:class}`~xdas.DataArray` is the base class to load and manipulate big datasets to in *xdas*. It is mainly composed of two attributes: - `data`: any N-dimensional array-like object. Compared to *xarray* `xdas.DataArray` are more permissive to the kinds of array-like objects that can be used. In particular, [virtual arrays](../io/virtual-datasets) can be used. - `coords`: a dict-like container of coordinates. As opposed to *xarray*, which uses dense arrays to label each point, *xdas* also implements [interpolated coordinates](../coordinates/interpolated-coordinates) that provides an efficient representation of evenly spaced data (gracefully handling gaps and small sampling variations). ![](/_static/dataarray.svg) Other important attributes are: - `dims`: a tuple that assign to each axis position a dimension name that is defined in the `coords` attribute. Note that having a coordinate per dimension is not mandatory and that the order of the `coords` does not necessary follow the order of the `dims` attribute. - `name`: the name of the array to specify the quantity stored (e.g., `"velocity"`). - `attribute`: a dictionary containing metadata. Note that *xdas* does not use those metadata. It tries to keep as much as possible the information stored there as the `DataArray` is manipulated but it is up to the user to update information there if needed. In the following examples, we use only one `DataArray`, if you have several `DataArray`s, you will just have to adapt the paths argument. ## Creating a DataArray The user can wrap together an n-dimensional array and some related coordinates. See the related description of how to create coordinates [here](../coordinates/interpolated-coordinates.md). For example: ```{code-cell} import numpy as np import xdas as xd data = np.zeros((6000, 1000)) starttime = np.datetime64("2023-01-01T00:00:00") endtime = starttime + np.timedelta64(10, "ms") * (data.shape[0] - 1) distance = 5.0 * np.arange(data.shape[1]) da = xd.DataArray( data=data, coords={ "time": { "tie_indices": [0, data.shape[0] - 1], "tie_values": [starttime, endtime], }, "distance": distance, }, ) da ``` ## Writing a DataArray to disk *xdas* uses the CF conventions to write {py:class}`xdas.DataArray` to disk as netCDF4 files. If the DataArray was generated from a netCDF4/HDF5 file and only slicing was performed, the DataArray can be written as a pointer to the original data using the `virtual` argument. See the part on [](../io/virtual-datasets). ```{code-cell} da.to_netcdf("dataarray.nc", virtual=None) # try to write virtual, here it's impossible ``` ## Reading a DataArray from disk Xdas can read several DAS file format with {py:func}`~xdas.open` along with its own format. Xdas uses the netCDF4 format with CF conventions. By default Xdas assumes that files are Xdas NetCDF format. If not the case the `engine` argument must be passed. To learn how to read your custom DAS data format with *xdas*, please see the chapter on [](../io/data-formats.md). ```{code-cell} da = xd.open("dataarray.nc", engine=None) # by default Xdas NetCDF da ``` By default any file is read in virtual mode, meaning that at this point only the metadata have been read. ## Using Compression Compression that are included in HDF5 and NETCDF4 can be used along with additional one that are provided by the [hdf5plugin](https://hdf5plugin.readthedocs.io) library. In this example, we use the Zfp compression which is a lossy compression that is particularly suited for floating point numbers. The recommended compression scheme is the *fixed accuracy mode* which ensure that your data is not altered by the compression above that threshold in absolute value. Be careful to choose a value which is much lower than your instrumental noise. Compression ratio of around 3-4 can usually be achieved in such a way. For big files, compressing by chunks can be useful to enhance slicing through the data (otherwise the entire data must be decompressed each time some part must be accessed). ```{code-cell} import hdf5plugin encoding = {"chunks": (10, 10), **hdf5plugin.Zfp(accuracy=1e-6)} da.to_netcdf("chunked_and_compressed.nc", virtual=False, encoding=encoding) ``` Reading compressed data is completely transparent, you do not need to specify anything. ```{code-cell} xd.open("chunked_and_compressed.nc") ``` Note that the indicated data size is the uncompressed data size. ## Assign new coordinates to your DataArray You can either replace the existing coordinates by new ones or assign new coordinates to a {py:class}`xdas.DataArray` and link it them an existing dimension. ### Replace existing coordinates In the example below, we replace the "distance" coordinate with new ones. ```{code-cell} new_distances = np.linspace(30.8, 40.9, da.shape[1]) assigned = da.assign_coords(distance=new_distances) assigned ``` ### Add new coordinates and link them to an existing dimension In the example below, we will add the new coordinate "latitude" linked with the "distance" dimension. ```{code-cell} latitudes = np.linspace(-33.90, -35.90, da.shape[1]) assigned = da.assign_coords(latitude=("distance", latitudes)) assigned ``` You can also swap a dimension to one of the new coordinates. ```{code-cell} swapped = assigned.swap_dims({"distance": "latitude"}) swapped ``` ## Plot your DataArray {py:class}`xdas.DataArray` includes the function {py:func}`xdas.DataArray.plot`. It uses the *xarray* way of plotting data depending on the number of dimensions your data array has. You'll have to adapt the arguments and keyword arguments in {py:func}`xdas.DataArray.plot` depending on the dimensionality of your data: - If your {py:class}`xdas.DataArray` has one dimension, please refer to the arguments and kwargs from the 'xarray.plot.line' function. - For 2 dimensions or more, please refer to the 'xarray.plot.imshow' function. - For other, please refer to 'xarray.plot.hist' function.