---
file_format: mystnb
kernelspec:
  name: python3
---

```{code-cell}
:tags: [remove-cell]

import os
os.chdir("../../_data")
```

# DataArray

{py:class}`~xdas.DataArray` is the base class to load and manipulate big datasets to in *xdas*. It is mainly composed of two attributes: 

- `data`: any N-dimensional array-like object. Compared to *xarray* `xdas.DataArray` are more permissive to the kinds of array-like objects that can be used. In particular, [virtual arrays](../io/virtual-datasets) can be used.
- `coords`: a dict-like container of coordinates. As opposed to *xarray*, which uses dense arrays to label each point, *xdas* also implements [interpolated coordinates](../coordinates/interpolated-coordinates) that provides an efficient representation of evenly spaced data (gracefully handling gaps and small sampling variations).

![](/_static/dataarray.svg)

Other important attributes are:

- `dims`: a tuple that assign to each axis position a dimension name that is defined in the `coords` attribute. Note that having a coordinate per dimension is not mandatory and that the order of the `coords` does not necessary follow the order of the `dims` attribute. 
- `name`: the name of the array to specify the quantity stored (e.g., `"velocity"`).
- `attribute`: a dictionary containing metadata. Note that *xdas* does not use those metadata. It tries to keep as much as possible the information stored there as the `DataArray` is manipulated but it is up to the user to update information there if needed.

In the following examples, we use only one `DataArray`, if you have several `DataArray`s, you will just have to adapt the paths argument.

## Creating a DataArray

The user can wrap together an n-dimensional array and some related coordinates. See the related description of how to create coordinates [here](../coordinates/interpolated-coordinates.md). For example:


```{code-cell}
import numpy as np
import xdas as xd

data = np.zeros((6000, 1000))
starttime = np.datetime64("2023-01-01T00:00:00")
endtime = starttime + np.timedelta64(10, "ms") * (data.shape[0] - 1)
distance = 5.0 * np.arange(data.shape[1])

da = xd.DataArray(
    data=data,
    coords={
        "time": {
            "tie_indices": [0, data.shape[0] - 1],
            "tie_values": [starttime, endtime],
        },
        "distance": distance,
    },
)
da
```

## Writing a DataArray to disk

*xdas* uses the CF conventions to write {py:class}`xdas.DataArray` to disk as netCDF4 files. If the DataArray was generated from a netCDF4/HDF5 file and only slicing was performed, the DataArray can be written as a pointer to the original data using the `virtual` argument. See the part on [](../io/virtual-datasets).

```{code-cell}
da.to_netcdf("dataarray.nc", virtual=None)  # try to write virtual, here it's impossible
```

## Reading a DataArray from disk

Xdas can read several DAS file format with {py:func}`~xdas.open` along with its own format. Xdas uses the netCDF4 format with CF conventions. By default Xdas assumes that files are Xdas NetCDF format. If not the case the `engine` argument must be passed.

To learn how to read your custom DAS data format with *xdas*, please see the chapter on [](../io/data-formats.md).

```{code-cell}
da = xd.open("dataarray.nc", engine=None)  # by default Xdas NetCDF
da
```

By default any file is read in virtual mode, meaning that at this point only the metadata have been read.

## Using Compression

Compression that are included in HDF5 and NETCDF4 can be used along with additional one that are provided by the [hdf5plugin](https://hdf5plugin.readthedocs.io) library.

In this example, we use the Zfp compression which is a lossy compression that is particularly suited for floating point numbers. The recommended compression scheme is the *fixed accuracy mode* which ensure that your data is not altered by the compression above that threshold in absolute value. Be careful to choose a value which is much lower than your instrumental noise. Compression ratio of around 3-4 can usually be achieved in such a way. For big files, compressing by chunks can be useful to enhance slicing through the data (otherwise the entire data must be decompressed each time some part must be accessed).

```{code-cell}
import hdf5plugin
encoding = {"chunks": (10, 10), **hdf5plugin.Zfp(accuracy=1e-6)}

da.to_netcdf("chunked_and_compressed.nc", virtual=False, encoding=encoding)
```

Reading compressed data is completely transparent, you do not need to specify anything.

```{code-cell}
xd.open("chunked_and_compressed.nc")
```

Note that the indicated data size is the uncompressed data size.

## Assign new coordinates to your DataArray

You can either replace the existing coordinates by new ones or assign new coordinates to a {py:class}`xdas.DataArray` and link it them an existing dimension. 

### Replace existing coordinates

In the example below, we replace the "distance" coordinate with new ones.

```{code-cell}
new_distances = np.linspace(30.8, 40.9, da.shape[1])
assigned = da.assign_coords(distance=new_distances)
assigned
```

### Add new coordinates and link them to an existing dimension

In the example below, we will add the new coordinate "latitude" linked with the "distance" dimension.

```{code-cell}
latitudes = np.linspace(-33.90, -35.90, da.shape[1])
assigned = da.assign_coords(latitude=("distance", latitudes))
assigned
```

You can also swap a dimension to one of the new coordinates.

```{code-cell}
swapped = assigned.swap_dims({"distance": "latitude"})
swapped
```

## Plot your DataArray

{py:class}`xdas.DataArray` includes the function {py:func}`xdas.DataArray.plot`. It uses the *xarray* way of plotting data depending on the number of dimensions your data array has. You'll have to adapt the arguments and keyword arguments in {py:func}`xdas.DataArray.plot` depending on the dimensionality of your data:

- If your {py:class}`xdas.DataArray` has one dimension, please refer to the arguments and kwargs from the 'xarray.plot.line' function.
- For 2 dimensions or more, please refer to the 'xarray.plot.imshow' function.
- For other, please refer to 'xarray.plot.hist' function.