Data Formats#

xdas implements some of the more commonly used DAS data formats, but it can be extended to work with other specific formats. In this part we will cover:

  • How to use xdas with an already implemented file format.

  • How to use xdas with your specific data format.

Implemented file formats#

Here below the list of formats that are currently implemented. All HDF5 based formats support native virtualization. Other formats support Dask virtualization. Please refer to the Virtual Datasets section. Xdas should automatically detect the correct file format. You can still specify which one you want in the engine argument in xdas.open().

Xdas support the following DAS formats:

Constructor

Instrument

engine argument

Virtualization

AP Sensing

DAS N5*

"apsensing"

HDF5

ASN

OptoDAS

"asn"

HDF5

FEBUS

A1

"febus"

HDF5

OptaSense

OLA, ODH*, …

"optasense"

HDF5

Silixa

iDAS

"silixa"

Dask

SINTELA

ONYX

"sintela"

HDF5

Terra15

Treble

"terra15"

HDF5

It also implements its own format and support ProdML and miniSEED:

Format

engine argument

Virtualization

Xdas

None

HDF5

ProdML

"prodml"

HDF5

miniSEED

"miniseed"

Dask

Warning

Due to poor documentation of the various version of the Febus format, it is recommended to manually provide the required trimming and the position of the timestamps within each block. For example to trim 100 samples on both side of each block and to set the timestamp location at the center of the block for a block of 2000 samples: xdas.open("path.h5", engine="febus", overlaps=(100, 100), offset=1000)

Extending xdas with your file format#

xdas insists on its extensibility, the power is in the hands of the users. Extending xdas usually consists of writing few-line-of-code-long functions. The process consists in dealing with the two main aspects of a xarray.DataArray: unpacking the data and coordinates objects, eventually processing them and packing them back into a Database object.

Function-based solution#

To add a new file format the user can specify a function that read one file and outputs a xarray.DataArray. This function can then be passed as an engine keyword argument to the xdas.open() function. The reading function must fetch and parse the data and coordinates information.

Adding the support for a new file format generally consists in providing the path to the data array and parsing the start time and spatial and temporal spacing as in the example below.

import h5py
import numpy as np
import xdas as xd
from xdas import DataArray
from xdas.virtual import VirtualSource

def open_dataarray(fname):
    with h5py.File(fname, "r") as file:
        t0 = np.datetime64(file["dataset"].attrs["t0"]).astype("datetime64[ms]")
        dt = np.timedelta64(int(file["dataset"].attrs["dt"]*1e3), "ms")
        dx = file["dataset"].attrs["dx"][()]
        data = VirtualSource(file["dataset"])
    nt, nx = data.shape
    t = {"tie_indices": [0, nt - 1], "tie_values": [t0, t0 + (nt - 1) * dt]}
    x = {"tie_indices": [0, nx - 1], "tie_values": [0.0, (nx - 1) * dx]}
    return DataArray(data, {"time": t, "distance": x})

# Replace "other_format.hdf5" by the path of your file
da = xd.open("other_format.hdf5", engine=open_dataarray)
da
<xdas.DataArray (time: 20, distance: 10)>
VirtualSource: 800.0B (float32)
Coordinates:
  * time (time): 2024-01-01T14:00:00.000 to 2024-01-01T14:00:00.190
  * distance (distance): 0.000 to 90.000

This example is for one file. For multi-file datasets please indicate the path of your files with a ‘*’ before the file format if all your files are in the same folder or pass a list of paths.

Class-based solution#

To add support in a more complete way, you can also create your own engine by inheriting from the xdas.io.Engine abstract class. Note that when the class is defined, the name keyword argument allows to register the new engine along with the aliases one that is useful when several instruments share the same data format. This allows to add your engine to the Engine._registry and to retrieve it by doing Engine[name]. The _supported_vtypes and _supported_ctypes class attributes allow to determine which kind of virtualization backend and type of coordinates can be used with this file format. When you open any file, you can additionally provide the vtype and ctype keyword arguments to specify which backends to use. The Engine class defines the __init__ method that checks those passed kwargs and stores in self.vtype and self.ctype the chosen backends.

from xdas.io import Engine
from xdas.coordinates import Coordinate

class MyEngine(Engine, name="my_engine", aliases=["other_engine"]):
    _supported_vtypes = ["hdf5"]
    _supported_ctypes = {
        "distance": ["interpolated", "sampled", "dense"],
        "time": ["interpolated", "sampled", "dense"],
    }

    def open_dataarray(self, fname):
        with h5py.File(fname, "r") as file:
            t0 = np.datetime64(file["dataset"].attrs["t0"]).astype("datetime64[ms]")
            dt = np.timedelta64(int(file["dataset"].attrs["dt"]*1e3), "ms")
            x0 = file["dataset"].attrs["x0"][()]
            dx = file["dataset"].attrs["dx"][()]
            data = VirtualSource(file["dataset"])
        nt, nx = data.shape
        t = Coordinate[self.ctype["time"]].from_block(t0, nt, dt, dim="time")
        x = Coordinate[self.ctype["distance"]].from_block(x0, nx, dx, dim="distance")
        return DataArray(data, {"time": t, "distance": x})

Once the class is created and instanciated you can then use it :

# Replace "other_format.hdf5" by the path of your file
da = xd.open("other_format.hdf5", engine="my_engine", ctype="sampled")
da
<xdas.DataArray (time: 20, distance: 10)>
VirtualSource: 800.0B (float32)
Coordinates:
  * time (time): 2024-01-01T14:00:00.000 to 2024-01-01T14:00:00.200
  * distance (distance): 0.000 to 100.000