Interpolated Coordinates#

Coordinate#

Because DAS data are generally sampled with a constant sampling rate/resolution, keeping the corresponding value for each index as a dense array is inefficient. xdas stores the coordinates using the CF convention through the xdas.Coordinate object. With this method, only a few tie points are kept and intermediate values are retrieved by linear interpolation. Discontinuities are marked by two consecutive tie points, as illustrated below:

The resulting coordinate vector is sparse but contains all the information necessary to exactly recover the original, dense coordinate vector.

Creating a Coordinate#

The xdas.Coordinate constructor takes tie_indices and tie_values as inputs. The code below corresponds with the example illustrated in the figure above:

import xdas as xd

coord = xd.Coordinate(
    {
        "tie_indices": [0, 9, 19, 20, 29],
        "tie_values": [0.0, 90.0, 190.0, 400.0, 490.0]
    }
)
coord 
0.000 to 490.000

The resulting object acts as an numpy.ndarray object. Indexing and selecting works out of the box. Note that when specifying an increment step greater than 1, the tie points can be displaced a little bit.

coord = coord[1:-3:2]
coord
10.000 to 450.000

A major advantage of xdas.Coordinate is that it enables label-based selection. For instance, to retrieve the index of a value the get_index() method can be used:

coord.to_index(430.0)
np.int64(11)

Warning

To be able to do label-based selection, tie_values must be strictly increasing. In other words there must not be any overlap. To deal with small overlaps, a solution is to simplify the coordinates, increasing the tolerance such that the overlapping points disappear.

Gaps and Overlaps#

Gaps and Overlaps can be easily identified based on the tie point positions, and extracted with:

coord.get_discontinuities()
start_index end_index start_value end_value delta type
0 10 11 410.0 430.0 20.0 gap

While gaps represents missing data and are not problematic, overlaps usually arise from labeling errors and should be taken care of.

Using the simplify() method, the coordinate can be simplified with controlled accuracy using the Ramer–Douglas–Peucker algorithm. In this example, the second tie point does not provide useful information and is safely discarded.

coord = coord.simplify(tolerance=0.0)
coord
10.000 to 450.000

Temporal Coordinates#

The main use of coordinates in xdas is to deal with long time series. By default xdas uses "datetime64[us]" dtype. Microseconds are used because to perform interpolation xdas convert datetime64 to POSIX float which cannot safely represent timestamps with better accuracies.

import numpy as np

coord = xd.Coordinate(
    {
        "tie_indices": [0, 3600 * 100],
        "tie_values": [
            np.datetime64("2023-01-01T00:00:00"), 
            np.datetime64("2023-01-01T01:00:00"),
        ],
    }
)
coord.to_index(slice("2023-01-01T00:10:00", "2023-01-01T00:20:00"))
slice(np.int64(60000), np.int64(120001), None)