---
file_format: mystnb
kernelspec:
  name: python3
---

```{code-cell}
:tags: [remove-cell]

import os
import xdas as xd
os.chdir("../../_data")
```

# Composing a processing sequence

The xdas library provides various routines from NumPy, SciPy, and ObsPy that have been optimized for DAS DataArray objects, and which can be incorporated in a processing pipeline. See [](processing) for an explanation of the xdas processing workflows, e.g. for bigger-than-RAM datasets. Higher-level operations (FK-filters, STA/LTA detector, etc.) can be constructed from a sequence of the elementary operations implemented in xdas. To facilitate this and other user-defined operations, xdas offers a convenient framework to create and execute a (nested) sequences of atomic operations. By using sequences, built-in and user-defined processing tasks mesh seamlessly with the optimization and IO-infrastructure that xdas offers, improving the robustness and reproducibility of complex processing pipelines.

## Chaining elementary operations (atoms)

There are three "flavours" declaring the atoms that can be used to compose a sequence, illustrated by the following example:

```{code-cell} 
import numpy as np
import xdas
import xdas.signal as xs
from xdas.atoms import Partial, Sequential, IIRFilter

sequence = Sequential(
    [
      xs.taper(..., dim="time"),
      Partial(np.square),
      IIRFilter(order=4, cutoff=1.5, btype="highpass", dim="time"),
    ]
)
sequence
```

In the snippet above, we define our `sequence` as an instance of the `Sequential` class, which contains three operations. The first operation applies a Tukey taper along the time dimension, encoded by the xdas implementation of the SciPy library routines (`xdas.signal`). Since this functions takes a data array as the first argument, we use `...` as a placeholder. 

The second operation in this sequence is defined by the `square` operation built into NumPy. Since this function is not imported directly from xdas, using `...` as a placeholder won't work. This is where `Partial` comes in: wrapping `Partial` around `np.square` would be equivalent to `np.square(...)`, effectively converting an arbitrary routine into an xdas routine and inserting a placeholder as the first argument (to be substituted with a data array later).

The last operation, `IIRFilter`, instantiates a specific class dedicated to chunked execution. It inherits from the `Atom` class, which handles the logic of initialising and passing around state objects (like the filter state). This allows us to process our data one chunk at a time, without explicitly having to handle state updates and transfer.

## Executing a sequence

Once the processing sequence has been defined, it can operate on data in memory by simply calling the sequence with the data array as the argument:

```{code-cell} 
from xdas.synthetics import wavelet_wavefronts

da = wavelet_wavefronts()
result = sequence(da)
result.plot(yincrease=False)
```

The same sequence can be re-used, so it only needs to be defined once.

For executing a sequence on chunked data (e.g., larger-than-memory data sets), see the next section: [](processing.md).

## Defining custom atoms

The `Partial` method is a convenient wrapper for simple functions that take an xdas DataArray as the first argument, which covers a lot of cases. However, more complex routines, particularly those that rely on a state, will require a more explicit treatment. Such operations can be subclassed from the `Atom` base class, and adhere to the following structure:

```{code-cell} 
from xdas.atoms import Atom, State

class MyStatefulRoutine(Atom):

  def __init__(self, a, b, c=10):
    super().__init__()
    # Set class-specific parameters
    self.a = a
    self.b = b
    self.c = c
    # Define the state variable (if needed)
    self.state = State(...)

  def initialize(self, da, **kwargs):
    # Initialize state based on DataArray ``da``
    ...
  
  def call(self, da, **kwargs):
    # Apply routine to DataArray ``da``
    ...
```