Note
Go to the end to download the full example code
Parallel I/O using MPIΒΆ
The HDF5 storage backend supports parallel I/O using the Message Passing Interface (MPI).
Using this feature requires that you install hdf5
and h5py
against an MPI driver, and you
install mpi4py
. The basic installation of pynwb will not work. Setup can be tricky, and
is outside the scope of this tutorial (for now), and the following assumes that you have
HDF5 installed in a MPI configuration.
Here we:
1. Instantiate a dataset for parallel write: We create TimeSeries with 4 timestamps that we will write in parallel
2. Write to that file in parallel using MPI: Here we assume 4 MPI ranks while each rank writes the data for a different timestamp.
3. Read from the file in parallel using MPI: Here each of the 4 MPI ranks reads one time step from the file
from mpi4py import MPI
import numpy as np
from dateutil import tz
from pynwb import NWBHDF5IO, NWBFile, TimeSeries
from datetime import datetime
from hdmf.backends.hdf5.h5_utils import H5DataIO
start_time = datetime(2018, 4, 25, 2, 30, 3, tzinfo=tz.gettz("US/Pacific"))
fname = "test_parallel_pynwb.nwb"
rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run)
# Create file on one rank. Here we only instantiate the dataset we want to
# write in parallel but we do not write any data
if rank == 0:
nwbfile = NWBFile("aa", "aa", start_time)
data = H5DataIO(shape=(4,), maxshape=(4,), dtype=np.dtype("int"))
nwbfile.add_acquisition(
TimeSeries(name="ts_name", description="desc", data=data, rate=100.0, unit="m")
)
with NWBHDF5IO(fname, "w") as io:
io.write(nwbfile)
# write to dataset in parallel
with NWBHDF5IO(fname, "a", comm=MPI.COMM_WORLD) as io:
nwbfile = io.read()
print(rank)
nwbfile.acquisition["ts_name"].data[rank] = rank
# read from dataset in parallel
with NWBHDF5IO(fname, "r", comm=MPI.COMM_WORLD) as io:
print(io.read().acquisition["ts_name"].data[rank])
Note
Using hdmf.backends.hdf5.h5_utils.H5DataIO
we can also specify further
details about the data layout, e.g., via the chunking and compression parameters.