Modular Data Storage using External Files¶
PyNWB supports linking between files using external links.
Example Use Case: Integrating data from multiple files¶
NBWContainer classes (e.g.,
TimeSeries) support the integration of data stored in external
HDF5 files with NWB data files via external links. To make things more concrete, let’s look at the following use
case. We want to simultaneously record multiple data streams during data acquisition. Using the concept of external
links allows us to save each data stream to an external HDF5 files during data acquisition and to
afterwards link the data into a single NWB:N file. In this case, each recording becomes represented by a
separate file-system object that can be set as read-only once the experiment is done. In the following
we are using
TimeSeries as an example, but the same approach works for other
NWBContainers as well.
The same strategies we use here for creating External Links also apply to Soft Links. The main difference between soft and external links is that soft links point to other objects within the same file while external links point to objects in external files.
In the case of
TimeSeries, the uncorrected timestamps generated by the acquisition
system can be stored (or linked) in the sync group. In the NWB:N format, hardware-recorded time data
must then be corrected to a common time base (e.g., timestamps from all hardware sources aligned) before
it can be included in the timestamps of the TimeSeries. This means, in the case
TimeSeries we need to be careful that we are not including data with incompatible
timestamps in the same file when using external links.
External links can become stale/break. Since external links are pointing to data in other files external links may become invalid any time files are modified on the file system, e.g., renamed, moved or access permissions are changed.
Creating test data¶
In the following we are creating two
TimeSeries each written to a separate file.
We then show how we can integrate these files into a single NWBFile.
from datetime import datetime from uuid import uuid4 import numpy as np from dateutil.tz import tzlocal from pynwb import NWBHDF5IO, NWBFile, TimeSeries # Create the base data start_time = datetime(2017, 4, 3, 11, tzinfo=tzlocal()) data = np.arange(1000).reshape((100, 10)) timestamps = np.arange(100) filename1 = "external1_example.nwb" filename2 = "external2_example.nwb" filename3 = "external_linkcontainer_example.nwb" filename4 = "external_linkdataset_example.nwb" # Create the first file nwbfile1 = NWBFile( session_description="demonstrate external files", identifier=str(uuid4()), session_start_time=start_time, ) # Create the second file test_ts1 = TimeSeries( name="test_timeseries1", data=data, unit="SIunit", timestamps=timestamps ) nwbfile1.add_acquisition(test_ts1) # Write the first file with NWBHDF5IO(filename1, "w") as io: io.write(nwbfile1) # Create the second file nwbfile2 = NWBFile( session_description="demonstrate external files", identifier=str(uuid4()), session_start_time=start_time, ) # Create the second file test_ts2 = TimeSeries( name="test_timeseries2", data=data, unit="SIunit", timestamps=timestamps, ) nwbfile2.add_acquisition(test_ts2) # Write the second file with NWBHDF5IO(filename2, "w") as io: io.write(nwbfile2)
Creating a single file for sharing¶
External links are convenient but to share data we may want to hand a single file with all the
data to our collaborator rather than having to collect all relevant files. To do this,
HDF5IO (and in turn
provide the convenience function
which copies an HDF5 file and resolves all external links.