IO HDF5 module¶

The io_hdf5 submodule offers the following functions to read, write and manipulate data in the Hierarchical Data Format HDF5:

Function	Description
`read_hdf5()`	Reads a single group dataset from an HDF5 file.
`read_collection_hdf5()`	Reads multiple group datasets from an HDF5 file.
`write_hdf5()`	Writes a single group dataset in an HDF5 file.
`write_collection_hdf5()`	Writes a collection in an HDF5 file.
`rename_dataset_hdf5()`	Renames a group dataset in an HDF5 file.
`delete_dataset_hdf5()`	Deletes a group dataset in an HDF5 file.
`summary_hdf5()`	Returns a summary of datasets in an HDF5 file.

araucaria.io.io_hdf5.read_hdf5(fpath, name)[source]¶

Reads a single group dataset from an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
name (str) – Dataset name to retrieve from the HDF5 file.

Return type

Group

Returns

Group containing the requested dataset.

Raises

IOError – If the HDF5 file does not exist in the specified path.
ValueError – If name does not exist in the HDF5 file.

Example

>>> from araucaria import Group
>>> from araucaria.testdata import get_testpath
>>> from araucaria.utils import check_objattrs
>>> from araucaria.io import read_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # extracting geothite scan
>>> group_mu = read_hdf5(fpath, name='Goethite_20K')
>>> check_objattrs(group_mu, Group, attrlist=['mu', 'mu_ref'])
[True, True]

araucaria.io.io_hdf5.read_collection_hdf5(fpath, names=['all'])[source]¶

Reads multiple group datasets from an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
names (list) – List with group datasets to read.

Return type

Collection

Returns

Collection containing the requested datasets.

Raises

IOError – If the HDF5 file does not exist in the specified path.
ValueError – If the requested names do not exist in the HDF5 file.

Warning

The HDF5 file does not store the tags attribute of a Collection. Therefore the returned collection will automatically assign tag='scan' to each group dataset.

Example

>>> from araucaria import Collection
>>> from araucaria.testdata import get_testpath
>>> from araucaria.utils import check_objattrs
>>> from araucaria.io import read_collection_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # reading database
>>> collection = read_collection_hdf5(fpath)
>>> check_objattrs(collection, Collection)
True
>>> collection.get_names()
['FeIISO4_20K', 'Fe_Foil', 'Ferrihydrite_20K', 'Goethite_20K']

>>> # read selected group datasets
>>> collection = read_collection_hdf5(fpath, names=['Fe_Foil'])
>>> collection.get_names()
['Fe_Foil']

araucaria.io.io_hdf5.convert_bytes_hdf5(record)[source]¶

Utility function to convert a bytes record from an HDF5 file.

Returned value will be either a dict, list, or str.

Parameters: record (Dataset) – HDF5 dataset record.
Return type: Union[dict, list, str]
Returns: Converted record.
Raises: TypeError: – If value stored inside record is not of type bytes.

Notes

araucaria stores dict or list records as bytes in the HDF5 file. Such records need to be converted back to their original types during reading.

araucaria.io.io_hdf5.write_hdf5(fpath, group, name=None, replace=False)[source]¶

Writes a group dataset in an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
group (Group) – Group to write in the HDF5 file.
name (Optional[str]) – Name for the dataset in the HDF5 file. The default is None, which preserves the original group name.
replace (bool) – Replace previous dataset. The default is False.

Return type

None

Returns

Raises

IOError – If dataset cannot be written to the HDF5 file.
TypeError – If group is not a valid Group instance.
ValueError – If name dataset already exists in the HDF5 file and replace=False.

Notes

If the file specified by fpath does not exists, it will be automatically created. If the file already exists then the dataset will be appended.

By default the write operation will be canceled if name already exists in the HDF5 file. The previous dataset can be overwritten with the option replace=True.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.

araucaria.io.io_hdf5.write_collection_hdf5(fpath, collection, names=['all'], replace=False)[source]¶

Writes a collection in an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
collection (Collection) – Collection to write in the HDF5 file.
names (list) – List with group dataset names to write in the HDF5 file.
replace (bool) – Replace previous dataset. The default is False.

Return type

None

Returns

Raises

IOError – If dataset cannot be written to the HDF5 file.
ValueError – If names dataset does not exist in the colleciton.
ValueError – If names dataset already exists in the HDF5 file and replace=False.

Notes

If the file specified by fpath does not exists, it will be automatically created. If the file already exists then the datasets in the collection will be appended.

By default the write operation will be canceled if any names dataset in collection already exists in the HDF5 file. Previous datasets can be overwritten with the option replace=True.

Warning

The tags attribute of the collection will not be stored in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_collection_hdf5, write_collection_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # reading database
>>> collection = read_collection_hdf5(fpath)
>>> # saving collection in a new hdf5 file
>>> write_collection_hdf5('database.h5', collection, replace=True)
FeIISO4_20K written to database.h5.
Fe_Foil written to database.h5.
Ferrihydrite_20K written to database.h5.
Goethite_20K written to database.h5.

>>> # write selected group dataset
>>> write_collection_hdf5('database.h5', collection, names=['Fe_Foil'], replace=True)
Fe_Foil written to database.h5.

araucaria.io.io_hdf5.write_recursive_hdf5(dataset, group)[source]¶

Utility function to write a Group recursively in an HDF5 file.

Parameters

dataset (Dataset) – Dataset in the HDF5 file.
group (Group) – Group to write in the HDF5 file.

Return type

None

Returns

Warning

Only str, float, int and ndarray types are currently supported for recursive writting in an HDF5 Dataset.

dict and list types will be convertet to str, which is in turn saved as bytes in the HDF5 database. If read with read_hdf5(), such records will be automatically converted to their original type in the group.

araucaria.io.io_hdf5.rename_dataset_hdf5(fpath, name, newname)[source]¶

Renames a dataset in an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
name (str) – Name of Group dataset.
newname (str) – New name for Group dataset.

Return type

None

Returns

Raises

IOError – If the HDF5 file does not exist in the specified path.
ValueError – If name dataset does not exist in the HDF5 file.
ValueError – If newname dataset already exists in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.
>>> # renaming dataset
>>> rename_dataset_hdf5('database.h5', 'xmu_testfile', 'xmu_renamed')
xmu_testfile renamed to xmu_renamed in database.h5.

araucaria.io.io_hdf5.delete_dataset_hdf5(fpath, name)[source]¶

Deletes a dataset from an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
name (str) – Name of dataset to delete.

Return type

None

Returns

Raises

IOError – If the HDF5 file does not exist in the specified path.
ValueError – If name dataset does not exist in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.
>>> # deleting dataset
>>> delete_dataset_hdf5('database.h5', 'xmu_testfile')
xmu_testfile deleted from database.h5.

araucaria.io.io_hdf5.summary_hdf5(fpath, regex=None, optional=None, **pre_edge_kws)[source]¶

Returns a summary report of datasets in an HDF5 file.

Parameters

fpath (Path) – Path to HDF5 file.
regex (Optional[str]) – Search string to filter results by dataset name. See Notes for details. The default is None.
optional (Optional[list]) – List with optional parameters. See Notes for details. The default is None.
pre_edge_kws (dict) – Dictionary with arguments for pre_edge().

Return type

Report

Returns

Report for datasets in the HDF5 file.

Raises

IOError – If the HDF5 file does not exist in the specified path.

Notes

Summary data includes the following:

Dataset index.
Dataset name.
Measurement mode.
Numbers of scans.
Absorption edge step \(\Delta\mu(E_0)\), if optional=['edge_step'].
Absorption threshold energy \(E_0\), if optional=['e0'].
Merged scans, if optional=['merged_scans'].
Optional parameters if they exist as attributes in the dataset.

A regex value can be used to filter dataset names based on a regular expression (reges). For valid regex syntax, please check the documentation of the module re.

The number of scans and names of merged files are retrieved from the merged_scans attribute of the HDF5 dataset.

The absorption threshold and the edge step are retrieved by calling the function pre_edge().

Optional parameters will be retrieved from the dataset as attributes. Currently only str, float or int will be retrieved. Otherswise an empty character will be printed in the report.