IO HDF5 module

The io_hdf5 submodule offers the following functions to read, write and manipulate data in the Hierarchical Data Format HDF5:

Function

Description

read_hdf5()

Reads a single group dataset from an HDF5 file.

read_collection_hdf5()

Reads multiple group datasets from an HDF5 file.

write_hdf5()

Writes a single group dataset in an HDF5 file.

write_collection_hdf5()

Writes a collection in an HDF5 file.

rename_dataset_hdf5()

Renames a group dataset in an HDF5 file.

delete_dataset_hdf5()

Deletes a group dataset in an HDF5 file.

summary_hdf5()

Returns a summary of datasets in an HDF5 file.

araucaria.io.io_hdf5.read_hdf5(fpath, name)[source]

Reads a single group dataset from an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • name (str) – Dataset name to retrieve from the HDF5 file.

Return type

Group

Returns

Group containing the requested dataset.

Raises
  • IOError – If the HDF5 file does not exist in the specified path.

  • ValueError – If name does not exist in the HDF5 file.

Example

>>> from araucaria import Group
>>> from araucaria.testdata import get_testpath
>>> from araucaria.utils import check_objattrs
>>> from araucaria.io import read_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # extracting geothite scan
>>> group_mu = read_hdf5(fpath, name='Goethite_20K')
>>> check_objattrs(group_mu, Group, attrlist=['mu', 'mu_ref'])
[True, True]
araucaria.io.io_hdf5.read_collection_hdf5(fpath, names=['all'])[source]

Reads multiple group datasets from an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • names (list) – List with group datasets to read.

Return type

Collection

Returns

Collection containing the requested datasets.

Raises
  • IOError – If the HDF5 file does not exist in the specified path.

  • ValueError – If the requested names do not exist in the HDF5 file.

Warning

The HDF5 file does not store the tags attribute of a Collection. Therefore the returned collection will automatically assign tag='scan' to each group dataset.

Example

>>> from araucaria import Collection
>>> from araucaria.testdata import get_testpath
>>> from araucaria.utils import check_objattrs
>>> from araucaria.io import read_collection_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # reading database
>>> collection = read_collection_hdf5(fpath)
>>> check_objattrs(collection, Collection)
True
>>> collection.get_names()
['FeIISO4_20K', 'Fe_Foil', 'Ferrihydrite_20K', 'Goethite_20K']
>>> # read selected group datasets
>>> collection = read_collection_hdf5(fpath, names=['Fe_Foil'])
>>> collection.get_names()
['Fe_Foil']
araucaria.io.io_hdf5.convert_bytes_hdf5(record)[source]

Utility function to convert a bytes record from an HDF5 file.

Returned value will be either a dict, list, or str.

Parameters

record (Dataset) – HDF5 dataset record.

Return type

Union[dict, list, str]

Returns

Converted record.

Raises

TypeError: – If value stored inside record is not of type bytes.

Notes

araucaria stores dict or list records as bytes in the HDF5 file. Such records need to be converted back to their original types during reading.

araucaria.io.io_hdf5.write_hdf5(fpath, group, name=None, replace=False)[source]

Writes a group dataset in an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • group (Group) – Group to write in the HDF5 file.

  • name (Optional[str]) – Name for the dataset in the HDF5 file. The default is None, which preserves the original group name.

  • replace (bool) – Replace previous dataset. The default is False.

Return type

None

Returns

Raises
  • IOError – If dataset cannot be written to the HDF5 file.

  • TypeError – If group is not a valid Group instance.

  • ValueError – If name dataset already exists in the HDF5 file and replace=False.

Notes

If the file specified by fpath does not exists, it will be automatically created. If the file already exists then the dataset will be appended.

By default the write operation will be canceled if name already exists in the HDF5 file. The previous dataset can be overwritten with the option replace=True.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.
araucaria.io.io_hdf5.write_collection_hdf5(fpath, collection, names=['all'], replace=False)[source]

Writes a collection in an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • collection (Collection) – Collection to write in the HDF5 file.

  • names (list) – List with group dataset names to write in the HDF5 file.

  • replace (bool) – Replace previous dataset. The default is False.

Return type

None

Returns

Raises
  • IOError – If dataset cannot be written to the HDF5 file.

  • ValueError – If names dataset does not exist in the colleciton.

  • ValueError – If names dataset already exists in the HDF5 file and replace=False.

Notes

If the file specified by fpath does not exists, it will be automatically created. If the file already exists then the datasets in the collection will be appended.

By default the write operation will be canceled if any names dataset in collection already exists in the HDF5 file. Previous datasets can be overwritten with the option replace=True.

Warning

The tags attribute of the collection will not be stored in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_collection_hdf5, write_collection_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # reading database
>>> collection = read_collection_hdf5(fpath)
>>> # saving collection in a new hdf5 file
>>> write_collection_hdf5('database.h5', collection, replace=True)
FeIISO4_20K written to database.h5.
Fe_Foil written to database.h5.
Ferrihydrite_20K written to database.h5.
Goethite_20K written to database.h5.
>>> # write selected group dataset
>>> write_collection_hdf5('database.h5', collection, names=['Fe_Foil'], replace=True)
Fe_Foil written to database.h5.
araucaria.io.io_hdf5.write_recursive_hdf5(dataset, group)[source]

Utility function to write a Group recursively in an HDF5 file.

Parameters
  • dataset (Dataset) – Dataset in the HDF5 file.

  • group (Group) – Group to write in the HDF5 file.

Return type

None

Returns

Warning

Only str, float, int and ndarray types are currently supported for recursive writting in an HDF5 Dataset.

dict and list types will be convertet to str, which is in turn saved as bytes in the HDF5 database. If read with read_hdf5(), such records will be automatically converted to their original type in the group.

araucaria.io.io_hdf5.rename_dataset_hdf5(fpath, name, newname)[source]

Renames a dataset in an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • name (str) – Name of Group dataset.

  • newname (str) – New name for Group dataset.

Return type

None

Returns

Raises
  • IOError – If the HDF5 file does not exist in the specified path.

  • ValueError – If name dataset does not exist in the HDF5 file.

  • ValueError – If newname dataset already exists in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.
>>> # renaming dataset
>>> rename_dataset_hdf5('database.h5', 'xmu_testfile', 'xmu_renamed')
xmu_testfile renamed to xmu_renamed in database.h5.
araucaria.io.io_hdf5.delete_dataset_hdf5(fpath, name)[source]

Deletes a dataset from an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • name (str) – Name of dataset to delete.

Return type

None

Returns

Raises
  • IOError – If the HDF5 file does not exist in the specified path.

  • ValueError – If name dataset does not exist in the HDF5 file.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # saving a new hdf5 file
>>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database.h5.
>>> # deleting dataset
>>> delete_dataset_hdf5('database.h5', 'xmu_testfile')
xmu_testfile deleted from database.h5.
araucaria.io.io_hdf5.summary_hdf5(fpath, regex=None, optional=None, **pre_edge_kws)[source]

Returns a summary report of datasets in an HDF5 file.

Parameters
  • fpath (Path) – Path to HDF5 file.

  • regex (Optional[str]) – Search string to filter results by dataset name. See Notes for details. The default is None.

  • optional (Optional[list]) – List with optional parameters. See Notes for details. The default is None.

  • pre_edge_kws (dict) – Dictionary with arguments for pre_edge().

Return type

Report

Returns

Report for datasets in the HDF5 file.

Raises

IOError – If the HDF5 file does not exist in the specified path.

Notes

Summary data includes the following:

  1. Dataset index.

  2. Dataset name.

  3. Measurement mode.

  4. Numbers of scans.

  5. Absorption edge step \(\Delta\mu(E_0)\), if optional=['edge_step'].

  6. Absorption threshold energy \(E_0\), if optional=['e0'].

  7. Merged scans, if optional=['merged_scans'].

  8. Optional parameters if they exist as attributes in the dataset.

A regex value can be used to filter dataset names based on a regular expression (reges). For valid regex syntax, please check the documentation of the module re.

The number of scans and names of merged files are retrieved from the merged_scans attribute of the HDF5 dataset.

The absorption threshold and the edge step are retrieved by calling the function pre_edge().

Optional parameters will be retrieved from the dataset as attributes. Currently only str, float or int will be retrieved. Otherswise an empty character will be printed in the report.

See also

read_hdf5(), Report

Examples

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import summary_hdf5
>>> fpath = get_testpath('Fe_database.h5')
>>> # printing default summary
>>> report = summary_hdf5(fpath)
>>> report.show()
=================================
id  dataset           mode    n
=================================
1   FeIISO4_20K       mu      5
2   Fe_Foil           mu_ref  5
3   Ferrihydrite_20K  mu      5
4   Goethite_20K      mu      5
=================================
>>> # printing summary with merged scans of Goethite groups
>>> report = summary_hdf5(fpath, regex='Goe', optional=['merged_scans'])
>>> report.show()
=======================================================
id  dataset       mode  n  merged_scans
=======================================================
1   Goethite_20K  mu    5  20K_GOE_Fe_K_240.00000.xdi
                           20K_GOE_Fe_K_240.00001.xdi
                           20K_GOE_Fe_K_240.00002.xdi
                           20K_GOE_Fe_K_240.00003.xdi
                           20K_GOE_Fe_K_240.00004.xdi
=======================================================
>>> # printing custom parameters
>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_xmu, write_hdf5
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # adding additional attributes
>>> group_mu.symbol = 'Zn'
>>> group_mu.temp   = 25.0
>>> # saving a new hdf5 file
>>> write_hdf5('database2.h5', group_mu, name='xmu_testfile', replace=True)
xmu_testfile written to database2.h5.
>>> report = summary_hdf5('database2.h5', optional=['symbol','temp'])
>>> report.show()
=========================================
id  dataset       mode  n  symbol  temp
=========================================
1   xmu_testfile  mu    1  Zn      25
=========================================