IO HDF5 module¶
The io_hdf5 submodule offers the following functions to read, write and
manipulate data in the Hierarchical Data Format HDF5:
Function |
Description |
|---|---|
Reads a single group dataset from an HDF5 file. |
|
Reads multiple group datasets from an HDF5 file. |
|
Writes a single group dataset in an HDF5 file. |
|
Writes a collection in an HDF5 file. |
|
Renames a group dataset in an HDF5 file. |
|
Deletes a group dataset in an HDF5 file. |
|
Returns a summary of datasets in an HDF5 file. |
- araucaria.io.io_hdf5.read_hdf5(fpath, name)[source]¶
Reads a single group dataset from an HDF5 file.
- Parameters
- Return type
- Returns
Group containing the requested dataset.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
namedoes not exist in the HDF5 file.
Example
>>> from araucaria import Group >>> from araucaria.testdata import get_testpath >>> from araucaria.utils import check_objattrs >>> from araucaria.io import read_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # extracting geothite scan >>> group_mu = read_hdf5(fpath, name='Goethite_20K') >>> check_objattrs(group_mu, Group, attrlist=['mu', 'mu_ref']) [True, True]
- araucaria.io.io_hdf5.read_collection_hdf5(fpath, names=['all'])[source]¶
Reads multiple group datasets from an HDF5 file.
- Parameters
- Return type
- Returns
Collection containing the requested datasets.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If the requested
namesdo not exist in the HDF5 file.
Warning
The HDF5 file does not store the
tagsattribute of a Collection. Therefore the returned collection will automatically assigntag='scan'to each group dataset.Example
>>> from araucaria import Collection >>> from araucaria.testdata import get_testpath >>> from araucaria.utils import check_objattrs >>> from araucaria.io import read_collection_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # reading database >>> collection = read_collection_hdf5(fpath) >>> check_objattrs(collection, Collection) True >>> collection.get_names() ['FeIISO4_20K', 'Fe_Foil', 'Ferrihydrite_20K', 'Goethite_20K']
>>> # read selected group datasets >>> collection = read_collection_hdf5(fpath, names=['Fe_Foil']) >>> collection.get_names() ['Fe_Foil']
- araucaria.io.io_hdf5.convert_bytes_hdf5(record)[source]¶
Utility function to convert a
bytesrecord from an HDF5 file.Returned value will be either a
dict,list, orstr.- Parameters
record (
Dataset) – HDF5 dataset record.- Return type
- Returns
Converted record.
- Raises
TypeError: – If value stored inside
recordis not of typebytes.
Notes
araucariastoresdictorlistrecords asbytesin the HDF5 file. Such records need to be converted back to their original types during reading.
- araucaria.io.io_hdf5.write_hdf5(fpath, group, name=None, replace=False)[source]¶
Writes a group dataset in an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If dataset cannot be written to the HDF5 file.
TypeError – If
groupis not a valid Group instance.ValueError – If
namedataset already exists in the HDF5 file andreplace=False.
Notes
If the file specified by
fpathdoes not exists, it will be automatically created. If the file already exists then the dataset will be appended.By default the write operation will be canceled if
namealready exists in the HDF5 file. The previous dataset can be overwritten with the optionreplace=True.Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5.
- araucaria.io.io_hdf5.write_collection_hdf5(fpath, collection, names=['all'], replace=False)[source]¶
Writes a collection in an HDF5 file.
- Parameters
fpath (
Path) – Path to HDF5 file.collection (
Collection) – Collection to write in the HDF5 file.names (
list) – List with group dataset names to write in the HDF5 file.replace (
bool) – Replace previous dataset. The default is False.
- Return type
- Returns
- Raises
IOError – If dataset cannot be written to the HDF5 file.
ValueError – If
namesdataset does not exist in the colleciton.ValueError – If
namesdataset already exists in the HDF5 file andreplace=False.
Notes
If the file specified by
fpathdoes not exists, it will be automatically created. If the file already exists then the datasets in the collection will be appended.By default the write operation will be canceled if any
namesdataset incollectionalready exists in the HDF5 file. Previous datasets can be overwritten with the optionreplace=True.Warning
The
tagsattribute of thecollectionwill not be stored in the HDF5 file.Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_collection_hdf5, write_collection_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # reading database >>> collection = read_collection_hdf5(fpath) >>> # saving collection in a new hdf5 file >>> write_collection_hdf5('database.h5', collection, replace=True) FeIISO4_20K written to database.h5. Fe_Foil written to database.h5. Ferrihydrite_20K written to database.h5. Goethite_20K written to database.h5.
>>> # write selected group dataset >>> write_collection_hdf5('database.h5', collection, names=['Fe_Foil'], replace=True) Fe_Foil written to database.h5.
- araucaria.io.io_hdf5.write_recursive_hdf5(dataset, group)[source]¶
Utility function to write a Group recursively in an HDF5 file.
- Parameters
dataset (
Dataset) – Dataset in the HDF5 file.group (
Group) – Group to write in the HDF5 file.
- Return type
- Returns
Warning
Only
str,float,intandndarraytypes are currently supported for recursive writting in an HDF5Dataset.dictandlisttypes will be convertet tostr, which is in turn saved asbytesin the HDF5 database. If read withread_hdf5(), such records will be automatically converted to their original type in the group.
- araucaria.io.io_hdf5.rename_dataset_hdf5(fpath, name, newname)[source]¶
Renames a dataset in an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
namedataset does not exist in the HDF5 file.ValueError – If
newnamedataset already exists in the HDF5 file.
Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5. >>> # renaming dataset >>> rename_dataset_hdf5('database.h5', 'xmu_testfile', 'xmu_renamed') xmu_testfile renamed to xmu_renamed in database.h5.
- araucaria.io.io_hdf5.delete_dataset_hdf5(fpath, name)[source]¶
Deletes a dataset from an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
namedataset does not exist in the HDF5 file.
Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5. >>> # deleting dataset >>> delete_dataset_hdf5('database.h5', 'xmu_testfile') xmu_testfile deleted from database.h5.
- araucaria.io.io_hdf5.summary_hdf5(fpath, regex=None, optional=None, **pre_edge_kws)[source]¶
Returns a summary report of datasets in an HDF5 file.
- Parameters
fpath (
Path) – Path to HDF5 file.regex (
Optional[str]) – Search string to filter results by dataset name. See Notes for details. The default is None.optional (
Optional[list]) – List with optional parameters. See Notes for details. The default is None.pre_edge_kws (
dict) – Dictionary with arguments forpre_edge().
- Return type
- Returns
Report for datasets in the HDF5 file.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
Notes
Summary data includes the following:
Dataset index.
Dataset name.
Measurement mode.
Numbers of scans.
Absorption edge step \(\Delta\mu(E_0)\), if
optional=['edge_step'].Absorption threshold energy \(E_0\), if
optional=['e0'].Merged scans, if
optional=['merged_scans'].Optional parameters if they exist as attributes in the dataset.
A
regexvalue can be used to filter dataset names based on a regular expression (reges). For valid regex syntax, please check the documentation of the modulere.The number of scans and names of merged files are retrieved from the
merged_scansattribute of the HDF5 dataset.The absorption threshold and the edge step are retrieved by calling the function
pre_edge().Optional parameters will be retrieved from the dataset as attributes. Currently only
str,floatorintwill be retrieved. Otherswise an empty character will be printed in the report.See also
Examples
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import summary_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # printing default summary >>> report = summary_hdf5(fpath) >>> report.show() ================================= id dataset mode n ================================= 1 FeIISO4_20K mu 5 2 Fe_Foil mu_ref 5 3 Ferrihydrite_20K mu 5 4 Goethite_20K mu 5 =================================
>>> # printing summary with merged scans of Goethite groups >>> report = summary_hdf5(fpath, regex='Goe', optional=['merged_scans']) >>> report.show() ======================================================= id dataset mode n merged_scans ======================================================= 1 Goethite_20K mu 5 20K_GOE_Fe_K_240.00000.xdi 20K_GOE_Fe_K_240.00001.xdi 20K_GOE_Fe_K_240.00002.xdi 20K_GOE_Fe_K_240.00003.xdi 20K_GOE_Fe_K_240.00004.xdi =======================================================
>>> # printing custom parameters >>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # adding additional attributes >>> group_mu.symbol = 'Zn' >>> group_mu.temp = 25.0 >>> # saving a new hdf5 file >>> write_hdf5('database2.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database2.h5. >>> report = summary_hdf5('database2.h5', optional=['symbol','temp']) >>> report.show() ========================================= id dataset mode n symbol temp ========================================= 1 xmu_testfile mu 1 Zn 25 =========================================