IO HDF5 module¶
The io_hdf5
submodule offers the following functions to read, write and
manipulate data in the Hierarchical Data Format HDF5
:
Function |
Description |
---|---|
Reads a single group dataset from an HDF5 file. |
|
Reads multiple group datasets from an HDF5 file. |
|
Writes a single group dataset in an HDF5 file. |
|
Writes a collection in an HDF5 file. |
|
Renames a group dataset in an HDF5 file. |
|
Deletes a group dataset in an HDF5 file. |
|
Returns a summary of datasets in an HDF5 file. |
- araucaria.io.io_hdf5.read_hdf5(fpath, name)[source]¶
Reads a single group dataset from an HDF5 file.
- Parameters
- Return type
- Returns
Group containing the requested dataset.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
name
does not exist in the HDF5 file.
Example
>>> from araucaria import Group >>> from araucaria.testdata import get_testpath >>> from araucaria.utils import check_objattrs >>> from araucaria.io import read_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # extracting geothite scan >>> group_mu = read_hdf5(fpath, name='Goethite_20K') >>> check_objattrs(group_mu, Group, attrlist=['mu', 'mu_ref']) [True, True]
- araucaria.io.io_hdf5.read_collection_hdf5(fpath, names=['all'])[source]¶
Reads multiple group datasets from an HDF5 file.
- Parameters
- Return type
- Returns
Collection containing the requested datasets.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If the requested
names
do not exist in the HDF5 file.
Warning
The HDF5 file does not store the
tags
attribute of a Collection. Therefore the returned collection will automatically assigntag='scan'
to each group dataset.Example
>>> from araucaria import Collection >>> from araucaria.testdata import get_testpath >>> from araucaria.utils import check_objattrs >>> from araucaria.io import read_collection_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # reading database >>> collection = read_collection_hdf5(fpath) >>> check_objattrs(collection, Collection) True >>> collection.get_names() ['FeIISO4_20K', 'Fe_Foil', 'Ferrihydrite_20K', 'Goethite_20K']
>>> # read selected group datasets >>> collection = read_collection_hdf5(fpath, names=['Fe_Foil']) >>> collection.get_names() ['Fe_Foil']
- araucaria.io.io_hdf5.convert_bytes_hdf5(record)[source]¶
Utility function to convert a
bytes
record from an HDF5 file.Returned value will be either a
dict
,list
, orstr
.- Parameters
record (
Dataset
) – HDF5 dataset record.- Return type
- Returns
Converted record.
- Raises
TypeError: – If value stored inside
record
is not of typebytes
.
Notes
araucaria
storesdict
orlist
records asbytes
in the HDF5 file. Such records need to be converted back to their original types during reading.
- araucaria.io.io_hdf5.write_hdf5(fpath, group, name=None, replace=False)[source]¶
Writes a group dataset in an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If dataset cannot be written to the HDF5 file.
TypeError – If
group
is not a valid Group instance.ValueError – If
name
dataset already exists in the HDF5 file andreplace=False
.
Notes
If the file specified by
fpath
does not exists, it will be automatically created. If the file already exists then the dataset will be appended.By default the write operation will be canceled if
name
already exists in the HDF5 file. The previous dataset can be overwritten with the optionreplace=True
.Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5.
- araucaria.io.io_hdf5.write_collection_hdf5(fpath, collection, names=['all'], replace=False)[source]¶
Writes a collection in an HDF5 file.
- Parameters
fpath (
Path
) – Path to HDF5 file.collection (
Collection
) – Collection to write in the HDF5 file.names (
list
) – List with group dataset names to write in the HDF5 file.replace (
bool
) – Replace previous dataset. The default is False.
- Return type
- Returns
- Raises
IOError – If dataset cannot be written to the HDF5 file.
ValueError – If
names
dataset does not exist in the colleciton.ValueError – If
names
dataset already exists in the HDF5 file andreplace=False
.
Notes
If the file specified by
fpath
does not exists, it will be automatically created. If the file already exists then the datasets in the collection will be appended.By default the write operation will be canceled if any
names
dataset incollection
already exists in the HDF5 file. Previous datasets can be overwritten with the optionreplace=True
.Warning
The
tags
attribute of thecollection
will not be stored in the HDF5 file.Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_collection_hdf5, write_collection_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # reading database >>> collection = read_collection_hdf5(fpath) >>> # saving collection in a new hdf5 file >>> write_collection_hdf5('database.h5', collection, replace=True) FeIISO4_20K written to database.h5. Fe_Foil written to database.h5. Ferrihydrite_20K written to database.h5. Goethite_20K written to database.h5.
>>> # write selected group dataset >>> write_collection_hdf5('database.h5', collection, names=['Fe_Foil'], replace=True) Fe_Foil written to database.h5.
- araucaria.io.io_hdf5.write_recursive_hdf5(dataset, group)[source]¶
Utility function to write a Group recursively in an HDF5 file.
- Parameters
dataset (
Dataset
) – Dataset in the HDF5 file.group (
Group
) – Group to write in the HDF5 file.
- Return type
- Returns
Warning
Only
str
,float
,int
andndarray
types are currently supported for recursive writting in an HDF5Dataset
.dict
andlist
types will be convertet tostr
, which is in turn saved asbytes
in the HDF5 database. If read withread_hdf5()
, such records will be automatically converted to their original type in the group.
- araucaria.io.io_hdf5.rename_dataset_hdf5(fpath, name, newname)[source]¶
Renames a dataset in an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
name
dataset does not exist in the HDF5 file.ValueError – If
newname
dataset already exists in the HDF5 file.
Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5. >>> # renaming dataset >>> rename_dataset_hdf5('database.h5', 'xmu_testfile', 'xmu_renamed') xmu_testfile renamed to xmu_renamed in database.h5.
- araucaria.io.io_hdf5.delete_dataset_hdf5(fpath, name)[source]¶
Deletes a dataset from an HDF5 file.
- Parameters
- Return type
- Returns
- Raises
IOError – If the HDF5 file does not exist in the specified path.
ValueError – If
name
dataset does not exist in the HDF5 file.
Example
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5, rename_dataset_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # saving a new hdf5 file >>> write_hdf5('database.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database.h5. >>> # deleting dataset >>> delete_dataset_hdf5('database.h5', 'xmu_testfile') xmu_testfile deleted from database.h5.
- araucaria.io.io_hdf5.summary_hdf5(fpath, regex=None, optional=None, **pre_edge_kws)[source]¶
Returns a summary report of datasets in an HDF5 file.
- Parameters
fpath (
Path
) – Path to HDF5 file.regex (
Optional
[str
]) – Search string to filter results by dataset name. See Notes for details. The default is None.optional (
Optional
[list
]) – List with optional parameters. See Notes for details. The default is None.pre_edge_kws (
dict
) – Dictionary with arguments forpre_edge()
.
- Return type
- Returns
Report for datasets in the HDF5 file.
- Raises
IOError – If the HDF5 file does not exist in the specified path.
Notes
Summary data includes the following:
Dataset index.
Dataset name.
Measurement mode.
Numbers of scans.
Absorption edge step \(\Delta\mu(E_0)\), if
optional=['edge_step']
.Absorption threshold energy \(E_0\), if
optional=['e0']
.Merged scans, if
optional=['merged_scans']
.Optional parameters if they exist as attributes in the dataset.
A
regex
value can be used to filter dataset names based on a regular expression (reges). For valid regex syntax, please check the documentation of the modulere
.The number of scans and names of merged files are retrieved from the
merged_scans
attribute of the HDF5 dataset.The absorption threshold and the edge step are retrieved by calling the function
pre_edge()
.Optional parameters will be retrieved from the dataset as attributes. Currently only
str
,float
orint
will be retrieved. Otherswise an empty character will be printed in the report.See also
Examples
>>> from araucaria.testdata import get_testpath >>> from araucaria.io import summary_hdf5 >>> fpath = get_testpath('Fe_database.h5') >>> # printing default summary >>> report = summary_hdf5(fpath) >>> report.show() ================================= id dataset mode n ================================= 1 FeIISO4_20K mu 5 2 Fe_Foil mu_ref 5 3 Ferrihydrite_20K mu 5 4 Goethite_20K mu 5 =================================
>>> # printing summary with merged scans of Goethite groups >>> report = summary_hdf5(fpath, regex='Goe', optional=['merged_scans']) >>> report.show() ======================================================= id dataset mode n merged_scans ======================================================= 1 Goethite_20K mu 5 20K_GOE_Fe_K_240.00000.xdi 20K_GOE_Fe_K_240.00001.xdi 20K_GOE_Fe_K_240.00002.xdi 20K_GOE_Fe_K_240.00003.xdi 20K_GOE_Fe_K_240.00004.xdi =======================================================
>>> # printing custom parameters >>> from araucaria.testdata import get_testpath >>> from araucaria.io import read_xmu, write_hdf5 >>> fpath = get_testpath('xmu_testfile.xmu') >>> # extracting mu and mu_ref scans >>> group_mu = read_xmu(fpath, scan='mu') >>> # adding additional attributes >>> group_mu.symbol = 'Zn' >>> group_mu.temp = 25.0 >>> # saving a new hdf5 file >>> write_hdf5('database2.h5', group_mu, name='xmu_testfile', replace=True) xmu_testfile written to database2.h5. >>> report = summary_hdf5('database2.h5', optional=['symbol','temp']) >>> report.show() ========================================= id dataset mode n symbol temp ========================================= 1 xmu_testfile mu 1 Zn 25 =========================================