Collection class¶

class araucaria.main.collection.Collection(name=None)[source]¶

Collection storage class.

This class stores a collection of Group objects.

Parameters: name (str) – Name for the collection. The default is None.

tags¶

Dictionary with available groups in the collection based on tag keys.

Type: dict

Notes

Each group will be stored as an attribute of the collection. The tags attribute classifies group names based on a tag key, which is useful for joint manipulation of groups.

The following methods are currently implemented:

Method	Description
`add_group()`	Adds a group to the collection.
`apply()`	Applies a function to groups in the collection.
`copy()`	Returns a copy of the collection.
`del_group()`	Deletes a group from the collection.
`get_group()`	Returns a group in the collection.
`get_mcer()`	Returns the minimum common energy range for the collection.
`get_names()`	Return group names in the collection.
`get_tag()`	Returns tag of a group in the collection.
`rename_group()`	Renames a group in the collection.
`retag()`	Modifies tag of a group in the collection.
`summary()`	Returns a summary report of the collection.

Warning

Each group can only have a single tag key.

Example

>>> from araucaria import Collection
>>> collection = Collection()
>>> type(collection)
<class 'araucaria.main.collection.Collection'>

add_group(group, tag='scan')[source]¶

Adds a group dataset to the collection.

Parameters

group (Group) – The data group to add to the collection.
tag (str) – Key for the tags attribute of the collection. The default is ‘scan’.

Return type

None

Returns

Raises

TypeError – If group is not a valid Group instance.
ValueError – If group.name is already in the collection.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1 = Group(**{'name': 'group1'})
>>> g2 = Group(**{'name': 'group2'})
>>> for group in (g1, g2):
...     collection.add_group(group)
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, True]

>>> # using tags
>>> g3 = Group(**{'name': 'group3'})
>>> collection.add_group(g3, tag='ref')
>>> for key, value in collection.tags.items():
...     print(key, value, type(value))
scan ['group1', 'group2'] <class 'list'>
ref ['group3'] <class 'list'>

apply(func, taglist=['all'], **kwargs)[source]¶

Applies a function to groups in a collection.

Parameters

func – Function to apply to the collection. Must accept update=True as an argument.
taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].
**kwargs – Additional keyword arguments to pass to func.
kwargs (dict) –

Return type

None

Returns

Raises

ValueError – If any item in taglist is not a key of the tags attribute.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_collection_hdf5
>>> from araucaria.xas import pre_edge
>>> fpath      = get_testpath('Fe_database.h5')
>>> collection = read_collection_hdf5(fpath)
>>> collection.apply(pre_edge)
>>> report     = collection.summary(optional=['e0'])
>>> report.show()
===============================================
id  dataset           tag   mode    n  e0
===============================================
1   FeIISO4_20K       scan  mu      5  7124.7
2   Fe_Foil           scan  mu_ref  5  7112
3   Ferrihydrite_20K  scan  mu      5  7127.4
4   Goethite_20K      scan  mu      5  7127.3
===============================================

copy()[source]¶

Returns a deep copy of the collection.

Parameters: None –
Return type: Collection
Returns: Copy of the collection.

Example

>>> from numpy import allclose
>>> from araucaria import Group, Collection
>>> collection1 = Collection()
>>> content     = {'name': 'group', 'energy': [1,2,3,4,5,6]}
>>> group       = Group(**content)
>>> collection1.add_group(group)
>>> collection2 = collection1.copy()
>>> energy1     = collection1.get_group('group').energy
>>> energy2     = collection2.get_group('group').energy
>>> allclose(energy1, energy2)
True

del_group(name)[source]¶

Removes a group dataset from the collection.

Parameters: name – Name of group to remove.
Return type: None
Returns
Raises: TypeError – If name is not in a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1 = Group(**{'name': 'group1'})
>>> g2 = Group(**{'name': 'group2'})
>>> for group in (g1, g2):
...     collection.add_group(group)
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, True]
>>> collection.del_group('group2')
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, False]
>>> # verifying that the deleted group has no tag
>>> for key, value in collection.tags.items():
...     print(key, value)
scan ['group1']

get_group(name)[source]¶

Returns a group dataset from the collection.

Parameters: name – Name of group to retrieve.
Return type: Group
Returns: Requested group.
Raises: TypeError – If name is not in a group in the collection.

Important

Changes made to the group will be propagated to the collection. If you need a copy of the group use the copy() method.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1    = Group(**{'name': 'group1'})
>>> collection.add_group(g1)
>>> gcopy = collection.get_group('group1')
>>> check_objattrs(gcopy, Group)
True
>>> print(gcopy.name)
group1

get_mcer(num=None, taglist=['all'])[source]¶

Returns the minimum common energy range for the collection.

Parameters

num (Optional[int]) – Number of equally-spaced points for the energy array.
taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].

Return type

ndarray

Returns

Array containing the minimum common energy range

Raises

AttributeError – If energy is not an attribute of the requested groups.
ValueError – If any item in taglist is not a key of the tags attribute.

Notes

By default the returned array contains the lowest number of points available in the minimum common energy range of the groups.

Providing a value for num will return the desired number of equally-spaced points for the minimum common energy range.

Examples

>>> from numpy import linspace
>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1', 'energy': linspace(1000, 2000, 6)})
>>> g2   = Group(**{'name': 'group2', 'energy': linspace(1500, 2500, 11)})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> # mcer for tag 'scan'
>>> print(collection.get_mcer(taglist=['scan']))
[1000. 1200. 1400. 1600. 1800. 2000.]
>>> # mcer for tag 'ref'
>>> print(collection.get_mcer(taglist=['ref']))
[1500. 1600. 1700. 1800. 1900. 2000. 2100. 2200. 2300. 2400. 2500.]

>>> # mcer for 'all' groups
>>> print(collection.get_mcer())
[1600. 1800. 2000.]
>>> # mcer for 'all' groups explicitly
>>> print(collection.get_mcer(taglist=['scan', 'ref']))
[1600. 1800. 2000.]

>>> # mcer with given number of points
>>> print(collection.get_mcer(num=11))
[1500. 1550. 1600. 1650. 1700. 1750. 1800. 1850. 1900. 1950. 2000.]

get_names(taglist=['all'])[source]¶

Returns group names in the collection.

Parameters: taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].
Return type: List[str]
Returns: List with group names in the collection.
Raises: ValueError – If any item in taglist is not a key of the tags attribute.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> g3   = Group(**{'name': 'group3'})
>>> g4   = Group(**{'name': 'group4'})
>>> tags = ('scan', 'ref', 'ref', 'scan')
>>> for i, group in enumerate([g1, g2, g3, g4]):
...     collection.add_group(group, tag=tags[i])
>>> collection.get_names()
['group1', 'group2', 'group3', 'group4']
>>> collection.get_names(taglist=['scan'])
['group1', 'group4']
>>> collection.get_names(taglist=['ref'])
['group2', 'group3']

get_tag(name)[source]¶

Returns tag of a group in the collection.

Parameters: name – Name of group to retrieve tag.
Return type: str
Returns: Tag of the group.
Raises: AttributeError – If name is not in a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> print(collection.get_tag('group1'))
scan
>>> print(collection.get_tag('group2'))
ref

rename_group(name, newname)[source]¶

Renames a group in the collection.

Parameters

name (str) – Name of group to modify.
newname (str) – New name for the group.

Return type

None

Returns

Raises

AttributeError – If name is not a group in the collection.
TypeError – If newname is not a string.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group)
>>> collection.rename_group('group1', 'group3')
>>> print(collection.get_names())
['group2', 'group3']
>>> print(collection.group3.name)
group3

retag(name, tag)[source]¶

Modifies tag of a group in the collection.

Parameters

name (str) – Name of group to modify.
tag (str) – New tag for the group.

Return type

None

Returns

Raises

AttributeError – If name is not a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> collection.retag('group1', 'ref')
>>> for key, value in collection.tags.items():
...     print(key, value)
ref ['group1', 'group2']

summary(taglist=['all'], regex=None, optional=None)[source]¶

Returns a summary report of groups in a collection.

Parameters

taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].
regex (Optional[str]) – Search string to filter results by group name. See Notes for details. The default is None.
optional (Optional[list]) – List with optional parameters. See Notes for details. The default is None.

Return type

Report

Returns

Report for datasets in the HDF5 file.

Raises

ValueError – If any item in taglist is not a key of the tags attribute.

Notes

Summary data includes the following:

Group index.
Group name.
Group tag.
Measurement mode.
Numbers of scans.
Merged scans, if optional=['merged_scans'].
Optional parameters if they exist as attributes in the group.

A regex value can be used to filter group names based on a regular expression (reges). For valid regex syntax, please check the documentation of the module re.

The number of scans and names of merged files are retrieved from the merged_scans attribute of collection.

Optional parameters will be retrieved from the groups as attributes. Currently only str, float or int will be retrieved. Otherswise an empty character will be printed in the report.