Collection class

class araucaria.main.collection.Collection(name=None)[source]

Collection storage class.

This class stores a collection of Group objects.

Parameters

name (str) – Name for the collection. The default is None.

tags

Dictionary with available groups in the collection based on tag keys.

Type

dict

Notes

Each group will be stored as an attribute of the collection. The tags attribute classifies group names based on a tag key, which is useful for joint manipulation of groups.

The following methods are currently implemented:

Method

Description

add_group()

Adds a group to the collection.

apply()

Applies a function to groups in the collection.

copy()

Returns a copy of the collection.

del_group()

Deletes a group from the collection.

get_group()

Returns a group in the collection.

get_mcer()

Returns the minimum common energy range for the collection.

get_names()

Return group names in the collection.

get_tag()

Returns tag of a group in the collection.

rename_group()

Renames a group in the collection.

retag()

Modifies tag of a group in the collection.

summary()

Returns a summary report of the collection.

Warning

Each group can only have a single tag key.

Example

>>> from araucaria import Collection
>>> collection = Collection()
>>> type(collection)
<class 'araucaria.main.collection.Collection'>
add_group(group, tag='scan')[source]

Adds a group dataset to the collection.

Parameters
  • group (Group) – The data group to add to the collection.

  • tag (str) – Key for the tags attribute of the collection. The default is ‘scan’.

Return type

None

Returns

Raises
  • TypeError – If group is not a valid Group instance.

  • ValueError – If group.name is already in the collection.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1 = Group(**{'name': 'group1'})
>>> g2 = Group(**{'name': 'group2'})
>>> for group in (g1, g2):
...     collection.add_group(group)
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, True]
>>> # using tags
>>> g3 = Group(**{'name': 'group3'})
>>> collection.add_group(g3, tag='ref')
>>> for key, value in collection.tags.items():
...     print(key, value, type(value))
scan ['group1', 'group2'] <class 'list'>
ref ['group3'] <class 'list'>
apply(func, taglist=['all'], **kwargs)[source]

Applies a function to groups in a collection.

Parameters
  • func – Function to apply to the collection. Must accept update=True as an argument.

  • taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].

  • **kwargs – Additional keyword arguments to pass to func.

  • kwargs (dict) –

Return type

None

Returns

Raises

ValueError – If any item in taglist is not a key of the tags attribute.

Example

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_collection_hdf5
>>> from araucaria.xas import pre_edge
>>> fpath      = get_testpath('Fe_database.h5')
>>> collection = read_collection_hdf5(fpath)
>>> collection.apply(pre_edge)
>>> report     = collection.summary(optional=['e0'])
>>> report.show()
===============================================
id  dataset           tag   mode    n  e0
===============================================
1   FeIISO4_20K       scan  mu      5  7124.7
2   Fe_Foil           scan  mu_ref  5  7112
3   Ferrihydrite_20K  scan  mu      5  7127.4
4   Goethite_20K      scan  mu      5  7127.3
===============================================
copy()[source]

Returns a deep copy of the collection.

Parameters

None

Return type

Collection

Returns

Copy of the collection.

Example

>>> from numpy import allclose
>>> from araucaria import Group, Collection
>>> collection1 = Collection()
>>> content     = {'name': 'group', 'energy': [1,2,3,4,5,6]}
>>> group       = Group(**content)
>>> collection1.add_group(group)
>>> collection2 = collection1.copy()
>>> energy1     = collection1.get_group('group').energy
>>> energy2     = collection2.get_group('group').energy
>>> allclose(energy1, energy2)
True
del_group(name)[source]

Removes a group dataset from the collection.

Parameters

name – Name of group to remove.

Return type

None

Returns

Raises

TypeError – If name is not in a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1 = Group(**{'name': 'group1'})
>>> g2 = Group(**{'name': 'group2'})
>>> for group in (g1, g2):
...     collection.add_group(group)
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, True]
>>> collection.del_group('group2')
>>> check_objattrs(collection, Collection, attrlist=['group1','group2'])
[True, False]
>>> # verifying that the deleted group has no tag
>>> for key, value in collection.tags.items():
...     print(key, value)
scan ['group1']
get_group(name)[source]

Returns a group dataset from the collection.

Parameters

name – Name of group to retrieve.

Return type

Group

Returns

Requested group.

Raises

TypeError – If name is not in a group in the collection.

Important

Changes made to the group will be propagated to the collection. If you need a copy of the group use the copy() method.

Example

>>> from araucaria import Collection, Group
>>> from araucaria.utils import check_objattrs
>>> collection = Collection()
>>> g1    = Group(**{'name': 'group1'})
>>> collection.add_group(g1)
>>> gcopy = collection.get_group('group1')
>>> check_objattrs(gcopy, Group)
True
>>> print(gcopy.name)
group1
get_mcer(num=None, taglist=['all'])[source]

Returns the minimum common energy range for the collection.

Parameters
  • num (Optional[int]) – Number of equally-spaced points for the energy array.

  • taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].

Return type

ndarray

Returns

Array containing the minimum common energy range

Raises
  • AttributeError – If energy is not an attribute of the requested groups.

  • ValueError – If any item in taglist is not a key of the tags attribute.

Notes

By default the returned array contains the lowest number of points available in the minimum common energy range of the groups.

Providing a value for num will return the desired number of equally-spaced points for the minimum common energy range.

Examples

>>> from numpy import linspace
>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1', 'energy': linspace(1000, 2000, 6)})
>>> g2   = Group(**{'name': 'group2', 'energy': linspace(1500, 2500, 11)})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> # mcer for tag 'scan'
>>> print(collection.get_mcer(taglist=['scan']))
[1000. 1200. 1400. 1600. 1800. 2000.]
>>> # mcer for tag 'ref'
>>> print(collection.get_mcer(taglist=['ref']))
[1500. 1600. 1700. 1800. 1900. 2000. 2100. 2200. 2300. 2400. 2500.]
>>> # mcer for 'all' groups
>>> print(collection.get_mcer())
[1600. 1800. 2000.]
>>> # mcer for 'all' groups explicitly
>>> print(collection.get_mcer(taglist=['scan', 'ref']))
[1600. 1800. 2000.]
>>> # mcer with given number of points
>>> print(collection.get_mcer(num=11))
[1500. 1550. 1600. 1650. 1700. 1750. 1800. 1850. 1900. 1950. 2000.]
get_names(taglist=['all'])[source]

Returns group names in the collection.

Parameters

taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].

Return type

List[str]

Returns

List with group names in the collection.

Raises

ValueError – If any item in taglist is not a key of the tags attribute.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> g3   = Group(**{'name': 'group3'})
>>> g4   = Group(**{'name': 'group4'})
>>> tags = ('scan', 'ref', 'ref', 'scan')
>>> for i, group in enumerate([g1, g2, g3, g4]):
...     collection.add_group(group, tag=tags[i])
>>> collection.get_names()
['group1', 'group2', 'group3', 'group4']
>>> collection.get_names(taglist=['scan'])
['group1', 'group4']
>>> collection.get_names(taglist=['ref'])
['group2', 'group3']
get_tag(name)[source]

Returns tag of a group in the collection.

Parameters

name – Name of group to retrieve tag.

Return type

str

Returns

Tag of the group.

Raises

AttributeError – If name is not in a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> print(collection.get_tag('group1'))
scan
>>> print(collection.get_tag('group2'))
ref
rename_group(name, newname)[source]

Renames a group in the collection.

Parameters
  • name (str) – Name of group to modify.

  • newname (str) – New name for the group.

Return type

None

Returns

Raises

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group)
>>> collection.rename_group('group1', 'group3')
>>> print(collection.get_names())
['group2', 'group3']
>>> print(collection.group3.name)
group3
retag(name, tag)[source]

Modifies tag of a group in the collection.

Parameters
  • name (str) – Name of group to modify.

  • tag (str) – New tag for the group.

Return type

None

Returns

Raises

AttributeError – If name is not a group in the collection.

Example

>>> from araucaria import Collection, Group
>>> collection = Collection()
>>> g1   = Group(**{'name': 'group1'})
>>> g2   = Group(**{'name': 'group2'})
>>> tags = ('scan', 'ref')
>>> for i, group in enumerate([g1, g2]):
...     collection.add_group(group, tag=tags[i])
>>> collection.retag('group1', 'ref')
>>> for key, value in collection.tags.items():
...     print(key, value)
ref ['group1', 'group2']
summary(taglist=['all'], regex=None, optional=None)[source]

Returns a summary report of groups in a collection.

Parameters
  • taglist (List[str]) – List with keys to filter groups in the collection based on the tags attribute. The default is [‘all’].

  • regex (Optional[str]) – Search string to filter results by group name. See Notes for details. The default is None.

  • optional (Optional[list]) – List with optional parameters. See Notes for details. The default is None.

Return type

Report

Returns

Report for datasets in the HDF5 file.

Raises

ValueError – If any item in taglist is not a key of the tags attribute.

Notes

Summary data includes the following:

  1. Group index.

  2. Group name.

  3. Group tag.

  4. Measurement mode.

  5. Numbers of scans.

  6. Merged scans, if optional=['merged_scans'].

  7. Optional parameters if they exist as attributes in the group.

A regex value can be used to filter group names based on a regular expression (reges). For valid regex syntax, please check the documentation of the module re.

The number of scans and names of merged files are retrieved from the merged_scans attribute of collection.

Optional parameters will be retrieved from the groups as attributes. Currently only str, float or int will be retrieved. Otherswise an empty character will be printed in the report.

See also

Report

Examples

>>> from araucaria.testdata import get_testpath
>>> from araucaria.io import read_collection_hdf5
>>> fpath      = get_testpath('Fe_database.h5')
>>> collection = read_collection_hdf5(fpath)
>>> # printing default summary
>>> report = collection.summary()
>>> report.show()
=======================================
id  dataset           tag   mode    n
=======================================
1   FeIISO4_20K       scan  mu      5
2   Fe_Foil           scan  mu_ref  5
3   Ferrihydrite_20K  scan  mu      5
4   Goethite_20K      scan  mu      5
=======================================
>>> # printing summary of dnd file with merged scans
>>> report = collection.summary(regex='Goe', optional=['merged_scans'])
>>> report.show()
=============================================================
id  dataset       tag   mode  n  merged_scans
=============================================================
1   Goethite_20K  scan  mu    5  20K_GOE_Fe_K_240.00000.xdi
                                 20K_GOE_Fe_K_240.00001.xdi
                                 20K_GOE_Fe_K_240.00002.xdi
                                 20K_GOE_Fe_K_240.00003.xdi
                                 20K_GOE_Fe_K_240.00004.xdi
=============================================================
>>> # printing custom summary
>>> from araucaria.testdata import get_testpath
>>> from araucaria import Collection
>>> from araucaria.io import read_xmu
>>> fpath = get_testpath('xmu_testfile.xmu')
>>> # extracting mu and mu_ref scans
>>> group_mu = read_xmu(fpath, scan='mu')
>>> # adding additional attributes
>>> group_mu.symbol = 'Zn'
>>> group_mu.temp   = 25.0
>>> # saving in a collection
>>> collection = Collection()
>>> collection.add_group(group_mu)
>>> report = collection.summary(optional=['symbol','temp'])
>>> report.show()
===================================================
id  dataset           tag   mode  n  symbol  temp
===================================================
1   xmu_testfile.xmu  scan  mu    1  Zn      25
===================================================