Cluster module

The cluster module offers the following functions to perform clustering:

Function

Description

cluster()

Performs hierarchical clustering on a collection.

araucaria.stats.cluster.cluster(collection, taglist=['all'], cluster_region='xanes', cluster_range=[- inf, inf], method='single', metric='euclidean', kweight=2)[source]

Performs hierarchical clustering on a collection.

Parameters
  • collection (Collection) – Collection with the groups for clustering.

  • taglist (List[str]) – List with keys to filter groups based on their tags attributes in the Collection. The default is [‘all’].

  • cluster_region (str) – XAFS region to perform clustering. Accepted values are ‘dxanes’, ‘xanes’, or ‘exafs’. The default is ‘xanes’.

  • cluster_range (list) – Domain range in absolute values. Energy units are expected for ‘dxanes’ or ‘xanes’, while wavenumber (k) units are expected for ‘exafs’. The default is [-inf, inf].

  • method (str) – Likage method to compute the distance between clusters. See the linkage() function of scipy for a list of valid method names. The default is ‘single’.

  • metric (str) – The distance metric. See the pdist() function of scipy for a list of valid distance metrics. The default is ‘euclidean’.

  • kweight (int) – Exponent for weighting chi(k) by k^kweight. Only valid for cluster_region='exafs'. The default is 2.

Return type

Dataset

Returns

Dataset with the following arguments:

  • Z : hierarchical clustering encoded as a linkage matrix.

  • groupnames : list with names of clustered groups.

  • energy : array with energy values. Returned only if cluster_region='xanes or cluster_region=dxanes.

  • k : array with wavenumber values. Returned only if cluster_region='exafs'.

  • matrix : array with observed values for groups in cluster_range.

  • cluster_pars : dictionary with cluster parameters.

See also

fig_cluster()

Plots the dendrogram of a hierarchical clustering.

Examples

>>> from araucaria.testdata import get_testpath
>>> from araucaria import Dataset
>>> from araucaria.xas import pre_edge, autobk
>>> from araucaria.stats import cluster
>>> from araucaria.io import read_collection_hdf5
>>> from araucaria.utils import check_objattrs
>>> fpath      = get_testpath('Fe_database.h5')
>>> collection = read_collection_hdf5(fpath)
>>> collection.apply(pre_edge)
>>> out        = cluster(collection, cluster_region='xanes')
>>> attrs      = ['groupnames', 'energy', 'matrix', 'Z', 'cluster_pars']
>>> check_objattrs(out, Dataset, attrs)
[True, True, True, True, True]
>>> # exafs clustering
>>> collection.apply(autobk)
>>> out   = cluster(collection, cluster_region='exafs', cluster_range=[0,10])
>>> attrs = ['groupnames', 'k', 'matrix', 'Z', 'cluster_pars']
>>> check_objattrs(out, Dataset, attrs)
[True, True, True, True, True]