Cluster module¶
The cluster module offers the following
functions to perform clustering:
Function |
Description |
|---|---|
Performs hierarchical clustering on a collection. |
- araucaria.stats.cluster.cluster(collection, taglist=['all'], cluster_region='xanes', cluster_range=[- inf, inf], method='single', metric='euclidean', kweight=2)[source]¶
Performs hierarchical clustering on a collection.
- Parameters
collection (
Collection) – Collection with the groups for clustering.taglist (
List[str]) – List with keys to filter groups based on theirtagsattributes in the Collection. The default is [‘all’].cluster_region (
str) – XAFS region to perform clustering. Accepted values are ‘dxanes’, ‘xanes’, or ‘exafs’. The default is ‘xanes’.cluster_range (
list) – Domain range in absolute values. Energy units are expected for ‘dxanes’ or ‘xanes’, while wavenumber (k) units are expected for ‘exafs’. The default is [-inf,inf].method (
str) – Likage method to compute the distance between clusters. See thelinkage()function ofscipyfor a list of valid method names. The default is ‘single’.metric (
str) – The distance metric. See thepdist()function ofscipyfor a list of valid distance metrics. The default is ‘euclidean’.kweight (
int) – Exponent for weighting chi(k) by k^kweight. Only valid forcluster_region='exafs'. The default is 2.
- Return type
- Returns
Dataset with the following arguments:
Z: hierarchical clustering encoded as a linkage matrix.groupnames: list with names of clustered groups.energy: array with energy values. Returned only ifcluster_region='xanesorcluster_region=dxanes.k: array with wavenumber values. Returned only ifcluster_region='exafs'.matrix: array with observed values for groups incluster_range.cluster_pars: dictionary with cluster parameters.
See also
fig_cluster()Plots the dendrogram of a hierarchical clustering.
Examples
>>> from araucaria.testdata import get_testpath >>> from araucaria import Dataset >>> from araucaria.xas import pre_edge, autobk >>> from araucaria.stats import cluster >>> from araucaria.io import read_collection_hdf5 >>> from araucaria.utils import check_objattrs >>> fpath = get_testpath('Fe_database.h5') >>> collection = read_collection_hdf5(fpath) >>> collection.apply(pre_edge) >>> out = cluster(collection, cluster_region='xanes') >>> attrs = ['groupnames', 'energy', 'matrix', 'Z', 'cluster_pars'] >>> check_objattrs(out, Dataset, attrs) [True, True, True, True, True]
>>> # exafs clustering >>> collection.apply(autobk) >>> out = cluster(collection, cluster_region='exafs', cluster_range=[0,10]) >>> attrs = ['groupnames', 'k', 'matrix', 'Z', 'cluster_pars'] >>> check_objattrs(out, Dataset, attrs) [True, True, True, True, True]