Cluster module¶
The cluster
module offers the following
functions to perform clustering:
Function |
Description |
---|---|
Performs hierarchical clustering on a collection. |
- araucaria.stats.cluster.cluster(collection, taglist=['all'], cluster_region='xanes', cluster_range=[- inf, inf], method='single', metric='euclidean', kweight=2)[source]¶
Performs hierarchical clustering on a collection.
- Parameters
collection (
Collection
) – Collection with the groups for clustering.taglist (
List
[str
]) – List with keys to filter groups based on theirtags
attributes in the Collection. The default is [‘all’].cluster_region (
str
) – XAFS region to perform clustering. Accepted values are ‘dxanes’, ‘xanes’, or ‘exafs’. The default is ‘xanes’.cluster_range (
list
) – Domain range in absolute values. Energy units are expected for ‘dxanes’ or ‘xanes’, while wavenumber (k) units are expected for ‘exafs’. The default is [-inf
,inf
].method (
str
) – Likage method to compute the distance between clusters. See thelinkage()
function ofscipy
for a list of valid method names. The default is ‘single’.metric (
str
) – The distance metric. See thepdist()
function ofscipy
for a list of valid distance metrics. The default is ‘euclidean’.kweight (
int
) – Exponent for weighting chi(k) by k^kweight. Only valid forcluster_region='exafs'
. The default is 2.
- Return type
- Returns
Dataset with the following arguments:
Z
: hierarchical clustering encoded as a linkage matrix.groupnames
: list with names of clustered groups.energy
: array with energy values. Returned only ifcluster_region='xanes
orcluster_region=dxanes
.k
: array with wavenumber values. Returned only ifcluster_region='exafs'
.matrix
: array with observed values for groups incluster_range
.cluster_pars
: dictionary with cluster parameters.
See also
fig_cluster()
Plots the dendrogram of a hierarchical clustering.
Examples
>>> from araucaria.testdata import get_testpath >>> from araucaria import Dataset >>> from araucaria.xas import pre_edge, autobk >>> from araucaria.stats import cluster >>> from araucaria.io import read_collection_hdf5 >>> from araucaria.utils import check_objattrs >>> fpath = get_testpath('Fe_database.h5') >>> collection = read_collection_hdf5(fpath) >>> collection.apply(pre_edge) >>> out = cluster(collection, cluster_region='xanes') >>> attrs = ['groupnames', 'energy', 'matrix', 'Z', 'cluster_pars'] >>> check_objattrs(out, Dataset, attrs) [True, True, True, True, True]
>>> # exafs clustering >>> collection.apply(autobk) >>> out = cluster(collection, cluster_region='exafs', cluster_range=[0,10]) >>> attrs = ['groupnames', 'k', 'matrix', 'Z', 'cluster_pars'] >>> check_objattrs(out, Dataset, attrs) [True, True, True, True, True]