SciPy

tethne.analyze.features module

Methods for analyzing featuresets.

cosine_distance Calculate cosine distance for sparse feature vectors.
cosine_similarity Calculate cosine similarity for sparse feature vectors.
distance Calculate the distance between two sparse feature vectors using a method from scipy.spatial.distance.
kl_divergence Calculate Kullback-Leibler Distance for sparse feature vectors.
tethne.analyze.features.cosine_distance(sa, sb)[source]

Calculate cosine distance for sparse feature vectors.

Uses the cosine method in scipy.spatial.distance.

Parameters:

sa : list

sb : list

Returns:

distance : float

Cosine distance.

tethne.analyze.features.cosine_similarity(sa, sb)[source]

Calculate cosine similarity for sparse feature vectors.

Uses the cosine method in scipy.spatial.distance.

Parameters:

sa : list

sb : list

Returns:

similarity : float

Cosine similarity

tethne.analyze.features.distance(sa, sb, method, normalize=True, smooth=False)[source]

Calculate the distance between two sparse feature vectors using a method from scipy.spatial.distance.

Supported distance methods:

Method Documentation
braycurtis scipy.org
canberra scipy.org
chebyshev scipy.org
cityblock scipy.org
correlation scipy.org
cosine scipy.org
dice scipy.org
euclidean scipy.org
hamming scipy.org
jaccard scipy.org
kulsinski scipy.org
matching scipy.org
rogerstanimoto scipy.org
russellrao scipy.org
sokalmichener scipy.org
sokalsneath scipy.org
sqeuclidean scipy.org
yule scipy.org
Parameters:

sa : list

sb : list

method : str

Name of a method in scipy.spatial.distance (see above).

normalize : bool

(default: True) If True, sa and sb are normalized so that they each sum to 1.0.

smooth : bool

(default: False) If True, uses the smoothing method described in Bigi 2003

Returns:

distance : float

Distance value from method.

tethne.analyze.features.kl_divergence(sa, sb)[source]

Calculate Kullback-Leibler Distance for sparse feature vectors.

Uses the smoothing method described in Bigi 2003 to facilitate better comparisons between vectors describing wordcounts.

Parameters:

sa : list

sb : list

Returns:

divergence : float

KL divergence.