tethne.analyze.features module¶
Methods for analyzing featuresets.
cosine_distance | Calculate cosine distance for sparse feature vectors. |
cosine_similarity | Calculate cosine similarity for sparse feature vectors. |
distance | Calculate the distance between two sparse feature vectors using a method from scipy.spatial.distance. |
kl_divergence | Calculate Kullback-Leibler Distance for sparse feature vectors. |
- tethne.analyze.features.cosine_distance(sa, sb)[source]¶
Calculate cosine distance for sparse feature vectors.
Uses the cosine method in scipy.spatial.distance.
Parameters: sa : list
sb : list
Returns: distance : float
Cosine distance.
- tethne.analyze.features.cosine_similarity(sa, sb)[source]¶
Calculate cosine similarity for sparse feature vectors.
Uses the cosine method in scipy.spatial.distance.
Parameters: sa : list
sb : list
Returns: similarity : float
Cosine similarity
- tethne.analyze.features.distance(sa, sb, method, normalize=True, smooth=False)[source]¶
Calculate the distance between two sparse feature vectors using a method from scipy.spatial.distance.
Supported distance methods:
Method Documentation braycurtis scipy.org canberra scipy.org chebyshev scipy.org cityblock scipy.org correlation scipy.org cosine scipy.org dice scipy.org euclidean scipy.org hamming scipy.org jaccard scipy.org kulsinski scipy.org matching scipy.org rogerstanimoto scipy.org russellrao scipy.org sokalmichener scipy.org sokalsneath scipy.org sqeuclidean scipy.org yule scipy.org Parameters: sa : list
sb : list
method : str
Name of a method in scipy.spatial.distance (see above).
normalize : bool
(default: True) If True, sa and sb are normalized so that they each sum to 1.0.
smooth : bool
(default: False) If True, uses the smoothing method described in Bigi 2003
Returns: distance : float
Distance value from method.
- tethne.analyze.features.kl_divergence(sa, sb)[source]¶
Calculate Kullback-Leibler Distance for sparse feature vectors.
Uses the smoothing method described in Bigi 2003 to facilitate better comparisons between vectors describing wordcounts.
Parameters: sa : list
sb : list
Returns: divergence : float
KL divergence.