SciPy

tethne package

Subpackages

Submodules

tethne.utilities module

Helper functions.

class tethne.utilities.Dictionary[source]

A two-way index for integer/string pairs.

class tethne.utilities.MLStripper[source]

Bases: HTMLParser.HTMLParser

feed(data)[source]

added this check as sometimes we are getting the data in integer format instead of string

get_data()[source]
handle_data(d)[source]
tethne.utilities.argmax(iterable)[source]
tethne.utilities.argmin(iterable)[source]
tethne.utilities.argsort(seq)[source]
tethne.utilities.attribs_to_string(attrib_dict, keys)[source]

A more specific version of the subdict utility aimed at handling node and edge attribute dictionaries for NetworkX file formats such as gexf (which does not allow attributes to have a list type) by making them writable in those formats

tethne.utilities.concat_list(listA, listB, delim=' ')[source]

Concatenate list elements pair-wise with the delim character Returns the concatenated list Raises index error if lists are not parallel

tethne.utilities.contains(l, f)[source]

Searches list l for a pattern specified in a lambda function f.

tethne.utilities.dict_from_node(node, recursive=False)[source]

Converts ElementTree node to a dictionary.

Parameters:

node : ElementTree node

recursive : boolean

If recursive=False, the value of any field with children will be the number of children.

Returns:

dict : nested dictionary.

Tags as keys and values as values. Sub-elements that occur multiple times in an element are contained in a list.

tethne.utilities.is_number(value)[source]
tethne.utilities.mean(iterable)[source]
tethne.utilities.nonzero(iterable)[source]
tethne.utilities.normalize(s)[source]

Normalize a token.

  • Convert to lower-case,
  • Remove all punctuation,
  • Remove all numbers.
tethne.utilities.number(value)[source]
tethne.utilities.overlap(listA, listB)[source]

Return list of objects shared by listA, listB.

tethne.utilities.strip_non_ascii(s)[source]

Returns the string without non-ASCII characters.

Parameters:

string : string

A string that may contain non-ASCII characters.

Returns:

clean_string : string

A string that does not contain non-ASCII characters.

tethne.utilities.strip_punctuation(s)[source]
tethne.utilities.strip_tags(html)[source]
tethne.utilities.subdict(super_dict, keys)[source]

Returns a subset of the super_dict with the specified keys.

tethne.utilities.swap(u, v)[source]

exchange the values of u and v

tethne.utilities.tokenize(passage)[source]

Convert a string into a list of normalized words.

Module contents

Tethne is a Python package that draws together tools and techniques from bibliometrics, computational linguistics, and social influence modeling into a single easy-to-use corpus analysis framework. Scholars can use Tethne to parse and organize data from the ISI Web of Science and JSTOR Data-for-Research databases, and generate time-variant citation-based network models, topic models, and social influence models.

analyze Methods for analyzing Corpus, GraphCollection, and networkx.Graph objects.
classes The classes package provides the fundamental classes for working with bibliographic data in Tethne.
model
networks Methods for building networks from bibliographic data.
readers Methods for parsing bibliographic datasets.
writers Export networks to structured and unstructured formats, for visualization.