SciPy

tethne.writers.corpora module

tethne.writers.corpora.to_documents(target, ngrams, metadata=None, vocab=None)[source]
Parameters:

target : str

Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus_docs.txt’ and ‘./mycorpus_meta.csv’.

ngrams : dict

Keys are paper identifiers, values are lists of (ngram, frequency) tuples. If vocab is provided, assumes that ngram is an index into vocab.

metadata : tuple

(keys, dict): keys is a list of metadata keys, and dict contains metadata values dict for each paper. ( [ str ], { str(p) : dict } )

Raises:

IOError

tethne.writers.corpora.to_dtm_input(target, D, feature='unigrams', fields=['date', 'atitle'])[source]
Parameters:

target : str

Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus-mult.dat’, ‘./mycorpus-seq.dat’, ‘mycorpus-vocab.dat’, and ‘./mycorpus-meta.dat’.

D : Corpus

Contains Paper objects generated from the same DfR dataset as t_ngrams, indexed by doi and sliced by date.

feature : str

(default: ‘unigrams’) Features in Corpus to use for modeling.

fields : list

(optional) Fields in Paper to include in the metadata file.

Returns:

None : If all goes well.

Raises:

IOError