tethne.writers.corpora module¶

tethne.writers.corpora.to_documents(target, ngrams, metadata=None, vocab=None)[source]¶

Parameters:

Parameters:	target : str Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus_docs.txt’ and ‘./mycorpus_meta.csv’. ngrams : dict Keys are paper identifiers, values are lists of (ngram, frequency) tuples. If vocab is provided, assumes that ngram is an index into vocab. metadata : tuple (keys, dict): keys is a list of metadata keys, and dict contains metadata values dict for each paper. ( [ str ], { str(p) : dict } )
Raises:	IOError

target : str

Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus_docs.txt’ and ‘./mycorpus_meta.csv’.

ngrams : dict

Keys are paper identifiers, values are lists of (ngram, frequency) tuples. If vocab is provided, assumes that ngram is an index into vocab.

metadata : tuple

(keys, dict): keys is a list of metadata keys, and dict contains metadata values dict for each paper. ( [ str ], { str(p) : dict } )

Raises:

IOError

tethne.writers.corpora.to_dtm_input(target, D, feature='unigrams', fields=['date', 'atitle'])[source]¶

Parameters:

Parameters:	target : str Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus-mult.dat’, ‘./mycorpus-seq.dat’, ‘mycorpus-vocab.dat’, and ‘./mycorpus-meta.dat’. D : `Corpus` Contains `Paper` objects generated from the same DfR dataset as t_ngrams, indexed by doi and sliced by date. feature : str (default: ‘unigrams’) Features in `Corpus` to use for modeling. fields : list (optional) Fields in `Paper` to include in the metadata file.
Returns:	None : If all goes well.
Raises:	IOError

target : str

Target path for documents; e.g. ‘./mycorpus’ will result in ‘./mycorpus-mult.dat’, ‘./mycorpus-seq.dat’, ‘mycorpus-vocab.dat’, and ‘./mycorpus-meta.dat’.

D : Corpus

Contains Paper objects generated from the same DfR dataset as t_ngrams, indexed by doi and sliced by date.

feature : str

(default: ‘unigrams’) Features in Corpus to use for modeling.

fields : list

(optional) Fields in Paper to include in the metadata file.

Returns:

None : If all goes well.

Raises:

IOError

tethne.writers.corpora module¶

Previous topic

Next topic

This Page