SciPy

tethne.readers.wos module

Reader for Web of Science field-tagged bibliographic data.

Tethne parses Web of Science field-tagged data into a list of Paper objects. This is a two-step process: data are first parsed into a list of dictionaries with field-tags as keys, and then each dictionary is converted to a Paper . readers.wos.read() performs both steps in sequence.

One-step Parsing

The method readers.wos.read() performs both readers.wos.parse() and readers.wos.convert() . This is the preferred (simplest) approach in most cases.

>>> papers = rd.wos.read("/Path/to/savedrecs.txt")
>>> papers[0]
<tethne.data.Paper instance at 0x101b575a8>

Alternatively, if you have many data files saved in the same directory, you can use readers.wos.from_dir() :

>>> papers = rd.wos.parse_from_dir("/Path/to")

Two-step Parsing

Use the two-step approach if you need to access fields not included in Paper, or if you wish to perform some intermediate manipulation on the raw parsed data.

First import the readers.wos module:

>>> import tethne.readers as rd

Then parse the WoS data to a list of field-tagged dictionaries using readers.wos.parse() :

>>> wos_list = rd.wos.parse("/Path/to/savedrecs.txt")
>>> wos_list[0].keys()
['EM', '', 'CL', 'AB', 'WC', 'GA', 'DI', 'IS', 'DE', 'VL', 'CY', 'AU', 'JI', 
 'AF', 'CR', 'DT', 'TC', 'EP', 'CT', 'PG', 'PU', 'PI', 'RP', 'J9', 'PT', 
 'LA', 'UT', 'PY', 'ID', 'SI', 'PA', 'SO', 'Z9', 'PD', 'TI', 'SC', 'BP', 
 'C1', 'NR', 'RI', 'ER', 'SN']

Convert those field-tagged dictionaries to Paper objects using readers.wos.convert() :

>>> papers = rd.wos.convert(wos_list)
>>> papers[0]
<tethne.data.Paper instance at 0x101b575a8>

Methods

convert(wos_data) Convert parsed field-tagged data to Paper instances.
from_dir(path) Convenience function for generating a list of Paper from a directory of Web of Science field-tagged data files.
parse(filepath) Parse Web of Science field-tagged data.
read(datapath) Yields a list of Paper instances from a Web of Science data file.
exception tethne.readers.wos.DataError[source]

Bases: exceptions.Exception

tethne.readers.wos.convert(wos_data)[source]

Convert parsed field-tagged data to Paper instances.

Convert a dictionary or list of dictionaries with keys from the Web of Science field tags into a Paper instance or list of Paper instances, the standard for Tethne.

Each Paper is tagged with an accession id for this conversion.

Parameters:

wos_data : list

A list of dictionaries with keys from the WoS field tags.

Returns:

papers : list

A list of Paper instances.

Notes

Need to handle author name anomolies (case, blank spaces, etc.) that may make the same author appear to be two different authors in Networkx; this is important for any graph with authors as nodes.

Examples

>>> import tethne.readers as rd
>>> wos_list = rd.wos.parse("/Path/to/data.txt")
>>> papers = rd.wos.convert(wos_list)
tethne.readers.wos.corpus_from_dir(path)[source]
Parameters:

path : string

Path to directory of field-tagged data files.

Returns:

papers : list

A list of Paper objects.

tethne.readers.wos.from_dir(path)[source]

Convenience function for generating a list of Paper from a directory of Web of Science field-tagged data files.

Parameters:

path : string

Path to directory of field-tagged data files.

Returns:

papers : list

A list of Paper objects.

Raises:

IOError

Invalid path.

Examples

>>> import tethne.readers as rd
>>> papers = rd.wos.from_dir("/Path/to/datadir")        
tethne.readers.wos.parse(filepath)[source]

Parse Web of Science field-tagged data.

Parameters:

filepath : string

Filepath to the Web of Science plain text file.

Returns:

wos_list : list

A list of dictionaries each associated with a paper from the Web of Science with keys from docs/fieldtags.txt as encountered in the file; most values associated with keys are strings with special exceptions defined by the list_keys and int_keys variables.

Raises:

KeyError : Key value which needs to be converted to an ‘int’ is not present.

AttributeError :

IOError : File at filepath not found, not readable, or empty.

Notes

Unknown keys: RI, OI, Z9

Examples

>>> import tethne.readers as rd
>>> wos_list = rd.wos.parse("/Path/to/data.txt")
tethne.readers.wos.read(datapath)[source]

Yields a list of Paper instances from a Web of Science data file.

Parameters:

datapath : string

Filepath to the Web of Science field-tagged data file.

Returns:

papers : list

A list of Paper instances.

Examples

>>> import tethne.readers as rd
>>> papers = rd.wos.read("/Path/to/data.txt")
tethne.readers.wos.read_corpus(path)[source]