SciPy

tethne.networks.papers module

Methods for generating networks in which papers are vertices.

author_coupling Vertices are papers and edges indicates shared authorship.
bibliographic_coupling Generate a bibliographic coupling network.
cocitation Generate a cocitation network.
direct_citation Create a traditional directed citation network.
topic_coupling Two papers are coupled if they both contain a shared topic above threshold.
tethne.networks.papers.author_coupling(papers, threshold=1, node_attribs=['date'], node_id='ayjid', **kwargs)[source]

Vertices are papers and edges indicates shared authorship.

Element Description
Node Papers, represented by node_id.
Edge (a,b) in E(G) if a and b share x authors and x >= threshold
Edge Attributes overlap: the value of x (above).
Parameters:

papers : list

A list of Paper

threshold : int

Minimum number of co-citations required to draw an edge between two authors.

node_id : string

Field in Paper used to identify nodes.

node_attribs : list

List of fields in Paper to include as node attributes in graph.

Returns:

acoupling : networkx.Graph

An author-coupling network.

tethne.networks.papers.bibliographic_coupling(papers, citation_id='ayjid', threshold=1, node_id='ayjid', node_attribs=['date'], weighted=False, **kwargs)[source]

Generate a bibliographic coupling network.

Two papers are bibliographically coupled when they both cite the same, third, paper. You can generate a bibliographic coupling network using the networks.papers.bibliographic_coupling() method.

>>> BC = nt.papers.bibliographic_coupling(papers)
>>> BC
<networkx.classes.graph.Graph object at 0x102eec710>

Especially when working with large datasets, or disciplinarily narrow literatures, it is usually helpful to set a minimum number of shared citations required for two papers to be coupled. You can do this by setting the `threshold` parameter.

>>> BC = nt.papers.bibliographic_coupling(papers, threshold=1)
>>> len(BC.edges())
1216
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=2)
>>> len(BC.edges())
542
Element Description
Node Papers represented by node_id.
Node Attributes node_attribs in Paper
Edge (a,b) in E(G) if a and b share x citations where x >= threshold.
Edge Attributes overlap: the number of citations shared
Parameters:

papers : list

A list of wos_objects.

citation_id: string

A key from Paper to identify the citation overlaps. Default is ‘ayjid’.

threshold : int

Minimum number of shared citations to consider two papers “coupled”.

node_id : string

Field in Paper used to identify the nodes. Default is ‘ayjid’.

node_attribs : list

List of fields in Paper to include as node attributes in graph.

weighted : bool

If True, edge attribute overlap is a float in {0-1} calculated as \cfrac{N_{ij}}{\sqrt{N_{i}N_{j}}} where N_{i} and N_{j} are the number of references in Paper i and j, respectively, and N_{ij} is the number of references shared by papers i and j.

Returns:

bcoupling : networkx.Graph

A bibliographic coupling network.

Raises:

KeyError : Raised when citation_id is not present in the meta_list.

Notes

Lists cannot be attributes? causing errors for both gexf and graphml also nodes cannot be none.

tethne.networks.papers.cocitation(papers, threshold=1, node_id='ayjid', topn=None, verbose=False, node_attribs=['date'], **kwargs)[source]

Generate a cocitation network.

A cocitation network is a network in which vertices are papers, and edges indicate that two papers were cited by the same third paper. CiteSpace is a popular desktop application for co-citation analysis, and you can read about the theory behind it here. Co-citation analysis is generally performed with a temporal component, so building a GraphCollection from a :class`.Corpus` sliced by date is recommended.

You can generate a co-citation network using the networks.papers.cocitation() method:

>>> CC = nt.papers.cocitation(papers)
>>> CC
<networkx.classes.graph.Graph object at 0x102eec790>

For large datasets, you may wish to set a minimum number of co-citations required for an edge between two papers Keep in mind that all of the references in a single paper are co-cited once, so a threshold of at least 2 is prudent. Note the dramatic decrease in the number of edges when the threshold is changed from 2 to 3.

>>> CC = nt.papers.cocitation(papers, threshold=2)
>>> len(CC.edges())
8889
>>> CC = nt.papers.cocitation(papers, threshold=3)
>>> len(CC.edges())
1493
Element Description
Node Cited papers represented by Paper ayjid.
Edge (a, b) if a and b are cited by the same paper.
Edge Attributes weight: number of times two papers are co-cited together.
Parameters:

papers : list

a list of Paper objects.

threshold : int

Minimum number of co-citations required to create an edge.

topn : int or float, or None

If provided, only the topn (int) or topn percent (float) most cited papers will be included in the cocitation network. If None (default), network will include all cited papers (NOTE: this can cause severe memory consumption for even moderately-sized datasets).

verbose : bool

If True, prints status messages.

Returns:

cocitation : networkx.Graph

A cocitation network.

tethne.networks.papers.direct_citation(papers, node_id='ayjid', node_attribs=['date'], **kwargs)[source]

Create a traditional directed citation network.

Direct-citation graphs are directed acyclic graphs in which vertices are papers, and each (directed) edge represents a citation of the target paper by the source paper. The networks.papers.direct_citation() method generates both a global citation graph, which includes all cited and citing papers, and an internal citation graph that describes only citations among papers in the original dataset.

To generate direct-citation graphs, use the networks.papers.direct_citation() method. Note the size difference between the global and internal citation graphs.

>>> gDC, iDC = nt.papers.direct_citation(papers)
>>> len(gDC)
5998
>>> len(iDC)
163
Element Description
Node Papers, represented by node_id.
Edge From a paper to a cited reference.
Edge Attribute Publication date of the citing paper.
Parameters:

papers : list

A list of Paper instances.

node_id : int

A key from Paper to identify the nodes. Default is ‘ayjid’.

node_attribs : list

List of user provided optional arguments apart from the provided positional arguments.

Returns:

citation_network : networkx.DiGraph

Global citation network (all citations).

citation_network_internal : networkx.DiGraph

Internal citation network where only the papers in the list are nodes in the network.

Raises:

KeyError : If node_id is not present in the meta_list.

tethne.networks.papers.topic_coupling(papers, threshold=0.7, node_id='ayjid', **kwargs)[source]

Two papers are coupled if they both contain a shared topic above threshold.

Element Description
Node Papers, represented by node_id.
Edge (a,b) in E(G) if a and b share >= 1 topics with proportion >= threshold in both a and b.
Edge Attributes weight: combined mean proportion of each shared topic. topics: list of shared topics.
Parameters:

papers : list

A list of Paper

threshold : float

Minimum representation of a topic in each paper.

node_id : string

Field in Paper used to identify nodes.

Returns:

tc : networkx.Graph

A topic-coupling network.