tethne.networks.papers module¶
Methods for generating networks in which papers are vertices.
author_coupling | Vertices are papers and edges indicates shared authorship. |
bibliographic_coupling | Generate a bibliographic coupling network. |
cocitation | Generate a cocitation network. |
direct_citation | Create a traditional directed citation network. |
topic_coupling | Two papers are coupled if they both contain a shared topic above threshold. |
Vertices are papers and edges indicates shared authorship.
Element Description Node Papers, represented by node_id. Edge (a,b) in E(G) if a and b share x authors and x >= threshold Edge Attributes overlap: the value of x (above). Parameters: papers : list
A list of Paper
threshold : int
Minimum number of co-citations required to draw an edge between two authors.
node_id : string
Field in Paper used to identify nodes.
node_attribs : list
List of fields in Paper to include as node attributes in graph.
Returns: acoupling : networkx.Graph
An author-coupling network.
- tethne.networks.papers.bibliographic_coupling(papers, citation_id='ayjid', threshold=1, node_id='ayjid', node_attribs=['date'], weighted=False, **kwargs)[source]¶
Generate a bibliographic coupling network.
Two papers are bibliographically coupled when they both cite the same, third, paper. You can generate a bibliographic coupling network using the networks.papers.bibliographic_coupling() method.
>>> BC = nt.papers.bibliographic_coupling(papers) >>> BC <networkx.classes.graph.Graph object at 0x102eec710>
Especially when working with large datasets, or disciplinarily narrow literatures, it is usually helpful to set a minimum number of shared citations required for two papers to be coupled. You can do this by setting the `threshold` parameter.
>>> BC = nt.papers.bibliographic_coupling(papers, threshold=1) >>> len(BC.edges()) 1216 >>> BC = nt.papers.bibliographic_coupling(papers, threshold=2) >>> len(BC.edges()) 542
Element Description Node Papers represented by node_id. Node Attributes node_attribs in Paper Edge (a,b) in E(G) if a and b share x citations where x >= threshold. Edge Attributes overlap: the number of citations shared Parameters: papers : list
A list of wos_objects.
citation_id: string
A key from Paper to identify the citation overlaps. Default is ‘ayjid’.
threshold : int
Minimum number of shared citations to consider two papers “coupled”.
node_id : string
Field in Paper used to identify the nodes. Default is ‘ayjid’.
node_attribs : list
List of fields in Paper to include as node attributes in graph.
weighted : bool
If True, edge attribute overlap is a float in {0-1} calculated as \cfrac{N_{ij}}{\sqrt{N_{i}N_{j}}} where N_{i} and N_{j} are the number of references in Paper i and j, respectively, and N_{ij} is the number of references shared by papers i and j.
Returns: bcoupling : networkx.Graph
A bibliographic coupling network.
Raises: KeyError : Raised when citation_id is not present in the meta_list.
Notes
Lists cannot be attributes? causing errors for both gexf and graphml also nodes cannot be none.
- tethne.networks.papers.cocitation(papers, threshold=1, node_id='ayjid', topn=None, verbose=False, node_attribs=['date'], **kwargs)[source]¶
Generate a cocitation network.
A cocitation network is a network in which vertices are papers, and edges indicate that two papers were cited by the same third paper. CiteSpace is a popular desktop application for co-citation analysis, and you can read about the theory behind it here. Co-citation analysis is generally performed with a temporal component, so building a GraphCollection from a :class`.Corpus` sliced by date is recommended.
You can generate a co-citation network using the networks.papers.cocitation() method:
>>> CC = nt.papers.cocitation(papers) >>> CC <networkx.classes.graph.Graph object at 0x102eec790>
For large datasets, you may wish to set a minimum number of co-citations required for an edge between two papers Keep in mind that all of the references in a single paper are co-cited once, so a threshold of at least 2 is prudent. Note the dramatic decrease in the number of edges when the threshold is changed from 2 to 3.
>>> CC = nt.papers.cocitation(papers, threshold=2) >>> len(CC.edges()) 8889 >>> CC = nt.papers.cocitation(papers, threshold=3) >>> len(CC.edges()) 1493
Element Description Node Cited papers represented by Paper ayjid. Edge (a, b) if a and b are cited by the same paper. Edge Attributes weight: number of times two papers are co-cited together. Parameters: papers : list
a list of Paper objects.
threshold : int
Minimum number of co-citations required to create an edge.
topn : int or float, or None
If provided, only the topn (int) or topn percent (float) most cited papers will be included in the cocitation network. If None (default), network will include all cited papers (NOTE: this can cause severe memory consumption for even moderately-sized datasets).
verbose : bool
If True, prints status messages.
Returns: cocitation : networkx.Graph
A cocitation network.
- tethne.networks.papers.direct_citation(papers, node_id='ayjid', node_attribs=['date'], **kwargs)[source]¶
Create a traditional directed citation network.
Direct-citation graphs are directed acyclic graphs in which vertices are papers, and each (directed) edge represents a citation of the target paper by the source paper. The networks.papers.direct_citation() method generates both a global citation graph, which includes all cited and citing papers, and an internal citation graph that describes only citations among papers in the original dataset.
To generate direct-citation graphs, use the networks.papers.direct_citation() method. Note the size difference between the global and internal citation graphs.
>>> gDC, iDC = nt.papers.direct_citation(papers) >>> len(gDC) 5998 >>> len(iDC) 163
Element Description Node Papers, represented by node_id. Edge From a paper to a cited reference. Edge Attribute Publication date of the citing paper. Parameters: papers : list
A list of Paper instances.
node_id : int
A key from Paper to identify the nodes. Default is ‘ayjid’.
node_attribs : list
List of user provided optional arguments apart from the provided positional arguments.
Returns: citation_network : networkx.DiGraph
Global citation network (all citations).
citation_network_internal : networkx.DiGraph
Internal citation network where only the papers in the list are nodes in the network.
Raises: KeyError : If node_id is not present in the meta_list.
- tethne.networks.papers.topic_coupling(papers, threshold=0.7, node_id='ayjid', **kwargs)[source]¶
Two papers are coupled if they both contain a shared topic above threshold.
Element Description Node Papers, represented by node_id. Edge (a,b) in E(G) if a and b share >= 1 topics with proportion >= threshold in both a and b. Edge Attributes weight: combined mean proportion of each shared topic. topics: list of shared topics. Parameters: papers : list
A list of Paper
threshold : float
Minimum representation of a topic in each paper.
node_id : string
Field in Paper used to identify nodes.
Returns: tc : networkx.Graph
A topic-coupling network.