paperscraper.citations
paperscraper.citations
¶
citations
¶
get_citations_by_doi(doi: str) -> int
¶
Get the number of citations of a paper according to semantic scholar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doi
|
str
|
the DOI of the paper. |
required |
Returns:
| Type | Description |
|---|---|
int
|
The number of citations |
Source code in paperscraper/citations/citations.py
get_citations_from_title(title: str) -> int
¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
Title of paper to be searched on Scholar. |
required |
Raises:
| Type | Description |
|---|---|
TypeError
|
If sth else than str is passed. |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of citations of paper. |
Source code in paperscraper/citations/citations.py
entity
¶
core
¶
Entity
¶
An abstract entity class with a set of utilities shared by the objects that perform self-linking analyses, such as Paper and Researcher.
Source code in paperscraper/citations/entity/core.py
paper
¶
Paper
¶
Bases: Entity
Source code in paperscraper/citations/entity/paper.py
__init__(input: str, mode: ModeType = 'infer')
¶
Set up a Paper object for analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
str
|
Paper identifier. This can be the title, DOI or semantic scholar ID of the paper. |
required |
mode
|
ModeType
|
The format in which the ID was provided. Defaults to "infer". |
'infer'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If unknown mode is given. |
Source code in paperscraper/citations/entity/paper.py
self_references()
¶
Extracts the self references of a paper, for each author.
self_citations()
¶
Extracts the self citations of a paper, for each author.
get_result() -> Optional[PaperResult]
¶
Provides the result of the analysis.
Returns: PaperResult if available.
Source code in paperscraper/citations/entity/paper.py
researcher
¶
Researcher
¶
Bases: Entity
Source code in paperscraper/citations/entity/researcher.py
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 | |
__init__(input: str, mode: ModeType = 'infer')
¶
Construct researcher object for self citation/reference analysis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
str
|
A researcher to search for, identified by name, ORCID iD, or Semantic Scholar Author ID. |
required |
mode
|
ModeType
|
This can be a |
'infer'
|
Raises:
| Type | Description |
|---|---|
ValueError
|
Unknown mode |
Source code in paperscraper/citations/entity/researcher.py
self_references(verbose: bool = False) -> ResearcherResult
¶
Sifts through all papers of a researcher and extracts the self references.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
verbose
|
bool
|
If True, logs detailed information for each paper. |
False
|
Returns:
| Type | Description |
|---|---|
ResearcherResult
|
A ResearcherResult containing aggregated self-reference data. |
Source code in paperscraper/citations/entity/researcher.py
self_citations(verbose: bool = False) -> ResearcherResult
¶
Sifts through all papers of a researcher and finds how often they are self-cited.
Source code in paperscraper/citations/entity/researcher.py
get_result() -> ResearcherResult
¶
Provides the result of the analysis.
Source code in paperscraper/citations/entity/researcher.py
orcid
¶
orcid_to_author_name(orcid_id: str) -> Optional[str]
¶
Given an ORCID ID (as a string, e.g. '0000-0002-1825-0097'), returns the full name of the author from the ORCID public API.
Source code in paperscraper/citations/orcid.py
self_citations
¶
self_citations_paper(inputs: Union[str, List[str]], verbose: bool = False) -> Union[CitationResult, List[CitationResult]]
async
¶
Analyze self-citations for one or more papers by DOI or Semantic Scholar ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
Union[str, List[str]]
|
A single DOI/SSID string or a list of them. |
required |
verbose
|
bool
|
If True, logs detailed information for each paper. |
False
|
Returns:
| Type | Description |
|---|---|
Union[CitationResult, List[CitationResult]]
|
A single CitationResult if a string was passed, else a list of CitationResults. |
Source code in paperscraper/citations/self_citations.py
self_references
¶
self_references_paper(inputs: Union[str, List[str]], verbose: bool = False) -> Union[ReferenceResult, List[ReferenceResult]]
async
¶
Analyze self-references for one or more papers by DOI or Semantic Scholar ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs
|
Union[str, List[str]]
|
A single DOI/SSID string or a list of them. |
required |
verbose
|
bool
|
If True, logs detailed information for each paper. |
False
|
Returns:
| Type | Description |
|---|---|
Union[ReferenceResult, List[ReferenceResult]]
|
A single ReferenceResult if a string was passed, else a list of ReferenceResults. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no references are found for a given identifier. |
Source code in paperscraper/citations/self_references.py
tests
¶
test_self_citations
¶
TestSelfCitations
¶
Source code in paperscraper/citations/tests/test_self_citations.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | |
test_researcher()
¶
Tests calculation of self-references for all papers of an author.
Source code in paperscraper/citations/tests/test_self_citations.py
test_researcher_from_orcid()
¶
Tests calculation of self-references for all papers of an author.
Source code in paperscraper/citations/tests/test_self_citations.py
test_self_references
¶
TestSelfReferences
¶
Source code in paperscraper/citations/tests/test_self_references.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 | |
test_compare_async_and_sync_performance(dois)
¶
Compares the execution time of asynchronous and synchronous self_references
for a list of DOIs.
Source code in paperscraper/citations/tests/test_self_references.py
test_researcher()
¶
Tests calculation of self-references for all papers of an author.
Source code in paperscraper/citations/tests/test_self_references.py
test_researcher_from_orcid()
¶
Tests calculation of self-references for all papers of an author.
Source code in paperscraper/citations/tests/test_self_references.py
utils
¶
get_doi_from_title(title: str) -> Optional[str]
¶
Searches the DOI of a paper based on the paper title
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
Paper title |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
DOI according to semantic scholar API |
Source code in paperscraper/citations/utils.py
get_doi_from_ssid(ssid: str, max_retries: int = 10) -> Optional[str]
async
¶
Given a Semantic Scholar paper ID, returns the corresponding DOI if available.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ssid
|
str
|
The paper ID on Semantic Scholar. |
required |
Returns:
| Type | Description |
|---|---|
Optional[str]
|
str or None: The DOI of the paper, or None if not found or in case of an error. |
Source code in paperscraper/citations/utils.py
get_title_and_id_from_doi(doi: str) -> Dict[str, str] | None
async
¶
Given a DOI, retrieves the paper's title and semantic scholar paper ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doi
|
str
|
The DOI of the paper (e.g., "10.18653/v1/N18-3011"). |
required |
Returns:
| Type | Description |
|---|---|
Dict[str, str] | None
|
dict or None: A dictionary with keys 'title' and 'ssid'. |
Source code in paperscraper/citations/utils.py
author_name_to_ssaid(author_name: str) -> Tuple[str, str]
async
¶
Given an author name, returns the Semantic Scholar author ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
author_name
|
str
|
The full name of the author. |
required |
Returns:
| Type | Description |
|---|---|
Tuple[str, str]
|
Tuple[str, str] or None: The SS author ID alongside the SS name (may differ slightly from input name) or None if no author is found. |
Source code in paperscraper/citations/utils.py
determine_paper_input_type(input: str) -> Literal['ssid', 'doi', 'title']
¶
Determines the intended input type by the user if not explicitly given (infer).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input
|
str
|
Either a DOI or a semantic scholar paper ID or an author name. |
required |
Returns:
| Type | Description |
|---|---|
Literal['ssid', 'doi', 'title']
|
The input type |
Source code in paperscraper/citations/utils.py
get_papers_for_author(ss_author_id: str) -> List[str]
async
¶
Given a Semantic Scholar author ID, returns a list of all Semantic Scholar paper IDs for that author.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ss_author_id
|
str
|
The Semantic Scholar author ID (e.g., "1741101"). |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
A list of paper IDs (as strings) authored by the given author. |
Source code in paperscraper/citations/utils.py
find_matching(first: List[Dict[str, str]], second: List[Dict[str, str]]) -> List[str]
¶
Ingests two sets of authors and returns a list of those that match (either based on name or on author ID).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
first
|
List[Dict[str, str]]
|
First set of authors given as list of dict with two keys ( |
required |
second
|
List[Dict[str, str]]
|
Second set of authors given as list of dict with two same keys. |
required |
Returns:
| Type | Description |
|---|---|
List[str]
|
List of names of authors in first list where a match was found. |
Source code in paperscraper/citations/utils.py
check_overlap(n1: str, n2: str) -> bool
¶
Check whether two author names are identical.
Heuristics
- Case insensitive
- If name sets are identical, a match is assumed (e.g. "John Walter" vs "Walter John").
- Assume the last token is the surname and require:
- same surname
- both have at least one given name
- first given names are compatible (same, or initial vs full)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n1
|
str
|
first name (e.g., "John A. Smith") |
required |
n2
|
str
|
second name (e.g., "J. Smith") |
required |
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
Whether names are identical. |
Source code in paperscraper/citations/utils.py
clean_name(s: str) -> str
¶
Clean up a str by removing special characters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
s
|
str
|
Input possibly containing special symbols |
required |
Returns:
| Type | Description |
|---|---|
str
|
Homogenized string. |