API Reference¶

This section documents the public API of paperscraper.

Below you’ll find links to the documentation for each module:

paperscraper — Main package entry point.
paperscraper.arxiv — ArXiv scraping & keyword search
paperscraper.citations — Get (self-)citations & (self-)reference of papers and authors
paperscraper.get_dumps — Utilities to download bioRxiv, medRxiv & chemRxiv metadata
paperscraper.pdf — Download publications as pdfs
paperscraper.pubmed — Pubmed keyword search
paperscraper.scholar — Google Scholar endpoints
paperscraper.xrxiv — Shared utilities for {bio,med,chem}Rxiv

Citation¶

If you use paperscraper, please cite a paper that motivated our development of this tool.

@article{born2021trends,
  title={Trends in Deep Learning for Property-driven Drug Design},
  author={Born, Jannis and Manica, Matteo},
  journal={Current Medicinal Chemistry},
  volume={28},
  number={38},
  pages={7862--7886},
  year={2021},
  publisher={Bentham Science Publishers}
}

Top-level API¶

`paperscraper` ¶

Initialize the module.

`dump_queries(keywords: List[List[Union[str, List[str]]]], dump_root: str) -> None` ¶

Performs keyword search on all available servers and dump the results.

Parameters:

Name	Type	Description	Default
`keywords`	`List[List[Union[str, List[str]]]]`	List of lists of keywords Each second-level list is considered a separate query. Within each query, each item (whether str or List[str]) are considered AND separated. If an item is again a list, strs are considered synonyms (OR separated).	required
`dump_root`	`str`	Path to root for dumping.	required

Source code in paperscraper/__init__.py

def dump_queries(keywords: List[List[Union[str, List[str]]]], dump_root: str) -> None:
    """Performs keyword search on all available servers and dump the results.

    Args:
        keywords (List[List[Union[str, List[str]]]]): List of lists of keywords
            Each second-level list is considered a separate query. Within each
            query, each item (whether str or List[str]) are considered AND
            separated. If an item is again a list, strs are considered synonyms
            (OR separated).
        dump_root (str): Path to root for dumping.
    """

    for idx, keyword in enumerate(keywords):
        for db, f in QUERY_FN_DICT.items():
            logger.info(f" Keyword {idx + 1}/{len(keywords)}, DB: {db}")
            filename = get_filename_from_query(keyword)
            os.makedirs(os.path.join(dump_root, db), exist_ok=True)
            f(keyword, output_filepath=os.path.join(dump_root, db, filename))

API Reference¶

Citation¶

Top-level API¶

paperscraper ¶

dump_queries(keywords: List[List[Union[str, List[str]]]], dump_root: str) -> None ¶

`paperscraper` ¶

`dump_queries(keywords: List[List[Union[str, List[str]]]], dump_root: str) -> None` ¶