Generating the Big Mechanisms Evaluation Corpus

Preliminary work for the program in downloading the full text of documents from the PMC lists .

June 2016

  1. Dry Run II Papers from MITRE

July 2015

  1. Two-Hop Ras BioPax Model

May 2015

  1. MINT + INTACT DATA

April 2015

  1. Open Access Pathway Logic Papers and Figures
  2. Extended Coprecipitation Frames v2

January 2015

  1. Coprecipitation Frames v2
  2. List of Experimental Motif Types + Definitions
  3. KEfED Database Construction

December 2014

  1. KEfED Modeling of Coprecipitation Ras Papers
  2. Initial Extraction Study of Results-Based Epistemics

October 2014

  1. Pathway Logic Experiment Types
  2. Building a Database of Observations from Result Text
  3. Deploying the BioScholar System
  4. Reading Against a Model of Experimental Evidence

August 2014

  1. Developing NL Annotations for KEfED Elements
  2. Epistemics and Fragments
  3. Generating the Big Mechanisms Evaluation Corpus
  4. A Generative Story for Scientific Text from Experimental Data
  1. BigMech Wiki Instructions + Redundancy

    This link on the Big Mechanisms wiki provides a list of 1,741 PMC id values. Since some of the articles occur in more than one query, only 840 of these are unique. Here is a list of these unique PMC id values.

  2. Preliminary Corpora

    Consistent with this list, we have attempted to download these documents to provide to the community as a shared resource. Pending additional bug checking, we now provide this as a resource for the community.

Given the latest lists of the open access xml from PMC, we were able to download 812 of these 840 documents .

  1. Corpus Organization

    • We organize directories of the corpus Journal/Year/Volume
    • space characters are replaced with _ in Journal and Volume names.
    • Each article’s files are named according to it’s PubMed ID (pmid)
    • [pmid].pdf - The pdf file
    • [pmid]_pmc.xml - The xml full text

We host the files for this on Amazon, so that they may only be downloaded from links on this site.