From the 71 “open access” pmids, there are 1716 datums. 24 have Hras, Braf, Raf1 or Rac1 as subject.
These come from 8 papers: [16492808, 11448999, 11777939, 12515821, 19050761, 20929976, 16520382, 12876277]
Of these, 5 papers containing at least one coprecipitation study; we use these as the initial basis of this small-scale study.
Click for ZIP file of PDF, PMC XML + TXT files
We used simple scripts to query the PL Datum databases about the papers implicated in this short annotation study. We now use this repo as a lab notebook. See
src/main/java/_02_rasSpecificPapers/S02_ReadPmidsFiguresAssays.java within the master branch.
Querying the PL database for assays revealed this file: pmids_figs_assays.txt
Querying the PL database for datum objects revealed this file: pmids_figs_datums.txt
Note that these assay types are documented in the Pathway Logic database here: http://pl.csl.sri.com/CurationNotebook/pages/_Assays.html
Initially, we attempted to model each type of assay as described within Pathway Logic assay types, as shown for these two assay types shown below:
KEfED Model [JSON]
pmid:11777939-Fig-4bhas the assay set to
GTP-assoc[BDPD]. Probably a data-typo. What does ‘BDPD’ stand for?
KEfED Model [JSON]
However, when we looked in detail at the papers experiments, we were trying to find KEfED versions of the basic PL types. Studies that detect coprecipitation use subtly different specific technical motifs at the level of KEfED models. We therefore started examining specific experiments in depth for a single paper: Innocenti et al. 2002: 11777939 and started to attempt to elaborate experimental motifs in greater detail.
We need to link the text from the Figure Legend and Methods Section to the KEfED template / PL Assay type and the text from the results section that actually describe the main findings to the PL Datum objects and the KEfED experiment.
This paper has a total of 18 experiments. Interestingly, there is not a one-to-one correspondence between the assays described in the Pathway Logic database, the KEfED models we’ve curated and the precise delineation of fragments in the results section. The authors occasionally describe more than one experiment in a single sentence. A single experiment may similarly provide more than one datum from more than one assay type (or even, a given experiment yields no PL datum objects at all). This reflects some of the differences between the KEfED modeling methodology and the PL curation approach.
|Expt||Pathway Logic Assay||KEfED Model Name||Fragment|
|8B||copptby[WB] + phos[||TimedIncubation_IP_WB_MobilityShift||8, 8AB|
All ORCA-encoded fragments and KEfED models are included in this zipfile
ORCA-encoded fragments are provided in the
brat format and the KEfED models and Data are provided as JSON files (conforming to the model for the original KEfED editor). They can be viewed in that system, but really need to be converted to our latest schema.
This shows a working pipeline for (A) delineating text using ORCA codes and (B) generating preliminary KEfED models and data tables for those experiments manually.
Here, we examine coprecipitation studies from the five papers in terms of their KEfED models.
I curated models for each of these experiments
|12515821||S1b (not included)||not included|
That makes 23 separate coprecipitation studies from 5 papers, using 13 different KEfED experiment types (note that these experimental types should be tightened up ontologically).
How should we continue here?
The focus of this work is now pushing on the KEfED model to demonstrate the technical process for performing extraction from text. To make this more concrete, we will build on the above work to focus on only simple coprecipitation studies, refine the definition of the types involved and then work on extending the corpus to provide enough training examples for Pradeep to be able to deliver a reading solution.
This includes the following experiments (to be extended as we proceed):