Here we describe an annotation study of epistemics, PL Datums and KEfED models for five papers pertaining to the Ras pathway.
From the 71 “open access” pmids, there are 1716 datums. 24 have Hras, Braf, Raf1 or Rac1 as subject.
These come from 8 papers: [16492808, 11448999, 11777939, 12515821, 19050761, 20929976, 16520382, 12876277]
Of these, 5 papers containing at least one coprecipitation study; we use these as the initial basis of this small-scale study.
Click for ZIP file of PDF, PMC XML + TXT files
We used the ORCA codes that are currently in the BioScholar system (listed in this excel spreadsheet). This consists of Anita’s original encoding and one additional code to denote a description by authors of ‘what they did’ in the execution of the experiment. This is intended as a placeholder to be substituted out later.
Within the results sections of papers, we expect to find the following high level argument structure:
v2_bD_cN
]v1_bR_cA
]methods
]v3_bD_cA
]v2_bD_cN
]I worked through the results sections of all files to mark up all elements of the experimental narrative that directly pertains to experimental results from within the paper with ORCA codes. These may be viewed (and edited) in the CMU installation of the BioScholar system: lagos.lti.cs.cmu.edu:8080/bioscholar/digLib.jsp
Use the search button marked
?
at the bottom of the article list of and then selectorca-ex
from the Fragment pane to see the annotations in the tool.
We dumped these to brat to show them as isolated text annotations: brat_orcaFiles.tar.gz. These are also available for viewing on the CMU system:
http://lagos.lti.cs.cmu.edu/~gully/brat/#
After feedback from Anita, we changed the codes from orca
codes to ‘epistemic segment types’, which more accurately describe the requirements. We switched these over in the database very easily using mysql’s REPLACE function:
UPDATE FTDFragment SET frgType = REPLACE(frgType, 'orca-ex', 'epistSeg')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'orca-ex', 'epistSeg')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'implication: v2_bD_sA', 'implication')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'v1_bD_sN', 'goal')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'cited-result: v3_bD_sN', 'other-result')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'hypothesis: v1_b0_sA', 'hypothesis')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'result: v3_bD_sA', 'result')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'v1_bR_sA', 'goal')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'v1_bD_sA', 'goal')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'fact: v3_b0_s0', 'fact')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'cited-implication: v2_bD_', 'other-implication')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'v2_bR_sA', 'fact')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'cited-hypothesis\': v1_b0_', 'other-hypothesis')
UPDATE FTDFragmentBlock SET code = REPLACE(code, 'problem: v0_b0_s0', 'problem')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple,'implication: v2_bD_sA', 'implication')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'v1_bD_sN', 'goal')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'cited-result: v3_bD_sN', 'other-result')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'hypothesis: v1_b0_sA', 'hypothesis')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'result: v3_bD_sA', 'result')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'v1_bR_sA', 'goal')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'v1_bD_sA', 'goal')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'fact: v3_b0_s0', 'fact')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'cited-implication: v2_bD_', 'other-implication')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'v2_bR_sA', 'fact')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'cited-hypothesis\': v1_b0_', 'other-hypothesis')
UPDATE ViewTable SET indexTuple = REPLACE(indexTuple, 'problem: v0_b0_s0', 'problem')
This maps all previous codes to their new values.
Outcome: have an automatic system to identify experimental passages in the text (i.e., given a collection of open access articles, we want a system that generates a table with the following columns:
Milestones + Tasks: (broken down as monthly staging points working backwards from the final outcome). This is structured “We need X … To accomplish this, we will do Y …”.
6 month milestone (Jul 2015): We need to have the full table as shown above for all available XML documents (column 8 is optional depending on how we work with image analyzers).
5 month milestone (Jun 2015): We need to have the experiments complete and performing at F-Score > 0.9 and the experiment identification algorithm finished.
4 month milestone (May 2015): We need to be on track with an effective experimental process for extraction working and in use based on the gold standard epistemic statement type annotations.
3 month milestone (Apr 2015): We need to have a well-defined gold-standard training set with annotations for (A) epistemic statement types and (B) experiment labels with inter-annotator agreement data to confirm it’s validity.
2 month milestone (Mar 2015): We need to have an effective working process for creating annotations for epistemic statement types and experiment codes with automated computation of inter-annotator agreement.
1 month milestone (Feb 2015): We need a practical annotation tool with inter-annotator functionality + prototype experimental methods for IE and detecting experimental labels.