The SciKnowMine Triage Application

12. Web Application - The Base Digital Library

Once the documents have been selected for inclusion in a particular corpus, then more detailed information extraction and annotation can be performed. This work is still at an early stage but is intended as a core functionality for the SciKnowMine system. In particular, we anticipate using Information Extraction to locate and extract passages of interest within the system.

The screenshot shows a specific target corpus selected for the MGI-driven use-case: 'Allele Phenotype'. The user may annotate passages in the text by clicking, dragging and releasing the mouse over the document. These two selected passages shown are the main claims of the paper shown. In particular, we anticipate the use of the BioC annotation standard to support sharing and structuring these fragment elements.

By focusing on PDF files as a common de-facto publication standard that most scientists are comfortable using, we hope to develop knowledge engineering and text mining applications that scientists find intuitively useful in the execution of their day-to-day work. We also intend that our tools are portable and modular so that they may be incorporated into other developers' systems.