Working with the BioMedical literature is something that every scientist must grapple with and there exist relatively few practical tools that enable and support the construction of knowledge bases from published papers. BioScholar is designed to be a practical framework that a graduate student could use to track information in the literature to help build a database of known facts for their subject. The system is the first instantiation of a system using the 'KEfED' model.
Even though text mining research is an active field in biomedicine ('BioNLP'), there are relatively few practical systems available for bioinformatics developers and biocurators that are easy to install and use. This initial implementation extends our basic digital library application to provide a `triage' function (identifying documents of interest to a specific database) using supervised machine learning techniques.
As an open source project that provides a set of basic, yet extensible functionality (document management, accurate PDF text-extraction, annotation, triage, document clustering etc.) that is intended support CS text mining techniques in other groups removing the need to reimplement these basic capabilities.
A low-level java library for accurately extracting text from PDF files and additional web-based tools to improve performance of the text extraction.
A general-purpose, ontology-aware knowledge engineering methodology for biomedical data that permits the construction of biomedical databases derived only from a description of the underlying experimental protocol used to gather the data.
A system scaffolding set of java-libraries that permit the agile development of web-applications. This is the foundation for all of our software tools and permits us to develop specialized database applications for scientists rapidly and efficiently.