The SciKnowMine Triage Application

We present here a user manual for running and maintaining a web-based system for peforming document triage given a corpus of PDF files. We will describe processes for installation, execution and maintenance of the system.

  1. Installation Manual
  2. System Organization
  3. Command Line - Set up
  4. Command Line - Working with Data
  5. Command Line - Reporting Functions
  6. Command Line - Deleting Data
  7. Command Line - Machine Learning
  8. Command Line Tools - Running Experiments
  9. Web Application - Running the System
  10. Web Application - Extracting text using LAPDF-Text
  11. Web Application - Performing the triage task
  12. Web Application - The Base Digital Library

3. Command Line Tools - Set Up

Once installed, the system permits two modes of use: (A) using command-line tools to administer the server, and upload/process scientific articles into corpora and (B) use the web interface to assign articles to specific corpora.

Setting up the swfTools directory

The system uses the pdf2swf command to convert PDF files to the SWF file for displaying in FlexPaper (http://flexpaper.devaldi.com/). We therefore have to identify to the system where to find the pdf2swf executable with the following command.

setSwfToolsBinDirectory </path/to/pdf2swf/executable>

Building a triage database

This step has to be executed only once.

The triage system stores all of its data in a MySQL database. This includes articles content, classification codes, and how they are organized into collections (corpora). Before using the triage system you need to create a trage database using one of the triage commands pre-installed in your system.

In order to execute this command you must select a name and use a suitable login name and password for existing user with suitable permissions.

buildTriageDatabase -db <name-of-database> -l <login> -p <password>        
 -db DBNAME : Database name
 -l LOGIN   : Database login
 -p PASSWD  : Database password

Destroying a triage database

Removing a database from the system involves a similar command to the one creating it.

destroyTriageDatabase -db <name-of-database> -l <login> -p <password>        
 -db DBNAME : Database name
 -l LOGIN   : Database login
 -p PASSWD  : Database password