The SciKnowMine Triage Application

We present here a user manual for running and maintaining a web-based system for peforming document triage given a corpus of PDF files. We will describe processes for installation, execution and maintenance of the system.

  1. Installation Manual
  2. System Organization
  3. Command Line - Set up
  4. Command Line - Working with Data
  5. Command Line - Reporting Functions
  6. Command Line - Deleting Data
  7. Command Line - Machine Learning
  8. Command Line Tools - Running Experiments
  9. Web Application - Running the System
  10. Web Application - Extracting text using LAPDF-Text
  11. Web Application - Performing the triage task
  12. Web Application - The Base Digital Library

5. Command Line Tools - Reporting functions

The system has three query command line functions for an administrator to query the state of the system from the command line.

The reportCorpusCounts command returns a formatted count of the contents of each target and triage corpus.

reportCorpusCounts -db DBNAME -l LOGIN -p PASSWD 

 -db DBNAME            : Database name
 -l LOGIN              : Database login
 -p PASSWD             : Database password

The reportTriageCorpusContents command returns a formatted list of all the documents in a given triage corpus (relating to a defined target corpus).

reportTargetCorpusContents  -db DBNAME -l LOGIN -p PASSWD -targetCorpus CNAME

 -db DBNAME          : Database name
 -l LOGIN            : Database login
 -p PASSWD           : Database password
 -targetCorpus CNAME : Target Corpus Name

The reportTriageCorpusContents command returns a formatted list of all the documents in a given triage corpus.

reportTriageCorpusContents -db DBNAME -l LOGIN -p PASSWD -targetCorpus CNAME -triageCorpus CNAME

 -db DBNAME          : Database name
 -l LOGIN            : Database login
 -p PASSWD           : Database password
 -targetCorpus CNAME : Target Corpus Name
 -triageCorpus CNAME : Triage Corpus Name