BioNLP and Text Mining Lab

RLIMS-P: rule-based literature mining system for protein phosphorylation

RLIMS-P is a rule-based text-mining system specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from biomedical literature. It consists of different modules, including (i) natural language processing (NLP) modules to annotate input text, and (ii) an information extraction (IE) engine to apply a collection of phrasal patterns to extract protein phosphorylation information from input text. To extend the system for different PTM types and, thus, to accommodate new sets of phrasal patterns in the system, the IE engine has been enhanced to ease the development and maintenance of pattern collections.

eFIP: extracting functional impact of phosphorylation

eFIP (Extracting Functional Impact of Phosphorylation) is a text mining system that extracts from literature the possible impact of phosphorylation on a phosphoprotein's interactions with other proteins [Tudor et al., 2012; Arigh et al., 2012]. The online system mines the protein-protein interactions (PPIs) of a phosphoprotein, allowing biologists to query the database by a set of PMIDs or protein names. eFIP is now being generalized to additionally identify the impact of phosphorylation on a phosphoprotein's subcellular localizations from the literature.

eGIFT: extracting gene information from text

eGIFT (Extracting Gene Information From Text) identifies terms and documents that are relevant to a gene and its products. Additional functionalities of eGIFT include:

finding terms in documents for a group of genes
finding genes sharing a specific term
finding related terms and related genes

iSimp: a sentence simplification system for biomedical text

A challenge in designing and applying NLP systems to biomedical text is the complexity of sentences. One possible approach to alleviate this situation is to simplify the sentences. We developed iSimp, which can reduce the sentence syntactic complexity, thus improving the performance of NLP systems (e.g., relation extraction systems) [Peng et al., 2012]. To make iSimp readily usable in NLP and text mining tools, we participate BioCreative IV BioC track, and adopt the BioC format, a simple XML format to share text documents and annotations [Comeau et al., 2013]. The Java API developed as part of the iSimp project becomes part of the public release of the BioC package.

IXtractR

Text mining is increasingly used in the biomedical domain because of its ability to automatically gather information from large amount of scientific articles. One important task in biomedical text mining is relation extraction, which aims to identify designated relations among biological entities reported in literature. Here, we report a novel framework to facilitate the development of a pattern-based biomedical relation extraction system. iXtractR ("I eXtract Relations") is an implementation of this framework. It is a web service designed to detect the various types of relations/events: Gene_expression, Transcription, Localization, Phosphorylation, Protein_catabalism, and Binding.

eMiRIT

eMiRIT (Extracting MicroRNA Information from Text) identifies terms and documents that are relevant to a microRNA (miR). The miRs in eMiRIT are species-independent. The literature for many species-specific miRs is sparse, and because eMiRIT uses a frequency-based approach, the results will be misleading if too few documents are used. The core properties of a miR are likely to be common to many species, and these will be captured as top-ranking terms in a species-independent approach.