RLIMS-P is a rule-based text-mining
system specifically designed to extract protein
phosphorylation information on protein kinase, substrate
and phosphorylation sites from biomedical literature. It
consists of different modules, including (i) natural
language processing (NLP) modules to annotate input text,
and (ii) an information extraction (IE) engine to apply a
collection of phrasal patterns to extract protein
phosphorylation information from input text. To extend the
system for different PTM types and, thus, to accommodate
new sets of phrasal patterns in the system, the IE engine
has been enhanced to ease the development and maintenance
of pattern collections.
eFIP (Extracting Functional Impact
of Phosphorylation) is a text mining system that extracts
from literature the possible impact of phosphorylation on a
phosphoprotein's interactions with other
proteins [Tudor et al., 2012; Arigh et al., 2012]. The
online system mines the protein-protein interactions (PPIs)
of a phosphoprotein, allowing biologists to query the
database by a set of PMIDs or protein names. eFIP is now
being generalized to additionally identify the impact of
phosphorylation on a phosphoprotein's
subcellular localizations from the literature.
eGIFT
(Extracting Gene Information From Text) identifies terms
and documents that are relevant to a gene and its products.
Additional functionalities of eGIFT include:
- finding terms in documents for a group of genes
- finding genes sharing a specific term
- finding related terms and related genes
A challenge in designing and applying NLP
systems to biomedical text is the complexity of sentences.
One possible approach to alleviate this situation is to
simplify the sentences. We developed iSimp, which can
reduce the sentence syntactic complexity, thus improving
the performance of NLP systems (e.g., relation extraction
systems) [Peng et al., 2012]. To make iSimp readily usable
in NLP and text mining tools, we participate BioCreative IV
BioC track, and adopt the BioC format, a simple XML format
to share text documents and annotations [Comeau et al.,
2013]. The Java API developed as part of the iSimp project
becomes part of the public release of the BioC package.
Text mining is increasingly used in the biomedical domain
because of its ability to automatically gather information
from large amount of scientific articles. One important task
in biomedical text mining is relation extraction, which aims
to identify designated relations among biological entities
reported in literature. Here, we report a novel framework to
facilitate the development of a pattern-based biomedical relation
extraction system. iXtractR ("I eXtract Relations") is an
implementation of this framework. It is a web service designed
to detect the various types of relations/events: Gene_expression,
Transcription, Localization, Phosphorylation, Protein_catabalism,
and Binding.
eMiRIT (Extracting MicroRNA Information from Text) identifies
terms and documents that are relevant to a microRNA (miR). The
miRs in eMiRIT are species-independent. The literature for many
species-specific miRs is sparse, and because eMiRIT uses a
frequency-based approach, the results will be misleading if too
few documents are used. The core properties of a miR are likely
to be common to many species, and these will be captured as
top-ranking terms in a species-independent approach.