About: MOTIVATION: To facilitate accurate estimation of statistical significance of sequence similarity in profile–profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance. RESULTS: In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments. AVAILABILITY AND IMPLEMENTATION: LAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: MOTIVATION: To facilitate accurate estimation of statistical significance of sequence similarity in profile–profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance. RESULTS: In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments. AVAILABILITY AND IMPLEMENTATION: LAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	MOTIVATION: To facilitate accurate estimation of statistical significance of sequence similarity in profile–profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance. RESULTS: In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments. AVAILABILITY AND IMPLEMENTATION: LAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Subject	Phylogenetics Wellcome Trust Protein structure Evolutionary biology Protein families Protein domains Protein superfamilies Computational science Statistical hypothesis testing Bioinformatics software
part of	LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins
is abstract of	LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins
is hasSource of	covid:ann/target/70dcc133cacb07cc4239643d4517d165834b57bd covid:ann/target/b08de0b69ec0973b8f9265806eb67debd811abd5 covid:ann/target/081f4a816a144740eb0cb34feaa41c6611816cfe covid:ann/target/86c5da4738c06d48ac75fd7de1d101b7ba36d443 covid:ann/target/049032d43cebaf228408e4e231eb0dcb8607fa22 covid:ann/target/8819c1d5acddb673bfdda94b399ca0090c9ec271 covid:ann/target/835bab6d912e2b935d35b354c99c9fbac0d9d0b4 covid:ann/target/4720f767e0edfef82f80d11b8ded1992520a34b9 covid:ann/target/c3c26fe0c30ce5a68905eb053965742e6acfb7af covid:ann/target/cbc178cb3c0d7dfa10cf97af9ebf86a4424e3958 covid:ann/target/78cfd84300e946d758515690f303089a90323768 covid:ann/target/89fed9e4d6b7ee7a724a48d2962439a4da7ba4dc covid:ann/target/1a4621cc7bc96e72ef1f82e5b3d4d43780ff70b4 covid:ann/target/bf24809d3093e6b76363af1b602cd9ace902e5a3 covid:ann/target/d559ea7028c845baf7b39b83fc7038dadb2eb90d covid:ann/target/73fd14629ca36adba43cb46025933ab8819e9545 covid:ann/target/b88ed16f8fd4c6f23120f26faaf5b0b699457e52 covid:ann/target/8d0a289cb9379ffc9f1458020b82f8ae3cfa3530 covid:ann/target/dad49800bd0a14039aa61e10a545e8d4c4343c6d covid:ann/target/fbe6b1fa181a6675bb2ea52654aea7e2bb3affba covid:ann/target/bb94d43399a8e2ca6acffd61bbd6642838499429 covid:ann/target/409c6efb9e3fa19de5f01ea755c15977595015e9 covid:ann/target/416215f25be1ec56423e0831f5e54e22b2187b55 covid:ann/target/703b961fb5bc75243e435239841e3457401677f2 covid:ann/target/68b89f282e1787ffc1259df091834a1c0df3865c covid:ann/target/a2f44d52753f0d9db0368ee2c4f44e4dc1eff937 covid:ann/target/a0c8bb75768867a4ac4ba405e27e26344de170df covid:ann/target/7adbccf7562b7b2afc766e32dae17ba80d281899 covid:ann/target/10e637ae15aacfcb5943c34ddcb5ecb9b64cde66 covid:ann/target/3ef07b39e36c86ed4ce6ada7629ac4b62ca01087 covid:ann/target/a8e93d6a7482e9a31dd04c457e5e5efbaa47ae79 covid:ann/target/05f3995607d661308c161b329f8d8b028a6a41de covid:ann/target/5b6597f821490d445b7db91e83f88912a4f3e6fc covid:ann/target/be3d521d1b96c782babfa63b166f9eee9e5e0d73 covid:ann/target/a9a2556ea7af66f555b1b1ee1b87a26d021a7d1d covid:ann/target/c3aec049b53ad9a0fb313d55c5b23bd0f0ac58c3 covid:ann/target/2b058d19f35eb4d499c1cbc6ec58f5b45918c1c4 covid:ann/target/c532290c2a5d2b1d775a8ec7cdde56339421ecf1 covid:ann/target/fff2064190ab0bebc5e4b628f76358f31c19b6af covid:ann/target/ce3788e9dc6646ada51c901947a7f5153ea5dd70 covid:ann/target/3d3d3566a9155c3bb2514f73c66908b9380af9c0 covid:ann/target/87fcdee16ae1776310d7e8e4276870cbe7de6f28 covid:ann/target/2f361e2fbd16f534b5688d6817072249663f37a2 covid:ann/target/5015a82d9da2e449a94863a1da5f7a769e3afcac covid:ann/target/3edefa1e5b1ed1f3e9a5198a0005206d0b75457b covid:ann/target/5467dfe06ff1c27c9c6735a66cb764b024313715 covid:ann/target/e519febc5b4c701c5ebc11d322dd48c380719660 covid:ann/target/9f493e8d95816b4b7bff7a2463ebea97874af330 covid:ann/target/a67a72ca068c58b7ccc2e5f0b6c01993a3992c0d covid:ann/target/628437095d576ba11f934268b9cb6af600060d4d covid:ann/target/94c58baf5c0033e7ea092330e4e86dd2d1e2f72c covid:ann/target/f2b40f4c2c878fbfe76b50612276770b38087817 covid:ann/target/804e9ef089692f4cf159da9b463c4e63b8c39e7a covid:ann/target/4ca166da6393660425c3edd62d5799eb562ec0bc covid:ann/target/2ca3ab2916cac39cefcf6f38dc24eb0bba663cfa covid:ann/target/268d21ecf22fefb4686da89af53e25a90977db02 covid:ann/target/19070f4ec16fa049b01ee54701ab68a8f1a6cb2e covid:ann/target/eb5057cb45193307192582792a07c0bc6e22e1f3 covid:ann/target/64d2349573f606cfdfc306d6123ed054132232a1 covid:ann/target/e63271ad75cc0e514fd0bcf52baa0118c7020bf9 covid:ann/target/5d093a2c689e73dbbbf8f4f76496ca383fa7f4a2 covid:ann/target/b1f8223cdc84faf091427541ad67241126725bed covid:ann/target/af6d99194068c0d924b2834667ae4bbcbd8cc0ac covid:ann/target/ebedf69174ebce1fdb062c98bbe3824f2544497c covid:ann/target/ee1e2234f63fc1f935e9e5ffb8d1164190675daf covid:ann/target/6dd89f1d52d3ceddac654f73843a55a88f4f07ab covid:ann/target/7492de46e1a4e121b7328e807d6e4e0b7aaf0250 covid:ann/target/1ad9d317c56bcb7696d8152e00bd921c8bc4acf9 covid:ann/target/31377783dc4968d8c61f2908d0240c6cf64e7229 covid:ann/target/7043263be8d9283a74309e6ed75ad2e68184ebcc covid:ann/target/68dd9976b8d894374506a93c0d9b7e98ca0369c7 covid:ann/target/18b68fa5b6cf4447a4242b539b2e41151b219f4b covid:ann/target/770edd38604f0240391df22800c2dd6dc2337d7e covid:ann/target/67b2a4aee35876d099e6b622eda26b7a00fdae4a covid:ann/target/10c99717799137b6da924259d6eab1334b224ecd covid:ann/target/3364ac919bc5031aba37265123e73e446c6a8bad covid:ann/target/2435bcae835021b00819c0cc67df3c3b8928439d covid:ann/target/472ba667f2c1353e1f2546df1872d0d867aac37c covid:ann/target/64d30c07ccecbc4a389c26c18f41726a3e9ef61a covid:ann/target/ea5d0801b8c33bf31b049944a35ef860206cc29e covid:ann/target/912e35833840ee7c40b0d427be320c0925716db8 covid:ann/target/c6902652db3206e50dcea7573c51fe823c14c60b covid:ann/target/1726ea4a9dd0d523c5836cb43e5f4ea642857e94 covid:ann/target/47730af09c8ba7e9cba194eb5167115e5efc5a22 covid:ann/target/3723e52bb64443768b871af4039227e1f427e982 covid:ann/target/7ed36a8c7da5f7a5b9c92d09db1c6408ea4902d7

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software