About: AIM AND OBJECTIVE: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. METHODS: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. RESULTS: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82-33.85% in terms of F1M. CONCLUSION: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.

Facets (new session)
Description
Metadata
Settings
- owl:sameAs
- Inference Rule:

About: AIM AND OBJECTIVE: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. METHODS: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. RESULTS: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82-33.85% in terms of F1M. CONCLUSION: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins. Goto Sponge NotDistinct Permalink

An Entity of Type : fabio:Abstract, within Data Space : wasabi.inria.fr associated with source document(s)

Attributes	Values
type	abstract
value	AIM AND OBJECTIVE: The rapid increase in the amount of protein sequence data available leads to an urgent need for novel computational algorithms to analyze and compare these sequences. This study is undertaken to develop an efficient computational approach for timely encoding protein sequences and extracting the hidden information. METHODS: Based on two physicochemical properties of amino acids, a protein primary sequence was converted into a three-letter sequence, and then a graph without loops and multiple edges and its geometric line adjacency matrix were obtained. A generalized PseAAC (pseudo amino acid composition) model was thus constructed to characterize a protein sequence numerically. RESULTS: By using the proposed mathematical descriptor of a protein sequence, similarity comparisons among β-globin proteins of 17 species and 72 spike proteins of coronaviruses were made, respectively. The resulting clusters agreed well with the established taxonomic groups. In addition, a generalized PseAAC based SVM (support vector machine) model was developed to identify DNA-binding proteins. Experiment results showed that our method performed better than DNAbinder, DNA-Prot, iDNA-Prot and enDNA-Prot by 3.29-10.44% in terms of ACC, 0.056-0.206 in terms of MCC, and 1.45-15.76% in terms of F1M. When the benchmark dataset was expanded with negative samples, the presented approach outperformed the four previous methods with improvement in the range of 2.49-19.12% in terms of ACC, 0.05-0.32 in terms of MCC, and 3.82-33.85% in terms of F1M. CONCLUSION: These results suggested that the generalized PseAAC model was very efficient for comparison and analysis of protein sequences, and very competitive in identifying DNA-binding proteins.
Subject	Algorithms Protein structure Alternatives to animal testing Computational fields of study Scientific modeling Stereochemistry Virtual reality Mathematical logic Theoretical computer science Computational science Simulation software Game theory Board game gameplay and terminology Perfect competition
part of	Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation
is abstract of	Protein Sequence Comparison and DNA-binding Protein Identification with Generalized PseAAC and Graphical Representation
is hasSource of	covid:ann/target/0d047ba37d79f85fd9f904fe41ac085c291433bd covid:ann/target/a5c88c9bdfb8ff19eabbde8023f41b2fc6d0524f covid:ann/target/d2037c095f2b92bf443f1f3c622019d3be9fda1d covid:ann/target/da9109478d066f5b70ca0c385e1e4f5badf4a392 covid:ann/target/9bd61b4edcf87d121546da41696c04f8aa5068f9 covid:ann/target/a6f00ac19551e43014390fbdcd74b41e4075e683 covid:ann/target/b50d5ba79eb8e28761f566085907d3145ff21565 covid:ann/target/e05e44da5ff8af081e1e58cafc5e2c289e6bc5ad covid:ann/target/f34532b418aa26c9aacc68feddb76910e85a79bb covid:ann/target/19f32465b8eaa1ea8b16b4bb8707352f4a2d5489 covid:ann/target/4cc9b025d08d06332492826e80112c6869e5ba21 covid:ann/target/62d8aba8d31af89f432039c91b699c88337742c0 covid:ann/target/4b8502854e9f9b0a8e9a510641447f2d3e6117d1 covid:ann/target/57840ebc109a1e4781205af58c7e9646f2fb612f covid:ann/target/477ab3a825b4ed24c59d0e8f711fff125672a735 covid:ann/target/1791813481dc7ce7c8245ec9fa6dea3ad302c447 covid:ann/target/e9e0d30880956c4e06e4b373a3a18ec629e2be90 covid:ann/target/226c1d97b6091b074318c193de6b5337a72547f8 covid:ann/target/7a5ab65a39f2469da902ee8c932523321d823fbc covid:ann/target/85c372021d1416b7b4798c6e7822dda4fda0ee85 covid:ann/target/b09d13a484c50955d6b9511fa59784b157f77f03 covid:ann/target/f4e6cf36abf906bf645547e099e4637e190a9b4c covid:ann/target/c4d3ff5b566d32bb4c06cd4ea55d693470067625 covid:ann/target/71edc006ed875f02a0566a679df0398024780458 covid:ann/target/320aaee8d8540af528081a3d103b9b323451afb1 covid:ann/target/70a412cdde60d01ea40823e81f5fdbbc927cf5c4 covid:ann/target/497b649f68101a198c79ba9816962ae6a786a4ef covid:ann/target/d6bd9dcddf3dc1a21d122f1b8bdc70ebcc01f32e

Faceted Search & Find service v1.13.91 as of Mar 24 2020

Alternative Linked Data Documents: Sponger | ODE Content Formats:

RDF

ODATA

Microdata

About

OpenLink Virtuoso version 07.20.3229 as of Jul 10 2020, on Linux (x86_64-pc-linux-gnu), Single-Server Edition (94 GB total memory)
Data on this page belongs to its respective rights holders.
Virtuoso Faceted Browser Copyright © 2009-2024 OpenLink Software