TY - JOUR
T1 - BacPaCS-Bacterial Pathogenicity Classification via Sparse-SVM
AU - Barash, Eran
AU - Sal-Man, Neta
AU - Sabato, Sivan
AU - Ziv-Ukelson, Michal
N1 - Funding Information:
The research of N.S. was partially supported by the Israel Science Foundation [grant number 559/15]. The research of S.S. and E.B. was partially supported by the Israel Science Foundation [grant number 555/15]. The research of M.Z.-U. and E.B. was partially supported by the Israel Science Foundation [grant number 179/14 and Grant No. 939/18].
Publisher Copyright:
© The Author(s) 2018.
PY - 2019/6/1
Y1 - 2019/6/1
N2 - Motivation: Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. Results: We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-acterial Pathogenicity Classification via sparse-SVM'BacPaCS) which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported.
AB - Motivation: Bacterial infections are a major cause of illness worldwide. However, most bacterial strains pose no threat to human health and may even be beneficial. Thus, developing powerful diagnostic bioinformatic tools that differentiate pathogenic from commensal bacteria are critical for effective treatment of bacterial infections. Results: We propose a machine-learning approach for classifying human-hosted bacteria as pathogenic or non-pathogenic based on their genome-derived proteomes. Our approach is based on sparse Support Vector Machines (SVM), which autonomously selects a small set of genes that are related to bacterial pathogenicity. We implement our approach as a tool-acterial Pathogenicity Classification via sparse-SVM'BacPaCS) which is fully automated and handles datasets significantly larger than those previously used. BacPaCS shows high accuracy in distinguishing pathogenic from non-pathogenic bacteria, in a clinically relevant dataset, comprising only human-hosted bacteria. Among the genes that received the highest positive weight in the resulting classifier, we found genes that are known to be related to bacterial pathogenicity, in addition to novel candidates, whose involvement in bacterial virulence was never reported.
UR - http://www.scopus.com/inward/record.url?scp=85068411453&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/bty928
DO - 10.1093/bioinformatics/bty928
M3 - Article
C2 - 30407484
AN - SCOPUS:85068411453
SN - 1367-4803
VL - 35
SP - 2001
EP - 2008
JO - Bioinformatics
JF - Bioinformatics
IS - 12
ER -