Motivation: A large fraction of open reading frames (ORFs) identified as 'hypothetical' proteins correspond to either 'conserved hypothetical' proteins, representing sequences homologous to ORFs of unknown function from other organisms, or to hypothetical proteins lacking any significant sequence similarity to other ORFs in the databases. Elucidating the functions and three-dimensional structures of such orphan ORFs, termed ORFans or poorly conserved ORFs (PCOs), is essential for understanding biodiversity. However, it has been claimed that many ORFans may not encode for expressed proteins. Results: A genome-wide experimental study of 'paralogous PCOs' in the halophilic archaea Halobacterium sp. NRC-1 was conducted. Paralogous PCOs are ORFs with at least one homolog in the same organism, but with no clear homologs in other organisms. The results reveal that mRNA is synthesized for a majority of the Halobacterium sp. NRC-1 paralogous PCO families, including those comprising relatively short proteins, strongly suggesting that these Halobacterium sp. NRC-1 paralogous PCOs correspond to true, expressed proteins. Hence, further computational and experimental studies aimed at characterizing PCOs in this and other organisms are merited. Such efforts could shed light on PCOs' functions and origins, thereby serving to elucidate the vast diversity observed in the genetic material.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics