Many biological studies these days are based on reading millions of DNA sequences in parallel. This approach, known as high-throughput sequencing, has revolutionized the biological world in enabling many molecular measurements in a relatively short time and low cost. The accumulation of data produced by these techniques gave rise to many computational challenges. Almost all of these challenges are solved by indexing DNA sequences using short words found inside each sequence by pre-defined rules. Recently, we introduced new rules to choose these index words, which lead to improved choice of words for many DNA sequence analysis tasks, but the method to find these words inside a sequence are still inefficient. In this proposal, we will develop new algorithms to generate improved rules to choose index words in a DNA sequence more efficiently, and demonstrate their use in high-throughput DNA sequence analysis tasks to reduce their runtime and memory usage. The successful completion of the project will pave way to improving many more sequence analysis tasks, thus having a large impact on the field of bioinformatics.
|Effective start/end date
|1/01/20 → …
- United States-Israel Binational Science Foundation (BSF)