CLEMENTE, JHOIRENE. FINDING MOTIFS IN PARALLEL USING RANDOM PROJECTION ON GPUS (Under the direction of HENRY N. ADORNA, Ph.D.) Biological motifs are short patterns that have significant number of occurrences in the set of DNA sequences. These motifs are transcription binding sites that help regulate transcription and therefore gene expression. Detection of these patterns helps in gene function discovery and building regulatory networks. Mutations may occur at random positions of the genome and these patterns are also subject to modifications, making the problem more challenging. A variant called planted (l, d)-motif finding models the detection of these subtle motif patterns in the DNA. However, several algorithms used fail to recover most of the planted (l,d)-motifs. To address this problem, a hybrid algorithm was proposed in the literature which we will refer to as FMURP (Finding Motifs using Random Projection). It uses an initialization method called Projection to avoid being trapped in the local maxima and therefore increases the chance of getting the planted motifs. This algorithm is shown to have a high accuracy on solving motif finding and planted (l,d)-motif finding problem. This research presents a parallel algorithm and implementation of FMURP on Graphics Processing Units(GPUs) using CUDA. It also provides details on the implementation and optimizations done in GPU in order to minimize usage of space. The implementation called CUDA-FMURP was tested on randomly generated (l,d)-motif instances and is shown to have recovered majority of the planted motifs. It is also shown that CUDA-FMURP obtains a maximum speedup of 6.8 using a 512 core GPU with 2.0 compute capability.
Oh no.. perfect gif to describe me while working on my thesis