# Candidacy Lecture: Introduction to Reoptimization

Unless P=NP, we cannot obtain a polynomial-time algorithm solving hard combinatorial problems. One practical approach in solving this kind of problem is to relax the condition of always finding the optimal solution for an instance and settle for “good enough” solutions. The kind of algorithms which are guaranteed to obtain a solution with a certain quality are called approximative algorithms. However, not all hard problems are approximable, i.e., we can obtain a polynomial-time algorithm that can guarantee the goodness of the solution for a problem.

In this lecture, we will present the concept of reoptimization. In this approach, given an instance I of some problem Π, an optimal solution OPT for Π in I, and a modified instance I’ resulting from a local perturbation of I, we wish to use OPT in order to solve Π in I’. With this additional information, reoptimization may help to improve the approximability of the problem or the running time of the solution to it. In fact, we can obtain a polynomial-time approximation scheme (PTAS) for a reoptimization variant of a problem given that the unmodified problem is approximable.

# Approximate Gene Cluster Discovery Problem (AGCDP)

Several genes across different genomes are grouped together as a result of functional dependencies. These set of genes that are kept together are called gene cluster. Identifying the set of clustered genes in the literature is widely studied in the fields of Biology and Computer Science. In the field of computation, biological problems as such are modelled as optimization problems.

A model introduced by Rahman et.al. is the Approximate Gene Cluster Discovery Problem (AGCDP) where the genes are treated as integers and genomes are treated as a string of integers.

The input of AGCDP is the following.

1. Set of integer strings which represent the genomes.
2. Number of genes expected to be in the cluster

The solution of the problem is a set of genes which minimizes a certain score. Details on the score computation is detailed in the paper by Rahman. They also presented an integer linear programming formulation of the problem.

# Randomized Computation

Lecture on Randomized Computation for our Computational Complexity class.

This lecture includes a short introduction to randomized algorithms.  It includes discussion on probabilistic Turing Machines (PTMs), TMs that make use of random number in its computation, as well as different complexity classes that it recognize. The complexity classes include the Bounded Error Probability Polynomial (BPP), RP, co-RP, and ZPP. Relationship of the said classes where also presented as well as their relationship to known complexity class such as P and NP.

# Installing GMAC

GMAC is a user-level library that implements an Asymmetric Distributed Shared Memory model to be used by CUDA programs. An ADSM model allows CPU code to access data hosted in accelerator (GPU) memory.

GMAC is being developed by the Operating System Group at the Universitat Politecnica de Catalunya and the IMPACT Research Group at theUniversity of Illinois

_________________________________________________________

To install gmac,

2. Unzip gmac-0.0.10.tar.gz using the following command

```gunzip gmac-0.0.10.tar.gz
tar -xvf gmac-0.0.10.tar```

3. Go inside the directory gmac-0.0.10 and make another directory build to separate the installation files from the source.

```cd gmac-0.0.10
mkdir build```

4. Go inside the directory build and run configuration file

```cd build
../configure```

5. Install using make all install command

`sudo make all install`

________________________________________________________________________________

I am installing gmac on a 32 bit UNIX machine with gcc 4.3 and CUDA 3.0.

After installing from the Makefile I encountered the following error

```/usr/bin/ld: cannot find -lcuda
collect2: ld returned 1 exit status
make[5]: *** [libgmac.la] Error 1
make[5]: Leaving directory `/home/imacbuntu/Projects/gmac-0.0.10/build/
src/gmac'
make[4]: *** [all-recursive] Error 1
make[4]: Leaving directory `/home/imacbuntu/Projects/gmac-0.0.10/build/
src/gmac'
make[3]: *** [all] Error 2
make[3]: Leaving directory `/home/imacbuntu/Projects/gmac-0.0.10/build/
src/gmac'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/home/imacbuntu/Projects/gmac-0.0.10/build/
src'
make[1]: *** [all] Error 2
make[1]: Leaving directory
`/home/imacbuntu/Projects/gmac-0.0.10/build/
src'
make: *** [all-recursive] Error 1```

The error has something to do with the library of cuda,

when I tried to link the libcuda.so in /usr/lib by performing the following command, installation of gmac became successful.

```sudo ln -s /usr/lib/nvidia-current/libcuda.so /usr/lib/libcuda.so

# Every great and…

Every great and deep difficulty bears in itself its own solution. It forces us to change our thinking in order to find it.
-Niels Bohr

# Motif Finding on GPUs using Random Projections

CLEMENTE, JHOIRENE. FINDING MOTIFS IN PARALLEL USING RANDOM PROJECTION ON GPUS (Under the direction of HENRY N. ADORNA, Ph.D.) Biological motifs are short patterns that have significant number of occurrences in the set of DNA sequences. These motifs are transcription binding sites that help regulate transcription and therefore gene expression. Detection of these patterns helps in gene function discovery and building regulatory networks. Mutations may occur at random positions of the genome and these patterns are also subject to modifications, making the problem more challenging. A variant called planted (l, d)-motif finding models the detection of these subtle motif patterns in the DNA. However, several algorithms used fail to recover most of the planted (l,d)-motifs. To address this problem, a hybrid algorithm was proposed in the literature which we will refer to as FMURP (Finding Motifs using Random Projection). It uses an initialization method called Projection to avoid being trapped in the local maxima and therefore increases the chance of getting the planted motifs. This algorithm is shown to have a high accuracy on solving motif finding and planted (l,d)-motif finding problem. This research presents a parallel algorithm and implementation of FMURP on Graphics Processing Units(GPUs) using CUDA. It also provides details on the implementation and optimizations done in GPU in order to minimize usage of space. The implementation called CUDA-FMURP was tested on randomly generated (l,d)-motif instances and is shown to have recovered majority of the planted motifs. It is also shown that CUDA-FMURP obtains a maximum speedup of 6.8 using a 512 core GPU with 2.0 compute capability.

# Analysis of Gene Expression Data

University of the Philippines Manila (UPM) invited us to talk about our research in Bioinformatics. I will discuss our research on visualization of yeast gene expression data.