Several genes across different genomes are grouped together as a result of functional dependencies. These set of genes that are kept together are called gene cluster. Identifying the set of clustered genes in the literature is widely studied in the fields of Biology and Computer Science. In the field of computation, biological problems as such are modelled as optimization problems.
A model introduced by Rahman et.al. is the Approximate Gene Cluster Discovery Problem (AGCDP) where the genes are treated as integers and genomes are treated as a string of integers.
The input of AGCDP is the following.
- Set of integer strings which represent the genomes.
- Number of genes expected to be in the cluster
The solution of the problem is a set of genes which minimizes a certain score. Details on the score computation is detailed in the paper by Rahman. They also presented an integer linear programming formulation of the problem.