- Sat 20 September 2014
- Go
- #rosalind, #bioinformatics
About a year ago I took a wonderful MOOC called Bioinformatics Algorithms. The people behind this MOOC also developed stepic.org and rosalind.info. Project Rosalind is very similar to Project Euler, a website where you can find problems that are meant to be solved with some programming. You can find all the homework problems from the MOOC in rosalind.info under "Textbook Track".
The MOOC is going to be offered again this coming October, and next February the second part will be offered for the first time. Last year I used Python 2.x for all my MOOC work. Since I liked to MOOC so much, and I am a bit crazy, I decided to take it again this year. First I thought of translating the Python 2.x source code from last year to Python 3.x. However, this might be somewhat trivial (at some point I should switch to Python 3.x anyway). So instead, I have decided to adapt my Python 2.x source code to Go. Currently I know very little Go, but last year I knew very little Python and still managed to pass the MOOC. My goal this year is to learn as much as possible, both about bioinformatics, and about the Go programming language.
This week I started by establishing a workspace and workflow. The first thing I did was update my Go version. I use Ubuntu Linux, and the Go version that is available via apt-get
is version 1.2.1. Updating was surprisingly easy and painless. I followed the steps listed here. The only nontrivial step was adding this line to my .bashrc
file:
export PATH=$PATH:/usr/local/go/bin
The second thing I did was change my GOPATH
variable. Previously I used a folder in my Dropbox folder as my main Go directory. Recently I started using Copy since it provides more free space. Inside my Copy
directory I made a subdirectory called go
:
$ mkdir $HOME/Copy/go
Then I added the following line to my .bashrc
file:
export GOPATH=$HOME/Copy/go
After this I went back to the GOPATH
directory and made three subdirectories (following the suggestions from here):
$ cd $GOPATH
$ mkdir src pkg bin
Next I made a directory for my Rosalind work. Since I have a GitHub account, I used:
$ cd $GOPATH/src
$ mkdir github.com/meirizarrygelpi/rosalind
Being ambitious, I plan to eventually solve all the problems in Rosalind. For now I will concentrate on the problems in the Textbook Track:
$ cd $GOPATH/src/github.com/meirizarrygelpi/rosalind
$ mkdir textbook-track
This past week I solved the first six problems (1A, 1B, 1C, 1D, 1E, and 1F) and started working on 1G. For each problem you get a .txt
file with input, so inside the textbook-track
directory I made five subdirectories:
$ cd $GOPATH/src/github.com/meirizarrygelpi/rosalind/textbook-track
$ mkdir data output problems solvers week01
The week01
directory contains a .go
file with the week01
package: the functions that do the hard work. I am not going to give any spoilers by sharing the contents of that file! The solvers
directory contains the solvers
package: the functions that read the data and print the answer in a specific way. Inside the solvers
directory I have a .go
file with the following content:
package solvers
import (
"fmt"
"strings"
w01 "github.com/meirizarrygelpi/rosalind/textbook-track/week01"
)
func Solve_1A() {
var dna string
var k int
fmt.Scanln(&dna)
fmt.Scanln(&k)
fmt.Println(strings.Join(w01.MostFrequentExactKMers(dna, k), " "))
}
func Solve_1B() {
var dna string
fmt.Scanln(&dna)
fmt.Println(w01.ReverseComplement(dna))
}
func Solve_1C() {
var p, g string
fmt.Scanln(&p)
fmt.Scanln(&g)
fmt.Println(strings.Join(w01.ExactPatternIndex(p, g), " "))
}
func Solve_1D() {
var g string
var k, L, t int
fmt.Scanln(&g)
fmt.Scanln(&k, &L, &t)
fmt.Println(strings.Join(w01.LtClump(g, k, L, t), " "))
}
func Solve_1E() {
var g string
fmt.Scanln(&g)
fmt.Println(strings.Join(w01.MinGCSkew(g), " "))
}
func Solve_1F() {
var p, t string
var d int
fmt.Scanln(&p)
fmt.Scanln(&t)
fmt.Scanln(&d)
fmt.Println(strings.Join(w01.ApproximatePatternIndex(p, t, d), " "))
}
My plan is to keep the input .txt
files in the data
directory, and the output .txt
files in the output
directory. With Go you can get an executable file by installing your program with the go install
tool. Inside the problems
directory I have subdirectories of the form rosalind1*
where *
is a letter from A
to H
. As an explicit example, let me consider rosalind1A
. Inside this directory I have a .go
file with the following content:
package main
import (
sol "github.com/meirizarrygelpi/rosalind/textbook-track/solvers"
)
func main() {
sol.Solve_1A()
}
Back in the textbook-track
directory, I create a .sh
file with the following content:
#!/bin/bash
go install ./problems/rosalind1A
cat ./data/1A.txt | rosalind1A > ./output/1A.txt
This .sh
file allows me to automate the process of answering a Rosalind problem. The first line uses the go install
tool to make an executable file rosalind1A
. This executable file lives in the $GOPATH/bin
directory. In the second line, I take the contents of data/1A.txt
, pipe it to the executablerosalind1A
and send the outcome to a output/1A.txt
. A similar process is repeated for the other Rosalind problems. Note that using pipes and output redirection in the .sh
file avoids the need to read and write files with Go.
Just in case the discussion is not clear, here is the output of running tree
inside the textbook-track
directory:
$ tree
.
├── data
│ ├── 1A1.txt
│ ...
│ └── 1G2.txt
├── output
│ ├── 1A1.txt
│ ...
│ └── 1F2.txt
├── problems
│ ├── rosalind1A
│ │ └── 1A.go
│ ...
│ └── rosalind1H
│ └── 1H.go
├── solvers
│ └── solvers01.go
├── week01
│ └── week01.go
├── week02
│ └── week02.go
└── work.sh
14 directories, 38 files
It is a bit messy, but it works well so far.