M.E. Irizarry-Gelpí

Physics impostor. Mathematics interloper. Husband. Father.

My Go Workspace for Rosalind


About a year ago I took a wonderful MOOC called Bioinformatics Algorithms. The people behind this MOOC also developed stepic.org and rosalind.info. Project Rosalind is very similar to Project Euler, a website where you can find problems that are meant to be solved with some programming. You can find all the homework problems from the MOOC in rosalind.info under "Textbook Track".

The MOOC is going to be offered again this coming October, and next February the second part will be offered for the first time. Last year I used Python 2.x for all my MOOC work. Since I liked to MOOC so much, and I am a bit crazy, I decided to take it again this year. First I thought of translating the Python 2.x source code from last year to Python 3.x. However, this might be somewhat trivial (at some point I should switch to Python 3.x anyway). So instead, I have decided to adapt my Python 2.x source code to Go. Currently I know very little Go, but last year I knew very little Python and still managed to pass the MOOC. My goal this year is to learn as much as possible, both about bioinformatics, and about the Go programming language.

This week I started by establishing a workspace and workflow. The first thing I did was update my Go version. I use Ubuntu Linux, and the Go version that is available via apt-get is version 1.2.1. Updating was surprisingly easy and painless. I followed the steps listed here. The only nontrivial step was adding this line to my .bashrc file:

export PATH=$PATH:/usr/local/go/bin

The second thing I did was change my GOPATH variable. Previously I used a folder in my Dropbox folder as my main Go directory. Recently I started using Copy since it provides more free space. Inside my Copy directory I made a subdirectory called go:

$ mkdir $HOME/Copy/go

Then I added the following line to my .bashrc file:

export GOPATH=$HOME/Copy/go

After this I went back to the GOPATH directory and made three subdirectories (following the suggestions from here):

$ cd $GOPATH
$ mkdir src pkg bin

Next I made a directory for my Rosalind work. Since I have a GitHub account, I used:

$ cd $GOPATH/src
$ mkdir github.com/meirizarrygelpi/rosalind

Being ambitious, I plan to eventually solve all the problems in Rosalind. For now I will concentrate on the problems in the Textbook Track:

$ cd $GOPATH/src/github.com/meirizarrygelpi/rosalind
$ mkdir textbook-track

This past week I solved the first six problems (1A, 1B, 1C, 1D, 1E, and 1F) and started working on 1G. For each problem you get a .txt file with input, so inside the textbook-track directory I made five subdirectories:

$ cd $GOPATH/src/github.com/meirizarrygelpi/rosalind/textbook-track
$ mkdir data output problems solvers week01

The week01 directory contains a .go file with the week01 package: the functions that do the hard work. I am not going to give any spoilers by sharing the contents of that file! The solvers directory contains the solvers package: the functions that read the data and print the answer in a specific way. Inside the solvers directory I have a .go file with the following content:

package solvers

import (
    "fmt"
    "strings"

    w01 "github.com/meirizarrygelpi/rosalind/textbook-track/week01"
)

func Solve_1A() {
    var dna string
    var k int
    fmt.Scanln(&dna)
    fmt.Scanln(&k)
    fmt.Println(strings.Join(w01.MostFrequentExactKMers(dna, k), " "))
}

func Solve_1B() {
    var dna string
    fmt.Scanln(&dna)
    fmt.Println(w01.ReverseComplement(dna))
}

func Solve_1C() {
    var p, g string
    fmt.Scanln(&p)
    fmt.Scanln(&g)
    fmt.Println(strings.Join(w01.ExactPatternIndex(p, g), " "))
}

func Solve_1D() {
    var g string
    var k, L, t int
    fmt.Scanln(&g)
    fmt.Scanln(&k, &L, &t)
    fmt.Println(strings.Join(w01.LtClump(g, k, L, t), " "))
}

func Solve_1E() {
    var g string
    fmt.Scanln(&g)
    fmt.Println(strings.Join(w01.MinGCSkew(g), " "))
}

func Solve_1F() {
    var p, t string
    var d int
    fmt.Scanln(&p)
    fmt.Scanln(&t)
    fmt.Scanln(&d)
    fmt.Println(strings.Join(w01.ApproximatePatternIndex(p, t, d), " "))
}

My plan is to keep the input .txt files in the data directory, and the output .txt files in the output directory. With Go you can get an executable file by installing your program with the go install tool. Inside the problems directory I have subdirectories of the form rosalind1* where * is a letter from A to H. As an explicit example, let me consider rosalind1A. Inside this directory I have a .go file with the following content:

package main

import (
    sol "github.com/meirizarrygelpi/rosalind/textbook-track/solvers"
)

func main() {
    sol.Solve_1A()
}

Back in the textbook-track directory, I create a .sh file with the following content:

#!/bin/bash
go install ./problems/rosalind1A
cat ./data/1A.txt | rosalind1A > ./output/1A.txt

This .sh file allows me to automate the process of answering a Rosalind problem. The first line uses the go install tool to make an executable file rosalind1A. This executable file lives in the $GOPATH/bin directory. In the second line, I take the contents of data/1A.txt, pipe it to the executablerosalind1A and send the outcome to a output/1A.txt. A similar process is repeated for the other Rosalind problems. Note that using pipes and output redirection in the .sh file avoids the need to read and write files with Go.

Just in case the discussion is not clear, here is the output of running tree inside the textbook-track directory:

$ tree
.
├── data
│   ├── 1A1.txt
│   ...
│   └── 1G2.txt
├── output
│   ├── 1A1.txt
│   ...
│   └── 1F2.txt
├── problems
│   ├── rosalind1A
│   │   └── 1A.go
│   ...
│   └── rosalind1H
│       └── 1H.go
├── solvers
│   └── solvers01.go
├── week01
│   └── week01.go
├── week02
│   └── week02.go
└── work.sh

14 directories, 38 files

It is a bit messy, but it works well so far.