Pfam curation tools

Contents:

Requirements

  • A UNIX/Linux/Apple computer with a 64-bit architecture (e.g x86_64) with administrative rights

For unsupported architectures the docker image will have to be built from a Dockerfile.

Preparation

Install Docker

You will need admininstration privileges to do this (or have a sys admin do it). On Debian/Ubuntu flavour machines, install the package docker.io:

sudo apt-get install docker.io

Ensure user is a member of the docker group:

groups

and add docker if that does not appear in the output above.

sudo adduser USERNAME docker

Download docker image

Download the latest version of the Pfam curation tools container from Dockerhub:

docker pull dockerhub.ebi.ac.uk/pfam/pfam-curation

Set up computer

Choose a directory on your machine where you will run the container; this can be anywhere you have write access but it’s important that you always use this location.

mkdir pfam_curation
cd pfam_curation

Within that directory create two additional directories: one will contain the sequence files required by the pfam tools and the other will be a working directory:

mkdir seqlib
mkdir pfam_data

Also, create a pfam.conf file in the pfam_curation directory. You can use the template.

Obtain the pfamseq (and, optionally) the uniprot fasta file. Either download the pfamseq.gz (and uniprot.gz) files from Pfam ftp or run the download_pfamseq.sh script.

bash download_pfamseq.sh

If downloading manually you will need to move the files to the seqlib directory and uncompress.

In the directory above pfam_curation directory, get a checkout of the Pfam dictionary:

svn co https://xfamsvn.ebi.ac.uk/svn/pfam/trunk/Data/Dictionary/

Running the container

docker run --rm -it -v $(pwd)/pfam_data:/home/pfam/pfam_data -v $(pwd)/pfam.conf:/home/pfam/pfam.conf -v $(pwd)/seqlib:/data/seqlib -v $(pwd)/Dictionary/dictionary:/home/pfam/Dictionary/dictionary -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix dockerhub.ebi.ac.uk/pfam/pfam-curation

This will give you a command prompt inside the container where you will have access to all the Pfam curation tools and scripts.

The first time you run you will need to index the sequence file(s) you downloaded previously:

esl-sfetch --index /data/seqlib/pfamseq
esl-sfetch --index /data/seqlib/uniprot

To exit the container, type control-D.

If you use the directory pfam_data to download and build families, then the files in this directory will be preserved if you exit the container and re-run later. They will also be accessible from the host computer in the directory of that name.

The command above should work on Linux-based machines and will permit applications such as belvu to run correctly. On a Mac the command above will require some modification. If a workaround does not exist, it’s possible to install and run these programs from the host computer. If you change to the pfam_data directory first it will be equivalent to running the command within the container.

Troubleshooting

If you see an error similar to this:

Temporary failure in name resolution: Unable to connect to a repository at URL 'https://xfamsvn.ebi.ac.uk/svn/pfam/trunk/Data/Families/PF00023' at /opt/Pfam/PfamLib/Bio/Pfam/SVN/Client.pm line 232

This means your Docker container is unable to access your network. To resolve the issue, you can either disable dnsmasq in NetworkManager, or specify a DNS server for Docker to use.

To disable dnsmasq, comment out the dns line (dns=dnsmasq line -> #dns=dnsmasq) in /etc/NetworkManager/NetworkManager.conf and then restart the network-manager service:

sudo vi /etc/NetworkManager/NetworkManager.conf
sudo service network-manager restart

To specify a DNS server, add your network’s DNS server to /etc/docker/daemon.json, and then restart the docker service:

sudo vi /etc/docker/daemon.json
sudo service docker restart

An example DNS line is given below:

{
  "dns": ["10.0.0.2", "8.8.8.8"]
}