Cypiripi

A tool for finding out the CYP2D6/CYP2D7 genotypes in the HTS reads


Introduction

Hmm… so what exactly is Cypiripi?

Cypiripi is a tool for exact genotyping of CYP2D6 using High Throughput Sequencing Data. A paper describing the algorithm is about to be published in ISMB 2015.

How do I get Cypiripi?

Just clone our repository to get the latest binary:

git clone https://github.com/sfu-compbio/cypiripi.git

DISCLAIMER : Due to the CPLEX licencing restrictions and fact that CPLEX is statically linked to the binary, this software is only available for ACADEMIC use. It is strictly forbidden to use it for any commercial purpose, unless explicitly allowed in writing by the authors. This might change in the future, though. Thank you!

How do I run Cypiripi?

Ideally, just invoke cypiripi.py with the following options:

python cypiripi.py --fasta reference --fastq [mygenome.fastq] --cov [coverage]

where:

  • --fasta is the prefix of a pre-processed CYP2D6 reference. One reference is already provided in the package, and it is named CYP2D6.
  • --fastq is your interleaved and paired .fastq file
  • --cov is the coverage per chromosome. E.g. for 40x sample, --cov should be 20.

Please check next section for more details and requirements.

Requirements

  1. Wrapper script needs at least Python 2.7 in order to run.
  2. mrfast and mrsfast should be located within the PATH in order for wrapper script to complete.
  3. Binary has been compiled on CentOS 5.x with gcc 5.2, and it might not work with older distributions.

Caveats

While genotyping is usually very fast, mapping can take a lot of time. In addition to this, mrfast (unlike mrsfast) is known to load all of the reads in memory, so it can create problems for a sample with large FASTQ files. It is recommended that you split large FASTQ files to smaller ones (we usually use 20,000,000 lines per FASTQ file in our test configuration).

The provided script assumes that you will invoke the mapping on only one (small) FASTQ file. You can pass --sam [mymap.sam] parameter to the wrapper script instead of --fastq parameter if you want to map the FASTQ files manually. Please note that both mymap.sam and mymap.sam.paired need to be present for Cypiripi in order to produce correct results.

Lines 128–141 in cypiripi.py contain the commands and parameters used for generation of necessary SAM files

DISCLAIMER: Cypiripi is not yet intended to be used with variable coverage data (e.g. PGRNSeq or similar technologies). It might work, but it is not guaranteed to produce a correc results. Also, if your coverage is very high (>300x), Cypiripi might require a large amounts of RAM. We’re currently working on resolving these issues.

Usage of Cypiripi binary

Parameter explanation

  • -f [reference file]

    Pre-processed gene reference file. Use reference.combined.align provided in the package for CYP2D6/CYP2D7 gene.

  • -s [SAM file]

    Input SAM (mapping) file.

  • -p [paired SAM file]

    Specify the input SAM file containing the paired-end mappings.

    Note: For every read, its pair mapping should appear in the line after read’s mapping (e.g. lines should be as R1/1, R1/2, R2/1, R2/2 …). Do not sort your mapping files by the mapping coordinate, otherwise Cypiripi will fail. This is done by default by mrfast and mrsfast.

    Default: SAM file with .paired suffix

  • -E [exclusions]

    A file containing the read names of the reads to be excluded (e.g. CYP2D8-originating reads).

    Default: SAM file with .discard suffix

  • -C [coverage]

    Expected coverage per chromosome for sample.

    Default: 20

  • -T [threshold]

    Expected minimum cut-off threshold. We recommend 25% or 30% of the coverage.

Support

Contact & Support

Feel free to drop any inquiry to inumanag at sfu dot oh canada. Since this software is still in beta stage and thus unstable, we will be glad to troubleshoot any problem you might encounter!

Authors

Cypiripi has been brought to you by:

from the Lab for Computational Biology at Simon Fraser University.

Funding

  • NSERC Discovery Grant

    NSERC

  • Vanier Canada Graduate Fellowships

    Vanier

Licence

Copyright (c) 2014, 2015, Simon Fraser University, Indiana University Bloomington. All rights reserved. Redistribution and use in binary forms, without modification, is permitted provided that the following conditions are met:

  • This software shall NOT BE USED in any commercial environment, unless explicitely allowed by the authors in the writing.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Simon Fraser University, Indiana University Bloomington nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Release notes

  • (15-Apr-2015) Cypiripi version 1.0 release
    • Initial public release