DeeZ

A tool for compressing SAM/BAM files


Introduction

So what is DeeZ?

DeeZ (a.k.a. DeeNA-Zip) is a tool for compressing SAM/BAM files, or more formally, a tool which does reference-based compression by local assembly. DeeZ was publised in Nature Methods in November 2014.

How do I get DeeZ?

Just clone our repository and issue make command:

git clone https://github.com/sfu-compbio/deez.git
cd dz && make -j

Alternatively, use:

git clone https://bitbucket.org/compbio/dz.git
git clone git://git.code.sf.net/p/deez-compression/deez deez-compression-deez

If you don’t have git, you can always fetch pre-packaged DeeZ archives:

Note: You will need at least g++ 4.4 to compile the sources.

How do I use DeeZ?

DeeZ is invoked as following:

Compression

deez -r [reference] [input.sam] -o [output]

This will compress input.sam to input.sam.dz.

Decompression

deez -r [reference] [input.dz] -o [output] ([region])

This will decompress input.dz to input.dz.sam. [region] is optional.

Random Access

You can also specify the region of interest while decompressing (i.e. randomly access the region). For example, to extract some reads from chr16 to standard output, you should run:

deez -r [reference] input.dz  -c chr16:15000000-16000000

Don’t forget to cite us if you use it in your research :)

Usage

Parameter explanation

  • --threads, -t [number]

    Set up the number of threads DeeZ may use for compression and decompression.

    Default value: 4

  • --header, -h

    Outputs the SAM header.

  • --reference, -r [file|directory]

    Specify the FASTA reference file.

    Note: Chromosome names in the SAM and FASTA files must match. Also, instead of one big FASTA file, DeeZ supports reference lookup in the given directory for chr*.fa files, where chr* is the chromosome ID from the SAM file.

  • --force, -!

    Force overwrite of exiting files.

  • --stdout, -c

    Compress/decompress to the stdout.

  • --output, -o [file]

    Compress/decompress to the file.

  • --lossy, -l

    Set lossy parameter for quality lossy encoding (for more information, please check SCALCE).

  • --quality, -q [mode]

    If mode is 1 or samcomp, DeeZ will use sam_comp quality model to encode the qualities. Quality random access is not supported on those files.

  • --withflag, -f [flag]

    Decompress only mappings which have flag bits set.

  • --withoutflag, -F [flag]

    Decompress only mappings which do not have flag bits set.

  • --stats, -S

    Display mapping statistics (needs DeeZ file as input).

  • --sort, -s

    Sort the input SAM/BAM file by mapping location.

  • --sortmem, -M [size]

    Maximum memory used for sorting.

    Default value: 1G

Support

Contact & Support

Feel free to drop any inquiry to inumanag at sfu dot oh canada or fhach at sfu dot oh canada.

Authors

DeeZ has been brought to you by:

from the Lab for Computational Biology at Simon Fraser University.

Funding

  • NSERC Discovery Grant

    NSERC

  • Vanier Canada Graduate Fellowships

    Vanier

Licence

Copyright (c) 2013, 2014, Simon Fraser University, Indiana University Bloomington. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the Simon Fraser University, Indiana University Bloomington nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.