The ngs_mapper is a configurable pipeline next generation sequence pipeline that aims to be easy for the bioinformatician to install as well as use. It focuses on documentation as well as easy configuration and running. The pipeline is also meant to get you from data to completed genome as easy as possible.
The documentation is a work-in-progress as the aim is to keep it as much up-to-date as possible with the pipeline as it changes. If anything seems out of place or wrong please open a bug report on github
- Data Structure
- primer file
- Running the Pipeline
- Pipeline Info
- TODO List
- Continuous Delivery support added for travis
- nfilter will now simply symlink if no options are supplied essentially skipping itself
- nfilter utilizes threads from config file
- Froze versions of all dependencies to remove issues when authors update dependencies that cause unwanted side-effects
- config file now has THREADS default
- fix for bug where some miseq reads were not identified correctly in tagreads
- convert functions now support output directory
- bug fix for nfilter symlinking
- fix for qsub job output from runsample
- no longer name files with filtered. prefix
- Pipeline now works with fasta files using the –fasta flag
- Pipeline now handles gzip(.gz) input files
- Pipeline now handles ab1 input files
- Added Zenodo badge
- IGV is installed with pipeline
- samtools version reverted back to same version as pre-1.4.0
- Installation now utilizes miniconda to handle system dependencies such as bwa, samtools, trimmomatic, imagemagick. This is a substantial difference and will require a complete reinstall of the pipeline to upgrade. Miniconda installation removes a lot of code that needed to be maintained and streamlines the installation and makes it much faster.
- Added install.sh that makes installing/upgrading much easier. The tests also use this so the installation is tested much better now.
- Pipeline utilizes requirements-conda.txt to determine python+system software dependencies. This allows specifying versions and removes need for a system administrator to install.
- runsample now supports –primer-file option and other primer trimming options which will utilize trimmomatic’s ILLUMINACLIP option
- runsamplesheet.sh supports an optional additional column in a given samplesheet that represents the primer fasta file to use to find sequences to trim out.
- Pipeline now looks for amount of threads instead of cpu cores. This will mean that on systems with hyperthreading that 2x more samples will run in parallel than before.
- Fixed bug where some parts of pipeline were not logging at all
- Fixed bug where graphs.sh could fail, yet pipeline would continue as if nothing was wrong
- Updated functional tests to include primer test
- Updated functional tests to output more information
- Added ngs_filter stage/script that can filter based on index fastq files as well as reads that contain an N. This stage is off by default.
- Fixed a bug where some scripts were not logging properly
- Fixes documentation issue with umask for sync user
- Added travis-ci support to automatically run tests when code is pushed to github
- Projects now default to running inside of a temporary directory inside of the specified output directory(-od)
- runsample now sets TMPDIR to tmpdir inside of output directory so that all analysis is run within that directory
- runsample accepts –qsub_m and –qsub_l commands which will direct it to return a PBS qsub job that can be piped into qsub
- Added Python 2.6 support
- Removed all occurances of bqd.mpileup and replaced with samtools.mpileup
- Changed bqd.parse_pileup such that it utilizes samtools.MPileupColumn to generate the dictionary items
- Remove legacy BamCoverage code that is not used anywhere
- Added support to select reads by specific platforms in runsample.py
- Fixed bug where MiSeq Index reads were being included in the mapping
- Renamed unpaired read file name that is produced by trim_reads from a generic Roche454 read name to simply unpaired_trimmed.fastq
- Added reflen to qualdepth.json files since length only told you the length of the assembly and not the reference.
- Fixed issue where coverage graphic was not drawing gap lines at the end of references because there was no data.
- sample_coverage colors were hard to distinquish so they were changed
- Bug with sample_coverage where certain combinations of # of references and # of samples would generate a graphic where sub-plots for each reference were overlapping
- Fixed incorrect command in doc/README.rst for how to open documentation with Firefox
- Fixed issue with sample_coverage’s usage statement and arguments description
- Fixed issue when no reads mapped and graphsample.py would raise an exception
- Fixed an issue when there were directories inside of the path specified that contains read files
- Replaced all .py scripts with same name but without .py. This is the correct way to have binary scripts for python. Aka, runsample.py is now just runsample
- Documentation updates
- Platforms now identified via identifiers inside read files instead of filenames
- IonTorrent sync added
- Various bug fixes
- base_caller.py can now utilize multiple processes to speed up analysis
- Documentation now installs with the pipeline
- run_bwa no longer makes temp directory but instead uses output path