We are specialized in processing of raw sequencing reads, their cleanup from adapters, primers, barcodes and various artefacts. The corrected data assemble better in de novo assemblies but also, map better to a reference sequence. Our custom software was tested on the following types of data.
Supported IonTorrent datasets
From our perspective, two groups of IonTorrent/IonProton datasets exist.
- IonTorrent took over adapters used in 454 protocols (you know it was started by former 454 Life Sciences employees?) and therefore, our pipeline crafted to work on 454-based data can smoothly process these IonTorrent datasets as well and remove forgotten adapters, their remnants and other artefacts.
- As an example, there is a project SRP010796 Salmonella montevideo strain MB110209-0055 which consists of several datasets while one of them is stored under acession SRR408493 (IonTorrentPGM sequencing, 520 flows during sequencing). As already mentioned, the adapters used were principially same as in the Roche/454 Titanium General library protocol. The sample was processed using Ion Torrent fragment library kit III. For platform comparison authors also generated Roche GS FLX data (SRR359782).
- Loman et al., 2012 used IonTorrentPGM machine with 260 sequencing flows for SRP009823 project of Escherichia coli O104:H4 STEC strain 280, with sequence files SRR389193, SRR389194. Again, the adapters were same as in the Roche/454 Titanium General library protocol. For platform comparison they also generated Illumina MiSeq data (SRX111764) and Roche GS FLX/Junior data (SRA048574).
- Barcoded datasets from IonTorrentPGM, 520 flows during sequencing:…
- SRP039521 Listeria fleischmannii FSL S10-1203, Ion Xpress Plus Fragment Library Kit (200 bp) for library preparation and then barcoded adapters were ligated afterwards
- SRP039516 Listeriaceae bacterium FSL S10-1204, Ion Xpress Plus Fragment Library Kit (200 bp) for library preparation and then barcoded adapters were ligated afterwards
- DRP000735 Clostridium saccharogumia VE202-01
- More examples: Staphylococcus aureus ST93-MRSA-IV isolates (SRP004474), …
- IonProton datasets with 640 sequencing flows (SRP039519 Listeria weihenstephanensis FSL R9-0317 and in SRP039517 Listeria rocourtiae FSL F6-920)
- In some cases when custom adapters were used we can still help with for example transcriptomic datasets from samples pre-processed via Clontech/Evrogen/full-length cDNA procols and maybe via some other as well. Please refer to the pre-processing protocols section of Supported sample pre-processing methods page. In brief, these transcriptomic protocols introduce zillions of adapters/artefacts into the “sample” sequence and it is a huge mess. It does not matter whether the “sample” decorated by these adapters/artefacts was later sequenced using one or another technology. For the sequencing itself the technology-specific adapters are ligated onto the “sample” (which we do NOT look for using our software but use some 3rd party tools), but the inner cassette with the “decorated sample” is just same. Therefore, removal of the decorating, inner-most adapters/artefacts surrounding the original sample insert obviously also removes those sequencing-technology-specific adapters (which are located more outwards). So, we can clean these datasets as well.
We offer data cleanup, error-correction, normalization, cleanup, assembly, reference mapping, variant calling, annotation.