We provide three types of datasets that each satisfy two of the three desired properties: synthetic (human and comprehensive), mouse (real and comprehensive) and sampled human (human and real).
Click the file type links to see the individual files. Then, click the links to download, or copy the url to download through wget or your preferred utility.
Venter: Synthetic reads generated by TVSim using Venter's variants from HuRef.
Contaminated Venter: The Venter reads, contaminated with synthetic reads produced from Watson's variants at a rate of 10%.
Mouse: short reads from the homozygous mouse strain C57BL/6
Contaminated Mouse: B6 mouse reads contaminated with NA12878 reads generated by the same sequencer at a rate of 10%
NA12878: High coverage reads for NA12878
Contaminated NA12878: NA12878 contaminated with reads from NA12877 at a rate of 10%
NA18507: High coverage reads from NA18507
NA19240: High coverage reads for NA19240