Impact of Mark Duplicate Reads During Variant Calling in Next Generation Sequencing (NGS) Data of Pistacia vera L.

Harun KARCI, Salih KAFKAS

  •  Year : 2024
  •  Vol : 4
  •  No : 1
  •  Page : 11-18
Pistachio (Pistacia vera L.) is a member of Anacardiceae family and the only cultural form of Pistacia species. P. vera is a dioecious species and there are a few hermaphrodite and monoecious flower nature within Pistacia. Breeding of the pistachio is quite a long process due to several limiting factors such as dioecious flower habitat, quite long juvenile period and alternate bearing. Recently, pistachio genomes have been released with chromosomal level and genome size was about 600 Mb. In the current paper, Next Generation Sequencing (NGS) data of Siirt cultivar has been analyzed to detect the impact of the ignoring duplicates during variant calling stage. About 5.2 Gb data was utilized for detection of the short InDels and SNPs. The highest mapping rate was exhibited with 99.83% and about 35 million reads was aligned successfully reference map. Mapping quality and read coverage depth filtering were carried out MQ>30 and DP>2, respectively. Totally, 7.18% of the reads represented duplicate reads (2.5 million reads).  BAM file without MarkDuplicates (MD) was generated a total of 1,022,161 SNPs and 124,762 InDels, BAM file with MD produced a total of 1,050,788 SNPs and 128,109 InDels. Each VCF files were compared according to positions. Same and different (reference allele same but different alternate alleles) variants at the same positions were recorded separately. In addition, BAM file passing MD stage in variant calling were caused the loss of a total of 42,413 true negative loci (TNL) and the getting of a total 10,439 false positive loci (FPL). Therefore, MD is a significant phase of the variant calling all of the organisms and should be carried out to eliminate of false positive loci. The results of the present study can be beneficial for detection of the variants in the next breeding programs.
Cite this Article As : Karcı, H. & Kafkas, S. (2024). Impact of mark duplicate reads during variant calling in next generation sequencing (NGS) data of Pistacia vera L. Eregli Journal of Agricultural Sciences, 4(1), 11-18. https://doi.org/10.54498/ETBD.2024.29

Description : Yazarların hiçbiri, bu makalede bahsedilen herhangi bir ürün, aygıt veya ilaç ile ilgili maddi çıkar ilişkisine sahip değildir. Araştırma, herhangi bir dış organizasyon tarafından desteklenmedi.Yazarlar çalışmanın birincil verilerine tam erişim izni vermek ve derginin talep ettiği takdirde verileri incelemesine izin vermeyi kabul etmektedirler. None of the authors, any product mentioned in this article, does not have a material interest in the device or drug. Research, not supported by any external organization. grant full access to the primary data and, if requested by the magazine they agree to allow the examination of data.
Impact of Mark Duplicate Reads During Variant Calling in Next Generation Sequencing (NGS) Data of Pistacia vera L., Araştırma Makalesi,
, Vol. 4 (1)
Received : 03.01.2024, Accepted : 28.06.2024 , Published Online : 28.06.2024
Ereğli Tarım Bilimleri Dergisi
ISSN: ;
E-ISSN: 2822-4167 ;
index index index