Data access
Last update:
1372 goats - November 9th 2020.
Data access (authentification required):
Shared Data.
8 directories available:
- The Genome directory contains:
- genome FASTA file (29 chromosomes + MT + 29 877 scaffolds = 29 907 sequences)
- DICT file
- tab-delimited text file (chromosome_name, length) sorted by length.
- The BQSR directory contains the VCF file used to GATK base recalibration
- The VQSR directory contains all the VCF files used to GATK variant recalibration (see README for details)
- The Vargoats_1159_20201109 directory contains:
- VCF file for indels (VQSR filtered) - 13 607 850 indels
- VCF file for snps (VQSR filtered) - 114 244 623 snps
- 30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 74 274 427 snps
- MD5 file to check the integrity of the download process.
- The Vargoats_1159_20201109_filtered_20210201 directory contains:
- 30 splitted by chromosome VCF files (see README for filtering parameters) - 26 751 539 snps
- MD5 file to check the integrity of the download process.
- The Vargoats_1372_20201008 directory contains:
- VCF file for indels (VQSR filtered) - 13 820 197 indels
- 30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 77 280 295 snps
- MD5 file to check the integrity of the download process.
- The Vargoats_1372_20201008_filtered_20210716 directory contains:
- 30 splitted by chromosome VCF files (see README for filtering parameters) - 28 645 747 snps
- MD5 file to check the integrity of the download process.
- The Vargoats_1372_20201008_filtered_phasedBeagle5.3 directory contains:
- 29 splitted by chromosome VCF files - Haplotype phasing performed with Beagle 5.3
- beagle5.3_phasing_summary text file.
- The Vargoats_1372_20230313 directory contains the same VCF of the Vargoats_1372_20201008 directory but
reannotated by snfEff (version 5.1) against ARS1.105
- Note: For Vargoats_1372_20201008... directories, the VCF annotation have been done using snpEff (version 4.3t) and the NCBI Capra hircus annotation release 102.
Short information on available data (based on 1372 goats - 77 280 295 snps)
- ID: Internal animal name
- MeanDP: Mean depth of coverage (from VCF file)
- MeanGQ: Mean genotype quality (from VCF file)
- Ts/Tv: Ratio of transitions to transversions (from SnpSift TsTv)
- HomoRef: Hom/Het stats Homozygous ref. (from SnpSift TsTv)
- OneAlt: Hom/Het stats One ALT (from SnpSift TsTv)
- TwoAlt: Hom/Het stats Two ALTs (from SnpSift TsTv)
- Missing: Hom/Het stats Missing (from SnpSift TsTv)
- SNP: Variant type SNP (from SnpSift TsTv)
Number of individuals per breed and country of origin (Supplementary Table S1)
- Description: Distribution of sequenced individuals per breed and abbreviations explanation
- Download: [XLSX]
Detailed information for each sequenced individual (Supplementary Table S2)
- Description: Description of each individual (species, breed, country of origin, localization, sex, sample provider and details about its sequence)
- Download: [XLSM]
- Note for AdaptMap ID column (only applicable for animals already present in the 1159 dataset):
- 25 problematic individuals were labeled with * (2 outliers with unconfirmed breed status were labeled as UNK in Working name column)
- 11 animals with missing genotypes were labeled with *#
- 14 animals with low concordance rate were labeled with *§