VarGoats: data access

Data access

Last update: 1372 goats - November 9th 2020. Data access (authentification required): Shared Data.

13 directories available:

The BQSR directory contains the VCF file used to GATK base recalibration
The VQSR directory contains all the VCF files used to GATK variant recalibration (see README for details)
The Vargoats_1159_20201109 directory contains:

VCF file for indels (VQSR filtered) - 13 607 850 indels
VCF file for snps (VQSR filtered) - 114 244 623 snps
30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 74 274 427 snps
MD5 file to check the integrity of the download process.

30 splitted by chromosome VCF files (see README for filtering parameters) - 26 751 539 snps
MD5 file to check the integrity of the download process.

VCF file for indels (VQSR filtered) - 13 820 197 indels
30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 77 280 295 snps
MD5 file to check the integrity of the download process.

30 splitted by chromosome VCF files (see README for filtering parameters) - 28 645 747 snps
MD5 file to check the integrity of the download process.

29 splitted by chromosome VCF files - Haplotype phasing performed with Beagle 5.3
beagle5.3_phasing_summary text file.

The Vargoats_1372_20220809_imputed_ancient directory
The Vargoats_1372_20230313 directory contains the same VCF of the Vargoats_1372_20201008 directory but reannotated by snfEff (version 5.1) against ARS1.105
The Vargoats_1372_Relabeling_and_Subsampling directory
The Vargoats_1372_20201008_929K_phased_dataset directory
The Balanced_datasets directory
Note: For Vargoats_1372_20201008... directories, the VCF annotation have been done using snpEff (version 4.3t) and the NCBI Capra hircus annotation release 102.

ID: Internal animal name

MeanDP: Mean depth of coverage (from VCF file)

MeanGQ: Mean genotype quality (from VCF file)

Ts/Tv: Ratio of transitions to transversions (from SnpSift TsTv)

HomoRef: Hom/Het stats Homozygous ref. (from SnpSift TsTv)

OneAlt: Hom/Het stats One ALT (from SnpSift TsTv)

TwoAlt: Hom/Het stats Two ALTs (from SnpSift TsTv)

Missing: Hom/Het stats Missing (from SnpSift TsTv)

SNP: Variant type SNP (from SnpSift TsTv)

Description: Distribution of sequenced individuals per breed and abbreviations explanation
Download: [XLSX]

Description: Description of each individual (species, breed, country of origin, localization, sex, sample provider and details about its sequence)
Download: [XLSM]
Note for AdaptMap ID column (only applicable for animals already present in the 1159 dataset):
- 25 problematic individuals were labeled with * (2 outliers with unconfirmed breed status were labeled as UNK in Working name column)
- 11 animals with missing genotypes were labeled with *#
- 14 animals with low concordance rate were labeled with *§