Data access

Last update: 1372 goats - November 9th 2020. Data access (authentification required): Shared Data.
8 directories available:
  1. The Genome directory contains:
    • genome FASTA file (29 chromosomes + MT + 29 877 scaffolds = 29 907 sequences)
    • DICT file
    • tab-delimited text file (chromosome_name, length) sorted by length.
  2. The BQSR directory contains the VCF file used to GATK base recalibration
  3. The VQSR directory contains all the VCF files used to GATK variant recalibration (see README for details)
  4. The Vargoats_1159_20201109 directory contains:
    • VCF file for indels (VQSR filtered) - 13 607 850 indels
    • VCF file for snps (VQSR filtered) - 114 244 623 snps
    • 30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 74 274 427 snps
    • MD5 file to check the integrity of the download process.
  5. The Vargoats_1159_20201109_filtered_20210201 directory contains:
    • 30 splitted by chromosome VCF files (see README for filtering parameters) - 26 751 539 snps
    • MD5 file to check the integrity of the download process.
  6. The Vargoats_1372_20201008 directory contains:
    • VCF file for indels (VQSR filtered) - 13 820 197 indels
    • 30 splitted by chromosome VCF files (VQSR + GATK QUAL>100 + countVariant()>=2 + biallelic) - 77 280 295 snps
    • MD5 file to check the integrity of the download process.
  7. The Vargoats_1372_20201008_filtered_20210716 directory contains:
    • 30 splitted by chromosome VCF files (see README for filtering parameters) - 28 645 747 snps
    • MD5 file to check the integrity of the download process.
  8. The Vargoats_1372_20201008_filtered_phasedBeagle5.3 directory contains:
    • 29 splitted by chromosome VCF files - Haplotype phasing performed with Beagle 5.3
    • beagle5.3_phasing_summary text file.
  9. Note: VCF annotation have been done using snpEff (version 4.3t) and the NCBI Capra hircus annotation release 102.

Short information on available data (based on 1372 goats - 77 280 295 snps)

  • ID: Internal animal name
  • MeanDP: Mean depth of coverage (from VCF file)
  • MeanGQ: Mean genotype quality (from VCF file)
  • Ts/Tv: Ratio of transitions to transversions (from SnpSift TsTv)
  • HomoRef: Hom/Het stats Homozygous ref. (from SnpSift TsTv)
  • OneAlt: Hom/Het stats One ALT (from SnpSift TsTv)
  • TwoAlt: Hom/Het stats Two ALTs (from SnpSift TsTv)
  • Missing: Hom/Het stats Missing (from SnpSift TsTv)
  • SNP: Variant type SNP (from SnpSift TsTv)

Number of individuals per breed and country of origin (Supplementary Table S1)

  • Description: Distribution of sequenced individuals per breed and abbreviations explanation
  • Download: [XLSX]

Detailed information for each sequenced individual (Supplementary Table S2)

  • Description: Description of each individual (species, breed, country of origin, localization, sex, sample provider and details about its sequence)
  • Download: [XLSM]
  • Note for AdaptMap ID column (only applicable for animals already present in the 1159 dataset):
    • 25 problematic individuals were labeled with * (2 outliers with unconfirmed breed status were labeled as UNK in Working name column)
    • 11 animals with missing genotypes were labeled with *#
    • 14 animals with low concordance rate were labeled with *§

Coordination

French National Institute for Agricultural Research

Gwenola Tosser-Klopp