Documentation for SLiM function readFromVCF, which is a method of the SLiM class Genome. Note that the R function is a stub, it does not do anything in R (except bring up this documentation). It will only do anything useful when used inside a slim_block function further nested in a slim_script function call, where it will be translated into valid SLiM code as part of a full SLiM script.

readFromVCF(filePath, mutationType)

Arguments

filePath

An object of type string. Must be of length 1 (a singleton). See details for description.

mutationType

An object of type null or integer or MutationType object. Must be of length 1 (a singleton). The default value is NULL. See details for description.

Value

An object of type Mutation object.

Details

Documentation for this function can be found in the official SLiM manual: page 675.

Read new mutations from the VCF format file at filePath and add them to the target genomes. The number of target genomes must match the number of genomes represented in the VCF file (i.e., two times the number of samples, if each sample is diploid). To read into all of the genomes in a given subpopulation pN, simply call pN.genomes.readFromVCF(), assuming the subpopulation's size matches that of the VCF file taking ploidy into account. A vector containing all of the mutations created by readFromVCF() is returned. SLiM's VCF parsing is quite primitive. The header is parsed only inasmuch as SLiM looks to see whether SLiM-specific VCF fields (see sections 27.2.3 and 27.2.4) are defined or not; the rest of the header information is ignored. Call lines are assumed to follow the format: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT i0...iN The CHROM, ID, QUAL, FILTER, and FORMAT fields are ignored, and information in the genotype fields beyond the GT genotype subfield are also ignored. SLiM's own VCF annotations (see section 27.2.3) are honored; in particular, mutations will be created using the given values of MID, S, PO, TO, and MT if those subfields are present, and DOM, if it is present, must match the dominance coefficient of the mutation type. The parameter mutationType (a MutationType object or id) will be used for any mutations that have no supplied mutation type id in the MT subfield; if mutationType would be used but is NULL an error will result. Mutation IDs supplied in MID will be used if no mutation IDs have been used in the simulation so far; if any have been used, it is difficult for SLiM to guarantee that there are no conflicts, so a warning will be emitted and the MID values will be ignored. If selection coefficients are not supplied with the S subfield, they will be drawn from the mutation type used for the mutation. If a population of origin is not supplied with the PO subfield, -1 will be used. If a tick of origin is not supplied with the TO subfield (or a generation of origin GO field, which was the SLiM convention before SLiM 4), the current tick will be used. REF and ALT must always be comprised of simple nucleotides (A/C/G/T) rather than values representing indels or other complex states. Beyond this, the handling of the REF and ALT fields depends upon several factors. First of all, these fields are ignored in non-nucleotide-based models, although they are still checked for conformance. In nucleotide-based models, when a header definition for SLiM's NONNUC tag is present (as when nucleotide-based output is generated by SLiM): Second, if a NONNUC field is present in the INFO field the call line is taken to represent a non-nucleotide-based mutation, and REF and ALT are again ignored. In this case the mutation type used must be non-nucleotidebased. Third, if NONNUC is not present the call line is taken to represent a nucleotide-based mutation. In this case, the mutation type used must be nucleotide-based. Also, in this case, the specified reference nucleotide must match the existing ancestral nucleotide at the given position. In nucleotidebased models, when a header definition for SLiM's NONNUC tag is not present (as when loading a non- SLiM-generated VCF file): The mutation type will govern the way nucleotides are handled. If the mutation type used for a mutation is nucleotide-based, the nucleotide provided in the VCF file for that allele will be used. If the mutation type is non-nucleotide-based, the nucleotide provided will be ignored. If multiple alleles using the same nucleotide at the same position are specified in the VCF file, a separate mutation will be created for each, mirroring SLiM's behavior with independent mutational lineages when writing VCF (see section 27.2.4). The MULTIALLELIC flag is ignored by readFromVCF(); call lines for mutations at the same base position in the same genome will result in stacked mutations whether or not MULTIALLELIC is present. The target genomes correspond, in order, to the haploid or diploid calls provided for i0…iN (the sample IDs) in the VCF file. In sex-based models that simulate the X or Y chromosome, null genomes in the target vector will be skipped, and will not be used to correspond to any of i0…iN; however, care should be taken in this case that the genomes in the VCF file correspond to the target genomes in the manner desired.

Author

Benjamin C Haller (bhaller@benhaller.com) and Philipp W Messer (messer@cornell.edu)