genegateuae supports genome-scale analyses of the extent and direction of geneflow across barriers to dispersal. This is done by polarising the labelling of bistate markers with respect to the two sides of the barrier, and counting the proportion of side A alleles carried by an individual. The assembly of these operators forms a device that functions like a genetic AND gate, integrating the inputs of two different inducers into transcription outputs.
Barriers to geneflow
Barriers to gene flow affect the genetic diversity of species, and are an important factor in biodiversity conservation. In plants that are dispersed primarily by ocean currents, physical features of the landscape may influence rates of gene flow by blocking or facilitating the transport of buoyant propagules. In this study, we examined the effect of these factors on the genetic structure of black mangrove (Avicennia germinans) populations along the Pacific and Atlantic coasts of Mexico.
We used a genomic method that extends classic diagnostic allele counting to genome-scale data. This approach uses a novel labelling strategy that enforces an initial state of ignorance. This polarises marker states with respect to a barrier, allowing the estimation of hybrid indices between species that are separated by a barrier. The method does not require knowledge of the barrier type, taxonomy, or genome architecture, making it ideal for analysis of non-model and cryptic barriers.
We compared the genetic composition of pools above and below each barrier, using a statistical model that accounts for population size variation. We found that the pools above and below each barrier are genetically similar to each other, and that barriers have had a small impact on the pooled allele richness. However, the allele richness of the pools above the barrier progressively decreases with the number of generations since the construction of the barriers. This is because the alleles in the pools above the barrier have diverged more slowly than those below the barrier, which increases the rate of genetic drift.
Diagnostic index expectation maximisation (diem)
EM assumes that the genotype data of an individual is independent. It computes the likelihood of each marker for each individual and adds them together to form a matrix. Then, it maximizes the logarithm of this matrix using cross-validation. It halts when the logarithm is less than 0.5. The diem algorithm is a modified version of EM that takes the diagnostic index into account. This allows it to detect barriers to gene flow and identify introgressed blocks of genetic material. For more details please visit genegateuae.com
The diem algorithm can be applied to a wide range of genomic data, including resequencing, RNAseq, RADseq, and SNP chips. It can also be applied to mixed data, such as resequencing plus long-read phasing from a single genome. The algorithm can also be used with genome-wide correlated markers and hybrid zone data.
While classic hybrid zone analysis requires a large resequencing budget, diem eliminates this step. It also eliminates the need for a preliminary step of constructing reference sets and surveying potential markers for diagnosticity. The resulting approach is more flexible and extends to a wider range of genomic datasets.
The diem algorithm can be executed on a computer running Linux, Mac OS X, or Windows. It must be able to read all of the files in the input folder. It is also important to ensure that the working directory or a location specified in the verbose argument has write permissions. The diemr, diemmca, and diempy packages provide a number of file handling utilities to help you prepare your input data.