This notebook tracks progress on the development of resevol R package, a BBSRC FAPESP Newton Funded Project to model how the spatial scale of heterogeneity in fungal isolate application and crop plant cultivation affects variation in the selection landscape for biopesticide resistance.
During the revision process of the software note, Matt suggested
another feature that I really should have thought of making earlier. In
version 0.3.3.0, after crop rotation, landscape cell values are
initialised with whatever value is used for crop_per_cell
.
This means that there is no way to simulate crop growth on a landscape
in between crop rotations. In version 0.3.4.0, I have fixed this with
two new arguments, crop_growth
and
crop_growth_type
.
The argument crop_growth
indicates how much a cell value
representing a given crop should change from one time step to the next
(before the potential effects of feeding by pests). By default, this
value is crop_growth = 0
, meaning that all biomass on a
cell will be initialised at the start of a new crop rotation. Nonzero
values will affect the change of a cell’s crop value in a way dependent
up on crop_growth_type
. Note that crop_growth
can either take a single value, or a vector of the same length as
crop_number
. If, for example, there are two crops
(crop_number = 2
), then we could set
crop_growth = c(0.1, 0.4)
, meaning that crop 1 would
increase by a value of 0.1 each time step, and crop 2 would increase by
a value of 0.4 each time step. If we set a crop_growth
to a
single value (e.g., crop_growth = 0.1
), then all crops grow
at the same rate of crop_growth
.
The argument crop_growth_type
specifies how growth is
enacted (i.e., how the value on the landscape changes). The default
option is crop_growth_type = "none"
, in which case, no
growth occurs. If crop_growth_type = "geometric"
, then the
value of crop_growth
specifies the proportion by which the
landscape value increases in a time step,
\[value_{t+1} = value_{t} \left(1 + crop\_growth \right).\]
So if we initialise a crop amount at 2 on the landscape cell, and
crop_growth = 0.1
, then in the next time step (unless some
crop is eaten by pests), we can predict that the crop amount will be 2.2
(i.e., an increase of 10 per cent).
If crop_growth_type = "linear"
, then the increment
crop_growth
is simply added to the existing cell value in
each time step.
\[value_{t+1} = value_{t} + crop\_growth.\]
My hope is that these options will be useful for a lot of purposes, but there is plenty of room for expanding the package with different rules for crop growth.
I don’t know why I didn’t think of this sooner, but there really
should be a feature that allows users to set the initial mean values of
traits. This is easy because trait means are of course independent of
the covariance structure of the traits, so there just needs to be some
specified increment to be added to each trait throughout the simulation.
I have introduced this in a new version 0.3.2.0 with the
trait_means
argument, which simply takes a vector of the
means for each customised trait. By default, values are set to zero
(i.e., what they always were originally), but the argument can also take
a vector of the same length as the number of traits, with each element
position corresponding to the specific trait. That is,
trait_means[1]
is the mean value of "T1"
,
trait_means[2]
is the mean value of "T2"
, et
cetera. An error is given if the user attempts to specify a
trait_means
vector that has a length different from the
number of traits. Just to show this in action, I will use the resevol
mine_gmatrix output mg_v1
, which has the covariance
structure below.
## [,1] [,2] [,3] [,4]
## [1,] 0.9936103 -0.5106387 0.2324040 0.2367691
## [2,] -0.5106387 1.0193888 0.2573216 0.2723028
## [3,] 0.2324040 0.2573216 0.9334559 -0.4480018
## [4,] 0.2367691 0.2723028 -0.4480018 0.9689539
We have four traits, which I will use to define consumption rate of
crops 1 and 2 (T1
and T2
, respectively), and
the consumption rates of pesticides 1 and 2 (T3
and
T4
, respectively). I will set the trait means to T1 = 1, T2
= 2, T3 = 0.2, and T4 = 0.2 and run a simulation with the code
below.
sim <- run_farm_sim(mine_output = mg_v1, N = 1000, xdim = 10, ydim = 10,
repro = "asexual", time_steps = 20, farms = 12,
crop_number = 2,
crop_init = "random",
crop_rotation_type = 2,
pesticide_number = 2,
pesticide_init = "random",
pesticide_rotation_type = 2,
food_consume = c("T1", "T2"),
pesticide_consume = c("T3", "T4"),
trait_means = c(1, 2, 0.2, 0.2),
food_needed_surv = 1,
food_needed_repr = 3,
print_last = TRUE);
At the end of the simulation (after some evolution has occurred), we can look at the evolution of traits and final trait distributions to confirm that they are around what we expect. Here is the evolution of the different traits over time.
Here are the distributions of each trait, individually.
Finally, here is a pairs plot showing the trait correlations.
Overall, this is doing what we want it to. I have also written a new
test into testthat for the package. New documentation for the
trait_means
argument has been written, and I will
incorporate this option into the working manuscript.
Version 0.3.0.1 has been submitted to CRAN and is passing all
checks. I am now a bit under halfway through writing a first draft
of the software note. This note be submitted as a pre-print, and for
publication, and will explain how to use the package. It will also
replace the current Get
Started article and CRAN
vignette. The structure of the publication introduces some key
background, the mine_gmatrix
function and its underlying
logic, the ecology and evolution of pests in the model, and a simple
example. I think that two supporting information documents are also in
order. The first will explain the evolutionary algorithm in more detail
and basically be a version of the current
vignette with maybe some minor revisions. The second will be a more
advanced example with complex dynamics and traits, presenting
simulations like the ones shown in this journal on 26 AUG 2021. My hope is that this plus the
documentation is sufficient for anyone to start using the package.
Having submitted version 0.3.0.1, I almost immediately discovered the
need for an update that is now version 0.3.1.0. I will hold off on
submitting this one to CRAN just yet, as the update is quite minor. The
cause was that the mine_gmatrix
function kept giving a
discrepancy between the stress threshold that caused it to terminate and
the actual predicted stress of the network it found. The reason for this
was that the algorithm was terminating due to finding unusually low
stress caused sampling error for the standard random normal loci. In
other words, a luck draw for random loci often caused networks to look
better at producing desired trait covariances than they actually were.
I’ve addressed this now by first using the mean stress for a whole
population of networks to test the termination criteria
(term_cri
) in the mine_gmatrix
function,
rather than the best stress out of all networks in the
population. But the mine_gmatrix
function still returns the
network that produces the lowest found stress.
Second, I have introduced a new diagnostic function to give the distribution of stress predicted from loci.
stress_test <- function(mine_output, indivs = 1000, reps = 10){
target_VCV <- mine_output[[2]];
Loci_to_Traits <- mine_output[[5]];
loci <- mine_output[[1]][1];
Ln_stress <- rep(x = NA, times = reps);
while(reps > 0){
sim_loci <- rnorm(n = indivs * loci);
individuals <- matrix(data = sim_loci, nrow = indivs, ncol = loci);
trait_vals <- individuals %*% Loci_to_Traits;
VCV <- cov(trait_vals);
stress <- mean((target_VCV - VCV) * (target_VCV - VCV));
Ln_stress[reps] <- log(stress);
reps <- reps - 1;
}
return(Ln_stress);
}
This simple function now forms part of the package itself, so users
can run the output of mine_gmatrix
and get a better overall
picture on how the chosen network performs. The example used in the
documentation is below (I’ve increased the replicates using the argument
reps = 1000
; the default is 10).
gmt <- matrix(data = 0, nrow = 4, ncol = 4);
diag(gmt) <- 1;
mg <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 3, indivs = 100,
npsize = 100, max_gen = 2, prnt_out = FALSE);
stresses <- stress_test(mine_output = mg, reps = 1000);
The output stresses
is just a vector of stresses for
some number (indivs
) of initialised genomes. The histogram
below shows the distribution for the above example.
This should at least make it easier for users to decide if they do or
do not like the output of mine_gmatrix
for use in a new set
of simulations. The above example shows stress that is quite consistent,
but if we let the simulation run for longer to find a better solution,
the stress variance seems to increase (note that now
max_gen = 400
below).
gmt <- matrix(data = 0, nrow = 4, ncol = 4);
diag(gmt) <- 1;
mg <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 3, indivs = 100,
npsize = 100, max_gen = 400, prnt_out = FALSE);
stresses <- stress_test(mine_output = mg, reps = 1000);
hist(stresses, xlab = "Stress values for initialised genomes",
main = "", breaks = 20);
It might be good to recommend this kind of diagnostic for users, particularly when simulating very complex individuals. I think that I will do this for the supporting information showing an advanced example.
Custom crop and pesticide initialisation and rotation
I have now introduced custom crop and pesticide initialisation and
rotation, as I set out to do in the 13 JUN
2022 notes below. The code appears to be stable, and I have
introduced two new automated tests in the package to make sure that the
feature remains stable. Below is an example of how to have a custom
rotation regime for crops and pesticides; I will start with crops.
First, I will use the mg_n1
gmatrix that comes with the
resevol
package (I could have also made a dummy one; the
details are not important here).
data("gmatrices");
Now I will run a simulation. The run_farm_sim
takes some
simple options for crop_init
and
crop_rotation_type
. This includes
crop_init = "random"
, which randomly assigns each farm to
one of the potential crops
. There are also three
crop_rotation_type
arguments, including 1
(no
crop rotation), 2
(random crop choice whenever it is time
to change), and 3
(cycling through crops by number; i.e.,
1, 2, 3, 1, 2, 3). I might add more preset options like this later, but
here is how to run a simulation with two crops, random crop
initialisation on farms, and random rotation between crops.
sim <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4,
repro = "asexual", time_steps = 10, farms = 4,
crop_number = 2,
crop_init = "random",
crop_rotation_type = 2,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
The new feature is that crop_init
and
crop_rotation_type
can take custom arguments, so
potentially any initialisation and cycling of crop choices is possible.
With crop_init
, we can specify which farm gets what crop at
the start of the simulation using a fector of length farms
.
If, for example, we want farms 1 and 4 to have crop 1, and farms 2 and 3
to have crop 2, we could run the below.
sim <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4,
repro = "asexual", time_steps = 10, farms = 4,
crop_number = 2,
crop_init = c(1, 2, 2, 1),
crop_rotation_type = 2,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
Note that the position of farms on the landscape is shown below.
## [,1] [,2] [,3] [,4]
## [1,] 1 1 2 2
## [2,] 1 1 2 2
## [3,] 3 3 4 4
## [4,] 3 3 4 4
Hence, this landscape would start out with farms numbered 1 and 4,
diagonal from one another, applying crop 1, and farms numbered 2 and 3,
also diagonal, applying crop 2. Now, suppose that we want users of crop
1 to switch to crop 2 with a probability of 0.75, and retain the use of
crop 1 with a probability of 0.25. Likewise, suppose that we want users
of crop 2 to switch to crop 1 with a probability of 0.75, and retain the
use of crop 2 with a probability of 0.25. We can now specify this with a
square matrix to input in place of the number for the argument
crop_rotation_type
.
change_mat <- matrix(data = c(0.25, 0.75, 0.75, 0.25), nrow = 2, ncol = 2);
## [,1] [,2]
## [1,] 0.25 0.75
## [2,] 0.75 0.25
Note that matrix elements \(M_{i,j}\) specify the probability of
changing from crop \(i\) to crop \(j\), meaning that row elements must sum to
1. This can be binary too, as in the default
crop_rotation_type = 3
.
## [,1] [,2]
## [1,] 0 1
## [2,] 1 0
The matrix above cycles back and forth between crops 1 and 2.
Overall, this should give a lot more flexibility in terms of specifying
how crops cycle over time. We can input the change_mat
directly as an argument in run_farm_sim
.
sim <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4,
repro = "asexual", time_steps = 10, farms = 4,
crop_number = 2,
crop_init = c(1, 2, 2, 1),
crop_rotation_type = change_mat,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
Nevertheless, the cycling is restricted to Markov chains, meaning that it cannot include cycles of crops such as 1, 1, 2, 1, 1, 2, 1, 1, etc., where the cycling repeats complex patterns. This is mainly due to the awkwardness of trying to allow for such cycles given simulations that can have high numbers of time steps (i.e., I do not think it is worth figuring out a system that allows for these kinds of cycles as an option in the code).
Pesticide rotation options work exactly the same way as crop rotation options, so I will not spend to much time on them. Here is an example of running one.
sim <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4,
repro = "asexual", time_steps = 10, farms = 4,
pesticide_number = 2,
pesticide_init = c(1, 2, 2, 1),
pesticide_rotation_type = change_mat,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
In conclusion, this new feature will allow for a very flexible range of different crop and pesticide cycling options.
Custom landscape option
I have also added an option to customise the farming landscape, also
as I set out to do in the 13 JUN 2022 notes
below. The customisation is on the top layer of the landscape that
differentiates independent farms. What is required is a matrix that
includes a sequence of natural numbers in all elements, so if there are
4 farms, then all matrix elements must be either 1, 2, 3, or 4. Beyond
this requirement, there is no restriction on where different farms are
placed; the do not even need to be contiguous on the landscape. Custom
terrains can be inserted using the terrain
argument in
run_farm_sim
. Below makes a custom terrain with three
farms.
custom_terrain <- matrix(data = 0, nrow = 8, ncol = 8);
custom_terrain[,1:2] <- 1;
custom_terrain[,2:6] <- 2;
custom_terrain[,7:8] <- 3;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 2 2 2 2 2 3 3
## [2,] 1 2 2 2 2 2 3 3
## [3,] 1 2 2 2 2 2 3 3
## [4,] 1 2 2 2 2 2 3 3
## [5,] 1 2 2 2 2 2 3 3
## [6,] 1 2 2 2 2 2 3 3
## [7,] 1 2 2 2 2 2 3 3
## [8,] 1 2 2 2 2 2 3 3
We can insert the terrain into the landscape with the
terrain
argument.
sim <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4,
repro = "asexual", time_steps = 10, farms = 4,
terrain = custom_terrain,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
Note that a custom terrain will override the arguments
farms
, xdim
, and ydim
, so even
though the above sets farms = 4
, there are actually only
three farms on the landscape. I am tempted to put up a warning for this,
but I am actually not sure if it is needed; my hope is that anyone
customising the landscape would recognise this from the
documentation.
Also note that these terrain values do not necessarily need to be farms. Through the use of a custom landscape and pesticide rotation option, these cells could represent something like diversionary feeding sites or even buildings or rivers. Consider, for example, the following matrix.
complex_terrain <- matrix(data = 0, nrow = 12, ncol = 12)
complex_terrain[1:2, 1:3] <- 1;
complex_terrain[3:7, 1:2] <- 1;
complex_terrain[8:12, 1:2] <- 2;
complex_terrain[1:2, 6:9] <- 3;
complex_terrain[3:6, 7:9] <- 3;
complex_terrain[1:6, 10:12] <- 4;
complex_terrain[10:12, 7] <- 5;
complex_terrain[7:12, 8:12] <- 5;
complex_terrain[3:12, 3:4] <- 6;
complex_terrain[7:9, 5] <- 6;
complex_terrain[1:2, 4:5] <- 7;
complex_terrain[3:6, 5:6] <- 7;
complex_terrain[7:9, 6:7] <- 7;
complex_terrain[10:12, 5:6] <- 7;
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1 1 1 7 7 3 3 3 3 4 4 4
## [2,] 1 1 1 7 7 3 3 3 3 4 4 4
## [3,] 1 1 6 6 7 7 3 3 3 4 4 4
## [4,] 1 1 6 6 7 7 3 3 3 4 4 4
## [5,] 1 1 6 6 7 7 3 3 3 4 4 4
## [6,] 1 1 6 6 7 7 3 3 3 4 4 4
## [7,] 1 1 6 6 6 7 7 5 5 5 5 5
## [8,] 2 2 6 6 6 7 7 5 5 5 5 5
## [9,] 2 2 6 6 6 7 7 5 5 5 5 5
## [10,] 2 2 6 6 7 7 5 5 5 5 5 5
## [11,] 2 2 6 6 7 7 5 5 5 5 5 5
## [12,] 2 2 6 6 7 7 5 5 5 5 5 5
We can visualise the terrain a bit more clearly with an image below.
We can imagine the dark maroon colour as being a river (7), and the red region on the immediate left of the river as being a nature reserve (6). To get the simulations to run treating it this way, we would only need to spray a crop and pesticide that has no effect on the pests on the river (effectively making it an empty area of landscape in terms of feeding or resistance), and a pesticide but not crop in the nature reserve. First we can initialise the crop and pesticide vectors. Assume that there are 2 legitimate crops and pesticides that the actual farms (1-5) will use.
initial_crops <- c(1, 2, 1, 2, 1, 3, 4);
initial_pesticides <- c(1, 1, 2, 2, 1, 3, 3);
We can then clarify the matrix so that there is only a cycle between 1 and 2, but never 3 and 4 for either the pesticide or the crops. First I will do the crops.
crop_cycle <- matrix(data = 0, nrow = 4, ncol = 4);
crop_cycle[1, 2] <- 1;
crop_cycle[2, 1] <- 1;
crop_cycle[3, 3] <- 1;
crop_cycle[4, 4] <- 1;
## [,1] [,2] [,3] [,4]
## [1,] 0 1 0 0
## [2,] 1 0 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
Now I will do the pesticide.
pesticide_cycle <- matrix(data = 0, nrow = 3, ncol = 3);
pesticide_cycle[1, 2] <- 1;
pesticide_cycle[2, 1] <- 1;
pesticide_cycle[3, 3] <- 1;
## [,1] [,2] [,3]
## [1,] 0 1 0
## [2,] 1 0 0
## [3,] 0 0 1
So farms (1-5) will swap between crops 1 and 2, while the reserve (6) will maintain crop 3 and the river (7) will be ‘crop’ 4. And farms will swap between pesticides 1 and 2, while the reserve and river will hold pesticide 3. Then, to make crop 4 and pesticide 3 not really exist, we just make it impossible for the pests to consume these.
sim <- run_farm_sim(mine_output = mg_n1, N = 100,
repro = "asexual", time_steps = 10,
crop_number = 4, pesticide_number = 3,
terrain = complex_terrain,
food_consume = c(0.25, 0.25, 0.25, 0),
pesticide_consume = c(0.1, 0.1, 0),
crop_init = initial_crops,
crop_rotation_type = crop_cycle,
pesticide_init = initial_pesticides,
pesticide_rotation_type = pesticide_cycle,
print_inds = FALSE, print_gens = FALSE,
print_last = FALSE, get_stats = FALSE);
The output shows that the pests leave the ‘food’ on the river alone (see layer 5), while the reserve (see layer 4) gets used very much.
## , , 1
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1 1 1 7 7 3 3 3 3 4 4 4
## [2,] 1 1 1 7 7 3 3 3 3 4 4 4
## [3,] 1 1 6 6 7 7 3 3 3 4 4 4
## [4,] 1 1 6 6 7 7 3 3 3 4 4 4
## [5,] 1 1 6 6 7 7 3 3 3 4 4 4
## [6,] 1 1 6 6 7 7 3 3 3 4 4 4
## [7,] 1 1 6 6 6 7 7 5 5 5 5 5
## [8,] 2 2 6 6 6 7 7 5 5 5 5 5
## [9,] 2 2 6 6 6 7 7 5 5 5 5 5
## [10,] 2 2 6 6 7 7 5 5 5 5 5 5
## [11,] 2 2 6 6 7 7 5 5 5 5 5 5
## [12,] 2 2 6 6 7 7 5 5 5 5 5 5
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 1.00 0.00 0 0 0 0 0.75 1.00 1.00 0 0 0
## [2,] 1.00 1.00 0 0 0 1 1.00 1.00 1.00 0 0 0
## [3,] 1.00 0.50 0 0 0 0 0.25 1.00 1.00 0 0 0
## [4,] 1.00 0.00 0 0 0 0 1.00 0.75 1.00 0 0 0
## [5,] 1.00 0.00 0 0 0 0 0.00 1.00 1.00 0 0 0
## [6,] 0.75 0.25 0 0 0 0 0.00 1.00 1.00 0 0 0
## [7,] 0.50 0.00 0 0 0 0 0.00 0.75 1.00 1 1 1
## [8,] 0.00 0.00 0 0 0 0 0.00 0.00 0.75 1 1 1
## [9,] 0.00 0.00 0 0 0 0 0.00 0.50 1.00 1 1 1
## [10,] 0.00 0.00 0 0 0 0 0.25 1.00 1.00 1 1 1
## [11,] 0.00 0.00 0 0 0 0 0.00 0.75 1.00 1 1 1
## [12,] 0.00 0.00 0 0 0 0 0.00 1.00 1.00 1 1 1
##
## , , 3
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [2,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [3,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [4,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [5,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [6,] 0.00 0 0 0 0 0 0 0 0 1 1 1
## [7,] 0.00 0 0 0 0 0 0 0 0 0 0 0
## [8,] 0.75 0 0 0 0 0 0 0 0 0 0 0
## [9,] 0.75 0 0 0 0 0 0 0 0 0 0 0
## [10,] 0.75 0 0 0 0 0 0 0 0 0 0 0
## [11,] 1.00 0 0 0 0 0 0 0 0 0 0 0
## [12,] 0.50 0 0 0 0 0 0 0 0 0 0 0
##
## , , 4
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 0 0 0 0 0 0 0 0 0 0 0 0
## [2,] 0 0 0 0 0 0 0 0 0 0 0 0
## [3,] 0 0 1 0 0 0 0 0 0 0 0 0
## [4,] 0 0 0 0 0 0 0 0 0 0 0 0
## [5,] 0 0 0 0 0 0 0 0 0 0 0 0
## [6,] 0 0 0 0 0 0 0 0 0 0 0 0
## [7,] 0 0 0 0 0 0 0 0 0 0 0 0
## [8,] 0 0 0 0 0 0 0 0 0 0 0 0
## [9,] 0 0 0 0 0 0 0 0 0 0 0 0
## [10,] 0 0 0 0 0 0 0 0 0 0 0 0
## [11,] 0 0 0 0 0 0 0 0 0 0 0 0
## [12,] 0 0 0 0 0 0 0 0 0 0 0 0
##
## , , 5
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 0 0 0 1 1 0 0 0 0 0 0 0
## [2,] 0 0 0 1 1 0 0 0 0 0 0 0
## [3,] 0 0 0 0 1 1 0 0 0 0 0 0
## [4,] 0 0 0 0 1 1 0 0 0 0 0 0
## [5,] 0 0 0 0 1 1 0 0 0 0 0 0
## [6,] 0 0 0 0 1 1 0 0 0 0 0 0
## [7,] 0 0 0 0 0 1 1 0 0 0 0 0
## [8,] 0 0 0 0 0 1 1 0 0 0 0 0
## [9,] 0 0 0 0 0 1 1 0 0 0 0 0
## [10,] 0 0 0 0 1 1 0 0 0 0 0 0
## [11,] 0 0 0 0 1 1 0 0 0 0 0 0
## [12,] 0 0 0 0 1 1 0 0 0 0 0 0
Similarly, the pests will only be affected by pesticides 1 and 2, while 3 is effectively nothing. Note that this is of course a trick for modelling landscape areas without food or pesticide, but I do not see a reason to make this more explicit in the function (unless I receive requests due to confusion, but my hope is it is clear from these notes and documentation).
As I prepare the pre-print and software note for the resevol package, I am thinking that might be worth doing just a bit more coding so that the publication can include four more features:
run_farm_sim
argument
crop_rotation_type
accepts either 1 or a 2. A 1 results in
no rotation of crops on the landscape, and a 2 results in a random
transition between types. I think that a custom transition matrix should
be possible to insert, whereby each element should specify the
probability of transitioning from crop i (row) to crop
j (column). In the simplest case (e.g., could be
crop_rotation_type = 3
), this could just be a binary matrix
that forms a cyclical graph. The current options 1 and 2 could even be
subsumed in the transition matrix.custom_landscape
function could check this matrix, then use
it as the top layer of an array.mine_gmatrix
option. This could be fairly easy to
implement, and for some cases might actually find genome solutions
faster than the evolutionary algorithm.My hope is that these different tasks could be done fairly quickly, and that they will make writing up the paper go more smoothly.
There had for some time been an annoying issue of resevol CRAN checks having ERRORs on Windows systems with ix86 architecture. As it turns out, this was caused by some errors in specifying the memory for arrays, which was fixed in this commit. Just to make sure that everything is clear and correct, I have made the cast much clearer in a new commit. Tests using the rhub package and on Winbuilder confirm that everything works as intended. Valgrind found no memory leaks; everything appears fine, so I have updated the version to v0.2.0.9 and submitted the fix to CRAN.
I have now run the same analysis that I did on 19 AUG 2021, but with traits that evolve
independently. This was done by setting a covariance matrix to be the
identity matrix for four traits. Here is the mine_gmatrix
output that was used for the simulations.
## [[1]]
## [1] 12.00 6.00 2000.00 18000.00 0.01 0.01 3200.00 0.01
## [9] 1200.00 6.00 -9.00 0.10 0.00
##
## [[2]]
## [,1] [,2] [,3] [,4]
## [1,] 1 0 0 0
## [2,] 0 1 0 0
## [3,] 0 0 1 0
## [4,] 0 0 0 1
##
## [[3]]
## [,1] [,2] [,3] [,4]
## [1,] 1.01323287 -0.91056130 0.22300841 0.978403744
## [2,] 0.27599802 -0.37879572 -1.01693690 0.762334090
## [3,] -0.03203048 -0.51511158 0.79037437 0.002639731
## [4,] -0.09193560 0.07907323 0.14790265 -0.039062606
## [5,] -0.73131494 0.07549138 0.17234648 -0.009426905
## [6,] -0.50890530 0.18561143 0.72008751 0.496708973
## [7,] -0.01427086 -0.28208810 -0.11196553 -0.537226514
## [8,] 0.25274376 0.15911871 -0.11647583 0.011376558
## [9,] -0.21570241 -0.33741651 0.47711231 -0.459624091
## [10,] -0.23138642 -0.92727329 0.04955606 -0.017246277
## [11,] -0.19430907 -0.38257361 -0.26531432 0.630049848
## [12,] -0.66903676 0.17674572 -0.16014889 0.067177474
##
## [[4]]
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 0.3666743 0.86477950 0.61013519 0.4585473
## [2,] -0.7287975 0.89967854 -0.26984724 0.1715412
## [3,] -0.2753835 -0.27174429 -0.02783652 0.5228947
## [4,] 0.2106197 -0.09700542 -0.95923363 0.2126882
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] -0.5464374 -0.57925350 0.2394385 0.13922962
## [2,] -0.2060458 0.01186567 -0.2479210 0.52160749
## [3,] 0.6099824 -0.40519047 -0.1605294 -0.05806732
## [4,] 0.7619924 -0.03927812 1.0184070 0.76831379
##
## , , 3
##
## [,1] [,2] [,3] [,4]
## [1,] 0.01579773 -0.0176360 0.1042295 -0.6998017767
## [2,] 1.12913619 -0.1744167 0.1694019 -0.0537070606
## [3,] 0.11367376 -0.5660887 0.5790242 -0.0007441308
## [4,] 0.00766533 -0.7063434 -0.6071945 -0.0742630499
##
## , , 4
##
## [,1] [,2] [,3] [,4]
## [1,] -0.4415180 -0.45882179 0.189424993 -0.4583560
## [2,] 0.4881191 -0.07763796 0.534154371 -0.3413463
## [3,] -0.1194739 0.71146855 0.622996667 0.7601848
## [4,] -0.2681222 0.75851682 0.004319161 -0.7493150
##
## , , 5
##
## [,1] [,2] [,3] [,4]
## [1,] 0.3594965 -0.44704612 0.1057529 0.6412851
## [2,] -0.4734836 -0.02154132 0.8362356 0.0703729
## [3,] -0.1015575 0.59180270 -0.2234910 0.3626638
## [4,] 0.7883569 0.48251087 0.6203973 0.1364310
##
## , , 6
##
## [,1] [,2] [,3] [,4]
## [1,] 0.6413329 -0.0520821 0.0313601 -0.7862978
## [2,] 0.2766572 0.5259953 1.0408831 0.7330976
## [3,] -0.4612240 0.6870955 -0.2413829 -0.1735197
## [4,] -0.8082452 -0.7595234 0.7075496 -0.6229779
##
##
## [[5]]
## [,1] [,2] [,3] [,4]
## [1,] -0.11900916 0.72971816 -0.33534413 -0.16856932
## [2,] -0.56655781 -0.01421153 -0.44314434 0.25331983
## [3,] 0.02974235 0.39626672 0.35083549 -0.16975625
## [4,] 0.08175985 0.01612911 0.06762713 0.03284772
## [5,] -0.10161875 -0.04482800 0.34340020 0.33128165
## [6,] 0.27189293 0.34943436 0.27667359 0.51511528
## [7,] -0.22418646 -0.17966680 0.14259016 -0.36500830
## [8,] 0.11894101 -0.03845361 -0.18004009 -0.05569821
## [9,] -0.06241772 0.05742287 0.38727606 -0.27966268
## [10,] -0.59027383 0.15893478 0.33824411 -0.16558486
## [11,] -0.38817067 0.16736336 -0.01299629 0.34651855
## [12,] -0.16592813 -0.15620554 0.18829157 0.39244612
##
## [[6]]
## [,1] [,2] [,3] [,4]
## [1,] 1.06913329 -0.072243449 0.064732772 -0.056487174
## [2,] -0.07224345 0.874066557 0.007063041 -0.007362562
## [3,] 0.06473277 0.007063041 0.948800593 -0.009399430
## [4,] -0.05648717 -0.007362562 -0.009399430 1.001389459
##
## [[7]]
## [1] 1.0132328710 -0.9105612976 0.2230084069 0.9784037440 0.2759980196
## [6] -0.3787957167 -1.0169369015 0.7623340902 -0.0320304800 -0.5151115759
## [11] 0.7903743678 0.0026397307 -0.0919356017 0.0790732281 0.1479026489
## [16] -0.0390626056 -0.7313149438 0.0754913824 0.1723464850 -0.0094269050
## [21] -0.5089053016 0.1856114264 0.7200875059 0.4967089726 -0.0142708566
## [26] -0.2820881047 -0.1119655280 -0.5372265136 0.2527437636 0.1591187075
## [31] -0.1164758339 0.0113765575 -0.2157024076 -0.3374165063 0.4771123077
## [36] -0.4596240914 -0.2313864218 -0.9272732890 0.0495560606 -0.0172462772
## [41] -0.1943090688 -0.3825736088 -0.2653143217 0.6300498485 -0.6690367591
## [46] 0.1767457169 -0.1601488935 0.0671774739 0.3666742930 0.8647795015
## [51] 0.6101351924 0.4585473120 -0.7287975411 0.8996785444 -0.2698472400
## [56] 0.1715411534 -0.2753834548 -0.2717442880 -0.0278365186 0.5228947384
## [61] 0.2106197266 -0.0970054220 -0.9592336331 0.2126882478 -0.5464374357
## [66] -0.5792534966 0.2394384985 0.1392296218 -0.2060458352 0.0118656721
## [71] -0.2479210148 0.5216074862 0.6099824149 -0.4051904672 -0.1605293560
## [76] -0.0580673216 0.7619923905 -0.0392781166 1.0184070112 0.7683137886
## [81] 0.0157977321 -0.0176359999 0.1042294944 -0.6998017767 1.1291361885
## [86] -0.1744166518 0.1694018671 -0.0537070606 0.1136737596 -0.5660887218
## [91] 0.5790242132 -0.0007441308 0.0076653298 -0.7063434037 -0.6071945083
## [96] -0.0742630499 -0.4415179511 -0.4588217919 0.1894249926 -0.4583559714
## [101] 0.4881190547 -0.0776379596 0.5341543712 -0.3413463261 -0.1194739358
## [106] 0.7114685497 0.6229966674 0.7601847997 -0.2681222285 0.7585168195
## [111] 0.0043191611 -0.7493149802 0.3594964704 -0.4470461217 0.1057529089
## [116] 0.6412850683 -0.4734836453 -0.0215413243 0.8362356329 0.0703728975
## [121] -0.1015575348 0.5918026977 -0.2234910091 0.3626638219 0.7883569250
## [126] 0.4825108655 0.6203972867 0.1364309830 0.6413329012 -0.0520821005
## [131] 0.0313601037 -0.7862978038 0.2766571825 0.5259952870 1.0408830938
## [136] 0.7330976036 -0.4612239907 0.6870955302 -0.2413828743 -0.1735197274
## [141] -0.8082452080 -0.7595233690 0.7075495824 -0.6229778721
##
## [[8]]
## [1] -5.625305
Note that the parameter values for mining the matrix were identical to those chosen in the simulations from 19 AUG 2021. The results are as expected; all traits evolve in response to selection to feed on both crop types and resist both pesticide types.
Asexual, no rotation
Asexual, pesticide rotation
Asexual, both rotation
Selfing, no rotation
Selfing, pesticide rotation
Selfing, both rotation
Non-selfing, no rotation
Non-selfing, pesticide rotation
Non-selfing, both rotation
Biparental, no rotation
Biparental, pesticide rotation
Biparental, both rotation
For full simulations, I think it will be important to have each replicate (and I’m including all of the above as one replicate) with its own mined gmatrix.
The helicoverpa package will be submitted to CRAN sometime before early October What this will require is some appropriate testing functions and documentation, and maybe one or two vignettes that show how the package works. I’ll also make a website similar to the GMSE website. We will want to decide if the package name should actually be ‘helicoverpa’, or if we should switch to something else (endorse is unfortunately taken; I’m not sure if ‘ENDORSE’ in all caps would work, but probably best to not try this and avoid confusion). I will update with more details about this later.
For now, I want to focus on some simulations that I have recently run with the idea of laying the foundations for a general paper on landscape heterogeneity and pesticide resistance (not so much focusing on the details of the Helicoverpa system). I’ve not yet checked to see how novel the general ideas are because this started more as a verification of the model (the code is working as intended, as far as I can tell).
The model focuses on the way that negative genetic correlations affect niche differentiation across different ploidy and mating types. The idea is that sex can act as a constraint to adaptation due to outbreeding depression in a way that cannot happen given asexual reproduction. Given a heterogeneous landscape, specifically in the context of the evolution of pesticide resistance and crop specialisation, asexually reproducing pests can get the upper hand by specialising on specific combinations of crop and pesticide and ensuring that their descendants will keep these specialised genotypes intact. In contrast, sexually reproducing organisms have a risk of outbreeding depression, breeding with a conspecific that is specialised to different conditions and having offspring that is not good at dealing with any environment. I have not seen this point in the literature, and I need to figure out the extent to which it has or has not been made and modelled to determine whether or not it is actually novel.
What follows is a lot of simulation results from four different types
of species set by the repro
argument in the R package:
For each of these different organism types, I included two different crop types and two pesticide types, and I ran three different pesticide and crop rotation procedures on a landscape with 24 farms for 480 time steps (maybe very loosely thinking of 12 time steps as a year):
I’m deliberately not having a crop rotation only simulation as it
seems unrealistic and potentially a bit redundant, at least for now. The
details of the simulation are as follows. The genome was created with
the mine_gmatrix
settings below.
mg <- mine_gmatrix(gmatrix = gmt, loci = 12, indivs = 2000, npsize = 12000,
max_gen = 2400, sampleK = 1200, chooseK = 6, layers = 6,
mu_pr = 0.05, pr_cross = 0.05, mu_sd = 0.01,
term_cri = -12);
The output of mg
is below. Note that list element 2
shows the target variance covariance matrix of four traits. Element 6
below gives the mined matrix.
## [[1]]
## [1] 12.00 6.00 2000.00 12000.00 0.05 0.01 2400.00 0.05
## [9] 1200.00 6.00 -12.00 0.10 0.00
##
## [[2]]
## [,1] [,2] [,3] [,4]
## [1,] 1.0 -0.5 0.2 0.2
## [2,] -0.5 1.0 0.2 0.2
## [3,] 0.2 0.2 1.0 -0.5
## [4,] 0.2 0.2 -0.5 1.0
##
## [[3]]
## [,1] [,2] [,3] [,4]
## [1,] -0.62181158 -0.23246088 1.027398e-01 0.36089789
## [2,] 0.11516718 0.35628744 1.932607e-06 0.38293051
## [3,] -0.33493318 0.32627944 6.940649e-02 -0.30558144
## [4,] -0.07520604 -0.32440111 3.741366e-01 0.24383415
## [5,] -0.51259791 -0.06048184 -8.689474e-01 0.60746638
## [6,] 0.21803456 -0.38792630 2.014267e-01 -0.42601954
## [7,] -0.22419755 -0.84531007 5.902170e-01 -0.10942012
## [8,] 0.73822815 -0.30174497 2.573475e-01 0.23633850
## [9,] -0.45722066 0.09532205 5.540713e-01 -0.06608855
## [10,] 0.28739580 -0.12243227 -1.553612e-01 -0.01861052
## [11,] -0.09843707 -0.18971567 6.172619e-01 -0.17746038
## [12,] 0.64383533 -0.19503881 -1.054181e-01 -0.37567049
##
## [[4]]
## , , 1
##
## [,1] [,2] [,3] [,4]
## [1,] 0.1876383 -0.7766931 0.2561096 -0.1594451
## [2,] -0.7218431 -0.1720015 0.1051845 -0.1391646
## [3,] -0.5724185 -0.6303034 -0.1351206 -0.1521792
## [4,] -0.1518009 -0.4726420 0.6897870 -0.2687902
##
## , , 2
##
## [,1] [,2] [,3] [,4]
## [1,] -0.89598885 -0.2361661 0.03713508 -0.4339221
## [2,] -0.03818028 0.1322045 0.52982046 -0.2206081
## [3,] -0.16489913 0.2119842 0.44171859 0.7881612
## [4,] 0.35726940 -0.5271966 0.13122182 -0.2990653
##
## , , 3
##
## [,1] [,2] [,3] [,4]
## [1,] 0.41092320 0.28480202 -0.1120791 -0.4511020
## [2,] 0.19794635 0.70486627 -0.2855835 -0.3474222
## [3,] -0.84575895 -0.06894474 0.2846113 -0.5004831
## [4,] 0.08297854 0.31942452 0.8976090 0.4573121
##
## , , 4
##
## [,1] [,2] [,3] [,4]
## [1,] 0.2253143 0.2004491 0.74271147 1.0243443
## [2,] -0.7223198 0.5598523 0.41619436 -0.1152639
## [3,] -0.2417876 0.5954666 -0.03304092 -0.4663887
## [4,] 0.4589945 0.5626436 -0.42824309 -0.3633294
##
## , , 5
##
## [,1] [,2] [,3] [,4]
## [1,] 0.05722936 -0.7678113 0.7009198 0.15491320
## [2,] 0.08709396 -0.6969477 0.4913277 -0.73493561
## [3,] -0.70780111 0.7518397 0.4779771 0.41039047
## [4,] -0.18146366 -0.6877152 0.4546469 -0.01592507
##
## , , 6
##
## [,1] [,2] [,3] [,4]
## [1,] -0.484868322 0.2352634 -0.5067399 0.51615705
## [2,] 1.272293446 -0.7279591 -0.4885374 1.19602908
## [3,] 0.653742180 -0.6679193 0.2497727 -0.05637619
## [4,] -0.008199929 1.2147659 0.9983367 0.26934001
##
##
## [[5]]
## [,1] [,2] [,3] [,4]
## [1,] 0.11355835 0.002908792 -0.122576128 0.24062456
## [2,] 0.19860768 -0.560485161 -0.238001247 -0.13433740
## [3,] 0.35488346 0.057368027 0.251966132 0.15915978
## [4,] -0.01866734 -0.145837880 -0.007194782 -0.16074206
## [5,] -0.23320888 0.208434036 -0.643175344 0.63047911
## [6,] -0.24710427 0.412170404 0.305025482 -0.13385957
## [7,] -0.21817543 0.395054251 0.293741747 -0.11102551
## [8,] -0.33650514 -0.312956818 -0.128296727 -0.52754011
## [9,] 0.46183511 -0.197388878 0.295203638 -0.03835963
## [10,] -0.24765628 0.105117973 -0.078117455 -0.06127897
## [11,] 0.17745685 -0.052061251 0.344817622 -0.22442234
## [12,] -0.40187122 0.296093799 0.120292139 -0.22079164
##
## [[6]]
## [,1] [,2] [,3] [,4]
## [1,] 0.9936103 -0.5106387 0.2324040 0.2367691
## [2,] -0.5106387 1.0193888 0.2573216 0.2723028
## [3,] 0.2324040 0.2573216 0.9334559 -0.4480018
## [4,] 0.2367691 0.2723028 -0.4480018 0.9689539
##
## [[7]]
## [1] -6.218116e-01 -2.324609e-01 1.027398e-01 3.608979e-01 1.151672e-01
## [6] 3.562874e-01 1.932607e-06 3.829305e-01 -3.349332e-01 3.262794e-01
## [11] 6.940649e-02 -3.055814e-01 -7.520604e-02 -3.244011e-01 3.741366e-01
## [16] 2.438342e-01 -5.125979e-01 -6.048184e-02 -8.689474e-01 6.074664e-01
## [21] 2.180346e-01 -3.879263e-01 2.014267e-01 -4.260195e-01 -2.241976e-01
## [26] -8.453101e-01 5.902170e-01 -1.094201e-01 7.382282e-01 -3.017450e-01
## [31] 2.573475e-01 2.363385e-01 -4.572207e-01 9.532205e-02 5.540713e-01
## [36] -6.608855e-02 2.873958e-01 -1.224323e-01 -1.553612e-01 -1.861052e-02
## [41] -9.843707e-02 -1.897157e-01 6.172619e-01 -1.774604e-01 6.438353e-01
## [46] -1.950388e-01 -1.054181e-01 -3.756705e-01 1.876383e-01 -7.766931e-01
## [51] 2.561096e-01 -1.594451e-01 -7.218431e-01 -1.720015e-01 1.051845e-01
## [56] -1.391646e-01 -5.724185e-01 -6.303034e-01 -1.351206e-01 -1.521792e-01
## [61] -1.518009e-01 -4.726420e-01 6.897870e-01 -2.687902e-01 -8.959888e-01
## [66] -2.361661e-01 3.713508e-02 -4.339221e-01 -3.818028e-02 1.322045e-01
## [71] 5.298205e-01 -2.206081e-01 -1.648991e-01 2.119842e-01 4.417186e-01
## [76] 7.881612e-01 3.572694e-01 -5.271966e-01 1.312218e-01 -2.990653e-01
## [81] 4.109232e-01 2.848020e-01 -1.120791e-01 -4.511020e-01 1.979463e-01
## [86] 7.048663e-01 -2.855835e-01 -3.474222e-01 -8.457590e-01 -6.894474e-02
## [91] 2.846113e-01 -5.004831e-01 8.297854e-02 3.194245e-01 8.976090e-01
## [96] 4.573121e-01 2.253143e-01 2.004491e-01 7.427115e-01 1.024344e+00
## [101] -7.223198e-01 5.598523e-01 4.161944e-01 -1.152639e-01 -2.417876e-01
## [106] 5.954666e-01 -3.304092e-02 -4.663887e-01 4.589945e-01 5.626436e-01
## [111] -4.282431e-01 -3.633294e-01 5.722936e-02 -7.678113e-01 7.009198e-01
## [116] 1.549132e-01 8.709396e-02 -6.969477e-01 4.913277e-01 -7.349356e-01
## [121] -7.078011e-01 7.518397e-01 4.779771e-01 4.103905e-01 -1.814637e-01
## [126] -6.877152e-01 4.546469e-01 -1.592507e-02 -4.848683e-01 2.352634e-01
## [131] -5.067399e-01 5.161570e-01 1.272293e+00 -7.279591e-01 -4.885374e-01
## [136] 1.196029e+00 6.537422e-01 -6.679193e-01 2.497727e-01 -5.637619e-02
## [141] -8.199929e-03 1.214766e+00 9.983367e-01 2.693400e-01
##
## [[8]]
## [1] -6.237817
This output was used to run simulation with the following parameter values (asexual without rotation is shown below).
sim <- run_farm_sim(mine_output = mg,
N = 1000,
neutral_loci = 1000,
xdim = 100,
ydim = 100,
repro = "asexual",
max_age = 6,
selfing = TRUE,
food_consume = c("T1", "T2"),
pesticide_consume = c("T3", "T4"),
food_needed_surv = 1,
food_needed_repr = 1,
reproduction_type = "food_based",
pesticide_tolerated_surv = 0,
pesticide_rotation_type = 2,
crop_rotation_type = 2,
min_age_reproduce = 4,
max_age_feed = 3,
lambda_value = 1.5,
farms = 24,
time_steps = 480,
mutation_pr = 0.001,
crossover_pr = 0.1,
net_mu_layers = 0,
crop_rotation_time = 12,
pesticide_rotation_time = 12,
crop_per_cell = 8,
pesticide_per_cell = 0.4,
crop_number = 2,
pesticide_number = 2,
K_on_birth = 1000000,
min_age_move = 4,
age_food_threshold = 2,
min_age_feed = 1,
pesticide_start = 100,
print_last = TRUE,
mating_distance = 2,
);
I have left out some arguments that were set to their default options. Note that we have a 100 by 100 landscape. Two traits determine how much of crop an individual consumes at a time (“T1” and “T2”), and two separate traits determine how much of two types of pesticide an individual imbibes (“T3”, “T4”). Individuals start feeding in their first time step and stop feeding after their third time step. On their four time step, individuals can move and reproduce, and they have a maximum age of 6 (though mortality happens at the end of a time step). Each cell has 8 units of crop that an individual can eat, and the crop is replaced each time the land is rotated (i.e., individuals can keep eating and feeding on it until there is either nothing left on a cell or the farmer plants a new crop after 12 time steps). Individuals must eat at least 1 unit to survive and reproduce, and can starve on or after their second time step. Individuals also cannot tolerate any pesticide (note, ‘zero’ is kind of arbitrary, but it’s an easy number to use).
The following shows images of all of the different scenarios. For each scenario, I will show an image of the landscape on the last time step. Different coloured squares show different farms, and the points on top of them are pests. I will also show the dynamics of pest abundance and traits over time, and I will show scatterplots of the correlations between traits.
Note that the pesticide application starts on time step 100 to allow a period of burn-in.
Asexual, no rotation
Asexual, pesticide rotation
Asexual, both rotation
Selfing, no rotation
Selfing, pesticide rotation
Selfing, both rotation
Non-selfing, no rotation
Non-selfing, pesticide rotation
Non-selfing, both rotation
Biparental, no rotation
Biparental, pesticide rotation
Biparental, both rotation
It’s worth mentioning that the reason for the oscillating population size is due to individuals sometimes consuming all of the food and starting to decrease in population density before the next crop is put down. To get a full picture on what is going on, I will need to run multiple replicate simulations with different mined genomes (i.e., same genetic correlations, but potentially achieved in a different way). I will also want to run the same simulations with an identity matrix (i.e., no genetic correlations) to act as a control so that we can understand the role of the negative genetic correlations in all of these results.
I have not really given all of these figures a hard look to try to think more clearly about what’s going on yet – at least not apart from the outbreeding depression idea.
I just want to walk through some of the work and simulations that I have been running in the past two weeks. I am not going to get into the details, but I just want to show what the model can do and roughly what kind of output we might expect for a larger set of simulations. Everything here is reproducible using the helicoverpa package, which can be downloaded from the R console using the code below.
install.packages("devtools"); # If you've not got devtools already installed
library(devtools);
install_github("bradduthie/helicoverpa");
There are no real safeguards or error messagese built into the package yet, so it’s probably best to keep to a small simulations when experimenting. There are a few key points I want to make before presenting the model.
The code is written in a very modular way. We can take bits out and rearrange most things without too much effort, so please do make suggestions if you think something is important to change. Most changes will take very little time for me to implement now that the broad structure of the code is written.
The results that I present below are really just about getting a feel for what the model can do, and for developing an intuition about what kind of results we might expect when we set things up more formally. It’s also another layer of checking to make sure that there are no major bugs in the code. Hence, the simulations that I have run reflect my own testing of what the model can do rather than the best test of any specific question.
I’m not going to explain all of the things that the software can do since I don’t think this would be terribly productive. I just want to give a sense of what we’re working with and what the possibilities are in relation to ENDORSE. There are some hidden hooks in here that were just too tempting not to code, which can be used for future papers. If I’ve done a good job, then it should be possible to use this one R package to address a range of potentially interesting questions. I don’t need to take the lead in everything, and I would be delighted if folks were to use the software for individual-based modelling.
Here are the basics of how to run simulations. You really only need to know two R functions, each of which can take a long list of arguments. The heavy lifting of both functions is done in C rather than R. The first function is the mine_gmatrix function. The second function is the run_farm_sim function. The role of mine_gmatrix is to generate a genome that produces an arbitrary number of phenotypic traits with an arbitrary covariance structure given a set of allele values with a standard normal distribution. The role of run_farm_sim is to do everything else – that is, to generate individuals with the mined genomes simulate the ecology and evolution of the system over time. The output of the simulation is the creation of one or more CSV files in the directory where the simulation is run (more on this later).
mine_gmatrix
functionThe goal of the mine_gmatrix
function is to get from
standard normal (mean = 0, stdev = 1) allele values at some number of
loci (loci
) through a network of some number of hidden
layers (layers
) to produce a set of traits that have some
variance covariance structure (gmatrix
, which is a square
covariance matrix).
The figure above shows an example with 18 loci and 9 traits, separated by 6 hidden layers. Each of the black arrows can be represented by a number indicating how much one locus (green) or node (blue) should contribute to the next (mathematically, arrows between two sets of nodes are represented by matrices that get multiplied together). These numbers become part of the genome for individuals, determining what a newly born individual’s traits will be given their allele values at each locus. For now, an evolutionary algorithm is used to find a set of numbers that will produce traits with a desired covariance structure given random standard normal allele values. How long it takes to find these numbers depends on the number of loci, layers, and traits, and how much error we are willing to accept. Here’s a very simple example of running the function with four loci, two layers, and two traits that have a variance of 1 and a covariance of -0.5.
library(helicoverpa);
gmt <- matrix(data = 0, nrow = 2, ncol = 2);
gmt[1, 1] <- 1;
gmt[2, 2] <- 1;
gmt[1, 2] <- -0.5;
gmt[2, 1] <- -0.5;
The covariance matrix is shown below.
## [,1] [,2]
## [1,] 1.0 -0.5
## [2,] -0.5 1.0
This matrix is set as the argument gmatrix = gmt
. The
number of loci is set with loci = 4
, and the number of
layers is set with layers = 2
(there is no need to set
trait number because the function figures this out from the
gmatrix
). The rest of the arguments specify how the
evolutionary algorithm is to run:
Argument | What it does |
---|---|
npsize | How big a population of networks evolve in the algorithm |
indivs | How many random normals to test the fitness of each network |
max_gen | The maximum number of generations allowed to evolve |
sampleK | Number of randomly selected networks to compete |
chooseK | Number of tournament winners (highest fitness) |
mu_pr | Probability of a value in a network mutating |
pr_cross | Probability of crossing over between networks |
mu_sd | Standard deviation of mutation effect size (mean is 0) |
term_cri | Logged stress criteria for termination |
The details here are not terribly important, but the key concepts are
explained in more detail in Hamblin (2013)
and Duthie, Cusack, et al. (2018) in one
of the GMSE
R package vignettes. Basically, what we’re doing is creating a
random population of npsize
networks like the one in the
figure above with random arrow values. We’re then letting this network
evolve over time with selection for networks that do a better job of
producing the gmatrix
from a standard normal loci values.
Evolution stops after max_gen
generations have passed or
the mean squared deviation of elements from the evolved variance
covariance matrix and gmatrix
is less than
exp(term_cri)
. Finding a genome is a bit of an art relying
on trying different parameter combinations and seeing what works. Big
npsize
(higher than 10000) with low mutation rates and
standard deviations (both below 0.01) are slow and steady, but usually
necessary for a large gmatrix
. For smaller
gmatrix
, smaller populations and higher mutation and
crossover rates are fine and quite fast. For the matrix gmt
above, we can probably get away with the following:
mg <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 2, npsize = 2000,
indivs = 2000, max_gen = 800, sampleK = 1200, chooseK = 6,
mu_pr = 0.3, pr_cross = 0.3, mu_sd = 0.05, term_cri = -4);
I’m being extremely relaxed here with the term_cri
and
max_gen
, but I just want to show a quick example. The
function prints off each generation, the mean stress, and the minimum
stress (best) genome found. This is mostly just to reassure you that
it’s doing the job and has not crashed. The output in mg
is
a list with seven elements:
mine_gmatrix
.gmatrix
.layers
square matrices,
representing all of the rest of the arrows in the figure above.gmatrix
).Here is the output for mg
above.
print(mg);
## [[1]]
## [1] 4.00 2.00 2000.00 2000.00 0.30 0.05 800.00 0.30 1200.00
## [10] 6.00 -4.00 0.10 0.00
##
## [[2]]
## [,1] [,2]
## [1,] 1.0 -0.5
## [2,] -0.5 1.0
##
## [[3]]
## [,1] [,2]
## [1,] -0.7780116 -0.8220788
## [2,] 0.1761964 0.2002705
## [3,] 0.7962038 -0.5952858
## [4,] 0.4842877 -0.2810729
##
## [[4]]
## , , 1
##
## [,1] [,2]
## [1,] 0.8775310 0.3140504
## [2,] -0.1439212 0.7641448
##
## , , 2
##
## [,1] [,2]
## [1,] 0.5217076 -0.8884919
## [2,] 0.7151524 0.1385427
##
##
## [[5]]
## [,1] [,2]
## [1,] -0.9184457 0.38059627
## [2,] 0.2146444 -0.08289933
## [3,] 0.2627214 -0.72528297
## [4,] 0.1979856 -0.42221579
##
## [[6]]
## [,1] [,2]
## [1,] 1.0484333 -0.6448085
## [2,] -0.6448085 0.8521395
##
## [[7]]
## [1] -0.7780116 -0.8220788 0.1761964 0.2002705 0.7962038 -0.5952858
## [7] 0.4842877 -0.2810729 0.8775310 0.3140504 -0.1439212 0.7641448
## [13] 0.5217076 -0.8884919 0.7151524 0.1385427
##
## [[8]]
## [1] -4.195757
Note that the variance covariance matrix in [[6]]
is
reasonable, but probably not good enough for real simulations. It took
less than 3 minutes to get, but the time taken appears to scale
exponentially with gmatrix
size and term_cri
.
But if we accept this result, we have all that we need to move ahead
with the next function for actual simulation.
run_farm_sim
functionThis function runs a single replicate simulation from start to
finish. It generates a landscape divided into some arbitrary number of
farms
and places new individuals on them. A series of
events then happen in pretty much any order that we want them to happen
over the course of time_steps
discrete time steps.
First, individuals are initialised with their genomes as found in
mine_gmatrix
. They are given random allele values for their
loci. Other options for individuals include the following (this is not
exhaustive):
repro
can be set as
asexual
, sexual
(all individuals female and
male with optional selfing
), or biparental
(females and males separate).max_age
that an individual can be before death.min_age_move
and max_age_move
).min_age_reproduce
and
max_age_reproduce
).min_age_feed
).food_consume
, which can be a vector of up to length 10 –
an element for each possible food type).pesticide_consume
, which can be a vector of up to length
10 – an element for each possible pesticide type).move_distance
).food_need_surve
) and the age at which this threshold is
assessed (age_food_threshold
).pesticide_tolerated_repr
) and the age at
which this threshold is assessed
(age_pesticide_threshold
).mutation_pr
) and crossover
(crossover_pr
).net_mu_layers
).neutral_loci
).When the simulation starts, the following happens:
crop_number
crops (up to 10 possible) grown on
farms
unique farms are changed at
crop_rotation_time
intervals (rules for changing can be
adjusted).pesticide_number
pesticides (up to 10 possible)
applied on each farm is changed at pesticide_rotation_time
intervals (here too, rules for changing can be adjusted).min_age_feed
and not over
max_age_feed
, then they eat on the landscape cell in which
they are located. The amount of food that they consume on the cell can
be set as food_consume
, which can be a vector of up to 10
elements (one for each food type). Food consumption rates can also be
specified as a trait (more on that later). Food consumption occurs in a
random order of individuals.min_age_feed
and not
over max_age_feed
(same as feeding) imbibe biopesticde on
the landscape cell in which they are located at a rate set by
pesticide_consume
, which can also be a vector of up to 10
elements (one for each pesticide type). Biopesticide consumption rates
can also be specified as a trait. Biopesticide consumption occurs in a
random order of individuals.min_age_move
(there are more complex rules available for
feeding and moving; the simplest is just moving once in any
direction).The simulation ends after a fixed number of time steps
(time_steps
) have passed, or if the population goes
extinct. I’ll now show a very simplified example of a simulation being
run, then a more complicated example.
In this simulation, I will just consider a very simple scenario of two traits, which will model the rate at which food is consumed (Trait 1) and the rate at which pesticide is imbibed (Trait 2). Just to get started, let each of these traits have a variance of 1 and a covariance of about 0.5 to model the idea that the rate at which you consume pesticide increases with the rate at which you consume food.
## [,1] [,2]
## [1,] 1.0 0.5
## [2,] 0.5 1.0
Using the mine_gmatrix
function with similar parameter
values tried above, I got a genome that produces the following matrix
given random standard normal allele values.
## [,1] [,2]
## [1,] 1.0186185 0.5655697
## [2,] 0.5655697 1.0950818
A convenient part of the code that I’ve not yet mentioned is that we
can replace several numerical parameter arguments with evolving traits.
This is possible for move_distance
,
food_needed_surv
, pesticide_tolerated_surv
,
food_needed_repr
, pesticide_tolerated_repr
,
mating_distance
, lambda_value
(raw
reproduction value), movement_bouts
,
food_consume
, and pesticide_consume
. If, for
example, we want movement distance in cells to be defined by trait one,
we just set move_distance = "T1"
(in this case, the value
will be floored since cells are discrete). With that in mind, we can run
the following simulation.
sim <- run_farm_sim(mine_output = mg,
N = 1000,
neutral_loci = 1000,
xdim = 40,
ydim = 40,
repro = "sexual",
max_age = 5,
selfing = FALSE,
food_consume = "T1",
pesticide_consume = c("T2", 0),
food_needed_surv = 1,
food_needed_repr = 1,
reproduction_type = "food_based",
pesticide_tolerated_surv = 0,
pesticide_rotation_type = 2,
min_age_reproduce = 3,
max_age_feed = 2,
farms = 4,
time_steps = 120,
mutation_pr = 0.01,
crossover_pr = 0.01,
net_mu_layers = 0,
crop_rotation_time = 4,
pesticide_rotation_time = 4,
crop_per_cell = 2,
pesticide_per_cell = 0,
crop_number = 1,
pesticide_number = 2,
K_on_birth = 1000000,
print_last = TRUE
);
Note that the mine_output
is just the whole vector
output we got from the mine_gmatrix
function.
N
is the initialsed number of individuals,
xdim
and ydim
set the landscape dimensions (so
we have a 40 by 40 square landscape). Individuals are sexual but cannot
self. Note how food_consume = "T1"
, so the amount of food
consumed by each individual will be determined by its value in trait 1
(had we set food_consume = 1
, then all individuals would
consume 1 unit of food; note that individuals cannot consume negative
food). The amount of pesticide 1 consumed by an individual will be
determined by "T2"
, while no individuals consume pesticide
2 (which can just be a trick for making some of the 4 farms
not apply any pesticide – since anything applied has no effect). Note
that we indicate one crop by crop_number = 1
and two
pesticides by pesticide_number = 2
(the second one being
irrelevant), and each cell applies 0 pesticide
(pesticide_per_cell = 0
), so this is actually a null model
in which no pesticide is applied at all.
The argument K_on_birth
limits the number of eggs that
can be made in the population; if the population goes over, then eggs
are randomly removed (this prevents the time consuming process of some
individuals being initialised unnecessarily – if they were just going to
die immediately anyway). The argument print_last
indicates
that we want to print all of the information (everything, genome and
all) for each individual in the last generation. The resulting file is
often in the 100s of MBs.
After the simulation runs, we get a file
population_data.csv
that gives information for each
generation on the time step, population size, mean age, mean food
consumed, mean pesticide consumed, mortality rate, mean food consumed of
each type, mean pesticide consumed of each type, mean trait values, and
mean inbreeding coefficient. Here’s a plot of population size, trait 1
value, trait 2 value, and mortality rate.
This makes sense (ignore the small spikes caused by the age
structure). The population size increases, the trait 1 mean increases as
individuals evolve to consume more food (which gives higher reproductive
output), and trait 2 gets dragged along due to the correlation with
trait 1 (despite not having any effect). Now we can see what happens
when some of the farms start to apply pesticide by running the same
model with pesticide_per_cell = 0
.
Note that the population size grows more slowly now, and that while trait 1 underlying food consumption evolves to be positive, the value is not as high as it was without the pesticide. This is because trait 2 now evolves to be a negative value, and traits 1 and 2 are positively correlated.
The plot above shows traits 1 and 2 for all individuals in the population where no pesticide is applied. The correlation here is 0.3219144. We can compare this to a plot of the same traits for the simulation in which pesticide was applied.
Note the shifted position of the y-axis, in particular, as selection
pushes pesticide consumption trait values down while pushing food
consumption values up. Even though the network itself has not changed
(i.e., standard random normals will still give individuals with traits
of the original correlation structure), the correlation between trait 1
and 2 is now 0.0577854. This is interesting, to me at least. I’m also
not entirely sure if I understand what is maintaining the variance in
traits, nor did I expect the realised correlation to change so much just
as a consequence of alleles being selected. Note that if you set
net_mu_layers = 4
so that the entire genome can evolve, you
get a strong negative covariance in 400 generations.
We can have a look how each trait is correlated with fitness, just measured by offspring production. Here is offspring production against the food consumption trait.
And here is offspring production against pesticide consumption trait.
The correlations are 0.0636287 and 0.0300173 for food consumption (trait 1) and pesticide consumption (trait 2) with offspring produced, respectively. This would seem to suggest that selection is quite weak after 120 generations? We could dive deeper into the mechanics of this if we wanted to, looking at how much food each individual consumed, exactly, and how much pesticide each individual consumed (and when, even, if needed). I’m tempted to let these run for a while longer to see if the correlation between traits drops to negative, and wondering if there is some reason for us to see a pattern of a ‘trade-on’ despite a hard coded loci to trait genome that causes a trade-off. Maybe there’s a reason to expect mutation of allele values alone to easily find a way to create this pattern, meaning that constraints are not as important as would be intuitive? I definitely need to think more about it.
I have now also looked through the following simulation scenarios in which there are four crops and two traits.
library(helicoverpa);
set.seed(Sys.time());
mg <- readRDS(file = "mg.rds");
sim <- run_farm_sim(mine_output = mg,
N = 10000,
neutral_loci = 1000,
xdim = 100,
ydim = 100,
repro = "biparental",
max_age = 10,
selfing = FALSE,
food_consume = c("T1", "T2"),
pesticide_consume = c("T3", "T4"),
food_needed_surv = 1,
food_needed_repr = 1,
reproduction_type = "food_based",
pesticide_tolerated_surv = 0,
pesticide_rotation_type = 2,
crop_rotation_type = 2,
min_age_reproduce = 8,
max_age_feed = 6,
lambda_value = 1.5,
farms = 16,
time_steps = 600,
mutation_pr = 0.01,
crossover_pr = 0.01,
net_mu_layers = 0,
crop_rotation_time = 24,
pesticide_rotation_time = 24,
crop_per_cell = 2,
pesticide_per_cell = 2,
crop_number = 2,
pesticide_number = 2,
K_on_birth = 1000000,
min_age_move = 8,
age_food_threshold = 7,
min_age_feed = 1,
pesticide_start = 200,
);
These are a bit more high powered, but the general idea is that we now have four traits, two underlying crop consumption (one for each crop) and two underlying pesticide consumption (one for each pesticide). Here is what the covariance matrix was set to be.
## [,1] [,2] [,3] [,4]
## [1,] 1.0 -0.5 0.2 0.2
## [2,] -0.5 1.0 0.2 0.2
## [3,] 0.2 0.2 1.0 -0.5
## [4,] 0.2 0.2 -0.5 1.0
Here is what the mining process came up with at some reasonable constraints.
## [,1] [,2] [,3] [,4]
## [1,] 0.8969052 -0.5597788 0.1942680 0.1290581
## [2,] -0.5597788 0.9774151 0.1447646 0.3630678
## [3,] 0.1942680 0.1447646 1.0671198 -0.5870404
## [4,] 0.1290581 0.3630678 -0.5870404 1.0399424
The general idea is that if you are good at consuming one crop, then you are probably not so good at consuming another crop. If you are good at consuming either crop, then you probably also imbibe a lot of pesticide. And if you are good at avoiding pesticide consumption in one trait, then you probably are in the other. I’ve run simulations above for all combinations of the following:
asexual
, sexual
, and
biparental
)When rotated, each of 16 farms
independently selects a
crop and a pesticide at random. Below, I’m just going to show plots of
population size, mortality rate, and each of the four trait values over
time. I’ve run all of these simulations for 300 generations, and I’m
considering the first 100 as a burn in before the pesticide is added.
The pesticide is then added for the first time on time step 210. I’ve
indicated these time steps with red vertical lines on the plots below.
Note that these are just single runs – we would need to run many
replicates to make robust inferences.
No rotation: Asexual population
No rotation: Sexual population
No rotation: Biparental population
Pesticide rotation only: Asexual population
Pesticide rotation only: Sexual population
Pesticide rotation only: Biparental population
Crop and pesticide rotation: Asexual population
Crop and pesticide rotation: Sexual population
Crop and pesticide rotation: Biparental population
These simulations show some interesting dynamics, though we will need to think more carefully before running them at a larger scale. Despite having population sizes on the order of tens of thousands, these ran quite quickly, with most simulations finishing up within 10-20 minutes. It would not be difficult to run a huge number in 1-2 months if necessary. Annoyingly, when we increase by another order of magnitude, the pace is glacial, with me stopping after trying it after 72 hours and no more than 70 generations. We might be able to get a bit more power through some creative refactoring of the code, and a bit if I made a version entirely in C. The longest step by far is sexual reproduction and allocating new offspring with diploid genomes. Haploid genomes and asexual populations are much, much faster in comparison.
I am thinking that it might be worthwhile to monitor the allelic diversity in the population in some way. Individuals are initialised with unique alleles, which at first I feared my be a bit unrealistic, but the population size starts out small, so perhaps not so much. We can always allow more burn-in to be safe.
It is nice to recover the sudden drop given the application of pesticide followed by the evolution of resistance and increase in population soon after. The strong selection is also reflected in the evolution of traits such that both pesticide traits are negative (indicating resistance).
One thing that concerned me at first was that only one of the crop feeding traits is positive in each simulation. This means that the population is quickly specialising on one type of food resource. But this is, I think, exactly what we would predict from theory in an evolving population. Evolutionary branching should only happen under some specific circumstances, which we would need to parameterise explicitly if we wanted to recreate. Perhaps we need to have a bit weaker of a negative correlation for feeding, or model multiple specialist populations?
Work Package 1C develops an individual-based model to investigate how landscape diversity affects the maintenance of susceptibility alleles, and will focus especially on the scale at which diversifying biopesticide strain use and crop species production can overcome pest resistance in the long term. In our prototype model written in R, we did this with a very simple genetic architecture that used matching alleles at a single locus to determine pesticide resistance and feeding ability across crops. While this was useful as an initial model to get a general idea of the scale at which different biopesticide application and crop production will need to be applied, the matching alleles model forces a strong negative genetic correlation between biopesticide resistance and feeding ability across crops. This is unrealistic, and a large part of this work package will be to try to understand the scale at which diversity is needed when the genetic architecture of pests is more realistic. A major modelling goal is to develop a method for constructing a more realistic genetic architecture for simulating pest management in silico, such that arbitrary and potentially evolving genetic correlations between pest traits (resistance, feeding efficiency) can be constructed. This entry is an initial plan for this genetic architecture; what follows is just the starting point, all subject to further revision.
When modelling the evolution of pest resistance and crop feeding ability, we need to get from pest genome to pest traits. Everything in between is the challenging bit, but these two end points are actually fairly straightfoward.
Genome can be modelled as a vector of real numbers. Each individual pest gets its own genome, which is essentially a very long vector in which each element is a single locus. There are relevant details to fill in later. For example, there are several ways to model how alleles work. One of the simplest ways is an infinite alleles model, in which each locus holds a real number of some arbitrary value. New mutations are then introduced by randomly selecting a new real number from some distribution. A slightly more challenging way (computationally) is to have some fixed number of discrete alleles that are possible for an individual to have (e.g., 100 alleles at each locus), but I do not think that this is the best way to go unless there is some compelling reason for it. Ploidy level can also be determined in advance; the simplest is haploid, but diploid organisms can be created simply by pairing vector elements (loci) and having rules for homozygosity and heterozygosity. In the end, however, we will have a long vector of numbers (1000s to 10000s of elements) with values that affect pest traits in some way, and are expected to change over the course of the simulation (evolution).
Traits can be modelled, again, as some vector of real numbers. Each individual gets their own values for a fixed number of ‘traits’ (e.g., sex, location, dispersal ability, resistance to a particular biopesticide, ability to feed on a particular crop). Any number of traits are possible, making the model as complex as it needs to be, but the idea is that each pest has its own set of values that make it unique. Some traits will not be directly related to the genome (e.g., location on the landscape). But most trait values, and especially bipesticide resistance and feeding ability, will be some function of the underlying genome values.
The challenge is to get from genome to traits in a way that allows traits to be quantitative and correlated in some realistic and potentially evolving way. Below is a starting point, though the details are likely to change. Feel free to leave comments here
A diploid genome will be constructed with \(N_{L}\) total loci, so that each pest genome will represented by a vector of length \(2L\). Vector elements will be able to take any real number, thereby modelling potentially infinite alleles. Allele values (i.e., the numbers at the loci) will be used in a function that ultimately determines the values of up to \(N_{T}\) pest traits. The simplest way to do this is to just have a single locus for a single trait with additive effects of alleles. For example, for a locus \(L_{i}\) with alleles \(A_{i1}\) and \(A_{i2}\), the value of \(T_{i} = 1/2(A_{i1} + A_{i2})\). More generally, any different rules for making a trait (e.g., complete or partial dominance) could also be modelled by setting a function \(T_{i} = f(A_{i1}, A_{i2})\). Similarly, we can also make traits polygenic by having them determined by multiple loci. For example, we might want five traits, each of which take values that are determined by 10 total loci in the genome (20 alleles). So, e.g., \(T_{1} = \sum_{i=1}^{10}(A_{i1} + A_{i2})\), \(T_{2} = \sum_{i=11}^{20}(A_{i1} + A_{i2})\), etc. This would give us five traits, each of which takes a value that is the sum of 20 individual real numbers within the genome.
One way to allow for correlated traits would be to have two separate trait values (e.g., \(T_{1}\) and \(T_{2}\)) be determined by some overlapping set of loci. Correlation between traits could be adjusted by adjusting the size of this overlap. For complete correlation, e.g., the two traits would be determined by the exact same loci. Negative correlations could be constructed by inverting the effects of loci such that \(L_{i} \to T_{1}\) but \(-L_{i} \to T_{2}\). Still, this means that the correlations must be set a priori, and sets some restrictions on the kinds of correlations that are possible (e.g., precision as a consequence of the total number of alleles that can be overlapping versus non-overlapping). Hence, I propose a slightly different construction, which will allow us to go from genome to traits in a way that allows us to set genetic correlations without needing to fix them a priori through some pre-specified mechanism. First, here is a simplified version of what I want to try.
The basic idea
Assume a large population of individuals with only two haploid loci \(L0\) and \(L1\), and two traits \(T0\) and \(T1\). There are four possible links between \(L_{i} \to T_{j}\), which we can call \(X_{ij}\) describing the link between \(L_{i}\) and \(T_{j}\): \(X_{00}\), \(X_{01}\), \(X_{10}\), and \(X_{11}\). Assume that \(X_{00} = 1\), \(X_{01} = 0.3\), \(X_{10} = 0\), and \(X_{11} = \sqrt{1 - 0.3^{2}}\). That is,
\[T_{0} = X_{00},\]
while,
\[T_{1} = \sqrt{1 - 0.3^{2}} X_{11} + 0.3 X_{01}.\]
If we randomly sample \(L0 \sim \mathcal{N}(0, 1)\) and \(L1 \sim \mathcal{N}(0, 1)\), then trait variances equal \(1\), while \(cov(T_{0}, T_{1}) = 0.3\). We can confirm this by simulating a population of \(N = 100000\) individuals.
N <- 100000;
inds <- matrix(data = 0, nrow = N, ncol = 9);
colnames(inds) <- c("ID", "L0", "L1", "X00", "X01", "X10", "X11", "T0", "T1");
inds[,1] <- 1:N;
inds[,2] <- rnorm(n = N, mean = 0, sd = 1);
inds[,3] <- rnorm(n = N, mean = 0, sd = 1);
inds[,4] <- 1;
inds[,5] <- 0.3;
inds[,6] <- 0;
inds[,7] <- sqrt(1 - 0.3^2);
inds[,8] <- (inds[,2] * inds[,4]) + (inds[,3] * inds[,6]);
inds[,9] <- (inds[,2] * inds[,5]) + (inds[,3] * inds[,7]);
So the individuals are in rows, while their attributes (IDs, genomes, interactions, traits) are in columns.
## ID L0 L1 X00 X01 X10 X11 T0 T1
## [1,] 1 1.2151489 -0.7894837 1 0.3 0 0.9539392 1.2151489 -0.3885747
## [2,] 2 1.1639301 0.6970085 1 0.3 0 0.9539392 1.1639301 1.0140827
## [3,] 3 0.6699372 0.3480979 1 0.3 0 0.9539392 0.6699372 0.5330454
## [4,] 4 -1.5380836 -1.0903007 1 0.3 0 0.9539392 -1.5380836 -1.5015056
## [5,] 5 -0.5447039 -0.2878480 1 0.3 0 0.9539392 -0.5447039 -0.4380007
## [6,] 6 1.5296328 0.9251678 1 0.3 0 0.9539392 1.5296328 1.3414436
We can look at the covariance between traits \(T0\) and \(T1\) in columns 8 and 9 (which might then code for ‘resistance ability’ or ‘feeding efficiency’, or something like it).
cov(inds[,8:9]);
## T0 T1
## T0 1.0156514 0.3069984
## T1 0.3069984 1.0093656
This is roughly what we want as our variance covariance matrix. One really simple way to acheive this for five traits, for example, could be just to consider five loci (or blocks of some number of loci) \(L_{i}\) and all of the interactions between locus \(i\) and \(j\), \(X_{ij}\) to get to each trait.
In the figure above, five loci \(L_{i}\) are connected to each trait \(T_{j}\) by arrows \(X_{ij}\). Arrow values could be represented
within the genome, as with the individual attributes
"X00"
, "X01"
, "X10"
, and
"X11"
in the toy example above. For a given desired
covariance matrix of trait values, calculations of \(X_{ij}\) become longer as trait number
increases, but I should be able to write the code to get this for an
arbitrary number of traits.
This is a simple way of doing things, but I think that it still leaves something to be desired. If we want the genetic covariances themselves to evolve (rather than just the traits via changing allele frequences in \(L_{i}\)), then we are forced to do it in a simplified way in which genetic covariances will almost inevitably evolve to become positive so that a ‘good’ value of \(T0\) and \(T1\) can be assured by a joint change with a change in \(L0\) by modifying \(X_{01}\) accordingly. In other words, any variation stuck into the genome that modifies \(X_{ij}\) values can be done so that there is directional selection increasing the fitness of \(T_{j}\) via change in \(L_{i}\), I think. What we really want is to tangle things so that changes along the path from alleles to traits is more complex. Below is my initial plan for increasing the complexity in such a way as to make the genetic architecture itself evolve a bit more realistically.
Adding some complexity
I propose the addition of multiple hidden layers beween loci and traits, effectively building something resembling a neural network (see here for video), but perhaps loosely interpreting these layers as steps along biochemical or developmental pathways from alleles to traits affecting fitness.
These hidden layers (blue) can now make each trait value a linear combination of all of the paths from each locus to each trait. For example, the value of \(L0\) can now get to \(T0\) in up to 100 different ways. Each link between nodes has its own value to modify the effect of a locus’ value on the end value of each trait. For example if the link from \(L0 \to H0 = 0.5\), \(H0 \to H0 = 0.5\), and \(H0 \to T0 = -1\), then this one complete path causes \(L0\) to decrease \(T0\) by \(L0(0.5 \times 0.5 \times -1) = -0.25L0\). This effect would then be added to the effects of all 99 other possible paths from \(L0\) to \(T0\) (I think it would probably make sense in the end to restrict values of edges to \(0 \leq e_{i} \leq 1\); we could also fix some edge values to be zero, perhaps giving a more realistic network structure).
This looks like a mess, but that’s kind of the point. Having many paths from loci to traits should give us the freedom to allow genetic covariance to evolve with some restrictions. As with the example above, we can have values for the 250 edges that form the paths from loci to traits, but the hidden layers give us more flexibility in setting evolutionary constraints.
We could, for example, allow e1 to e250 above to be represented by potentially changing values in an individual’s genome (through mutation, selection, and drift). Given this, selection across all L0-L9 and e1-e250 would almost certainly rapidly increase fitness of all traits. But we could also, e.g., block mutation from happening at any of the hidden layers after initialising, e.g., e1-e200 to random values (i.e., alleles would have polygenic effects that potentially lead to fitness tradeoffs, but only parts or proportions of the genetic architecture that we want to evolve actually could). Or we could change the initial genetic variation or mutation rates of edges relative to loci (e.g., make the genetic architecture itself evolve more or less rapidly). We could then ask several potentially interesting questions by controlling the extent to which genetic covariances can evolve and how processes such as drift might affect loci that underly traits versus loci that underly covariance among traits.
To initialise a desired genetic covariance matrix, an evolutionary algorithm could be used. Basically, this would iteratively tweak the hidden layer edge values and check for a better or worse fit to the desired trait covariance matrix. The advantage of this would be that we could tell a function to build a genetic architecture that confroms to a covariance matrix without forcing it to specify how (i.e., the genetic architecture would be ‘found’ rather than imposed a priori, reducing the number of assumptions that we need to make).
Alternatively, I think it might be reasonable to introduce selection on traits by modelling a process of sequential invasion. That is, initialise all traits at some random value, but don’t immediately impose selection. Hence, the traits and the genetic architecture underlying them could be random and selectively neutral for a burn-in period before introducing some sort of selective pressure (e.g., after 100 generations of burn-in, feeding efficiency gets turned on such that there is selection on feeding efficiency for one crop, \(T0\)). One by one, new selection pressures could arise. This could avoid having to make the assumption that the genetic architecture evolves as a consequence of early simultaneous selection for multiple traits – it’s more realistic to assume that some traits arise and evolve first (e.g., the ability to feed on one crop), then a new selective pressure arise after an evolved genome and genetic architecture already exists.
Plans for the full helicoverpa R package
Given successful funding of the ENDORSE project, these notes will now continue in development of work package C1c of the project. The following very rough timeline is suggested.
This timeline is subject to revision, but the initial steps are more detailed planning for the software and model, with input from colleagues across all work packages.
Success with the graphical display
While I did not update this journal during its development very much, the graphical display looks really good, and received positive feedback from colleagues in Brazil.
Simulations for Phase II
While the toy model of the graphical display
demonstrated the key points, a couple more functions are needed to do
some initial simulations to get results to put in Phase II. First, new
movement rules are needed for more realistic dispersal of
Helicoverpa among cells. This is done now with the
toy_pois_move_pest
function. In the function below, for
each pest, movement in a time step proceeds by (1) selecting how many
jumps that the pest gets by sampling from a poisson distribution with
mean dist
, then moving up to dist
cells away
in any direction during each jump (uniform probability of all cells
within dist
of the original cell).
# Simulate Poisson movement of toys -- a nasty loop here -- faster in C
toy_pois_move_pest <- function(PEST, LAND, dist = 1){
pest_n <- dim(PEST)[1];
xlim <- dim(LAND)[2];
ylim <- dim(LAND)[1];
distance <- rpois(n = pest_n, lambda = dist);
for(pest in 1:pest_n){
move <- distance[pest];
while(move > 0){
move_x <- sample(x = -move:move, size = 1);
move_y <- sample(x = -move:move, size = 1);
PEST[pest, 3] <- PEST[pest, 3] + move_x;
PEST[pest, 4] <- PEST[pest, 4] + move_y;
move <- move - 1;
}
while(PEST[pest, 3] < 1){
PEST[pest, 3] <- PEST[pest, 3] + xlim;
}
while(PEST[pest, 3] > xlim){
PEST[pest, 3] <- PEST[pest, 3] - xlim;
}
while(PEST[pest, 4] < 1){
PEST[pest, 4] <- PEST[pest, 4] + ylim;
}
while(PEST[pest, 4] > ylim){
PEST[pest, 4] <- PEST[pest, 4] - ylim;
}
}
return(PEST);
}
We can check out the distribution of distance away from the original cell when the above function is implemented.
LAND <- toy_initialise_land(xdim = 50, ydim = 50, pathogens = 1, crops = 1);
PEST_1 <- toy_initialise_pest(LAND, N = 10000);
PEST_2 <- toy_pois_move_pest(PEST_1, LAND, dist = 5);
diffx_1 <- (PEST_1[,3] - PEST_2[,3])^2;
diffx_2 <- (PEST_1[,3] - PEST_2[,3] + 50)^2; # Account for the torus
diffx_3 <- (PEST_1[,3] - PEST_2[,3] - 50)^2;
diffy_1 <- (PEST_1[,4] - PEST_2[,4])^2;
diffy_2 <- (PEST_1[,4] - PEST_2[,4] + 50)^2; # Account for the torus
diffy_3 <- (PEST_1[,4] - PEST_2[,4] - 50)^2;
diffx <- rep(x = 0, times = 10000);
diffy <- rep(x = 0, times = 10000);
for(i in 1:10000){ # Assuming no one goes too far
diffx[i] <- min(c(diffx_1[i], diffx_2[i], diffx_3[i]));
diffy[i] <- min(c(diffy_1[i], diffy_2[i], diffy_3[i]));
}
dist_mv1 <- sqrt(diffx + diffy);
par(mar = c(5, 5, 2, 1));
hist(dist_mv1, main = "", col = "grey", cex.lab = 1.5, cex.axis = 1.5,
xlab = "Total distance travelled (km)", ylab = "Count",
ylim = c(0, 2500), xlim = c(0, 50));
The next part is to initialise the land in blocks so that we can control how big of an area is deadicated toward a paricular fungal biopesticide and crop area. We want to be able to control how many blocks are put down over the landscape. The code below does this, albeit in not the most efficient way.
toy_block_land <- function(xdim, ydim, pathogens, crops, block_len){
cells <- xdim * ydim;
cell_left <- cells %% (block_len * block_len);
if(cell_left > 0){
stop("Landscape cannot be divided into squares of size 'block_len'");
}
LAND <- array(data = 0, dim = c(xdim, ydim, 3));
tx0 <- 1;
tx1 <- block_len;
ty0 <- 1;
ty1 <- block_len;
block_num <- 1;
while(tx0 <= xdim & ty0 <= ydim){
LAND[tx0:tx1, ty0:ty1, 1] <- block_num;
tx0 <- tx1 + 1;
tx1 <- tx1 + block_len;
if(tx0 > xdim){
tx0 <- 0;
tx1 <- block_len;
ty0 <- ty1 + 1;
ty1 <- ty1 + block_len;
}
block_num <- block_num + 1;
}
block_num <- block_num - 1;
for(block in 1:block_num){
rand_path <- sample(x = 1:pathogens, size = 1);
rand_crop <- sample(x = 1:crops, size = 1);
for(xx in 1:xdim){
for(yy in 1:ydim){
if(LAND[xx, yy, 1] == block){
LAND[xx, yy, 2] <- rand_path;
LAND[xx, yy, 3] <- rand_crop;
}
}
}
}
return(LAND);
}
One reason I don’t mind all of the loops in the toy model is that
they will need to be used when this is translated to C – there’s not
really a way around it (I don’t think), so there’s no reason to worry
too much about making everything elegant and avoiding R loops. It does
take a while for landscapes to be initialised in the toy model, but not
nearly as long as it takes most generations to run (at least, not ones
with lots of pests). Here are some ways of using
toy_block_land
to make landscapes. I’ll show fungal
biopesticide blocks being put down randomly below
Block length 1: Each cell is its own block, corresponding to applying fungal biopesticides in each 1 km by 1 km square on the landscape.
LAND1 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
block_len = 1);
par(mar = c(1, 1, 1, 1));
image(LAND1[,,2], xaxt = "n", yaxt = "n");
Block length 5: Each cell is its own block, corresponding to applying fungal biopesticides in each 2 km by 2 km square on the landscape.
LAND5 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
block_len = 5);
par(mar = c(1, 1, 1, 1));
image(LAND5[,,2], xaxt = "n", yaxt = "n");
Block length 10: Each cell is its own block, corresponding to applying fungal biopesticides in each 10 km by 10 km square on the landscape.
LAND10 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
block_len = 10);
par(mar = c(1, 1, 1, 1));
image(LAND10[,,2], xaxt = "n", yaxt = "n");
Block length 25: Each cell is its own block, corresponding to applying fungal biopesticides in each 2 km by 2 km square on the landscape.
LAND25 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
block_len = 25);
par(mar = c(1, 1, 1, 1));
image(LAND25[,,2], xaxt = "n", yaxt = "n");
Of course, for even the Phase II toy simulations, we’ll want a bigger landscape. I’ll try to do this for 1000 km by 1000 km and see how it goes.
Update to transferring to JSON
The function below has now been modified, along with the outer function for simulating resistance.
toy_simulate_resistance <- function(generations = 20, # Generations to sim
xdim = 2, # Land dimension 1
ydim = 2, # Land dimension 2
pathogens = 1, # Pathogen strains
crops = 1, # Crop species
path_alleles = 3, # Pathogen alleles
crop_alleles = 3, # Crop alleles
pest_init = 2000, # Initial pests
crop_rotate = "static", # Crops rotated
path_rotate = "static", # Pathogens rotated
pest_move_pr = 0.1, # Pest movement
pest_move_dist = 1, # Pest move distance
fecundity = 8, # Offspring per fem
cell_K = 2000 # K per cell
){
if(pest_move_dist > xdim & pest_move_dist > ydim){
pest_move_dist <- max(c(xdim, ydim)); # Avoids error
}
# Start initialising the landscape and pests
LAND <- toy_initialise_land(xdim = xdim, ydim = ydim,
pathogens = pathogens, crops = crops);
PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = path_alleles,
c_al = crop_alleles);
# Start the generations
PEST_DATA <- NULL;
LAND_DATA <- NULL;
gen <- 1;
while(gen < generations){
LAND <- toy_set_crops(LAND, crops, crop_rotate);
LAND <- toy_set_paths(LAND, pathogens, path_rotate);
PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);
PEST <- toy_feed_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
break;
}
PEST <- toy_kill_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST <- toy_reproduce_pest(PEST, LAND, path_alleles, crop_alleles,
fecundity, cell_K);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST_DATA[[gen]] <- PEST;
LAND_DATA[[gen]] <- LAND;
gen <- gen + 1;
}
return(list(PEST_DATA = PEST_DATA, LAND_DATA = LAND_DATA));
}
results_to_json <- function(pest, land, printit = TRUE, filename = "sim.json"){
if("package:jsonlite" %in% search() == FALSE){
stop("Error: Need to load the R package 'jsonlite'")
}
inds <- dim(pest)[1];
p_geno <- rep(x = 0, times = inds);
c_geno <- rep(x = 0, times = inds);
path <- rep(x = 0, times = inds);
crop <- rep(x = 0, times = inds);
r_path <- rep(x = 0, times = inds);
r_crop <- rep(x = 0, times = inds);
for(i in 1:inds){ # Doing this a bit lazily, without refactoring the rest
p_geno[i] <- as.numeric( paste(pest[i,5], pest[i,6], sep = ""));
c_geno[i] <- as.numeric( paste(pest[i,7], pest[i,8], sep = ""));
xloc <- pest[i, 3];
yloc <- pest[i, 4];
path[i] <- land[xloc, yloc, 2];
crop[i] <- land[xloc, yloc, 3];
if(pest[i,5] == path[i] | pest[i,5] == path[i]){
r_path <- 1;
}
if(pest[i,7] == crop[i] | pest[i,8] == crop[i]){
r_crop <- 1;
}
}
data <- matrix(data = 0, nrow = inds, ncol = 10);
data[,1] <- pest[,1];
data[,2] <- pest[,2];
data[,3] <- pest[,3];
data[,4] <- pest[,4];
data[,5] <- path;
data[,6] <- crop;
data[,7] <- p_geno;
data[,8] <- c_geno;
data[,9] <- r_path;
data[,10] <- r_crop;
colnames(data) <- c("ID", "sex", "xloc", "yloc", "path", "crop",
"p_geno", "c_geno", "resist_path", "eat_crop");
modsim <- list( traits = colnames(data),
values = unname(apply(data, 1,
function(x) as.data.frame(t(x))))
);
sim_json <- toJSON(list(traits = names(modsim), values = modsim),
pretty = TRUE);
if(printit == TRUE){
write(sim_json, filename);
}
return(sim_json);
}
We can run things the same way to generate the output. Note that the
output from toy_simulate_resistance
has changed, so now we
need to read the list into results_to_json
. The list
contains the elements PEST_DATA
and
LAND_DATA
.
library(jsonlite);
##
## Attaching package: 'jsonlite'
## The following object is masked from 'package:shiny':
##
## validate
sim <- toy_simulate_resistance(pathogens = 3, crops = 3);
json <- results_to_json(pest = sim$PEST_DATA[[19]], land = sim$LAND_DATA[[19]],
filename = "toy_model/data/sample_sim.json");
The output is here.
Transferring simulation results to json
I’ve written a short bit of code to link the simulation results to
the code. The input for this is a single generation of the
model (this could be easily changed). So if we run the
toy_simulate_resistance
function as before, we get a list with elements being
each generation.
sim <- toy_simulate_resistance(crops = 3, pathogens = 3);
We can look at just the last generation below.
head(sim[[19]]);
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 139435 0 1 1 1 1 3 3
## [2,] 139436 1 1 1 1 1 3 3
## [3,] 139437 1 1 1 1 1 3 3
## [4,] 139438 0 1 1 1 1 3 3
## [5,] 139439 0 1 1 1 1 3 3
## [6,] 139440 1 1 1 1 1 3 3
The code below sticks all of this into the json format (for the one
generation), prints the json output to a file if
printit == TRUE
, and returns the json output.
results_to_json <- function(sim, printit = TRUE, filename = "sim.json"){
if("package:jsonlite" %in% search() == FALSE){
stop("Error: Need to load the R package 'jsonlite'")
}
colnames(sim) <- c("ID", "sex", "xloc", "yloc", "p_al_1", "p_al_2",
"c_al_1", "c_al_2");
modsim <- list( traits = colnames(sim),
values = unname(apply(sim, 1,
function(x) as.data.frame(t(x))))
);
sim_json <- toJSON(list(traits = names(modsim), values = modsim),
pretty = TRUE);
if(printit == TRUE){
write(sim_json, filename);
}
return(sim_json);
}
Using the function below to print and return the json. This relies on
the jsonlite
package, which must be installed and
loaded.
library(jsonlite);
sim_json <- results_to_json(sim = sim[[19]],
filename = "toy_model/data/sample_sim.json");
The sim_json
output should be easy to modify as
necessary. I’ve not added it here because it’s much too large for the
notebook, but it can be accessed on the dev branch in the file sample_sim.json.
More playing around with the toy model
Following up on yesterday, I have tweaked two functions to allow us to manipulate the number of alleles underlying pathogen resistance and crop feeding. In effect, this allows us to see how selection can sweep through to nearly fix an allele for resistance to a particular pathogen or for feeding on a particular crop (since only one allele is needed, other neutral alleles can hide in the heterozygotes). But, when crops are rotated, or pathogens are rotated, we have higher genetic diversity – note that we are making an implicit assumption of a strong negative genetic correlation; pests can only have two alleles for resistance, e.g., and if there are three pathogens, then all pests must necessarily be susceptible to 1-2 pathogens.
# ==============================================================================
# This is just a toy version of the model, for the presentation
# ==============================================================================
# Note that I'm adding 'toy' at the start of each function; this is just in case
# we actually want to keep some of these functions in the eventual R package,
# but also don't want to worry about not being able to name functions like
# "initialise_land" the same here as elsewhere in the eventual bigger model.
# ==============================================================================
toy_simulate_resistance <- function(generations = 20, # Generations to sim
xdim = 2, # Land dimension 1
ydim = 2, # Land dimension 2
pathogens = 1, # Pathogen strains
crops = 1, # Crop species
path_alleles = 3, # Pathogen alleles
crop_alleles = 3, # Crop alleles
pest_init = 2000, # Initial pests
crop_rotate = "static", # Crops rotated
path_rotate = "static", # Pathogens rotated
pest_move_pr = 0.1, # Pest movement
pest_move_dist = 1, # Pest move distance
fecundity = 8, # Offspring per fem
cell_K = 2000 # K per cell
){
if(pest_move_dist > xdim & pest_move_dist > ydim){
pest_move_dist <- max(c(xdim, ydim)); # Avoids error
}
# Start initialising the landscape and pests
LAND <- toy_initialise_land(xdim = xdim, ydim = ydim,
pathogens = pathogens, crops = crops);
PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = path_alleles,
c_al = crop_alleles);
# Start the generations
PEST_DATA <- NULL;
gen <- 1;
while(gen < generations){
LAND <- toy_set_crops(LAND, crops, crop_rotate);
LAND <- toy_set_paths(LAND, pathogens, path_rotate);
PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);
PEST <- toy_feed_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
break;
}
PEST <- toy_kill_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST <- toy_reproduce_pest(PEST, LAND, path_alleles, crop_alleles,
fecundity, cell_K);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST_DATA[[gen]] <- PEST;
gen <- gen + 1;
}
return(PEST_DATA);
}
summarise_pest_data <- function(PEST_DATA){
# Density estimates
den_list <- unlist(lapply(PEST_DATA, dim));
den_vect <- den_list[c(TRUE, FALSE)];
gens <- length(PEST_DATA);
p_alleles <- max(c(PEST_DATA[[1]][,5], PEST_DATA[[1]][,6]));
c_alleles <- max(c(PEST_DATA[[1]][,7], PEST_DATA[[1]][,8]));
p_tabl <- matrix(data = 0, nrow = gens, ncol = p_alleles);
c_tabl <- matrix(data = 0, nrow = gens, ncol = c_alleles);
# Allele frequencies
for(gen in 1:length(PEST_DATA)){
total_p_alleles <- length(PEST_DATA[[gen]][,5:6]);
total_c_alleles <- length(PEST_DATA[[gen]][,7:8]);
for(allele in 1:p_alleles){
allele_count <- sum(PEST_DATA[[gen]][,5:6] == allele);
p_tabl[gen, allele] <- allele_count / total_p_alleles
}
for(allele in 1:c_alleles){
allele_count <- sum(PEST_DATA[[gen]][,7:8] == allele);
c_tabl[gen, allele] <- allele_count / total_c_alleles
}
}
return(list(densities = den_vect, pathogen_fr = p_tabl, crop_fr = c_tabl));
}
I’m going to run the same analysis as yesterday, but now show how allele frequencies change over generations. I’ve not broken this down by landscape cell yet, but this will be easy to do (just let me know the format the data would be best presented in). In each case, now there will be 3 alleles underlying pathogen resistance, and 3 alleles underlying crop feeding.
sim1 <- toy_simulate_resistance(pathogens = 1, crops = 1, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "static", path_rotate = "static");
dat1 <- summarise_pest_data(sim1);
plot(x = 1:length(dat1$densities), y = dat1$densities, type = "b", pch = 20,
lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
xlab = "Generation", ylab = "Pest density");
plot(x = 1:length(dat1$pathogen_fr[,1]), y = dat1$pathogen_fr[,1], type = "b",
pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1),
xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat1$pathogen_fr[,2]), y = dat1$pathogen_fr[,2], type = "b",
pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat1$pathogen_fr[,3]), y = dat1$pathogen_fr[,3], type = "b",
pch = 20, lwd = 2, col = "blue");
Next, we can see what happens when we instead have 3 pathogens under otherwise identical conditions. In each generation, one of three pathogens is randomly applied to each landscape cell.
sim2 <- toy_simulate_resistance(pathogens = 3, crops = 1, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "static", path_rotate = "random");
dat2 <- summarise_pest_data(sim2);
plot(x = 1:length(dat2$densities), y = dat2$densities, type = "b", pch = 20,
lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
xlab = "Generation", ylab = "Pest density");
plot(x = 1:length(dat2$pathogen_fr[,1]), y = dat2$pathogen_fr[,1], type = "b",
pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1),
xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat2$pathogen_fr[,2]), y = dat2$pathogen_fr[,2], type = "b",
pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat2$pathogen_fr[,3]), y = dat2$pathogen_fr[,3], type = "b",
pch = 20, lwd = 2, col = "blue");
Next, we can see what happens when we instead have 3 pathogens and 3 crops under otherwise identical conditions. In each generation, one of three pathogens and one of three crops are randomly applied to each landscape cell.
sim3 <- toy_simulate_resistance(pathogens = 3, crops = 3, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "random", path_rotate = "random");
## [1] "Extinction occurred in generation 12"
dat3 <- summarise_pest_data(sim3);
plot(x = 1:length(dat3$densities), y = dat3$densities, type = "b", pch = 20,
lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
xlab = "Generation", ylab = "Pest density");
plot(x = 1:length(dat3$pathogen_fr[,1]), y = dat3$pathogen_fr[,1], type = "b",
pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1),
xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat3$pathogen_fr[,2]), y = dat3$pathogen_fr[,2], type = "b",
pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat3$pathogen_fr[,3]), y = dat3$pathogen_fr[,3], type = "b",
pch = 20, lwd = 2, col = "blue");
As is evident, pathogen and crop rotation decreases pest density and increases the diversity of alleles underlying pathogen resistance and crop feeding.
Visual stuff looks good!
Rose has created a very nice visual display of the landscape, with sliders allowing for multiple crop species and pathogen strains.
Toy model prototype
I have managed to complete a toy model today, which should be sufficient to produce the simulated data needed to get the key ideas across. I’m going to dump the code below, which is located in the toy_model.R data file, not yet pushed on to master. We’ll need to merge the interface branch with the dev branch soon, which should be easy enough. The whole thing is in R, and includes all of the basic steps from the code structure mentioned on 30 JUL 2018, including mutation, recombination, dispersal, and pathogen and crop rotation (random or sequential). The output isn’t in that helpful of a form yet, but it will get there soon. To skip the many lines of code from the toy_model.R file, go here.
To quickly summarise, there are 12 parameters we can play with in the toy simulation model:
generations
)xdim
)ydim
)pathogens
)crops
)pest_init
)crop_rotate
:
where crop_rotate = "static"
means don’t rotate between
generations, crop_rotate = "rotate"
means sequence through
each crop one at a time over generations, and
crop_rotate = "random"
means apply a random crop in each
generation – all done independently for each landscape cell)path_rotate
: where path_rotate = "static"
means don’t rotate between generations,
path_rotate = "rotate"
means sequence through each pathogen
one at a time over generations, and path_rotate = "random"
means apply a random pathogen in each generation – all done
independently for each landscape cell)pest_move_pr
)pest_move_dist
; note, the landscape is a torus, and there
is therefore no edge – pests that move off of one side end up on the
other)fecundity
)cell_K
)Default parameter values are shown below. Note that the model is an individual-based simulation with an extremely simple diploid genetic architecture with two loci (one for pathogens, and one for crops). If there are \(N\) pathogens, then there are \(N\) alleles for pathogen resistance, and pests must have at least one allele to resist (else they die). Similarly, if there are \(N\) crops, then there are \(N\) alleles for being able to eat crops, and pests must have at least one to eat (else they die).
toy_simulate_resistance <- function(generations = 20, # Generations to sim
xdim = 2, # Land dimension 1
ydim = 2, # Land dimension 2
pathogens = 1, # Pathogen strains
crops = 1, # Crop species
pest_init = 2000, # Initial pests
crop_rotate = "static", # Crops rotated
path_rotate = "static", # Pathogens rotated
pest_move_pr = 0.1, # Pest movement
pest_move_dist = 1, # Pest move distance
fecundity = 8, # Offspring per fem
cell_K = 2000 # K per cell
){
if(pest_move_dist > xdim & pest_move_dist > ydim){
pest_move_dist <- max(c(xdim, ydim)); # Avoids error
}
# Start initialising the landscape and pests
LAND <- toy_initialise_land(xdim = xdim, ydim = ydim,
pathogens = pathogens, crops = crops);
PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = pathogens,
c_al = crops);
# Start the generations
PEST_DATA <- NULL;
gen <- 1;
while(gen < generations){
LAND <- toy_set_crops(LAND, crops, crop_rotate);
LAND <- toy_set_paths(LAND, pathogens, path_rotate);
PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);
PEST <- toy_feed_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
break;
}
PEST <- toy_kill_pest(PEST, LAND);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST <- toy_reproduce_pest(PEST, LAND, pathogens, crops, fecundity,
cell_K);
if(toy_check_extinction(PEST, gen) == TRUE){
break;
}
PEST_DATA[[gen]] <- PEST;
gen <- gen + 1;
}
return(PEST_DATA);
}
summarise_pest_data <- function(PEST_DATA){
# Density estimates
den_list <- unlist(lapply(PEST_DATA, dim));
den_list <- den_list[c(TRUE, FALSE)];
return(den_list);
}
toy_check_extinction <- function(PEST, gen){
if(is.vector(PEST) == TRUE){
print(paste("Extinction occurred in generation", gen));
return(TRUE);
}
if(dim(PEST)[1] < 4){
print(paste("Extinction occurred in generation", gen));
return(TRUE);
}
if(sum(PEST[,2] == 0) < 1 | sum(PEST[,2] == 1) < 1){
print(paste("Extinction occurred in generation", gen));
return(TRUE);
}
return(FALSE);
}
# Initialise the landscape
toy_initialise_land <- function(xdim = 2, ydim = 2, pathogens = 1, crops = 1){
LAND <- array(data = 0, dim = c(xdim, ydim, 3));
p_values <- sample(x = 1:pathogens, size = xdim * ydim, replace = TRUE);
p_layer <- matrix(data = p_values, ncol = xdim, nrow = ydim);
c_values <- sample(x = 1:crops, size = xdim * ydim, replace = TRUE);
c_layer <- matrix(data = c_values, ncol = xdim, nrow = ydim);
LAND[,,2] <- p_layer; # Just occurred to me that we might want a layer for
LAND[,,3] <- c_layer; # carrying capacity of the cell in LAND later
return(LAND);
}
# Initialise some *very* simple pests. Each allele is just going to map one to
# one for whether a crop can be attacked or a pathogen resisted. I am not even
# going to add traits affecting dispersal in the toy version -- this can be
# uniform for all individuals
toy_initialise_pest <- function(LAND, N = 10, p_al = 1, c_al = 1){
xdim <- dim(LAND)[1]; # This is needed to put the pests on some
ydim <- dim(LAND)[2]; # landscape cell
PEST <- matrix(data = 0, nrow = N, ncol = 8);
PEST[,1] <- 1:N; # ID
PEST[,2] <- sample(x = c(0, 1), size = N, replace = TRUE); # Sex
PEST[,3] <- sample(x = 1:xdim, size = N, replace = TRUE); # x-location
PEST[,4] <- sample(x = 1:ydim, size = N, replace = TRUE); # y-location
PEST[,5] <- sample(x = 1:p_al, size = N, replace = TRUE); # p allele 1
PEST[,6] <- sample(x = 1:p_al, size = N, replace = TRUE); # p allele 2
PEST[,7] <- sample(x = 1:c_al, size = N, replace = TRUE); # c allele 1
PEST[,8] <- sample(x = 1:c_al, size = N, replace = TRUE); # c allele 2
return(PEST);
}
# Just going to make this a simple function at first, randomly changing crops
toy_set_crops <- function(LAND, crops = 1, type = "static"){
if(crops == 1 | type == "static"){
return(LAND);
}
if(type == "rotate"){
old_crop <- LAND[,,3];
new_crop <- old_crop + 1;
new_crop[new_crop > crops] <- 1;
LAND[,,3] <- new_crop;
}else{ # Else it just randomises -- can think about fancier ways later
xdim <- dim(LAND)[1];
ydim <- dim(LAND)[2];
new_cval <- sample(x = 1:crops, size = xdim * ydim, replace = TRUE);
new_crop <- matrix(data = new_cval, nrow = xdim, ncol = ydim);
LAND[,,3] <- new_crop;
}
return(LAND);
}
# Ditto here -- just a simple function changing pathogens
toy_set_paths <- function(LAND, paths = 1, type = "static"){
if(paths == 1 | type == "static"){
return(LAND);
}
if(type == "rotate"){
old_path <- LAND[,,3];
new_path <- old_path + 1;
new_path[new_path > paths] <- 1;
LAND[,,2] <- new_path;
}else{ # Else it just randomises -- can think about fancier ways later
xdim <- dim(LAND)[1];
ydim <- dim(LAND)[2];
new_pval <- sample(x = 1:paths, size = xdim * ydim, replace = TRUE);
new_path <- matrix(data = new_pval, nrow = xdim, ncol = ydim);
LAND[,,2] <- new_path;
}
return(LAND);
}
# Just a function to move pests at a given value, randomly on the landscape. The
# `prob` is just probability of moving in a time step, while `dist` is just the
# maximum distance moved in cells, in any direction.
toy_move_pest <- function(PEST, LAND, prob = 0.1, dist = 1){
xdim <- dim(LAND)[1];
ydim <- dim(LAND)[2];
pests <- dim(PEST)[1];
to_move <- rbinom(n = pests, size = 1, pr = prob);
move_x <- sample(x = -dist:dist, size = pests,
replace = TRUE);
move_y <- sample(x = -dist:dist, size = pests,
replace = TRUE);
PEST[to_move == 1, 3] <- PEST[to_move == 1, 3] + move_x[to_move == 1];
PEST[to_move == 1, 4] <- PEST[to_move == 1, 4] + move_y[to_move == 1];
# The ifs below make a torus landscape so that pests don't move off of it
# In C, this will be done with a loop that looks cleaner; the below is just
# avoiding using a loop in R, though it would probably be fine.
if(length(PEST[PEST[,3] < 1, 3]) > 0){
PEST[PEST[,3] < 1, 3] <- PEST[PEST[,3] < 1, 3] + xdim;
}
if(length(PEST[PEST[,3] > xdim, 3]) > 0){
PEST[PEST[,3] > xdim, 3] <- PEST[PEST[,3] > xdim, 3] - xdim;
}
if(length(PEST[PEST[,4] < 1, 4]) > 0){
PEST[PEST[,4] < 1, 4] <- PEST[PEST[,4] < 1, 4] + ydim;
}
if(length(PEST[PEST[,4] > ydim, 4]) > 0){
PEST[PEST[,4] > ydim, 4] <- PEST[PEST[,4] > ydim, 4] - ydim;
}
return(PEST);
}
# Need to now feed the pests and remove those that don't feed (crop unavailable)
toy_feed_pest <- function(PEST, LAND){
# I'm just going to use a loop here, else matching to LAND cells is rough
pests <- dim(PEST)[1];
eaten <- rep(x = 0, times = pests);
for(i in 1:pests){
x_loc <- PEST[i, 3];
y_loc <- PEST[i, 4];
food <- LAND[x_loc, y_loc, 3];
if(PEST[i, 7] == food | PEST[i, 8] == food){
eaten[i] <- 1;
}
}
PEST <- PEST[eaten == 1,]; # You don't eat, you don't live
return(PEST);
}
# Then need to have pests resist pathogens, kill those that cannot
toy_kill_pest <- function(PEST, LAND){
# I'm just going to use a loop here, else matching to LAND cells is rough
pests <- dim(PEST)[1];
survived <- rep(x = 0, times = pests);
for(i in 1:pests){
x_loc <- PEST[i, 3];
y_loc <- PEST[i, 4];
patho <- LAND[x_loc, y_loc, 2];
if(PEST[i, 5] == patho | PEST[i, 6] == patho){
survived[i] <- 1;
}
}
PEST <- PEST[survived == 1,]; # You don't eat, you don't live
return(PEST);
}
# Then, reproduce pests that are left
toy_reproduce_pest <- function(PEST, LAND, pa, cr, births = 2, K = 100){
x_dim <- dim(LAND)[1];
y_dim <- dim(LAND)[2];
offspring <- NULL;
total_offspring <- 0; # This and the below are just to avoid an rbind(),
cell <- 1; # which is a massive memory sink
lst_ID <- max(PEST[,1]);
for(xloc in 1:x_dim){ # Double loop is just much cleaner here
for(yloc in 1:y_dim){
locals <- which(PEST[,3] == xloc & PEST[,4] == yloc);
local_offs <- NULL;
if( length(locals) > 1 ){
local_offs <- toy_breed_locals(PEST, locals, births, K, lst_ID);
}
if(length(local_offs) > 0){
lst_ID <- lst_ID + dim(local_offs)[1];
total_offspring <- total_offspring + dim(local_offs)[1];
offspring[[cell]] <- local_offs;
}else{
offspring[[cell]] <- NULL;
}
cell <- cell + 1;
}
}
offspring <- build_new_pest(offspring, total_offspring, pa, cr);
return(offspring);
}
# Need to recombine the genomes correctly in the offspring
toy_breed_locals <- function(PEST, locals, births, K, last_ID){
loc_PEST <- PEST[locals,]; # Need two of each sex (allee effect)
if(sum(loc_PEST[,2] == 0) < 2 | sum(loc_PEST[,2] == 1) < 2){
return(NULL);
}
females <- loc_PEST[loc_PEST[,2] == 0,];
males <- loc_PEST[loc_PEST[,2] == 1,];
new_offs <- dim(females)[1] * floor(births);
offspring <- matrix(data = 0, nrow = new_offs, ncol = 8);
offspring[,1] <- (last_ID + 1):(last_ID + new_offs);
offspring[,2] <- sample(x = c(0, 1), size = new_offs, replace = TRUE);
offspring[,3] <- loc_PEST[1, 3];
offspring[,4] <- loc_PEST[1, 4];
# Below grabs all of the alleles from females and males
p_fem_alleles <- c(females[,5], females[,6]);
p_mal_alleles <- c(males[,5], males[,6]);
c_fem_alleles <- c(females[,7], females[,8]);
c_mal_alleles <- c(males[,7], males[,8]);
# Now add them to the offspring randomly, one from female and one from male
for(i in 1:new_offs){
if(runif(n = 1) < 0.5){
offspring[i, 5] <- sample(x = p_fem_alleles, size = 1);
offspring[i, 6] <- sample(x = p_mal_alleles, size = 1);
}else{
offspring[i, 5] <- sample(x = p_mal_alleles, size = 1);
offspring[i, 6] <- sample(x = p_fem_alleles, size = 1);
}
if(runif(n = 1) < 0.5){
offspring[i, 7] <- sample(x = c_fem_alleles, size = 1);
offspring[i, 8] <- sample(x = c_mal_alleles, size = 1);
}else{
offspring[i, 7] <- sample(x = c_mal_alleles, size = 1);
offspring[i, 8] <- sample(x = c_fem_alleles, size = 1);
}
}
if(dim(offspring)[1] > K){
offspring <- offspring[1:K,];
}
return(offspring);
}
# Merge the different layers into a new pest array
build_new_pest <- function(offspring, total_offspring, pa, cr, mutation = 0.01){
new_PEST <- matrix(data = 0, nrow = total_offspring, ncol = 8);
start_row <- 1; # Again, avoids an rbind memory issue
for(cell in 1:length(offspring)){
if(length(offspring[[cell]]) > 0){
cell_offs <- dim(offspring[[cell]])[1];
rows <- start_row:(start_row + cell_offs - 1);
new_PEST[rows,] <- offspring[[cell]];
start_row <- start_row + cell_offs;
}
}
mu5 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
mu6 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
mu7 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
mu8 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
new_PEST[mu5, 5] <- sample(x = 1:cr, size = length(mu5), replace = TRUE);
new_PEST[mu6, 6] <- sample(x = 1:cr, size = length(mu6), replace = TRUE);
new_PEST[mu7, 7] <- sample(x = 1:pa, size = length(mu7), replace = TRUE);
new_PEST[mu8, 8] <- sample(x = 1:pa, size = length(mu8), replace = TRUE);
return(new_PEST);
}
Okay, so what can the above code do? I’ll
get some fancy results later, but for now, let’s just look at many
simulations across different pathogens
and
crops
to see how it affects pest global density. Below I
run and plot the situation with just one pathogen and one crop. Note
that when there is only 1 pathogen or crop, set rotation to
static
.
sim1 <- toy_simulate_resistance(pathogens = 1, crops = 1, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "static", path_rotate = "static");
dat1 <- summarise_pest_data(sim1);
plot(x = 1:length(dat1), y = dat1, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
ylab = "Pest density");
Next, we can see what happens when we instead have 3 pathogens under otherwise identical conditions. In each generation, one of three pathogens is randomly applied to each landscape cell.
sim2 <- toy_simulate_resistance(pathogens = 3, crops = 1, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "static", path_rotate = "random");
dat2 <- summarise_pest_data(sim2);
plot(x = 1:length(dat2), y = dat2, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
ylab = "Pest density");
Next, we can see what happens when we instead have 3 pathogens and 3 crops under otherwise identical conditions. In each generation, one of three pathogens and one of three crops are randomly applied to each landscape cell.
sim3 <- toy_simulate_resistance(pathogens = 3, crops = 3, cell_K = 2000,
pest_init = 2000, fecundity = 8,
generations = 20, pest_move_pr = 0.1,
crop_rotate = "random", path_rotate = "random");
## [1] "Extinction occurred in generation 18"
dat3 <- summarise_pest_data(sim3);
plot(x = 1:length(dat3), y = dat3, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
ylab = "Pest density");
As is evident, pathogen and crop rotation decreases pest density. More on this later, including diversity results, but this demonstrates a working toy model. As the heterogeneity of pathogens and crops increases, pest resistance seems to go down.
Some general thoughts about code structure
There are some high-level functions that will be useful to think
about before coding; that is, functions that need to do something in
sequence during simulation initialisation and then during each time step
of the simulation. These functions will call subfunctions that run key
routines – I think that all of these function, and the outer function
(e.g., simulate_resistance
, which runs through multiple
time steps to test a particular set of parameter combinations), should
be written in C for efficiency. An R function can call
simulate_resistance
and return the results to R, or an
outer C function can also call it and print results to a text file
(faster and easier with the computing cluster). The point is to have a
single funtion that everything feeds to that does the work by calling
multiple subfunctions – this workhorse function
simulate_resistance
will contain the main loop over \(N\) time_steps
.
Prior to the loop, there will be some initialisation of the landscape
LAND
and the pests PEST
. This will include all
of the values in each layer LAND
based on parameter
initialisation. The functions initialise_land
and
initialise_pest
are therefore called once at the start of
the simulation; they will need to decide the distribution of landscape
values and pests, and the sizes of each. It will not be necessary to
initialise the PEST
traits, only their genomes (based on
allele and loci parameters, dominance, etc.) and non-genetic attributes
such as location and ID (sex might or might not be based on genome).
Before starting the big loop, the key structures of PEST
and LAND
need to otherwise be built. Any other key data
structures will need to be initialised – I am thinking that the
simulate_resistance
function, or maybe some small other
function, should send a vector of information so that everything can be
stored handily, then individual parameter values can be set in the small
outer function then compiled into the vector for ease of use.
After initialisation, one big loop will run the simulation over multiple time steps (some pseudocode below).
simulate_resistance(parameter_values){
error_checks(parameter_values);
initialise_land(LAND);
initialise_pest(PEST);
for(time_step = 0; time_step < time_max; time_step++){
build_pest_traits(PEST, build_type);
set_crops(LAND, crop_parameters);
set_biopesticide(LAND, biopesticide_parameters);
move_pest(PEST, move_parameters);
feed_pest(PEST, LAND, feeding_parameters);
kill_pest(PEST, LAND, death_parameters);
reproduce_pest(PEST, evolution_parameters);
collect_PEST_data(PEST);
collect_LAND_data(LAND);
summarise_time_step(PEST, LAND);
status_check(PEST, LAND);
}
summarise_simulation();
}
Multiple events will occur in each time step, including (but not
limited to) the following: The build_pest_traits()
function
will simply read in and out the PEST
array, building the
traits from the pest genome according to some build_type
in
each generation. The set_crops
function will assign
LAND
cel layer values basedon on the crop parameter values
underlying, e.g., spatial heterogeneity; similarly,
set_biopesticide
will affect LAND
cell layer
values. Then the move_pest
function will cause the pests to
move according to some sort of disperal rules set in
parameter_values
. Note that the landscape and the pests
have not interacted in the time step yet (though we can imagine the
landscape affecting pest movement rules), and will now do so in
feed_pest
, where they will affect crops, followed by
kill_pest
and reproduce_pest
, where they will
suffer mortality and reproduce as a consequence of their interactions.
We might want to think about the order of these
operations. After pest reproduction (which will include sex and
genetic – but not trait – transmission), then summary data on the pests
and landscape need to be collected (could even have an option to print
everything out at this point, but it might be a lot of information) –
followed by a sort of grand summary of the time step. When the whole
simulation has finished, then the results can be printed.
Note also that some error checks and status checks will be useful,
since we’ll want to make this function able to run from R. The error
checks will just make sure nothing will crash the simulation (some of
this might be better programmed in R instead of C), while the
status_check
might look to see, e.g., if pests have gone
extinct, or have some sort of rule for termination of the simulation if
something happens.
I suspect that the above structure will need rearranging and rethinking, but it’s a starting point. Subfunctions will be the difficult part, coding efficient ways of building traits from the genome, movement rules, reproduction, etc.
I have attempted to set this notebook up with Disqus commenting enabled with the universal code options (embedded HTML), which will ideally allow everyone on the project to make comments on the notes.
There are two data structures that we need to think about in particular, and these include the pests, which will be represented individually, and the landscape, which will be a large 3D array. The biopesticide will not be modelled discretely, but through a change in landscape values.
Pests
Pests will need to be represented by some sort of data structure. We could create a custom structure in C, but this would get messy if and when we decide to convert the model into an R package so that users can simulate by calling an R function. I really don’t see why pests cannot be represented by a large table, which is how I have always represented individuals in the past, and I already have the code for reading R tables into C. The way that this will work then is to have each individual represented by a different row in the table, with columns corresponding to individual traits. Some traits that will be important include:
double
values will be used
for each haploid loci: 1. allele number (\(i\)), value (\(v\)), and dominance (\(h\)). Since alleles are diploid, we will
need \(6L\) double
columns
in total, which will be the last columns of the table (this is just
easier)The idea of each individual having its own ID and sex is fairly
straightforward, but what I’m thinking we can do with the traits will be
a bit different. We want the evolving genome to affect resistance
through a evolving traits, such that the genome affects traits
underlying resistance, which then affect the evolution of the genetic
architecture itself. The idea then is that each trait will be modelled
as a double
real number that is ‘visible’ to selection – in
most cases we will want the value of each trait will be a function of
the genome, but this wouldn’t necessarily be a requirement (e.g., if we
just wanted to fix dispersal ability to some value, we could set it to
whatever we desire in the relevant row and column – this will also be
useful for location on the landscape, which will be represented by two
values x and y). We could then write a specific modular function
build_traits
, which loops through each individual and
derives a single number from the individual’s genome, which is then
placed in the relevant trait column (1-T). Trait columns can by dynamic
– we can have anywhere from 1 to T of them, and the value of T will need
to be stored somewhere – better, if in the code we define ID and Sex as
traits, we can just have loci start at T (or T + 1 in R).
By having a modular build_traits
function, we can keep
the code flexible enough to have different ways of deriving single trait
values for any given genome. The genome itself is probably easiest to
model by breaking it down into L blocks of six, taking up 6L total
columns of the 2D pest array. The first three values of
a block will be the number (1), value (2), and dominance (3) for an
allele, while the next three values will be the same for a homologous
allele at the locus. The dominance values could be compared to calculate
which allele has a stronger effect, similar to Duthie and Reid (2016, code), and this could be pre-set or
randomised in some way for different allele types. The values could mean
whatever we want them to, but a good starting point would be to think of
them as additively effecting one or more trait as in Duthie, Bocedi, et al. (2018, code). We could of course just have the
allele numbers in the genome, with a separate lookup table to match them
with values and dominance coefficients, but I actually think this would
be slower and more difficult to code. The build_traits
function will loop through all of the loci, gathering in the relevant
values to build each trait – a sub-function (e.g.,
calc_trait
) could do this for an individual trait,
identifying where and what in the genome should be used to make a single
trait such as resistance to a particular biopesticide. So, for example,
the pest data structure for one (non ID or Sex) trait and one locus
would look like this:
simple_pests <- matrix(data = 0, nrow = 4, ncol = 9);
colnames(simple_pests) <- c("ID", "Sex", "Trait_1", "L1A1_i", "L1A1_v",
"L1A1_h", "L1A2_i", "L1A2_v", "L1A2_h");
simple_pests[,1] <- 1:4;
simple_pests[,2] <- c(0, 1, 0, 1);
simple_pests[,3] <- runif(n = 4);
simple_pests[,4] <- sample(x = 1:2, size = 4, replace = TRUE);
simple_pests[,7] <- sample(x = 1:2, size = 4, replace = TRUE);
print(simple_pests);
## ID Sex Trait_1 L1A1_i L1A1_v L1A1_h L1A2_i L1A2_v L1A2_h
## [1,] 1 0 0.71220863 2 0 0 2 0 0
## [2,] 2 1 0.81876411 1 0 0 2 0 0
## [3,] 3 0 0.26913112 1 0 0 1 0 0
## [4,] 4 1 0.01149918 2 0 0 1 0 0
I’ve not included the allele values or dominance coefficients for alleles 1 and 2. Note that in the columns ‘L1’ refers to locus 1, and ‘A1’ and ‘A2’ refer to alleles 1 and 2, respectively. As mentioned earlier, these will need to be represented by doubles in C, and the table could get quite big (well over 1000 columns, and potentially tens of thousands of rows). The benefit of doing all this is that we can potentially use multiple different types of infection models, as desired (e.g., gene for gene, quantative, threshold).
Landscape
The landscape will be modelled using a three dimensional array. We want space to be explicit, and the easiest way to do this is to have the first two dimensions of the array correspond to the x and y locations on a region of landscape. Interactions can happen at the scale of landscape cells (array elements), and distance between cells in any location can allow for explicit spatial processes (typically, count the number of cells in any of 8 directions, including diagonally, but other neighbour distances could be used, or cells could be hexagonal if there is good reason to do it). Having different cells with different values allows for landscape level heterogeneity (could also make values autocorrelated as desired as in Duthie and Falcy (2013)). Perhaps each cell starts with having some ID for the farm, as in the case of the 8 by 8 landscape below (actually x and y dimensions will be a variable in the model – I usually use 100 by 100, with a torus landscape that wraps around itself to avoid edge effects).
land_IDs <- matrix(data = 0, nrow = 8, ncol = 8);
land_IDs[1:2, 1:8] <- 1;
land_IDs[3:6, 1:5] <- 2;
land_IDs[3:8, 6:8] <- 3;
land_IDs[7:8, 1:3] <- 4;
land_IDs[7:8, 4:5] <- 5;
print(land_IDs);
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 1 1 1 1 1 1 1
## [2,] 1 1 1 1 1 1 1 1
## [3,] 2 2 2 2 2 3 3 3
## [4,] 2 2 2 2 2 3 3 3
## [5,] 2 2 2 2 2 3 3 3
## [6,] 2 2 2 2 2 3 3 3
## [7,] 4 4 4 5 5 3 3 3
## [8,] 4 4 4 5 5 3 3 3
So each of five farms or types of farms (or something else) are
represented in the landscape. But we need lots of pieces of information
for each cell, including what is planted on the cell, local
environmental variables, and biopesticide applied. To get this, we can
then add multiple layers on the z axis of a 3D array. So the landscape
layer above is just the ‘top’, with multiple layers beneath coding
critical information in a land
array. Assume, for now that
we need three layers (we’ll actually need a lot more): (1)
land_IDs
, (2) crop used, (3) biopesticide 1 applied.
land <- array(data = 0, dim = c(8, 8, 3));
land[,,1] <- land_IDs;
land[,,2] <- sample(x = 1:3, size = 64, replace = TRUE);
land[,,3] <- round(x = runif(n = 64, min = 0, max = 1), digits = 3);
print(land);
## , , 1
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1 1 1 1 1 1 1 1
## [2,] 1 1 1 1 1 1 1 1
## [3,] 2 2 2 2 2 3 3 3
## [4,] 2 2 2 2 2 3 3 3
## [5,] 2 2 2 2 2 3 3 3
## [6,] 2 2 2 2 2 3 3 3
## [7,] 4 4 4 5 5 3 3 3
## [8,] 4 4 4 5 5 3 3 3
##
## , , 2
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 3 3 1 1 2 2 2 3
## [2,] 1 3 3 3 2 1 2 3
## [3,] 3 1 1 2 1 1 2 2
## [4,] 3 3 3 2 1 3 1 2
## [5,] 3 2 2 2 3 2 1 1
## [6,] 2 3 3 1 1 2 1 1
## [7,] 1 1 1 1 3 1 1 1
## [8,] 3 1 2 2 1 3 2 2
##
## , , 3
##
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 0.534 0.804 0.478 0.874 0.622 0.997 0.469 0.939
## [2,] 0.548 0.905 0.414 0.611 0.552 0.085 0.286 0.137
## [3,] 0.879 0.180 0.539 0.184 0.240 0.718 0.851 0.760
## [4,] 0.370 0.090 0.808 0.779 0.616 0.430 0.047 0.979
## [5,] 0.647 0.492 0.475 0.569 0.701 0.469 0.958 0.081
## [6,] 0.568 0.152 0.027 0.284 0.249 0.707 0.007 0.061
## [7,] 0.784 0.611 0.784 0.826 0.199 0.820 0.590 0.246
## [8,] 0.904 0.203 0.312 0.641 0.709 0.764 0.107 0.118
The details aren’t so important for now; what matters is the idea that we can represent information about landscape cells dynamically using array layers, which can therefore correspond to any number of things that we want to affect pests. The above structures can be initialised in R and diverted to a toy model that could help accomplish the short-term goals of the project, but also diverted to C to do the more intense simulatoin involving complex genetic architecture (we’ll want a standalone C version too for speed and high-performance cluster computing). The toy model is probably the fist step – to have something loop over time steps showing that something (e.g., allelic diversity) is maintained under variable landscapes, but is not when landscapes are homogenous (or when dispersal is infinite). Probably best to simplify the genetics as much as possible (e.g., 1 locus, fewer than 5 alleles) for this, then return some numbers that can be extracted from these data structures to build a graphic.
I see these two structures (PEST and LAND) as being the two most important ones to get right early on because these are the ones that will be used in almost every part of the model. The code will otherwise be written modularly, meaning that if we don’t like the way that we’re, e.g., deriving pest traits from their genetic architecture, or setting dispersal rules, or modelling sex, then we can simply rewrite the specific function that does these things and plug it back into the model. With this, some list or vector is also needed to hold parameter values (PARA), at lesat from the transition from R to C.
I have now initialised this lab notebook for the helicoverpa R package and model.
Comments