This notebook tracks progress on the development of resevol R package, a BBSRC FAPESP Newton Funded Project to model how the spatial scale of heterogeneity in fungal isolate application and crop plant cultivation affects variation in the selection landscape for biopesticide resistance.

Go to comments

Project updates:

Update: 24 NOV 2022

During the revision process of the software note, Matt suggested another feature that I really should have thought of making earlier. In version 0.3.3.0, after crop rotation, landscape cell values are initialised with whatever value is used for crop_per_cell. This means that there is no way to simulate crop growth on a landscape in between crop rotations. In version 0.3.4.0, I have fixed this with two new arguments, crop_growth and crop_growth_type.

The argument crop_growth indicates how much a cell value representing a given crop should change from one time step to the next (before the potential effects of feeding by pests). By default, this value is crop_growth = 0, meaning that all biomass on a cell will be initialised at the start of a new crop rotation. Nonzero values will affect the change of a cell’s crop value in a way dependent up on crop_growth_type. Note that crop_growth can either take a single value, or a vector of the same length as crop_number. If, for example, there are two crops (crop_number = 2), then we could set crop_growth = c(0.1, 0.4), meaning that crop 1 would increase by a value of 0.1 each time step, and crop 2 would increase by a value of 0.4 each time step. If we set a crop_growth to a single value (e.g., crop_growth = 0.1), then all crops grow at the same rate of crop_growth.

The argument crop_growth_type specifies how growth is enacted (i.e., how the value on the landscape changes). The default option is crop_growth_type = "none", in which case, no growth occurs. If crop_growth_type = "geometric", then the value of crop_growth specifies the proportion by which the landscape value increases in a time step,

\[value_{t+1} = value_{t} \left(1 + crop\_growth \right).\]

So if we initialise a crop amount at 2 on the landscape cell, and crop_growth = 0.1, then in the next time step (unless some crop is eaten by pests), we can predict that the crop amount will be 2.2 (i.e., an increase of 10 per cent).

If crop_growth_type = "linear", then the increment crop_growth is simply added to the existing cell value in each time step.

\[value_{t+1} = value_{t} + crop\_growth.\]

My hope is that these options will be useful for a lot of purposes, but there is plenty of room for expanding the package with different rules for crop growth.

Update: 30 JUL 2022

I don’t know why I didn’t think of this sooner, but there really should be a feature that allows users to set the initial mean values of traits. This is easy because trait means are of course independent of the covariance structure of the traits, so there just needs to be some specified increment to be added to each trait throughout the simulation. I have introduced this in a new version 0.3.2.0 with the trait_means argument, which simply takes a vector of the means for each customised trait. By default, values are set to zero (i.e., what they always were originally), but the argument can also take a vector of the same length as the number of traits, with each element position corresponding to the specific trait. That is, trait_means[1] is the mean value of "T1", trait_means[2] is the mean value of "T2", et cetera. An error is given if the user attempts to specify a trait_means vector that has a length different from the number of traits. Just to show this in action, I will use the resevol mine_gmatrix output mg_v1, which has the covariance structure below.

##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.9936103 -0.5106387  0.2324040  0.2367691
## [2,] -0.5106387  1.0193888  0.2573216  0.2723028
## [3,]  0.2324040  0.2573216  0.9334559 -0.4480018
## [4,]  0.2367691  0.2723028 -0.4480018  0.9689539

We have four traits, which I will use to define consumption rate of crops 1 and 2 (T1 and T2, respectively), and the consumption rates of pesticides 1 and 2 (T3 and T4, respectively). I will set the trait means to T1 = 1, T2 = 2, T3 = 0.2, and T4 = 0.2 and run a simulation with the code below.

sim       <- run_farm_sim(mine_output = mg_v1, N = 1000, xdim = 10, ydim = 10, 
                          repro = "asexual", time_steps = 20, farms = 12,
                          crop_number             = 2,
                          crop_init               = "random",
                          crop_rotation_type      = 2,
                          pesticide_number        = 2,
                          pesticide_init          = "random",
                          pesticide_rotation_type = 2,
                          food_consume            = c("T1", "T2"),
                          pesticide_consume       = c("T3", "T4"),
                          trait_means             = c(1, 2, 0.2, 0.2),
                          food_needed_surv        = 1,
                          food_needed_repr        = 3,
                          print_last              = TRUE);

At the end of the simulation (after some evolution has occurred), we can look at the evolution of traits and final trait distributions to confirm that they are around what we expect. Here is the evolution of the different traits over time.

Here are the distributions of each trait, individually.

Finally, here is a pairs plot showing the trait correlations.

Overall, this is doing what we want it to. I have also written a new test into testthat for the package. New documentation for the trait_means argument has been written, and I will incorporate this option into the working manuscript.

Update: 19 JUL 2022

Version 0.3.0.1 has been submitted to CRAN and is passing all checks. I am now a bit under halfway through writing a first draft of the software note. This note be submitted as a pre-print, and for publication, and will explain how to use the package. It will also replace the current Get Started article and CRAN vignette. The structure of the publication introduces some key background, the mine_gmatrix function and its underlying logic, the ecology and evolution of pests in the model, and a simple example. I think that two supporting information documents are also in order. The first will explain the evolutionary algorithm in more detail and basically be a version of the current vignette with maybe some minor revisions. The second will be a more advanced example with complex dynamics and traits, presenting simulations like the ones shown in this journal on 26 AUG 2021. My hope is that this plus the documentation is sufficient for anyone to start using the package.

Having submitted version 0.3.0.1, I almost immediately discovered the need for an update that is now version 0.3.1.0. I will hold off on submitting this one to CRAN just yet, as the update is quite minor. The cause was that the mine_gmatrix function kept giving a discrepancy between the stress threshold that caused it to terminate and the actual predicted stress of the network it found. The reason for this was that the algorithm was terminating due to finding unusually low stress caused sampling error for the standard random normal loci. In other words, a luck draw for random loci often caused networks to look better at producing desired trait covariances than they actually were. I’ve addressed this now by first using the mean stress for a whole population of networks to test the termination criteria (term_cri) in the mine_gmatrix function, rather than the best stress out of all networks in the population. But the mine_gmatrix function still returns the network that produces the lowest found stress.

Second, I have introduced a new diagnostic function to give the distribution of stress predicted from loci.

stress_test <- function(mine_output, indivs = 1000, reps = 10){
    
    target_VCV     <- mine_output[[2]];
    Loci_to_Traits <- mine_output[[5]];
    loci           <- mine_output[[1]][1];
    
    Ln_stress <- rep(x = NA, times = reps);
    while(reps > 0){
        sim_loci        <- rnorm(n = indivs * loci);
        individuals     <- matrix(data = sim_loci, nrow = indivs, ncol = loci);
        trait_vals      <- individuals %*% Loci_to_Traits;
        VCV             <- cov(trait_vals);
        stress          <- mean((target_VCV - VCV) * (target_VCV - VCV));
        Ln_stress[reps] <- log(stress);
        reps            <- reps - 1;
    }
    
    return(Ln_stress);
}

This simple function now forms part of the package itself, so users can run the output of mine_gmatrix and get a better overall picture on how the chosen network performs. The example used in the documentation is below (I’ve increased the replicates using the argument reps = 1000; the default is 10).

gmt       <- matrix(data = 0, nrow = 4, ncol = 4);
diag(gmt) <- 1;
mg        <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 3, indivs = 100, 
                          npsize = 100, max_gen = 2, prnt_out = FALSE);
stresses  <- stress_test(mine_output = mg, reps = 1000);

The output stresses is just a vector of stresses for some number (indivs) of initialised genomes. The histogram below shows the distribution for the above example.

This should at least make it easier for users to decide if they do or do not like the output of mine_gmatrix for use in a new set of simulations. The above example shows stress that is quite consistent, but if we let the simulation run for longer to find a better solution, the stress variance seems to increase (note that now max_gen = 400 below).

gmt       <- matrix(data = 0, nrow = 4, ncol = 4);
diag(gmt) <- 1;
mg        <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 3, indivs = 100, 
                          npsize = 100, max_gen = 400, prnt_out = FALSE);
stresses  <- stress_test(mine_output = mg, reps = 1000);
hist(stresses, xlab = "Stress values for initialised genomes", 
     main = "", breaks = 20);

It might be good to recommend this kind of diagnostic for users, particularly when simulating very complex individuals. I think that I will do this for the supporting information showing an advanced example.

Update: 28 JUN 2022

Custom crop and pesticide initialisation and rotation

I have now introduced custom crop and pesticide initialisation and rotation, as I set out to do in the 13 JUN 2022 notes below. The code appears to be stable, and I have introduced two new automated tests in the package to make sure that the feature remains stable. Below is an example of how to have a custom rotation regime for crops and pesticides; I will start with crops. First, I will use the mg_n1 gmatrix that comes with the resevol package (I could have also made a dummy one; the details are not important here).

data("gmatrices");

Now I will run a simulation. The run_farm_sim takes some simple options for crop_init and crop_rotation_type. This includes crop_init = "random", which randomly assigns each farm to one of the potential crops. There are also three crop_rotation_type arguments, including 1 (no crop rotation), 2 (random crop choice whenever it is time to change), and 3 (cycling through crops by number; i.e., 1, 2, 3, 1, 2, 3). I might add more preset options like this later, but here is how to run a simulation with two crops, random crop initialisation on farms, and random rotation between crops.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4, 
                          repro = "asexual", time_steps = 10, farms = 4,
                          crop_number        = 2,
                          crop_init          = "random",
                          crop_rotation_type = 2,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

The new feature is that crop_init and crop_rotation_type can take custom arguments, so potentially any initialisation and cycling of crop choices is possible. With crop_init, we can specify which farm gets what crop at the start of the simulation using a fector of length farms. If, for example, we want farms 1 and 4 to have crop 1, and farms 2 and 3 to have crop 2, we could run the below.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4, 
                          repro = "asexual", time_steps = 10, farms = 4,
                          crop_number        = 2,
                          crop_init          = c(1, 2, 2, 1),
                          crop_rotation_type = 2,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

Note that the position of farms on the landscape is shown below.

##      [,1] [,2] [,3] [,4]
## [1,]    1    1    2    2
## [2,]    1    1    2    2
## [3,]    3    3    4    4
## [4,]    3    3    4    4

Hence, this landscape would start out with farms numbered 1 and 4, diagonal from one another, applying crop 1, and farms numbered 2 and 3, also diagonal, applying crop 2. Now, suppose that we want users of crop 1 to switch to crop 2 with a probability of 0.75, and retain the use of crop 1 with a probability of 0.25. Likewise, suppose that we want users of crop 2 to switch to crop 1 with a probability of 0.75, and retain the use of crop 2 with a probability of 0.25. We can now specify this with a square matrix to input in place of the number for the argument crop_rotation_type.

change_mat <- matrix(data = c(0.25, 0.75, 0.75, 0.25), nrow = 2, ncol = 2);

##      [,1] [,2]
## [1,] 0.25 0.75
## [2,] 0.75 0.25

Note that matrix elements \(M_{i,j}\) specify the probability of changing from crop \(i\) to crop \(j\), meaning that row elements must sum to 1. This can be binary too, as in the default crop_rotation_type = 3.

##      [,1] [,2]
## [1,]    0    1
## [2,]    1    0

The matrix above cycles back and forth between crops 1 and 2. Overall, this should give a lot more flexibility in terms of specifying how crops cycle over time. We can input the change_mat directly as an argument in run_farm_sim.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4, 
                          repro = "asexual", time_steps = 10, farms = 4,
                          crop_number        = 2,
                          crop_init          = c(1, 2, 2, 1),
                          crop_rotation_type = change_mat,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

Nevertheless, the cycling is restricted to Markov chains, meaning that it cannot include cycles of crops such as 1, 1, 2, 1, 1, 2, 1, 1, etc., where the cycling repeats complex patterns. This is mainly due to the awkwardness of trying to allow for such cycles given simulations that can have high numbers of time steps (i.e., I do not think it is worth figuring out a system that allows for these kinds of cycles as an option in the code).

Pesticide rotation options work exactly the same way as crop rotation options, so I will not spend to much time on them. Here is an example of running one.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4, 
                          repro = "asexual", time_steps = 10, farms = 4,
                          pesticide_number        = 2,
                          pesticide_init          = c(1, 2, 2, 1),
                          pesticide_rotation_type = change_mat,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

In conclusion, this new feature will allow for a very flexible range of different crop and pesticide cycling options.

Custom landscape option

I have also added an option to customise the farming landscape, also as I set out to do in the 13 JUN 2022 notes below. The customisation is on the top layer of the landscape that differentiates independent farms. What is required is a matrix that includes a sequence of natural numbers in all elements, so if there are 4 farms, then all matrix elements must be either 1, 2, 3, or 4. Beyond this requirement, there is no restriction on where different farms are placed; the do not even need to be contiguous on the landscape. Custom terrains can be inserted using the terrain argument in run_farm_sim. Below makes a custom terrain with three farms.

custom_terrain       <- matrix(data = 0, nrow = 8, ncol = 8);
custom_terrain[,1:2] <- 1;
custom_terrain[,2:6] <- 2;
custom_terrain[,7:8] <- 3;

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    2    2    2    2    2    3    3
## [2,]    1    2    2    2    2    2    3    3
## [3,]    1    2    2    2    2    2    3    3
## [4,]    1    2    2    2    2    2    3    3
## [5,]    1    2    2    2    2    2    3    3
## [6,]    1    2    2    2    2    2    3    3
## [7,]    1    2    2    2    2    2    3    3
## [8,]    1    2    2    2    2    2    3    3

We can insert the terrain into the landscape with the terrain argument.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, xdim = 4, ydim = 4, 
                          repro = "asexual", time_steps = 10, farms = 4,
                          terrain = custom_terrain,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

Note that a custom terrain will override the arguments farms, xdim, and ydim, so even though the above sets farms = 4, there are actually only three farms on the landscape. I am tempted to put up a warning for this, but I am actually not sure if it is needed; my hope is that anyone customising the landscape would recognise this from the documentation.

Also note that these terrain values do not necessarily need to be farms. Through the use of a custom landscape and pesticide rotation option, these cells could represent something like diversionary feeding sites or even buildings or rivers. Consider, for example, the following matrix.

complex_terrain              <- matrix(data = 0, nrow = 12, ncol = 12)
complex_terrain[1:2, 1:3]    <- 1;
complex_terrain[3:7, 1:2]    <- 1;
complex_terrain[8:12, 1:2]   <- 2;
complex_terrain[1:2, 6:9]    <- 3;
complex_terrain[3:6, 7:9]    <- 3;
complex_terrain[1:6, 10:12]  <- 4;
complex_terrain[10:12, 7]    <- 5;
complex_terrain[7:12, 8:12]  <- 5;
complex_terrain[3:12, 3:4]   <- 6;
complex_terrain[7:9, 5]      <- 6;
complex_terrain[1:2, 4:5]    <- 7;
complex_terrain[3:6, 5:6]    <- 7;
complex_terrain[7:9, 6:7]    <- 7;
complex_terrain[10:12, 5:6]  <- 7;

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]    1    1    1    7    7    3    3    3    3     4     4     4
##  [2,]    1    1    1    7    7    3    3    3    3     4     4     4
##  [3,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [4,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [5,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [6,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [7,]    1    1    6    6    6    7    7    5    5     5     5     5
##  [8,]    2    2    6    6    6    7    7    5    5     5     5     5
##  [9,]    2    2    6    6    6    7    7    5    5     5     5     5
## [10,]    2    2    6    6    7    7    5    5    5     5     5     5
## [11,]    2    2    6    6    7    7    5    5    5     5     5     5
## [12,]    2    2    6    6    7    7    5    5    5     5     5     5

We can visualise the terrain a bit more clearly with an image below.

We can imagine the dark maroon colour as being a river (7), and the red region on the immediate left of the river as being a nature reserve (6). To get the simulations to run treating it this way, we would only need to spray a crop and pesticide that has no effect on the pests on the river (effectively making it an empty area of landscape in terms of feeding or resistance), and a pesticide but not crop in the nature reserve. First we can initialise the crop and pesticide vectors. Assume that there are 2 legitimate crops and pesticides that the actual farms (1-5) will use.

initial_crops      <- c(1, 2, 1, 2, 1, 3, 4);
initial_pesticides <- c(1, 1, 2, 2, 1, 3, 3);

We can then clarify the matrix so that there is only a cycle between 1 and 2, but never 3 and 4 for either the pesticide or the crops. First I will do the crops.

crop_cycle <- matrix(data = 0, nrow = 4, ncol = 4);
crop_cycle[1, 2] <- 1;
crop_cycle[2, 1] <- 1;
crop_cycle[3, 3] <- 1;
crop_cycle[4, 4] <- 1;

##      [,1] [,2] [,3] [,4]
## [1,]    0    1    0    0
## [2,]    1    0    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1

Now I will do the pesticide.

pesticide_cycle <- matrix(data = 0, nrow = 3, ncol = 3);
pesticide_cycle[1, 2] <- 1;
pesticide_cycle[2, 1] <- 1;
pesticide_cycle[3, 3] <- 1;

##      [,1] [,2] [,3]
## [1,]    0    1    0
## [2,]    1    0    0
## [3,]    0    0    1

So farms (1-5) will swap between crops 1 and 2, while the reserve (6) will maintain crop 3 and the river (7) will be ‘crop’ 4. And farms will swap between pesticides 1 and 2, while the reserve and river will hold pesticide 3. Then, to make crop 4 and pesticide 3 not really exist, we just make it impossible for the pests to consume these.

sim       <- run_farm_sim(mine_output = mg_n1, N = 100, 
                          repro = "asexual", time_steps = 10,
                          crop_number = 4, pesticide_number = 3,
                          terrain                 = complex_terrain,
                          food_consume            = c(0.25, 0.25, 0.25, 0),
                          pesticide_consume       = c(0.1, 0.1, 0),
                          crop_init               = initial_crops,
                          crop_rotation_type      = crop_cycle,
                          pesticide_init          = initial_pesticides,
                          pesticide_rotation_type = pesticide_cycle,
                          print_inds = FALSE, print_gens = FALSE,
                          print_last = FALSE, get_stats = FALSE);

The output shows that the pests leave the ‘food’ on the river alone (see layer 5), while the reserve (see layer 4) gets used very much.

## , , 1
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]    1    1    1    7    7    3    3    3    3     4     4     4
##  [2,]    1    1    1    7    7    3    3    3    3     4     4     4
##  [3,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [4,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [5,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [6,]    1    1    6    6    7    7    3    3    3     4     4     4
##  [7,]    1    1    6    6    6    7    7    5    5     5     5     5
##  [8,]    2    2    6    6    6    7    7    5    5     5     5     5
##  [9,]    2    2    6    6    6    7    7    5    5     5     5     5
## [10,]    2    2    6    6    7    7    5    5    5     5     5     5
## [11,]    2    2    6    6    7    7    5    5    5     5     5     5
## [12,]    2    2    6    6    7    7    5    5    5     5     5     5
## 
## , , 2
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 1.00 0.00    0    0    0    0 0.75 1.00 1.00     0     0     0
##  [2,] 1.00 1.00    0    0    0    1 1.00 1.00 1.00     0     0     0
##  [3,] 1.00 0.50    0    0    0    0 0.25 1.00 1.00     0     0     0
##  [4,] 1.00 0.00    0    0    0    0 1.00 0.75 1.00     0     0     0
##  [5,] 1.00 0.00    0    0    0    0 0.00 1.00 1.00     0     0     0
##  [6,] 0.75 0.25    0    0    0    0 0.00 1.00 1.00     0     0     0
##  [7,] 0.50 0.00    0    0    0    0 0.00 0.75 1.00     1     1     1
##  [8,] 0.00 0.00    0    0    0    0 0.00 0.00 0.75     1     1     1
##  [9,] 0.00 0.00    0    0    0    0 0.00 0.50 1.00     1     1     1
## [10,] 0.00 0.00    0    0    0    0 0.25 1.00 1.00     1     1     1
## [11,] 0.00 0.00    0    0    0    0 0.00 0.75 1.00     1     1     1
## [12,] 0.00 0.00    0    0    0    0 0.00 1.00 1.00     1     1     1
## 
## , , 3
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [2,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [3,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [4,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [5,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [6,] 0.00    0    0    0    0    0    0    0    0     1     1     1
##  [7,] 0.00    0    0    0    0    0    0    0    0     0     0     0
##  [8,] 0.75    0    0    0    0    0    0    0    0     0     0     0
##  [9,] 0.75    0    0    0    0    0    0    0    0     0     0     0
## [10,] 0.75    0    0    0    0    0    0    0    0     0     0     0
## [11,] 1.00    0    0    0    0    0    0    0    0     0     0     0
## [12,] 0.50    0    0    0    0    0    0    0    0     0     0     0
## 
## , , 4
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [2,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [3,]    0    0    1    0    0    0    0    0    0     0     0     0
##  [4,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [5,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [6,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [7,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [8,]    0    0    0    0    0    0    0    0    0     0     0     0
##  [9,]    0    0    0    0    0    0    0    0    0     0     0     0
## [10,]    0    0    0    0    0    0    0    0    0     0     0     0
## [11,]    0    0    0    0    0    0    0    0    0     0     0     0
## [12,]    0    0    0    0    0    0    0    0    0     0     0     0
## 
## , , 5
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,]    0    0    0    1    1    0    0    0    0     0     0     0
##  [2,]    0    0    0    1    1    0    0    0    0     0     0     0
##  [3,]    0    0    0    0    1    1    0    0    0     0     0     0
##  [4,]    0    0    0    0    1    1    0    0    0     0     0     0
##  [5,]    0    0    0    0    1    1    0    0    0     0     0     0
##  [6,]    0    0    0    0    1    1    0    0    0     0     0     0
##  [7,]    0    0    0    0    0    1    1    0    0     0     0     0
##  [8,]    0    0    0    0    0    1    1    0    0     0     0     0
##  [9,]    0    0    0    0    0    1    1    0    0     0     0     0
## [10,]    0    0    0    0    1    1    0    0    0     0     0     0
## [11,]    0    0    0    0    1    1    0    0    0     0     0     0
## [12,]    0    0    0    0    1    1    0    0    0     0     0     0

Similarly, the pests will only be affected by pesticides 1 and 2, while 3 is effectively nothing. Note that this is of course a trick for modelling landscape areas without food or pesticide, but I do not see a reason to make this more explicit in the function (unless I receive requests due to confusion, but my hope is it is clear from these notes and documentation).

Update: 13 JUN 2022

As I prepare the pre-print and software note for the resevol package, I am thinking that might be worth doing just a bit more coding so that the publication can include four more features:

More explicit crop and pesticide rotation, as mentioned in Issue #43. I think that this would be fairly easy to implement and worth the time. Currently, the run_farm_sim argument crop_rotation_type accepts either 1 or a 2. A 1 results in no rotation of crops on the landscape, and a 2 results in a random transition between types. I think that a custom transition matrix should be possible to insert, whereby each element should specify the probability of transitioning from crop i (row) to crop j (column). In the simplest case (e.g., could be crop_rotation_type = 3), this could just be a binary matrix that forms a cyclical graph. The current options 1 and 2 could even be subsumed in the transition matrix.
Allow custom landscapes. I think that this should be possible by allowing users to put their own matrix in a function, with numbers on the matrix representing different farms (integers greater than zeros) or public land (zeros). A new custom_landscape function could check this matrix, then use it as the top layer of an array.
Custom plotting functions. This one is tricky given the flexibility of simulations and therefore the complexity of the output, but it might be worth including at least some basic plotting functions, or summary functions.
Simulated annealing algorithm for the mine_gmatrix option. This could be fairly easy to implement, and for some cases might actually find genome solutions faster than the evolutionary algorithm.

My hope is that these different tasks could be done fairly quickly, and that they will make writing up the paper go more smoothly.

Update: 06 JAN 2022

There had for some time been an annoying issue of resevol CRAN checks having ERRORs on Windows systems with ix86 architecture. As it turns out, this was caused by some errors in specifying the memory for arrays, which was fixed in this commit. Just to make sure that everything is clear and correct, I have made the cast much clearer in a new commit. Tests using the rhub package and on Winbuilder confirm that everything works as intended. Valgrind found no memory leaks; everything appears fine, so I have updated the version to v0.2.0.9 and submitted the fix to CRAN.

Update: 26 AUG 2021

I have now run the same analysis that I did on 19 AUG 2021, but with traits that evolve independently. This was done by setting a covariance matrix to be the identity matrix for four traits. Here is the mine_gmatrix output that was used for the simulations.

## [[1]]
##  [1]    12.00     6.00  2000.00 18000.00     0.01     0.01  3200.00     0.01
##  [9]  1200.00     6.00    -9.00     0.10     0.00
## 
## [[2]]
##      [,1] [,2] [,3] [,4]
## [1,]    1    0    0    0
## [2,]    0    1    0    0
## [3,]    0    0    1    0
## [4,]    0    0    0    1
## 
## [[3]]
##              [,1]        [,2]        [,3]         [,4]
##  [1,]  1.01323287 -0.91056130  0.22300841  0.978403744
##  [2,]  0.27599802 -0.37879572 -1.01693690  0.762334090
##  [3,] -0.03203048 -0.51511158  0.79037437  0.002639731
##  [4,] -0.09193560  0.07907323  0.14790265 -0.039062606
##  [5,] -0.73131494  0.07549138  0.17234648 -0.009426905
##  [6,] -0.50890530  0.18561143  0.72008751  0.496708973
##  [7,] -0.01427086 -0.28208810 -0.11196553 -0.537226514
##  [8,]  0.25274376  0.15911871 -0.11647583  0.011376558
##  [9,] -0.21570241 -0.33741651  0.47711231 -0.459624091
## [10,] -0.23138642 -0.92727329  0.04955606 -0.017246277
## [11,] -0.19430907 -0.38257361 -0.26531432  0.630049848
## [12,] -0.66903676  0.17674572 -0.16014889  0.067177474
## 
## [[4]]
## , , 1
## 
##            [,1]        [,2]        [,3]      [,4]
## [1,]  0.3666743  0.86477950  0.61013519 0.4585473
## [2,] -0.7287975  0.89967854 -0.26984724 0.1715412
## [3,] -0.2753835 -0.27174429 -0.02783652 0.5228947
## [4,]  0.2106197 -0.09700542 -0.95923363 0.2126882
## 
## , , 2
## 
##            [,1]        [,2]       [,3]        [,4]
## [1,] -0.5464374 -0.57925350  0.2394385  0.13922962
## [2,] -0.2060458  0.01186567 -0.2479210  0.52160749
## [3,]  0.6099824 -0.40519047 -0.1605294 -0.05806732
## [4,]  0.7619924 -0.03927812  1.0184070  0.76831379
## 
## , , 3
## 
##            [,1]       [,2]       [,3]          [,4]
## [1,] 0.01579773 -0.0176360  0.1042295 -0.6998017767
## [2,] 1.12913619 -0.1744167  0.1694019 -0.0537070606
## [3,] 0.11367376 -0.5660887  0.5790242 -0.0007441308
## [4,] 0.00766533 -0.7063434 -0.6071945 -0.0742630499
## 
## , , 4
## 
##            [,1]        [,2]        [,3]       [,4]
## [1,] -0.4415180 -0.45882179 0.189424993 -0.4583560
## [2,]  0.4881191 -0.07763796 0.534154371 -0.3413463
## [3,] -0.1194739  0.71146855 0.622996667  0.7601848
## [4,] -0.2681222  0.75851682 0.004319161 -0.7493150
## 
## , , 5
## 
##            [,1]        [,2]       [,3]      [,4]
## [1,]  0.3594965 -0.44704612  0.1057529 0.6412851
## [2,] -0.4734836 -0.02154132  0.8362356 0.0703729
## [3,] -0.1015575  0.59180270 -0.2234910 0.3626638
## [4,]  0.7883569  0.48251087  0.6203973 0.1364310
## 
## , , 6
## 
##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.6413329 -0.0520821  0.0313601 -0.7862978
## [2,]  0.2766572  0.5259953  1.0408831  0.7330976
## [3,] -0.4612240  0.6870955 -0.2413829 -0.1735197
## [4,] -0.8082452 -0.7595234  0.7075496 -0.6229779
## 
## 
## [[5]]
##              [,1]        [,2]        [,3]        [,4]
##  [1,] -0.11900916  0.72971816 -0.33534413 -0.16856932
##  [2,] -0.56655781 -0.01421153 -0.44314434  0.25331983
##  [3,]  0.02974235  0.39626672  0.35083549 -0.16975625
##  [4,]  0.08175985  0.01612911  0.06762713  0.03284772
##  [5,] -0.10161875 -0.04482800  0.34340020  0.33128165
##  [6,]  0.27189293  0.34943436  0.27667359  0.51511528
##  [7,] -0.22418646 -0.17966680  0.14259016 -0.36500830
##  [8,]  0.11894101 -0.03845361 -0.18004009 -0.05569821
##  [9,] -0.06241772  0.05742287  0.38727606 -0.27966268
## [10,] -0.59027383  0.15893478  0.33824411 -0.16558486
## [11,] -0.38817067  0.16736336 -0.01299629  0.34651855
## [12,] -0.16592813 -0.15620554  0.18829157  0.39244612
## 
## [[6]]
##             [,1]         [,2]         [,3]         [,4]
## [1,]  1.06913329 -0.072243449  0.064732772 -0.056487174
## [2,] -0.07224345  0.874066557  0.007063041 -0.007362562
## [3,]  0.06473277  0.007063041  0.948800593 -0.009399430
## [4,] -0.05648717 -0.007362562 -0.009399430  1.001389459
## 
## [[7]]
##   [1]  1.0132328710 -0.9105612976  0.2230084069  0.9784037440  0.2759980196
##   [6] -0.3787957167 -1.0169369015  0.7623340902 -0.0320304800 -0.5151115759
##  [11]  0.7903743678  0.0026397307 -0.0919356017  0.0790732281  0.1479026489
##  [16] -0.0390626056 -0.7313149438  0.0754913824  0.1723464850 -0.0094269050
##  [21] -0.5089053016  0.1856114264  0.7200875059  0.4967089726 -0.0142708566
##  [26] -0.2820881047 -0.1119655280 -0.5372265136  0.2527437636  0.1591187075
##  [31] -0.1164758339  0.0113765575 -0.2157024076 -0.3374165063  0.4771123077
##  [36] -0.4596240914 -0.2313864218 -0.9272732890  0.0495560606 -0.0172462772
##  [41] -0.1943090688 -0.3825736088 -0.2653143217  0.6300498485 -0.6690367591
##  [46]  0.1767457169 -0.1601488935  0.0671774739  0.3666742930  0.8647795015
##  [51]  0.6101351924  0.4585473120 -0.7287975411  0.8996785444 -0.2698472400
##  [56]  0.1715411534 -0.2753834548 -0.2717442880 -0.0278365186  0.5228947384
##  [61]  0.2106197266 -0.0970054220 -0.9592336331  0.2126882478 -0.5464374357
##  [66] -0.5792534966  0.2394384985  0.1392296218 -0.2060458352  0.0118656721
##  [71] -0.2479210148  0.5216074862  0.6099824149 -0.4051904672 -0.1605293560
##  [76] -0.0580673216  0.7619923905 -0.0392781166  1.0184070112  0.7683137886
##  [81]  0.0157977321 -0.0176359999  0.1042294944 -0.6998017767  1.1291361885
##  [86] -0.1744166518  0.1694018671 -0.0537070606  0.1136737596 -0.5660887218
##  [91]  0.5790242132 -0.0007441308  0.0076653298 -0.7063434037 -0.6071945083
##  [96] -0.0742630499 -0.4415179511 -0.4588217919  0.1894249926 -0.4583559714
## [101]  0.4881190547 -0.0776379596  0.5341543712 -0.3413463261 -0.1194739358
## [106]  0.7114685497  0.6229966674  0.7601847997 -0.2681222285  0.7585168195
## [111]  0.0043191611 -0.7493149802  0.3594964704 -0.4470461217  0.1057529089
## [116]  0.6412850683 -0.4734836453 -0.0215413243  0.8362356329  0.0703728975
## [121] -0.1015575348  0.5918026977 -0.2234910091  0.3626638219  0.7883569250
## [126]  0.4825108655  0.6203972867  0.1364309830  0.6413329012 -0.0520821005
## [131]  0.0313601037 -0.7862978038  0.2766571825  0.5259952870  1.0408830938
## [136]  0.7330976036 -0.4612239907  0.6870955302 -0.2413828743 -0.1735197274
## [141] -0.8082452080 -0.7595233690  0.7075495824 -0.6229778721
## 
## [[8]]
## [1] -5.625305

Note that the parameter values for mining the matrix were identical to those chosen in the simulations from 19 AUG 2021. The results are as expected; all traits evolve in response to selection to feed on both crop types and resist both pesticide types.

Asexual, no rotation

Asexual, pesticide rotation

Asexual, both rotation

Selfing, no rotation

Selfing, pesticide rotation

Selfing, both rotation

Non-selfing, no rotation

Non-selfing, pesticide rotation

Non-selfing, both rotation

Biparental, no rotation

Biparental, pesticide rotation

Biparental, both rotation

For full simulations, I think it will be important to have each replicate (and I’m including all of the above as one replicate) with its own mined gmatrix.

Update: 19 AUG 2021

The helicoverpa package will be submitted to CRAN sometime before early October What this will require is some appropriate testing functions and documentation, and maybe one or two vignettes that show how the package works. I’ll also make a website similar to the GMSE website. We will want to decide if the package name should actually be ‘helicoverpa’, or if we should switch to something else (endorse is unfortunately taken; I’m not sure if ‘ENDORSE’ in all caps would work, but probably best to not try this and avoid confusion). I will update with more details about this later.

For now, I want to focus on some simulations that I have recently run with the idea of laying the foundations for a general paper on landscape heterogeneity and pesticide resistance (not so much focusing on the details of the Helicoverpa system). I’ve not yet checked to see how novel the general ideas are because this started more as a verification of the model (the code is working as intended, as far as I can tell).

The model focuses on the way that negative genetic correlations affect niche differentiation across different ploidy and mating types. The idea is that sex can act as a constraint to adaptation due to outbreeding depression in a way that cannot happen given asexual reproduction. Given a heterogeneous landscape, specifically in the context of the evolution of pesticide resistance and crop specialisation, asexually reproducing pests can get the upper hand by specialising on specific combinations of crop and pesticide and ensuring that their descendants will keep these specialised genotypes intact. In contrast, sexually reproducing organisms have a risk of outbreeding depression, breeding with a conspecific that is specialised to different conditions and having offspring that is not good at dealing with any environment. I have not seen this point in the literature, and I need to figure out the extent to which it has or has not been made and modelled to determine whether or not it is actually novel.

What follows is a lot of simulation results from four different types of species set by the repro argument in the R package:

asexual: A haploid organism that reproduces asexually.
selfing: A monoecious diploid organism that reproduces sexually and can self-fertilise.
non-selfing: A monoecious diploid organism that reproduces sexually and cannot self-fertilise.
biparental: A dioecious diploid organism that reproduces sexually.

For each of these different organism types, I included two different crop types and two pesticide types, and I ran three different pesticide and crop rotation procedures on a landscape with 24 farms for 480 time steps (maybe very loosely thinking of 12 time steps as a year):

no_rotation: No rotation of either crop or pesticide. Farms randomly select a crop type and a pesticide to use at the start of the simulation and use it for the whole simulation.
pesticide_rotation: Crop is not rotated, but pesticide is randomly selected every 12 time steps. Farms randomly select a crop type and pesticide to use at the start of the simulation, and farmers use the same crop each time step.
both: Rotation of both crop and pesticide. Farms randomly select a crop and pesticide to use at the start of the simulation, and then randomly select a new crop and pesticide to use every 12 time steps.

I’m deliberately not having a crop rotation only simulation as it seems unrealistic and potentially a bit redundant, at least for now. The details of the simulation are as follows. The genome was created with the mine_gmatrix settings below.

mg  <- mine_gmatrix(gmatrix = gmt, loci = 12, indivs = 2000, npsize = 12000, 
                    max_gen = 2400, sampleK = 1200, chooseK = 6, layers = 6,
                    mu_pr = 0.05, pr_cross = 0.05, mu_sd = 0.01, 
                    term_cri = -12);

The output of mg is below. Note that list element 2 shows the target variance covariance matrix of four traits. Element 6 below gives the mined matrix.

## [[1]]
##  [1]    12.00     6.00  2000.00 12000.00     0.05     0.01  2400.00     0.05
##  [9]  1200.00     6.00   -12.00     0.10     0.00
## 
## [[2]]
##      [,1] [,2] [,3] [,4]
## [1,]  1.0 -0.5  0.2  0.2
## [2,] -0.5  1.0  0.2  0.2
## [3,]  0.2  0.2  1.0 -0.5
## [4,]  0.2  0.2 -0.5  1.0
## 
## [[3]]
##              [,1]        [,2]          [,3]        [,4]
##  [1,] -0.62181158 -0.23246088  1.027398e-01  0.36089789
##  [2,]  0.11516718  0.35628744  1.932607e-06  0.38293051
##  [3,] -0.33493318  0.32627944  6.940649e-02 -0.30558144
##  [4,] -0.07520604 -0.32440111  3.741366e-01  0.24383415
##  [5,] -0.51259791 -0.06048184 -8.689474e-01  0.60746638
##  [6,]  0.21803456 -0.38792630  2.014267e-01 -0.42601954
##  [7,] -0.22419755 -0.84531007  5.902170e-01 -0.10942012
##  [8,]  0.73822815 -0.30174497  2.573475e-01  0.23633850
##  [9,] -0.45722066  0.09532205  5.540713e-01 -0.06608855
## [10,]  0.28739580 -0.12243227 -1.553612e-01 -0.01861052
## [11,] -0.09843707 -0.18971567  6.172619e-01 -0.17746038
## [12,]  0.64383533 -0.19503881 -1.054181e-01 -0.37567049
## 
## [[4]]
## , , 1
## 
##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.1876383 -0.7766931  0.2561096 -0.1594451
## [2,] -0.7218431 -0.1720015  0.1051845 -0.1391646
## [3,] -0.5724185 -0.6303034 -0.1351206 -0.1521792
## [4,] -0.1518009 -0.4726420  0.6897870 -0.2687902
## 
## , , 2
## 
##             [,1]       [,2]       [,3]       [,4]
## [1,] -0.89598885 -0.2361661 0.03713508 -0.4339221
## [2,] -0.03818028  0.1322045 0.52982046 -0.2206081
## [3,] -0.16489913  0.2119842 0.44171859  0.7881612
## [4,]  0.35726940 -0.5271966 0.13122182 -0.2990653
## 
## , , 3
## 
##             [,1]        [,2]       [,3]       [,4]
## [1,]  0.41092320  0.28480202 -0.1120791 -0.4511020
## [2,]  0.19794635  0.70486627 -0.2855835 -0.3474222
## [3,] -0.84575895 -0.06894474  0.2846113 -0.5004831
## [4,]  0.08297854  0.31942452  0.8976090  0.4573121
## 
## , , 4
## 
##            [,1]      [,2]        [,3]       [,4]
## [1,]  0.2253143 0.2004491  0.74271147  1.0243443
## [2,] -0.7223198 0.5598523  0.41619436 -0.1152639
## [3,] -0.2417876 0.5954666 -0.03304092 -0.4663887
## [4,]  0.4589945 0.5626436 -0.42824309 -0.3633294
## 
## , , 5
## 
##             [,1]       [,2]      [,3]        [,4]
## [1,]  0.05722936 -0.7678113 0.7009198  0.15491320
## [2,]  0.08709396 -0.6969477 0.4913277 -0.73493561
## [3,] -0.70780111  0.7518397 0.4779771  0.41039047
## [4,] -0.18146366 -0.6877152 0.4546469 -0.01592507
## 
## , , 6
## 
##              [,1]       [,2]       [,3]        [,4]
## [1,] -0.484868322  0.2352634 -0.5067399  0.51615705
## [2,]  1.272293446 -0.7279591 -0.4885374  1.19602908
## [3,]  0.653742180 -0.6679193  0.2497727 -0.05637619
## [4,] -0.008199929  1.2147659  0.9983367  0.26934001
## 
## 
## [[5]]
##              [,1]         [,2]         [,3]        [,4]
##  [1,]  0.11355835  0.002908792 -0.122576128  0.24062456
##  [2,]  0.19860768 -0.560485161 -0.238001247 -0.13433740
##  [3,]  0.35488346  0.057368027  0.251966132  0.15915978
##  [4,] -0.01866734 -0.145837880 -0.007194782 -0.16074206
##  [5,] -0.23320888  0.208434036 -0.643175344  0.63047911
##  [6,] -0.24710427  0.412170404  0.305025482 -0.13385957
##  [7,] -0.21817543  0.395054251  0.293741747 -0.11102551
##  [8,] -0.33650514 -0.312956818 -0.128296727 -0.52754011
##  [9,]  0.46183511 -0.197388878  0.295203638 -0.03835963
## [10,] -0.24765628  0.105117973 -0.078117455 -0.06127897
## [11,]  0.17745685 -0.052061251  0.344817622 -0.22442234
## [12,] -0.40187122  0.296093799  0.120292139 -0.22079164
## 
## [[6]]
##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.9936103 -0.5106387  0.2324040  0.2367691
## [2,] -0.5106387  1.0193888  0.2573216  0.2723028
## [3,]  0.2324040  0.2573216  0.9334559 -0.4480018
## [4,]  0.2367691  0.2723028 -0.4480018  0.9689539
## 
## [[7]]
##   [1] -6.218116e-01 -2.324609e-01  1.027398e-01  3.608979e-01  1.151672e-01
##   [6]  3.562874e-01  1.932607e-06  3.829305e-01 -3.349332e-01  3.262794e-01
##  [11]  6.940649e-02 -3.055814e-01 -7.520604e-02 -3.244011e-01  3.741366e-01
##  [16]  2.438342e-01 -5.125979e-01 -6.048184e-02 -8.689474e-01  6.074664e-01
##  [21]  2.180346e-01 -3.879263e-01  2.014267e-01 -4.260195e-01 -2.241976e-01
##  [26] -8.453101e-01  5.902170e-01 -1.094201e-01  7.382282e-01 -3.017450e-01
##  [31]  2.573475e-01  2.363385e-01 -4.572207e-01  9.532205e-02  5.540713e-01
##  [36] -6.608855e-02  2.873958e-01 -1.224323e-01 -1.553612e-01 -1.861052e-02
##  [41] -9.843707e-02 -1.897157e-01  6.172619e-01 -1.774604e-01  6.438353e-01
##  [46] -1.950388e-01 -1.054181e-01 -3.756705e-01  1.876383e-01 -7.766931e-01
##  [51]  2.561096e-01 -1.594451e-01 -7.218431e-01 -1.720015e-01  1.051845e-01
##  [56] -1.391646e-01 -5.724185e-01 -6.303034e-01 -1.351206e-01 -1.521792e-01
##  [61] -1.518009e-01 -4.726420e-01  6.897870e-01 -2.687902e-01 -8.959888e-01
##  [66] -2.361661e-01  3.713508e-02 -4.339221e-01 -3.818028e-02  1.322045e-01
##  [71]  5.298205e-01 -2.206081e-01 -1.648991e-01  2.119842e-01  4.417186e-01
##  [76]  7.881612e-01  3.572694e-01 -5.271966e-01  1.312218e-01 -2.990653e-01
##  [81]  4.109232e-01  2.848020e-01 -1.120791e-01 -4.511020e-01  1.979463e-01
##  [86]  7.048663e-01 -2.855835e-01 -3.474222e-01 -8.457590e-01 -6.894474e-02
##  [91]  2.846113e-01 -5.004831e-01  8.297854e-02  3.194245e-01  8.976090e-01
##  [96]  4.573121e-01  2.253143e-01  2.004491e-01  7.427115e-01  1.024344e+00
## [101] -7.223198e-01  5.598523e-01  4.161944e-01 -1.152639e-01 -2.417876e-01
## [106]  5.954666e-01 -3.304092e-02 -4.663887e-01  4.589945e-01  5.626436e-01
## [111] -4.282431e-01 -3.633294e-01  5.722936e-02 -7.678113e-01  7.009198e-01
## [116]  1.549132e-01  8.709396e-02 -6.969477e-01  4.913277e-01 -7.349356e-01
## [121] -7.078011e-01  7.518397e-01  4.779771e-01  4.103905e-01 -1.814637e-01
## [126] -6.877152e-01  4.546469e-01 -1.592507e-02 -4.848683e-01  2.352634e-01
## [131] -5.067399e-01  5.161570e-01  1.272293e+00 -7.279591e-01 -4.885374e-01
## [136]  1.196029e+00  6.537422e-01 -6.679193e-01  2.497727e-01 -5.637619e-02
## [141] -8.199929e-03  1.214766e+00  9.983367e-01  2.693400e-01
## 
## [[8]]
## [1] -6.237817

This output was used to run simulation with the following parameter values (asexual without rotation is shown below).

sim <- run_farm_sim(mine_output              = mg,
                    N                        = 1000,
                    neutral_loci             = 1000,
                    xdim                     = 100,
                    ydim                     = 100,
                    repro                    = "asexual",
                    max_age                  = 6,
                    selfing                  = TRUE,
                    food_consume             = c("T1", "T2"),
                    pesticide_consume        = c("T3", "T4"),
                    food_needed_surv         = 1,
                    food_needed_repr         = 1,
                    reproduction_type        = "food_based",
                    pesticide_tolerated_surv = 0,
                    pesticide_rotation_type  = 2,
                    crop_rotation_type       = 2,
                    min_age_reproduce        = 4,
                    max_age_feed             = 3,
                    lambda_value             = 1.5,
                    farms                    = 24,
                    time_steps               = 480,
                    mutation_pr              = 0.001,
                    crossover_pr             = 0.1,
                    net_mu_layers            = 0,
                    crop_rotation_time       = 12,
                    pesticide_rotation_time  = 12,
                    crop_per_cell            = 8,
                    pesticide_per_cell       = 0.4,
                    crop_number              = 2,
                    pesticide_number         = 2,
                    K_on_birth               = 1000000,
                    min_age_move             = 4,
                    age_food_threshold       = 2,
                    min_age_feed             = 1,
                    pesticide_start          = 100,
                    print_last               = TRUE,
                    mating_distance          = 2,
);

I have left out some arguments that were set to their default options. Note that we have a 100 by 100 landscape. Two traits determine how much of crop an individual consumes at a time (“T1” and “T2”), and two separate traits determine how much of two types of pesticide an individual imbibes (“T3”, “T4”). Individuals start feeding in their first time step and stop feeding after their third time step. On their four time step, individuals can move and reproduce, and they have a maximum age of 6 (though mortality happens at the end of a time step). Each cell has 8 units of crop that an individual can eat, and the crop is replaced each time the land is rotated (i.e., individuals can keep eating and feeding on it until there is either nothing left on a cell or the farmer plants a new crop after 12 time steps). Individuals must eat at least 1 unit to survive and reproduce, and can starve on or after their second time step. Individuals also cannot tolerate any pesticide (note, ‘zero’ is kind of arbitrary, but it’s an easy number to use).

The following shows images of all of the different scenarios. For each scenario, I will show an image of the landscape on the last time step. Different coloured squares show different farms, and the points on top of them are pests. I will also show the dynamics of pest abundance and traits over time, and I will show scatterplots of the correlations between traits.

Note that the pesticide application starts on time step 100 to allow a period of burn-in.

Asexual, no rotation

Asexual, pesticide rotation

Asexual, both rotation

Selfing, no rotation

Selfing, pesticide rotation

Selfing, both rotation

Non-selfing, no rotation

Non-selfing, pesticide rotation

Non-selfing, both rotation

Biparental, no rotation

Biparental, pesticide rotation

Biparental, both rotation

It’s worth mentioning that the reason for the oscillating population size is due to individuals sometimes consuming all of the food and starting to decrease in population density before the next crop is put down. To get a full picture on what is going on, I will need to run multiple replicate simulations with different mined genomes (i.e., same genetic correlations, but potentially achieved in a different way). I will also want to run the same simulations with an identity matrix (i.e., no genetic correlations) to act as a control so that we can understand the role of the negative genetic correlations in all of these results.

I have not really given all of these figures a hard look to try to think more clearly about what’s going on yet – at least not apart from the outbreeding depression idea.

Update: 23 JUL 2021

I just want to walk through some of the work and simulations that I have been running in the past two weeks. I am not going to get into the details, but I just want to show what the model can do and roughly what kind of output we might expect for a larger set of simulations. Everything here is reproducible using the helicoverpa package, which can be downloaded from the R console using the code below.

install.packages("devtools"); # If you've not got devtools already installed
library(devtools);
install_github("bradduthie/helicoverpa");

There are no real safeguards or error messagese built into the package yet, so it’s probably best to keep to a small simulations when experimenting. There are a few key points I want to make before presenting the model.

The code is written in a very modular way. We can take bits out and rearrange most things without too much effort, so please do make suggestions if you think something is important to change. Most changes will take very little time for me to implement now that the broad structure of the code is written.
The results that I present below are really just about getting a feel for what the model can do, and for developing an intuition about what kind of results we might expect when we set things up more formally. It’s also another layer of checking to make sure that there are no major bugs in the code. Hence, the simulations that I have run reflect my own testing of what the model can do rather than the best test of any specific question.
I’m not going to explain all of the things that the software can do since I don’t think this would be terribly productive. I just want to give a sense of what we’re working with and what the possibilities are in relation to ENDORSE. There are some hidden hooks in here that were just too tempting not to code, which can be used for future papers. If I’ve done a good job, then it should be possible to use this one R package to address a range of potentially interesting questions. I don’t need to take the lead in everything, and I would be delighted if folks were to use the software for individual-based modelling.

Here are the basics of how to run simulations. You really only need to know two R functions, each of which can take a long list of arguments. The heavy lifting of both functions is done in C rather than R. The first function is the mine_gmatrix function. The second function is the run_farm_sim function. The role of mine_gmatrix is to generate a genome that produces an arbitrary number of phenotypic traits with an arbitrary covariance structure given a set of allele values with a standard normal distribution. The role of run_farm_sim is to do everything else – that is, to generate individuals with the mined genomes simulate the ecology and evolution of the system over time. The output of the simulation is the creation of one or more CSV files in the directory where the simulation is run (more on this later).

The `mine_gmatrix` function

The goal of the mine_gmatrix function is to get from standard normal (mean = 0, stdev = 1) allele values at some number of loci (loci) through a network of some number of hidden layers (layers) to produce a set of traits that have some variance covariance structure (gmatrix, which is a square covariance matrix).

The figure above shows an example with 18 loci and 9 traits, separated by 6 hidden layers. Each of the black arrows can be represented by a number indicating how much one locus (green) or node (blue) should contribute to the next (mathematically, arrows between two sets of nodes are represented by matrices that get multiplied together). These numbers become part of the genome for individuals, determining what a newly born individual’s traits will be given their allele values at each locus. For now, an evolutionary algorithm is used to find a set of numbers that will produce traits with a desired covariance structure given random standard normal allele values. How long it takes to find these numbers depends on the number of loci, layers, and traits, and how much error we are willing to accept. Here’s a very simple example of running the function with four loci, two layers, and two traits that have a variance of 1 and a covariance of -0.5.

library(helicoverpa);
gmt       <- matrix(data = 0, nrow = 2, ncol = 2);
gmt[1, 1] <- 1;
gmt[2, 2] <- 1;
gmt[1, 2] <- -0.5;
gmt[2, 1] <- -0.5;

The covariance matrix is shown below.

##      [,1] [,2]
## [1,]  1.0 -0.5
## [2,] -0.5  1.0

This matrix is set as the argument gmatrix = gmt. The number of loci is set with loci = 4, and the number of layers is set with layers = 2 (there is no need to set trait number because the function figures this out from the gmatrix). The rest of the arguments specify how the evolutionary algorithm is to run:

Argument	What it does
npsize	How big a population of networks evolve in the algorithm
indivs	How many random normals to test the fitness of each network
max_gen	The maximum number of generations allowed to evolve
sampleK	Number of randomly selected networks to compete
chooseK	Number of tournament winners (highest fitness)
mu_pr	Probability of a value in a network mutating
pr_cross	Probability of crossing over between networks
mu_sd	Standard deviation of mutation effect size (mean is 0)
term_cri	Logged stress criteria for termination

The details here are not terribly important, but the key concepts are explained in more detail in Hamblin (2013) and Duthie, Cusack, et al. (2018) in one of the GMSE R package vignettes. Basically, what we’re doing is creating a random population of npsize networks like the one in the figure above with random arrow values. We’re then letting this network evolve over time with selection for networks that do a better job of producing the gmatrix from a standard normal loci values. Evolution stops after max_gen generations have passed or the mean squared deviation of elements from the evolved variance covariance matrix and gmatrix is less than exp(term_cri). Finding a genome is a bit of an art relying on trying different parameter combinations and seeing what works. Big npsize (higher than 10000) with low mutation rates and standard deviations (both below 0.01) are slow and steady, but usually necessary for a large gmatrix. For smaller gmatrix, smaller populations and higher mutation and crossover rates are fine and quite fast. For the matrix gmt above, we can probably get away with the following:

mg  <- mine_gmatrix(gmatrix = gmt, loci = 4, layers = 2, npsize = 2000, 
                    indivs = 2000, max_gen = 800, sampleK = 1200, chooseK = 6,
                    mu_pr = 0.3, pr_cross = 0.3, mu_sd = 0.05, term_cri = -4);

I’m being extremely relaxed here with the term_cri and max_gen, but I just want to show a quick example. The function prints off each generation, the mean stress, and the minimum stress (best) genome found. This is mostly just to reassure you that it’s doing the job and has not crashed. The output in mg is a list with seven elements:

The parameters input into mine_gmatrix.
The specified gmatrix.
The first set of arrows from loci to network (green to blue above) in matrix form.
An array made up of layers square matrices, representing all of the rest of the arrows in the figure above.
The summed effects that each loci has on each trait.
The end variance covariance matrix.
The whole network (figure above) as a genome to be inserted in individuals.
The stress of the final network (mean squared deviation from gmatrix).

Here is the output for mg above.

print(mg);

## [[1]]
##  [1]    4.00    2.00 2000.00 2000.00    0.30    0.05  800.00    0.30 1200.00
## [10]    6.00   -4.00    0.10    0.00
## 
## [[2]]
##      [,1] [,2]
## [1,]  1.0 -0.5
## [2,] -0.5  1.0
## 
## [[3]]
##            [,1]       [,2]
## [1,] -0.7780116 -0.8220788
## [2,]  0.1761964  0.2002705
## [3,]  0.7962038 -0.5952858
## [4,]  0.4842877 -0.2810729
## 
## [[4]]
## , , 1
## 
##            [,1]      [,2]
## [1,]  0.8775310 0.3140504
## [2,] -0.1439212 0.7641448
## 
## , , 2
## 
##           [,1]       [,2]
## [1,] 0.5217076 -0.8884919
## [2,] 0.7151524  0.1385427
## 
## 
## [[5]]
##            [,1]        [,2]
## [1,] -0.9184457  0.38059627
## [2,]  0.2146444 -0.08289933
## [3,]  0.2627214 -0.72528297
## [4,]  0.1979856 -0.42221579
## 
## [[6]]
##            [,1]       [,2]
## [1,]  1.0484333 -0.6448085
## [2,] -0.6448085  0.8521395
## 
## [[7]]
##  [1] -0.7780116 -0.8220788  0.1761964  0.2002705  0.7962038 -0.5952858
##  [7]  0.4842877 -0.2810729  0.8775310  0.3140504 -0.1439212  0.7641448
## [13]  0.5217076 -0.8884919  0.7151524  0.1385427
## 
## [[8]]
## [1] -4.195757

Note that the variance covariance matrix in [[6]] is reasonable, but probably not good enough for real simulations. It took less than 3 minutes to get, but the time taken appears to scale exponentially with gmatrix size and term_cri. But if we accept this result, we have all that we need to move ahead with the next function for actual simulation.

The `run_farm_sim` function

This function runs a single replicate simulation from start to finish. It generates a landscape divided into some arbitrary number of farms and places new individuals on them. A series of events then happen in pretty much any order that we want them to happen over the course of time_steps discrete time steps.

First, individuals are initialised with their genomes as found in mine_gmatrix. They are given random allele values for their loci. Other options for individuals include the following (this is not exhaustive):

The reproduction type repro can be set as asexual, sexual (all individuals female and male with optional selfing), or biparental (females and males separate).
A max_age that an individual can be before death.
A minimum and maximum age at which the individual can move (min_age_move and max_age_move).
A minimum and maximum age at which the individual can reproduce (min_age_reproduce and max_age_reproduce).
A minimum and maximum age at which the individual can feed and imbibe pesticide (min_age_feed).
The amount of food consumed during feeding on a cell (food_consume, which can be a vector of up to length 10 – an element for each possible food type).
The amount of pesticide consumed during feeding on a cell (pesticide_consume, which can be a vector of up to length 10 – an element for each possible pesticide type).
The distance in cells an individual can travel when moving (move_distance).
The amount food that an individual needs to consume to survive (food_need_surve) and the age at which this threshold is assessed (age_food_threshold).
The amount of pesticide imbibed that an individual can tolerate before death (pesticide_tolerated_repr) and the age at which this threshold is assessed (age_pesticide_threshold).
The probability of mutation (mutation_pr) and crossover (crossover_pr).
How much of the genome can mutate (net_mu_layers).
How many neutral loci individuals have (neutral_loci).

When the simulation starts, the following happens:

The crop_number crops (up to 10 possible) grown on farms unique farms are changed at crop_rotation_time intervals (rules for changing can be adjusted).
The pesticide_number pesticides (up to 10 possible) applied on each farm is changed at pesticide_rotation_time intervals (here too, rules for changing can be adjusted).
Individuals age by a time step (this is fixed; all individuals do it at the start of a time step, but individuals can be as old as we like)
If individuals are at least min_age_feed and not over max_age_feed, then they eat on the landscape cell in which they are located. The amount of food that they consume on the cell can be set as food_consume, which can be a vector of up to 10 elements (one for each food type). Food consumption rates can also be specified as a trait (more on that later). Food consumption occurs in a random order of individuals.
Similarly, if individuals at leastmin_age_feed and not over max_age_feed (same as feeding) imbibe biopesticde on the landscape cell in which they are located at a rate set by pesticide_consume, which can also be a vector of up to 10 elements (one for each pesticide type). Biopesticide consumption rates can also be specified as a trait. Biopesticide consumption occurs in a random order of individuals.
Individuals move on the landscape if they are at least min_age_move (there are more complex rules available for feeding and moving; the simplest is just moving once in any direction).
The number of offspring that an individual produces is calculated, potentially based on their food consumption, mate access, parameter values, or traits.
New offspring are made, potentially with recombination in the case of sexual reproduction.
Individuals die according to some criteria (food and biopesticide consumption, age).

The simulation ends after a fixed number of time steps (time_steps) have passed, or if the population goes extinct. I’ll now show a very simplified example of a simulation being run, then a more complicated example.

A simple example of the simulation

In this simulation, I will just consider a very simple scenario of two traits, which will model the rate at which food is consumed (Trait 1) and the rate at which pesticide is imbibed (Trait 2). Just to get started, let each of these traits have a variance of 1 and a covariance of about 0.5 to model the idea that the rate at which you consume pesticide increases with the rate at which you consume food.

##      [,1] [,2]
## [1,]  1.0  0.5
## [2,]  0.5  1.0

Using the mine_gmatrix function with similar parameter values tried above, I got a genome that produces the following matrix given random standard normal allele values.

##           [,1]      [,2]
## [1,] 1.0186185 0.5655697
## [2,] 0.5655697 1.0950818

A convenient part of the code that I’ve not yet mentioned is that we can replace several numerical parameter arguments with evolving traits. This is possible for move_distance, food_needed_surv, pesticide_tolerated_surv, food_needed_repr, pesticide_tolerated_repr, mating_distance, lambda_value (raw reproduction value), movement_bouts, food_consume, and pesticide_consume. If, for example, we want movement distance in cells to be defined by trait one, we just set move_distance = "T1" (in this case, the value will be floored since cells are discrete). With that in mind, we can run the following simulation.

sim <- run_farm_sim(mine_output              = mg,
                    N                        = 1000,
                    neutral_loci             = 1000,
                    xdim                     = 40,
                    ydim                     = 40,
                    repro                    = "sexual",
                    max_age                  = 5,
                    selfing                  = FALSE,
                    food_consume             = "T1",
                    pesticide_consume        = c("T2", 0),
                    food_needed_surv         = 1,
                    food_needed_repr         = 1,
                    reproduction_type        = "food_based",
                    pesticide_tolerated_surv = 0,
                    pesticide_rotation_type  = 2,
                    min_age_reproduce        = 3,
                    max_age_feed             = 2,
                    farms                    = 4,
                    time_steps               = 120,
                    mutation_pr              = 0.01,
                    crossover_pr             = 0.01,
                    net_mu_layers            = 0,
                    crop_rotation_time       = 4,
                    pesticide_rotation_time  = 4,
                    crop_per_cell            = 2,
                    pesticide_per_cell       = 0,
                    crop_number              = 1,
                    pesticide_number         = 2,
                    K_on_birth               = 1000000,
                    print_last               = TRUE
);

Note that the mine_output is just the whole vector output we got from the mine_gmatrix function. N is the initialsed number of individuals, xdim and ydim set the landscape dimensions (so we have a 40 by 40 square landscape). Individuals are sexual but cannot self. Note how food_consume = "T1", so the amount of food consumed by each individual will be determined by its value in trait 1 (had we set food_consume = 1, then all individuals would consume 1 unit of food; note that individuals cannot consume negative food). The amount of pesticide 1 consumed by an individual will be determined by "T2", while no individuals consume pesticide 2 (which can just be a trick for making some of the 4 farms not apply any pesticide – since anything applied has no effect). Note that we indicate one crop by crop_number = 1 and two pesticides by pesticide_number = 2 (the second one being irrelevant), and each cell applies 0 pesticide (pesticide_per_cell = 0), so this is actually a null model in which no pesticide is applied at all.

The argument K_on_birth limits the number of eggs that can be made in the population; if the population goes over, then eggs are randomly removed (this prevents the time consuming process of some individuals being initialised unnecessarily – if they were just going to die immediately anyway). The argument print_last indicates that we want to print all of the information (everything, genome and all) for each individual in the last generation. The resulting file is often in the 100s of MBs.

After the simulation runs, we get a file population_data.csv that gives information for each generation on the time step, population size, mean age, mean food consumed, mean pesticide consumed, mortality rate, mean food consumed of each type, mean pesticide consumed of each type, mean trait values, and mean inbreeding coefficient. Here’s a plot of population size, trait 1 value, trait 2 value, and mortality rate.

This makes sense (ignore the small spikes caused by the age structure). The population size increases, the trait 1 mean increases as individuals evolve to consume more food (which gives higher reproductive output), and trait 2 gets dragged along due to the correlation with trait 1 (despite not having any effect). Now we can see what happens when some of the farms start to apply pesticide by running the same model with pesticide_per_cell = 0.

Note that the population size grows more slowly now, and that while trait 1 underlying food consumption evolves to be positive, the value is not as high as it was without the pesticide. This is because trait 2 now evolves to be a negative value, and traits 1 and 2 are positively correlated.

The plot above shows traits 1 and 2 for all individuals in the population where no pesticide is applied. The correlation here is 0.3219144. We can compare this to a plot of the same traits for the simulation in which pesticide was applied.

Note the shifted position of the y-axis, in particular, as selection pushes pesticide consumption trait values down while pushing food consumption values up. Even though the network itself has not changed (i.e., standard random normals will still give individuals with traits of the original correlation structure), the correlation between trait 1 and 2 is now 0.0577854. This is interesting, to me at least. I’m also not entirely sure if I understand what is maintaining the variance in traits, nor did I expect the realised correlation to change so much just as a consequence of alleles being selected. Note that if you set net_mu_layers = 4 so that the entire genome can evolve, you get a strong negative covariance in 400 generations.

We can have a look how each trait is correlated with fitness, just measured by offspring production. Here is offspring production against the food consumption trait.

And here is offspring production against pesticide consumption trait.

The correlations are 0.0636287 and 0.0300173 for food consumption (trait 1) and pesticide consumption (trait 2) with offspring produced, respectively. This would seem to suggest that selection is quite weak after 120 generations? We could dive deeper into the mechanics of this if we wanted to, looking at how much food each individual consumed, exactly, and how much pesticide each individual consumed (and when, even, if needed). I’m tempted to let these run for a while longer to see if the correlation between traits drops to negative, and wondering if there is some reason for us to see a pattern of a ‘trade-on’ despite a hard coded loci to trait genome that causes a trade-off. Maybe there’s a reason to expect mutation of allele values alone to easily find a way to create this pattern, meaning that constraints are not as important as would be intuitive? I definitely need to think more about it.

A more complex simulation

I have now also looked through the following simulation scenarios in which there are four crops and two traits.

library(helicoverpa);
set.seed(Sys.time());
mg <- readRDS(file = "mg.rds");

sim <- run_farm_sim(mine_output              = mg,
                    N                        = 10000,
                    neutral_loci             = 1000,
                    xdim                     = 100,
                    ydim                     = 100,
                    repro                    = "biparental",
                    max_age                  = 10,
                    selfing                  = FALSE,
                    food_consume             = c("T1", "T2"),
                    pesticide_consume        = c("T3", "T4"),
                    food_needed_surv         = 1,
                    food_needed_repr         = 1,
                    reproduction_type        = "food_based",
                    pesticide_tolerated_surv = 0,
                    pesticide_rotation_type  = 2,
                    crop_rotation_type       = 2,
                    min_age_reproduce        = 8,
                    max_age_feed             = 6,
                    lambda_value             = 1.5,
                    farms                    = 16,
                    time_steps               = 600,
                    mutation_pr              = 0.01,
                    crossover_pr             = 0.01,
                    net_mu_layers            = 0,
                    crop_rotation_time       = 24,
                    pesticide_rotation_time  = 24,
                    crop_per_cell            = 2,
                    pesticide_per_cell       = 2,
                    crop_number              = 2,
                    pesticide_number         = 2,
                    K_on_birth               = 1000000,
                    min_age_move             = 8,
                    age_food_threshold       = 7,
                    min_age_feed             = 1,
                    pesticide_start          = 200,
                    );

These are a bit more high powered, but the general idea is that we now have four traits, two underlying crop consumption (one for each crop) and two underlying pesticide consumption (one for each pesticide). Here is what the covariance matrix was set to be.

##      [,1] [,2] [,3] [,4]
## [1,]  1.0 -0.5  0.2  0.2
## [2,] -0.5  1.0  0.2  0.2
## [3,]  0.2  0.2  1.0 -0.5
## [4,]  0.2  0.2 -0.5  1.0

Here is what the mining process came up with at some reasonable constraints.

##            [,1]       [,2]       [,3]       [,4]
## [1,]  0.8969052 -0.5597788  0.1942680  0.1290581
## [2,] -0.5597788  0.9774151  0.1447646  0.3630678
## [3,]  0.1942680  0.1447646  1.0671198 -0.5870404
## [4,]  0.1290581  0.3630678 -0.5870404  1.0399424

The general idea is that if you are good at consuming one crop, then you are probably not so good at consuming another crop. If you are good at consuming either crop, then you probably also imbibe a lot of pesticide. And if you are good at avoiding pesticide consumption in one trait, then you probably are in the other. I’ve run simulations above for all combinations of the following:

Reproductive type (asexual, sexual, and biparental)
No rotation of crops or pesticide
Only rotation of pesticide
Rotation of both crops and pesticide

When rotated, each of 16 farms independently selects a crop and a pesticide at random. Below, I’m just going to show plots of population size, mortality rate, and each of the four trait values over time. I’ve run all of these simulations for 300 generations, and I’m considering the first 100 as a burn in before the pesticide is added. The pesticide is then added for the first time on time step 210. I’ve indicated these time steps with red vertical lines on the plots below. Note that these are just single runs – we would need to run many replicates to make robust inferences.

No rotation: Asexual population

No rotation: Sexual population

No rotation: Biparental population

Pesticide rotation only: Asexual population

Pesticide rotation only: Sexual population

Pesticide rotation only: Biparental population

Crop and pesticide rotation: Asexual population

Crop and pesticide rotation: Sexual population

Crop and pesticide rotation: Biparental population

Initial conclusions

These simulations show some interesting dynamics, though we will need to think more carefully before running them at a larger scale. Despite having population sizes on the order of tens of thousands, these ran quite quickly, with most simulations finishing up within 10-20 minutes. It would not be difficult to run a huge number in 1-2 months if necessary. Annoyingly, when we increase by another order of magnitude, the pace is glacial, with me stopping after trying it after 72 hours and no more than 70 generations. We might be able to get a bit more power through some creative refactoring of the code, and a bit if I made a version entirely in C. The longest step by far is sexual reproduction and allocating new offspring with diploid genomes. Haploid genomes and asexual populations are much, much faster in comparison.

I am thinking that it might be worthwhile to monitor the allelic diversity in the population in some way. Individuals are initialised with unique alleles, which at first I feared my be a bit unrealistic, but the population size starts out small, so perhaps not so much. We can always allow more burn-in to be safe.

It is nice to recover the sudden drop given the application of pesticide followed by the evolution of resistance and increase in population soon after. The strong selection is also reflected in the evolution of traits such that both pesticide traits are negative (indicating resistance).

One thing that concerned me at first was that only one of the crop feeding traits is positive in each simulation. This means that the population is quickly specialising on one type of food resource. But this is, I think, exactly what we would predict from theory in an evolving population. Evolutionary branching should only happen under some specific circumstances, which we would need to parameterise explicitly if we wanted to recreate. Perhaps we need to have a bit weaker of a negative correlation for feeding, or model multiple specialist populations?

Update: 27 JAN 2020

Work Package 1C develops an individual-based model to investigate how landscape diversity affects the maintenance of susceptibility alleles, and will focus especially on the scale at which diversifying biopesticide strain use and crop species production can overcome pest resistance in the long term. In our prototype model written in R, we did this with a very simple genetic architecture that used matching alleles at a single locus to determine pesticide resistance and feeding ability across crops. While this was useful as an initial model to get a general idea of the scale at which different biopesticide application and crop production will need to be applied, the matching alleles model forces a strong negative genetic correlation between biopesticide resistance and feeding ability across crops. This is unrealistic, and a large part of this work package will be to try to understand the scale at which diversity is needed when the genetic architecture of pests is more realistic. A major modelling goal is to develop a method for constructing a more realistic genetic architecture for simulating pest management in silico, such that arbitrary and potentially evolving genetic correlations between pest traits (resistance, feeding efficiency) can be constructed. This entry is an initial plan for this genetic architecture; what follows is just the starting point, all subject to further revision.

When modelling the evolution of pest resistance and crop feeding ability, we need to get from pest genome to pest traits. Everything in between is the challenging bit, but these two end points are actually fairly straightfoward.

Genome can be modelled as a vector of real numbers. Each individual pest gets its own genome, which is essentially a very long vector in which each element is a single locus. There are relevant details to fill in later. For example, there are several ways to model how alleles work. One of the simplest ways is an infinite alleles model, in which each locus holds a real number of some arbitrary value. New mutations are then introduced by randomly selecting a new real number from some distribution. A slightly more challenging way (computationally) is to have some fixed number of discrete alleles that are possible for an individual to have (e.g., 100 alleles at each locus), but I do not think that this is the best way to go unless there is some compelling reason for it. Ploidy level can also be determined in advance; the simplest is haploid, but diploid organisms can be created simply by pairing vector elements (loci) and having rules for homozygosity and heterozygosity. In the end, however, we will have a long vector of numbers (1000s to 10000s of elements) with values that affect pest traits in some way, and are expected to change over the course of the simulation (evolution).
Traits can be modelled, again, as some vector of real numbers. Each individual gets their own values for a fixed number of ‘traits’ (e.g., sex, location, dispersal ability, resistance to a particular biopesticide, ability to feed on a particular crop). Any number of traits are possible, making the model as complex as it needs to be, but the idea is that each pest has its own set of values that make it unique. Some traits will not be directly related to the genome (e.g., location on the landscape). But most trait values, and especially bipesticide resistance and feeding ability, will be some function of the underlying genome values.

The challenge is to get from genome to traits in a way that allows traits to be quantitative and correlated in some realistic and potentially evolving way. Below is a starting point, though the details are likely to change. Feel free to leave comments here

A diploid genome will be constructed with \(N_{L}\) total loci, so that each pest genome will represented by a vector of length \(2L\). Vector elements will be able to take any real number, thereby modelling potentially infinite alleles. Allele values (i.e., the numbers at the loci) will be used in a function that ultimately determines the values of up to \(N_{T}\) pest traits. The simplest way to do this is to just have a single locus for a single trait with additive effects of alleles. For example, for a locus \(L_{i}\) with alleles \(A_{i1}\) and \(A_{i2}\), the value of \(T_{i} = 1/2(A_{i1} + A_{i2})\). More generally, any different rules for making a trait (e.g., complete or partial dominance) could also be modelled by setting a function \(T_{i} = f(A_{i1}, A_{i2})\). Similarly, we can also make traits polygenic by having them determined by multiple loci. For example, we might want five traits, each of which take values that are determined by 10 total loci in the genome (20 alleles). So, e.g., \(T_{1} = \sum_{i=1}^{10}(A_{i1} + A_{i2})\), \(T_{2} = \sum_{i=11}^{20}(A_{i1} + A_{i2})\), etc. This would give us five traits, each of which takes a value that is the sum of 20 individual real numbers within the genome.

One way to allow for correlated traits would be to have two separate trait values (e.g., \(T_{1}\) and \(T_{2}\)) be determined by some overlapping set of loci. Correlation between traits could be adjusted by adjusting the size of this overlap. For complete correlation, e.g., the two traits would be determined by the exact same loci. Negative correlations could be constructed by inverting the effects of loci such that \(L_{i} \to T_{1}\) but \(-L_{i} \to T_{2}\). Still, this means that the correlations must be set a priori, and sets some restrictions on the kinds of correlations that are possible (e.g., precision as a consequence of the total number of alleles that can be overlapping versus non-overlapping). Hence, I propose a slightly different construction, which will allow us to go from genome to traits in a way that allows us to set genetic correlations without needing to fix them a priori through some pre-specified mechanism. First, here is a simplified version of what I want to try.

The basic idea

Assume a large population of individuals with only two haploid loci \(L0\) and \(L1\), and two traits \(T0\) and \(T1\). There are four possible links between \(L_{i} \to T_{j}\), which we can call \(X_{ij}\) describing the link between \(L_{i}\) and \(T_{j}\): \(X_{00}\), \(X_{01}\), \(X_{10}\), and \(X_{11}\). Assume that \(X_{00} = 1\), \(X_{01} = 0.3\), \(X_{10} = 0\), and \(X_{11} = \sqrt{1 - 0.3^{2}}\). That is,

\[T_{0} = X_{00},\]

while,

\[T_{1} = \sqrt{1 - 0.3^{2}} X_{11} + 0.3 X_{01}.\]

If we randomly sample \(L0 \sim \mathcal{N}(0, 1)\) and \(L1 \sim \mathcal{N}(0, 1)\), then trait variances equal \(1\), while \(cov(T_{0}, T_{1}) = 0.3\). We can confirm this by simulating a population of \(N = 100000\) individuals.

N    <- 100000;
inds <- matrix(data = 0, nrow = N, ncol = 9);
colnames(inds) <- c("ID", "L0", "L1", "X00", "X01", "X10", "X11", "T0", "T1");
inds[,1]  <- 1:N;
inds[,2]  <- rnorm(n = N, mean = 0, sd = 1);
inds[,3]  <- rnorm(n = N, mean = 0, sd = 1);
inds[,4]  <- 1;
inds[,5]  <- 0.3;
inds[,6]  <- 0;
inds[,7]  <- sqrt(1 - 0.3^2);
inds[,8]  <- (inds[,2] * inds[,4]) + (inds[,3] * inds[,6]);
inds[,9]  <- (inds[,2] * inds[,5]) + (inds[,3] * inds[,7]);

So the individuals are in rows, while their attributes (IDs, genomes, interactions, traits) are in columns.

##      ID         L0         L1 X00 X01 X10       X11         T0         T1
## [1,]  1  1.2151489 -0.7894837   1 0.3   0 0.9539392  1.2151489 -0.3885747
## [2,]  2  1.1639301  0.6970085   1 0.3   0 0.9539392  1.1639301  1.0140827
## [3,]  3  0.6699372  0.3480979   1 0.3   0 0.9539392  0.6699372  0.5330454
## [4,]  4 -1.5380836 -1.0903007   1 0.3   0 0.9539392 -1.5380836 -1.5015056
## [5,]  5 -0.5447039 -0.2878480   1 0.3   0 0.9539392 -0.5447039 -0.4380007
## [6,]  6  1.5296328  0.9251678   1 0.3   0 0.9539392  1.5296328  1.3414436

We can look at the covariance between traits \(T0\) and \(T1\) in columns 8 and 9 (which might then code for ‘resistance ability’ or ‘feeding efficiency’, or something like it).

cov(inds[,8:9]);

##           T0        T1
## T0 1.0156514 0.3069984
## T1 0.3069984 1.0093656

This is roughly what we want as our variance covariance matrix. One really simple way to acheive this for five traits, for example, could be just to consider five loci (or blocks of some number of loci) \(L_{i}\) and all of the interactions between locus \(i\) and \(j\), \(X_{ij}\) to get to each trait.

In the figure above, five loci \(L_{i}\) are connected to each trait \(T_{j}\) by arrows \(X_{ij}\). Arrow values could be represented within the genome, as with the individual attributes "X00", "X01", "X10", and "X11" in the toy example above. For a given desired covariance matrix of trait values, calculations of \(X_{ij}\) become longer as trait number increases, but I should be able to write the code to get this for an arbitrary number of traits.

This is a simple way of doing things, but I think that it still leaves something to be desired. If we want the genetic covariances themselves to evolve (rather than just the traits via changing allele frequences in \(L_{i}\)), then we are forced to do it in a simplified way in which genetic covariances will almost inevitably evolve to become positive so that a ‘good’ value of \(T0\) and \(T1\) can be assured by a joint change with a change in \(L0\) by modifying \(X_{01}\) accordingly. In other words, any variation stuck into the genome that modifies \(X_{ij}\) values can be done so that there is directional selection increasing the fitness of \(T_{j}\) via change in \(L_{i}\), I think. What we really want is to tangle things so that changes along the path from alleles to traits is more complex. Below is my initial plan for increasing the complexity in such a way as to make the genetic architecture itself evolve a bit more realistically.

Adding some complexity

I propose the addition of multiple hidden layers beween loci and traits, effectively building something resembling a neural network (see here for video), but perhaps loosely interpreting these layers as steps along biochemical or developmental pathways from alleles to traits affecting fitness.

These hidden layers (blue) can now make each trait value a linear combination of all of the paths from each locus to each trait. For example, the value of \(L0\) can now get to \(T0\) in up to 100 different ways. Each link between nodes has its own value to modify the effect of a locus’ value on the end value of each trait. For example if the link from \(L0 \to H0 = 0.5\), \(H0 \to H0 = 0.5\), and \(H0 \to T0 = -1\), then this one complete path causes \(L0\) to decrease \(T0\) by \(L0(0.5 \times 0.5 \times -1) = -0.25L0\). This effect would then be added to the effects of all 99 other possible paths from \(L0\) to \(T0\) (I think it would probably make sense in the end to restrict values of edges to \(0 \leq e_{i} \leq 1\); we could also fix some edge values to be zero, perhaps giving a more realistic network structure).

This looks like a mess, but that’s kind of the point. Having many paths from loci to traits should give us the freedom to allow genetic covariance to evolve with some restrictions. As with the example above, we can have values for the 250 edges that form the paths from loci to traits, but the hidden layers give us more flexibility in setting evolutionary constraints.

We could, for example, allow e1 to e250 above to be represented by potentially changing values in an individual’s genome (through mutation, selection, and drift). Given this, selection across all L0-L9 and e1-e250 would almost certainly rapidly increase fitness of all traits. But we could also, e.g., block mutation from happening at any of the hidden layers after initialising, e.g., e1-e200 to random values (i.e., alleles would have polygenic effects that potentially lead to fitness tradeoffs, but only parts or proportions of the genetic architecture that we want to evolve actually could). Or we could change the initial genetic variation or mutation rates of edges relative to loci (e.g., make the genetic architecture itself evolve more or less rapidly). We could then ask several potentially interesting questions by controlling the extent to which genetic covariances can evolve and how processes such as drift might affect loci that underly traits versus loci that underly covariance among traits.

To initialise a desired genetic covariance matrix, an evolutionary algorithm could be used. Basically, this would iteratively tweak the hidden layer edge values and check for a better or worse fit to the desired trait covariance matrix. The advantage of this would be that we could tell a function to build a genetic architecture that confroms to a covariance matrix without forcing it to specify how (i.e., the genetic architecture would be ‘found’ rather than imposed a priori, reducing the number of assumptions that we need to make).

Alternatively, I think it might be reasonable to introduce selection on traits by modelling a process of sequential invasion. That is, initialise all traits at some random value, but don’t immediately impose selection. Hence, the traits and the genetic architecture underlying them could be random and selectively neutral for a burn-in period before introducing some sort of selective pressure (e.g., after 100 generations of burn-in, feeding efficiency gets turned on such that there is selection on feeding efficiency for one crop, \(T0\)). One by one, new selection pressures could arise. This could avoid having to make the assumption that the genetic architecture evolves as a consequence of early simultaneous selection for multiple traits – it’s more realistic to assume that some traits arise and evolve first (e.g., the ability to feed on one crop), then a new selective pressure arise after an evolved genome and genetic architecture already exists.

Update: 14 MAY 2019

Plans for the full helicoverpa R package

Given successful funding of the ENDORSE project, these notes will now continue in development of work package C1c of the project. The following very rough timeline is suggested.

General software planning (language use, file organisation): SEP 2019
Model planning (data structures, flow chart): AUG 2019
Discussions with colleagues, revision, final planning: OCT 2019
Ethical approval for software development: NOV 2019
Coding and R package development: JAN-JUN 2020
Code testing and simulations: JUL-SEP 2020
Submission to rOpenSci: OCT 2020
Publish R package on CRAN and GitHub: NOV 2020
Final simulations and write-up: JAN-MAR 2021

This timeline is subject to revision, but the initial steps are more detailed planning for the software and model, with input from colleagues across all work packages.

Update: 21 SEP 2018

Success with the graphical display

While I did not update this journal during its development very much, the graphical display looks really good, and received positive feedback from colleagues in Brazil.

Simulations for Phase II

While the toy model of the graphical display demonstrated the key points, a couple more functions are needed to do some initial simulations to get results to put in Phase II. First, new movement rules are needed for more realistic dispersal of Helicoverpa among cells. This is done now with the toy_pois_move_pest function. In the function below, for each pest, movement in a time step proceeds by (1) selecting how many jumps that the pest gets by sampling from a poisson distribution with mean dist, then moving up to dist cells away in any direction during each jump (uniform probability of all cells within dist of the original cell).

# Simulate Poisson movement of toys -- a nasty loop here -- faster in C
toy_pois_move_pest <- function(PEST, LAND, dist = 1){
    pest_n   <- dim(PEST)[1];
    xlim     <- dim(LAND)[2];
    ylim     <- dim(LAND)[1];
    distance <- rpois(n = pest_n, lambda = dist);
    for(pest in 1:pest_n){
        move <- distance[pest];
        while(move > 0){
            move_x  <- sample(x = -move:move, size = 1); 
            move_y  <- sample(x = -move:move, size = 1);
            PEST[pest, 3] <- PEST[pest, 3] + move_x;
            PEST[pest, 4] <- PEST[pest, 4] + move_y;
            move    <- move - 1;
        }
        while(PEST[pest, 3] < 1){
            PEST[pest, 3] <- PEST[pest, 3] + xlim;
        }
        while(PEST[pest, 3] > xlim){
            PEST[pest, 3] <- PEST[pest, 3] - xlim;
        }
        while(PEST[pest, 4] < 1){
            PEST[pest, 4] <- PEST[pest, 4] + ylim;
        }
        while(PEST[pest, 4] > ylim){
            PEST[pest, 4] <- PEST[pest, 4] - ylim;
        }
    }
    return(PEST);
}

We can check out the distribution of distance away from the original cell when the above function is implemented.

LAND     <- toy_initialise_land(xdim = 50, ydim = 50, pathogens = 1, crops = 1);
PEST_1   <- toy_initialise_pest(LAND, N = 10000);
PEST_2   <- toy_pois_move_pest(PEST_1, LAND, dist = 5);
diffx_1  <- (PEST_1[,3] - PEST_2[,3])^2;
diffx_2  <- (PEST_1[,3] - PEST_2[,3] + 50)^2; # Account for the torus
diffx_3  <- (PEST_1[,3] - PEST_2[,3] - 50)^2;
diffy_1  <- (PEST_1[,4] - PEST_2[,4])^2;
diffy_2  <- (PEST_1[,4] - PEST_2[,4] + 50)^2; # Account for the torus
diffy_3  <- (PEST_1[,4] - PEST_2[,4] - 50)^2;
diffx    <- rep(x = 0, times = 10000);
diffy    <- rep(x = 0, times = 10000); 
for(i in 1:10000){ # Assuming no one goes too far
    diffx[i] <- min(c(diffx_1[i], diffx_2[i], diffx_3[i]));
    diffy[i] <- min(c(diffy_1[i], diffy_2[i], diffy_3[i]));
}
dist_mv1 <- sqrt(diffx + diffy);
par(mar = c(5, 5, 2, 1));
hist(dist_mv1, main = "", col = "grey", cex.lab = 1.5, cex.axis = 1.5, 
     xlab = "Total distance travelled (km)", ylab = "Count", 
     ylim = c(0, 2500), xlim = c(0, 50));

The next part is to initialise the land in blocks so that we can control how big of an area is deadicated toward a paricular fungal biopesticide and crop area. We want to be able to control how many blocks are put down over the landscape. The code below does this, albeit in not the most efficient way.

toy_block_land <- function(xdim, ydim, pathogens, crops, block_len){
    cells     <- xdim * ydim;
    cell_left <- cells %% (block_len * block_len);
    if(cell_left > 0){
        stop("Landscape cannot be divided into squares of size 'block_len'");
    }
    LAND      <- array(data = 0, dim = c(xdim, ydim, 3));
    tx0       <- 1;
    tx1       <- block_len;
    ty0       <- 1;
    ty1       <- block_len;
    block_num <- 1;
    while(tx0 <= xdim & ty0 <= ydim){
        LAND[tx0:tx1, ty0:ty1, 1] <- block_num;
        tx0                       <- tx1 + 1;
        tx1                       <- tx1 + block_len;
        if(tx0 > xdim){
            tx0 <- 0;
            tx1 <- block_len;
            ty0 <- ty1 + 1;
            ty1 <- ty1 + block_len;
        }
        block_num <- block_num + 1;
    }
    block_num  <- block_num - 1;
    for(block in 1:block_num){
        rand_path <- sample(x = 1:pathogens, size = 1);
        rand_crop <- sample(x = 1:crops, size = 1);
        for(xx in 1:xdim){
            for(yy in 1:ydim){
                if(LAND[xx, yy, 1] == block){
                    LAND[xx, yy, 2] <- rand_path;
                    LAND[xx, yy, 3] <- rand_crop;
                }
            }
        }
    }
    return(LAND);
}

One reason I don’t mind all of the loops in the toy model is that they will need to be used when this is translated to C – there’s not really a way around it (I don’t think), so there’s no reason to worry too much about making everything elegant and avoiding R loops. It does take a while for landscapes to be initialised in the toy model, but not nearly as long as it takes most generations to run (at least, not ones with lots of pests). Here are some ways of using toy_block_land to make landscapes. I’ll show fungal biopesticide blocks being put down randomly below

Block length 1: Each cell is its own block, corresponding to applying fungal biopesticides in each 1 km by 1 km square on the landscape.

LAND1 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
                        block_len = 1);
par(mar = c(1, 1, 1, 1));
image(LAND1[,,2], xaxt = "n", yaxt = "n");

Block length 5: Each cell is its own block, corresponding to applying fungal biopesticides in each 2 km by 2 km square on the landscape.

LAND5 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
                        block_len = 5);
par(mar = c(1, 1, 1, 1));
image(LAND5[,,2], xaxt = "n", yaxt = "n");

Block length 10: Each cell is its own block, corresponding to applying fungal biopesticides in each 10 km by 10 km square on the landscape.

LAND10 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
                        block_len = 10);
par(mar = c(1, 1, 1, 1));
image(LAND10[,,2], xaxt = "n", yaxt = "n");

Block length 25: Each cell is its own block, corresponding to applying fungal biopesticides in each 2 km by 2 km square on the landscape.

LAND25 <- toy_block_land(xdim = 100, ydim = 100, pathogens = 3, crops = 3,
                        block_len = 25);
par(mar = c(1, 1, 1, 1));
image(LAND25[,,2], xaxt = "n", yaxt = "n");

Of course, for even the Phase II toy simulations, we’ll want a bigger landscape. I’ll try to do this for 1000 km by 1000 km and see how it goes.

Update: 27 AUG 2018

Update to transferring to JSON

The function below has now been modified, along with the outer function for simulating resistance.

toy_simulate_resistance <- function(generations = 20,       # Generations to sim
                                    xdim = 2,               # Land dimension 1
                                    ydim = 2,               # Land dimension 2
                                    pathogens = 1,          # Pathogen strains
                                    crops = 1,              # Crop species
                                    path_alleles = 3,       # Pathogen alleles
                                    crop_alleles = 3,       # Crop alleles
                                    pest_init = 2000,       # Initial pests 
                                    crop_rotate = "static", # Crops rotated
                                    path_rotate = "static", # Pathogens rotated
                                    pest_move_pr = 0.1,     # Pest movement
                                    pest_move_dist = 1,     # Pest move distance
                                    fecundity = 8,          # Offspring per fem
                                    cell_K = 2000           # K per cell
                                    ){
    
    if(pest_move_dist > xdim & pest_move_dist > ydim){
        pest_move_dist <- max(c(xdim, ydim)); # Avoids error
    }
    # Start initialising the landscape and pests
    LAND <- toy_initialise_land(xdim  = xdim, ydim = ydim, 
                                pathogens = pathogens, crops = crops);
    PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = path_alleles, 
                                c_al = crop_alleles);
    # Start the generations
    PEST_DATA   <- NULL;
    LAND_DATA   <- NULL;
    gen         <- 1;
    while(gen < generations){
        LAND <- toy_set_crops(LAND, crops, crop_rotate);                        
        LAND <- toy_set_paths(LAND, pathogens, path_rotate);                    
        PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);        
        PEST <- toy_feed_pest(PEST, LAND);                                      
        if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
            break;
        }
        PEST <- toy_kill_pest(PEST, LAND);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST <- toy_reproduce_pest(PEST, LAND, path_alleles, crop_alleles, 
                                   fecundity, cell_K);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST_DATA[[gen]] <- PEST;
        LAND_DATA[[gen]] <- LAND;
        gen <- gen + 1;
    }
    return(list(PEST_DATA = PEST_DATA, LAND_DATA = LAND_DATA));
}

results_to_json <- function(pest, land, printit = TRUE, filename = "sim.json"){
    if("package:jsonlite" %in% search() == FALSE){
        stop("Error: Need to load the R package 'jsonlite'")
    }
    inds   <- dim(pest)[1];
    p_geno <- rep(x = 0, times = inds);
    c_geno <- rep(x = 0, times = inds);
    path   <- rep(x = 0, times = inds);
    crop   <- rep(x = 0, times = inds);
    r_path <- rep(x = 0, times = inds);
    r_crop <- rep(x = 0, times = inds);
    for(i in 1:inds){ # Doing this a bit lazily, without refactoring the rest
        p_geno[i] <- as.numeric( paste(pest[i,5], pest[i,6], sep = ""));
        c_geno[i] <- as.numeric( paste(pest[i,7], pest[i,8], sep = ""));
        xloc      <- pest[i, 3];
        yloc      <- pest[i, 4];
        path[i]   <- land[xloc, yloc, 2];
        crop[i]   <- land[xloc, yloc, 3];
        if(pest[i,5] == path[i] | pest[i,5] == path[i]){
            r_path <- 1;
        }
        if(pest[i,7] == crop[i] | pest[i,8] == crop[i]){
            r_crop <- 1;
        }
    }
    data      <- matrix(data = 0, nrow = inds, ncol = 10);
    data[,1]  <- pest[,1];
    data[,2]  <- pest[,2];
    data[,3]  <- pest[,3];
    data[,4]  <- pest[,4];
    data[,5]  <- path;
    data[,6]  <- crop;
    data[,7]  <- p_geno;
    data[,8]  <- c_geno;
    data[,9]  <- r_path;
    data[,10] <- r_crop;
    colnames(data) <- c("ID", "sex", "xloc", "yloc", "path", "crop", 
                        "p_geno", "c_geno", "resist_path", "eat_crop");
    modsim <- list( traits = colnames(data), 
                    values = unname(apply(data, 1, 
                                          function(x) as.data.frame(t(x))))
    );
    sim_json <- toJSON(list(traits = names(modsim), values = modsim), 
                       pretty = TRUE);
    if(printit == TRUE){
        write(sim_json, filename);
    }
    return(sim_json);
}

We can run things the same way to generate the output. Note that the output from toy_simulate_resistance has changed, so now we need to read the list into results_to_json. The list contains the elements PEST_DATA and LAND_DATA.

library(jsonlite);

## 
## Attaching package: 'jsonlite'

## The following object is masked from 'package:shiny':
## 
##     validate

sim  <- toy_simulate_resistance(pathogens = 3, crops = 3);
json <- results_to_json(pest = sim$PEST_DATA[[19]], land = sim$LAND_DATA[[19]],
                        filename = "toy_model/data/sample_sim.json");

The output is here.

Transferring simulation results to json

I’ve written a short bit of code to link the simulation results to the code. The input for this is a single generation of the model (this could be easily changed). So if we run the toy_simulate_resistance function as before, we get a list with elements being each generation.

sim <- toy_simulate_resistance(crops = 3, pathogens = 3);

We can look at just the last generation below.

head(sim[[19]]);

##        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 139435    0    1    1    1    1    3    3
## [2,] 139436    1    1    1    1    1    3    3
## [3,] 139437    1    1    1    1    1    3    3
## [4,] 139438    0    1    1    1    1    3    3
## [5,] 139439    0    1    1    1    1    3    3
## [6,] 139440    1    1    1    1    1    3    3

The code below sticks all of this into the json format (for the one generation), prints the json output to a file if printit == TRUE, and returns the json output.

results_to_json <- function(sim, printit = TRUE, filename = "sim.json"){
    if("package:jsonlite" %in% search() == FALSE){
        stop("Error: Need to load the R package 'jsonlite'")
    }
    colnames(sim) <- c("ID", "sex", "xloc", "yloc", "p_al_1", "p_al_2", 
                       "c_al_1", "c_al_2");
    modsim <- list( traits = colnames(sim), 
                    values = unname(apply(sim, 1, 
                                          function(x) as.data.frame(t(x))))
              );
    sim_json <- toJSON(list(traits = names(modsim), values = modsim), 
                   pretty = TRUE);
    if(printit == TRUE){
        write(sim_json, filename);
    }
    return(sim_json);
}

Using the function below to print and return the json. This relies on the jsonlite package, which must be installed and loaded.

library(jsonlite);
sim_json <- results_to_json(sim = sim[[19]], 
                            filename = "toy_model/data/sample_sim.json");

The sim_json output should be easy to modify as necessary. I’ve not added it here because it’s much too large for the notebook, but it can be accessed on the dev branch in the file sample_sim.json.

Update: 14 AUG 2018

More playing around with the toy model

Following up on yesterday, I have tweaked two functions to allow us to manipulate the number of alleles underlying pathogen resistance and crop feeding. In effect, this allows us to see how selection can sweep through to nearly fix an allele for resistance to a particular pathogen or for feeding on a particular crop (since only one allele is needed, other neutral alleles can hide in the heterozygotes). But, when crops are rotated, or pathogens are rotated, we have higher genetic diversity – note that we are making an implicit assumption of a strong negative genetic correlation; pests can only have two alleles for resistance, e.g., and if there are three pathogens, then all pests must necessarily be susceptible to 1-2 pathogens.

# ==============================================================================
# This is just a toy version of the model, for the presentation
# ==============================================================================
# Note that I'm adding 'toy' at the start of each function; this is just in case
# we actually want to keep some of these functions in the eventual R package, 
# but also don't want to worry about not being able to name functions like
# "initialise_land" the same here as elsewhere in the eventual bigger model.
# ==============================================================================
toy_simulate_resistance <- function(generations = 20,       # Generations to sim
                                    xdim = 2,               # Land dimension 1
                                    ydim = 2,               # Land dimension 2
                                    pathogens = 1,          # Pathogen strains
                                    crops = 1,              # Crop species
                                    path_alleles = 3,       # Pathogen alleles
                                    crop_alleles = 3,       # Crop alleles
                                    pest_init = 2000,       # Initial pests 
                                    crop_rotate = "static", # Crops rotated
                                    path_rotate = "static", # Pathogens rotated
                                    pest_move_pr = 0.1,     # Pest movement
                                    pest_move_dist = 1,     # Pest move distance
                                    fecundity = 8,          # Offspring per fem
                                    cell_K = 2000           # K per cell
                                    ){
    
    if(pest_move_dist > xdim & pest_move_dist > ydim){
        pest_move_dist <- max(c(xdim, ydim)); # Avoids error
    }
    # Start initialising the landscape and pests
    LAND <- toy_initialise_land(xdim  = xdim, ydim = ydim, 
                                pathogens = pathogens, crops = crops);
    PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = path_alleles, 
                                c_al = crop_alleles);
    # Start the generations
    PEST_DATA   <- NULL;
    gen         <- 1;
    while(gen < generations){
        LAND <- toy_set_crops(LAND, crops, crop_rotate);                        
        LAND <- toy_set_paths(LAND, pathogens, path_rotate);                    
        PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);        
        PEST <- toy_feed_pest(PEST, LAND);                                      
        if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
            break;
        }
        PEST <- toy_kill_pest(PEST, LAND);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST <- toy_reproduce_pest(PEST, LAND, path_alleles, crop_alleles, 
                                   fecundity, cell_K);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST_DATA[[gen]] <- PEST;
        gen <- gen + 1;
    }
    return(PEST_DATA);
}

summarise_pest_data <- function(PEST_DATA){
    # Density estimates
    den_list  <- unlist(lapply(PEST_DATA, dim));
    den_vect  <- den_list[c(TRUE, FALSE)];
    gens      <- length(PEST_DATA);
    p_alleles <- max(c(PEST_DATA[[1]][,5], PEST_DATA[[1]][,6]));
    c_alleles <- max(c(PEST_DATA[[1]][,7], PEST_DATA[[1]][,8])); 
    p_tabl    <- matrix(data = 0, nrow = gens, ncol = p_alleles);
    c_tabl    <- matrix(data = 0, nrow = gens, ncol = c_alleles);
    # Allele frequencies
    for(gen in 1:length(PEST_DATA)){
        total_p_alleles <- length(PEST_DATA[[gen]][,5:6]);
        total_c_alleles <- length(PEST_DATA[[gen]][,7:8]);
        for(allele in 1:p_alleles){
            allele_count         <- sum(PEST_DATA[[gen]][,5:6] == allele);
            p_tabl[gen, allele]  <- allele_count / total_p_alleles
        }
        for(allele in 1:c_alleles){
            allele_count         <- sum(PEST_DATA[[gen]][,7:8] == allele);
            c_tabl[gen, allele]  <- allele_count / total_c_alleles
        }
    }
    return(list(densities = den_vect, pathogen_fr = p_tabl, crop_fr = c_tabl));
}

I’m going to run the same analysis as yesterday, but now show how allele frequencies change over generations. I’ve not broken this down by landscape cell yet, but this will be easy to do (just let me know the format the data would be best presented in). In each case, now there will be 3 alleles underlying pathogen resistance, and 3 alleles underlying crop feeding.

sim1 <- toy_simulate_resistance(pathogens = 1, crops = 1, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "static", path_rotate = "static");
dat1 <- summarise_pest_data(sim1);
plot(x = 1:length(dat1$densities), y = dat1$densities, type = "b", pch = 20, 
     lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
     xlab = "Generation", ylab = "Pest density");

plot(x = 1:length(dat1$pathogen_fr[,1]), y = dat1$pathogen_fr[,1], type = "b", 
     pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1), 
     xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat1$pathogen_fr[,2]), y = dat1$pathogen_fr[,2], type = "b", 
     pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat1$pathogen_fr[,3]), y = dat1$pathogen_fr[,3], type = "b", 
     pch = 20, lwd = 2, col = "blue");

Next, we can see what happens when we instead have 3 pathogens under otherwise identical conditions. In each generation, one of three pathogens is randomly applied to each landscape cell.

sim2 <- toy_simulate_resistance(pathogens = 3, crops = 1, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "static", path_rotate = "random");
dat2 <- summarise_pest_data(sim2);
plot(x = 1:length(dat2$densities), y = dat2$densities, type = "b", pch = 20, 
     lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
     xlab = "Generation", ylab = "Pest density");

plot(x = 1:length(dat2$pathogen_fr[,1]), y = dat2$pathogen_fr[,1], type = "b", 
     pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1), 
     xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat2$pathogen_fr[,2]), y = dat2$pathogen_fr[,2], type = "b", 
     pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat2$pathogen_fr[,3]), y = dat2$pathogen_fr[,3], type = "b", 
     pch = 20, lwd = 2, col = "blue");

Next, we can see what happens when we instead have 3 pathogens and 3 crops under otherwise identical conditions. In each generation, one of three pathogens and one of three crops are randomly applied to each landscape cell.

sim3 <- toy_simulate_resistance(pathogens = 3, crops = 3, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "random", path_rotate = "random");

## [1] "Extinction occurred in generation 12"

dat3 <- summarise_pest_data(sim3);
plot(x = 1:length(dat3$densities), y = dat3$densities, type = "b", pch = 20, 
     lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19),
     xlab = "Generation", ylab = "Pest density");

plot(x = 1:length(dat3$pathogen_fr[,1]), y = dat3$pathogen_fr[,1], type = "b", 
     pch = 20, lwd = 2, cex.lab = 1.5, cex.axis = 1.5, ylim = c(0, 1), 
     xlim = c(1, 19), xlab = "Generation", ylab = "Allele frequency");
points(x = 1:length(dat3$pathogen_fr[,2]), y = dat3$pathogen_fr[,2], type = "b", 
     pch = 20, lwd = 2, col = "red");
points(x = 1:length(dat3$pathogen_fr[,3]), y = dat3$pathogen_fr[,3], type = "b", 
     pch = 20, lwd = 2, col = "blue");

As is evident, pathogen and crop rotation decreases pest density and increases the diversity of alleles underlying pathogen resistance and crop feeding.

Update: 13 AUG 2018

Visual stuff looks good!

Rose has created a very nice visual display of the landscape, with sliders allowing for multiple crop species and pathogen strains.

Toy model prototype

I have managed to complete a toy model today, which should be sufficient to produce the simulated data needed to get the key ideas across. I’m going to dump the code below, which is located in the toy_model.R data file, not yet pushed on to master. We’ll need to merge the interface branch with the dev branch soon, which should be easy enough. The whole thing is in R, and includes all of the basic steps from the code structure mentioned on 30 JUL 2018, including mutation, recombination, dispersal, and pathogen and crop rotation (random or sequential). The output isn’t in that helpful of a form yet, but it will get there soon. To skip the many lines of code from the toy_model.R file, go here.

To quickly summarise, there are 12 parameters we can play with in the toy simulation model:

Number of generations to simulate (generations)
X dimension (longitude), i.e., cell number, on the landscape (xdim)
Y dimension (latitude), i.e., cell number, on the landscape (ydim)
Total number of pathogen strains that can be applied (pathogens)
Total number of crops that can be planted (crops)
Initial number of pests (pest_init)
How to rotate crops on the landscape (crop_rotate: where crop_rotate = "static" means don’t rotate between generations, crop_rotate = "rotate" means sequence through each crop one at a time over generations, and crop_rotate = "random" means apply a random crop in each generation – all done independently for each landscape cell)
How to rotate pathogen application on the landscape (path_rotate: where path_rotate = "static" means don’t rotate between generations, path_rotate = "rotate" means sequence through each pathogen one at a time over generations, and path_rotate = "random" means apply a random pathogen in each generation – all done independently for each landscape cell)
Probability that a pest will move to a new cell in the generation (pest_move_pr)
Distance a pest will move up to in any direction from its natal cell (pest_move_dist; note, the landscape is a torus, and there is therefore no edge – pests that move off of one side end up on the other)
Number of offspring each female pest produces (fecundity)
Carrying capacity of pests per cell (cell_K)

Default parameter values are shown below. Note that the model is an individual-based simulation with an extremely simple diploid genetic architecture with two loci (one for pathogens, and one for crops). If there are \(N\) pathogens, then there are \(N\) alleles for pathogen resistance, and pests must have at least one allele to resist (else they die). Similarly, if there are \(N\) crops, then there are \(N\) alleles for being able to eat crops, and pests must have at least one to eat (else they die).

toy_simulate_resistance <- function(generations = 20,       # Generations to sim
                                    xdim = 2,               # Land dimension 1
                                    ydim = 2,               # Land dimension 2
                                    pathogens = 1,          # Pathogen strains
                                    crops = 1,              # Crop species
                                    pest_init = 2000,       # Initial pests 
                                    crop_rotate = "static", # Crops rotated
                                    path_rotate = "static", # Pathogens rotated
                                    pest_move_pr = 0.1,     # Pest movement
                                    pest_move_dist = 1,     # Pest move distance
                                    fecundity = 8,          # Offspring per fem
                                    cell_K = 2000           # K per cell
                                    ){
    
    if(pest_move_dist > xdim & pest_move_dist > ydim){
        pest_move_dist <- max(c(xdim, ydim)); # Avoids error
    }
    # Start initialising the landscape and pests
    LAND <- toy_initialise_land(xdim  = xdim, ydim = ydim, 
                                pathogens = pathogens, crops = crops);
    PEST <- toy_initialise_pest(LAND, N = pest_init, p_al = pathogens, 
                                c_al = crops);
    # Start the generations
    PEST_DATA   <- NULL;
    gen         <- 1;
    while(gen < generations){
        LAND <- toy_set_crops(LAND, crops, crop_rotate);                        
        LAND <- toy_set_paths(LAND, pathogens, path_rotate);                    
        PEST <- toy_move_pest(PEST, LAND, pest_move_pr, pest_move_dist);        
        PEST <- toy_feed_pest(PEST, LAND);                                      
        if(toy_check_extinction(PEST, gen) == TRUE){ # Hate these if breaks here
            break;
        }
        PEST <- toy_kill_pest(PEST, LAND);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST <- toy_reproduce_pest(PEST, LAND, pathogens, crops, fecundity, 
                                   cell_K);
        if(toy_check_extinction(PEST, gen) == TRUE){
            break;
        }
        PEST_DATA[[gen]] <- PEST;
        gen <- gen + 1;
    }
    return(PEST_DATA);
}

summarise_pest_data <- function(PEST_DATA){
    # Density estimates
    den_list <- unlist(lapply(PEST_DATA, dim));
    den_list <- den_list[c(TRUE, FALSE)];
    return(den_list);
}

toy_check_extinction <- function(PEST, gen){
    if(is.vector(PEST) == TRUE){
        print(paste("Extinction occurred in generation", gen));
        return(TRUE);
    }
    if(dim(PEST)[1] < 4){
        print(paste("Extinction occurred in generation", gen));
        return(TRUE);
    }
    if(sum(PEST[,2] == 0) < 1 | sum(PEST[,2] == 1) < 1){
        print(paste("Extinction occurred in generation", gen));
        return(TRUE);
    }
    return(FALSE);
}

# Initialise the landscape
toy_initialise_land <- function(xdim = 2, ydim = 2, pathogens = 1, crops = 1){
    LAND      <- array(data = 0, dim = c(xdim, ydim, 3));
    p_values  <- sample(x = 1:pathogens, size = xdim * ydim, replace = TRUE);
    p_layer   <- matrix(data = p_values, ncol = xdim, nrow = ydim);
    c_values  <- sample(x = 1:crops, size = xdim * ydim, replace = TRUE);
    c_layer   <- matrix(data = c_values, ncol = xdim, nrow = ydim);
    LAND[,,2] <- p_layer; # Just occurred to me that we might want a layer for
    LAND[,,3] <- c_layer; # carrying capacity of the cell in LAND later
    return(LAND);
}

# Initialise some *very* simple pests. Each allele is just going to map one to
# one for whether a crop can be attacked or a pathogen resisted. I am not even
# going to add traits affecting dispersal in the toy version -- this can be
# uniform for all individuals
toy_initialise_pest <- function(LAND, N = 10, p_al = 1, c_al = 1){
    xdim     <- dim(LAND)[1]; # This is needed to put the pests on some
    ydim     <- dim(LAND)[2]; # landscape cell
    PEST     <- matrix(data = 0, nrow = N, ncol = 8);
    PEST[,1] <- 1:N;                                           # ID
    PEST[,2] <- sample(x = c(0, 1), size = N, replace = TRUE); # Sex
    PEST[,3] <- sample(x = 1:xdim,  size = N, replace = TRUE); # x-location
    PEST[,4] <- sample(x = 1:ydim,  size = N, replace = TRUE); # y-location
    PEST[,5] <- sample(x = 1:p_al,  size = N, replace = TRUE); # p allele 1
    PEST[,6] <- sample(x = 1:p_al,  size = N, replace = TRUE); # p allele 2
    PEST[,7] <- sample(x = 1:c_al,  size = N, replace = TRUE); # c allele 1
    PEST[,8] <- sample(x = 1:c_al,  size = N, replace = TRUE); # c allele 2
    return(PEST);
}

# Just going to make this a simple function at first, randomly changing crops
toy_set_crops <- function(LAND, crops = 1, type = "static"){
    if(crops == 1 | type == "static"){
        return(LAND);
    }
    if(type == "rotate"){
        old_crop                   <- LAND[,,3];
        new_crop                   <- old_crop + 1;
        new_crop[new_crop > crops] <- 1;
        LAND[,,3]                  <- new_crop;
    }else{ # Else it just randomises -- can think about fancier ways later
        xdim      <- dim(LAND)[1];
        ydim      <- dim(LAND)[2];
        new_cval  <- sample(x = 1:crops, size = xdim * ydim, replace = TRUE);
        new_crop  <- matrix(data = new_cval, nrow = xdim, ncol = ydim);
        LAND[,,3] <- new_crop;
    }
    return(LAND);
}

# Ditto here -- just a simple function changing pathogens
toy_set_paths <- function(LAND, paths = 1, type = "static"){
    if(paths == 1 | type == "static"){
        return(LAND);
    }
    if(type == "rotate"){
        old_path                   <- LAND[,,3];
        new_path                   <- old_path + 1;
        new_path[new_path > paths] <- 1;
        LAND[,,2]                  <- new_path;
    }else{ # Else it just randomises -- can think about fancier ways later
        xdim      <- dim(LAND)[1];
        ydim      <- dim(LAND)[2];
        new_pval  <- sample(x = 1:paths, size = xdim * ydim, replace = TRUE);
        new_path  <- matrix(data = new_pval, nrow = xdim, ncol = ydim);
        LAND[,,2] <- new_path;
    }
    return(LAND);
}

# Just a function to move pests at a given value, randomly on the landscape. The
# `prob` is just probability of moving in a time step, while `dist` is just the
# maximum distance moved in cells, in any direction.
toy_move_pest <- function(PEST, LAND, prob = 0.1, dist = 1){
    xdim                     <- dim(LAND)[1];
    ydim                     <- dim(LAND)[2];
    pests                    <- dim(PEST)[1];
    to_move                  <- rbinom(n = pests, size = 1, pr = prob);
    move_x                   <- sample(x = -dist:dist, size = pests, 
                                       replace = TRUE);
    move_y                   <- sample(x = -dist:dist, size = pests, 
                                       replace = TRUE);
    PEST[to_move == 1, 3]    <- PEST[to_move == 1, 3] + move_x[to_move == 1];
    PEST[to_move == 1, 4]    <- PEST[to_move == 1, 4] + move_y[to_move == 1];
    # The ifs below make a torus landscape so that pests don't move off of it
    # In C, this will be done with a loop that looks cleaner; the below is just
    # avoiding using a loop in R, though it would probably be fine.
    if(length(PEST[PEST[,3] < 1, 3]) > 0){ 
        PEST[PEST[,3] < 1, 3] <- PEST[PEST[,3] < 1, 3] + xdim;
    }
    if(length(PEST[PEST[,3] > xdim, 3]) > 0){
        PEST[PEST[,3] > xdim, 3] <- PEST[PEST[,3] > xdim, 3] - xdim;
    }
    if(length(PEST[PEST[,4] < 1, 4]) > 0){
        PEST[PEST[,4] < 1, 4] <- PEST[PEST[,4] < 1, 4] + ydim;
    }
    if(length(PEST[PEST[,4] > ydim, 4]) > 0){
        PEST[PEST[,4] > ydim, 4] <- PEST[PEST[,4] > ydim, 4] - ydim;
    }
    return(PEST);
}

# Need to now feed the pests and remove those that don't feed (crop unavailable)
toy_feed_pest <- function(PEST, LAND){
    # I'm just going to use a loop here, else matching to LAND cells is rough
    pests <- dim(PEST)[1];
    eaten <- rep(x = 0, times = pests); 
    for(i in 1:pests){
        x_loc <- PEST[i, 3];
        y_loc <- PEST[i, 4];
        food  <- LAND[x_loc, y_loc, 3];
        if(PEST[i, 7] == food | PEST[i, 8] == food){
            eaten[i] <- 1;
        }
    }
    PEST <- PEST[eaten == 1,]; # You don't eat, you don't live
    return(PEST);
}

# Then need to have pests resist pathogens, kill those that cannot
toy_kill_pest <- function(PEST, LAND){
    # I'm just going to use a loop here, else matching to LAND cells is rough
    pests    <- dim(PEST)[1];
    survived <- rep(x = 0, times = pests);
    for(i in 1:pests){
        x_loc <- PEST[i, 3];
        y_loc <- PEST[i, 4];
        patho <- LAND[x_loc, y_loc, 2];
        if(PEST[i, 5] == patho | PEST[i, 6] == patho){
            survived[i] <- 1;
        }
    }
    PEST <- PEST[survived == 1,]; # You don't eat, you don't live
    return(PEST);
}

# Then, reproduce pests that are left
toy_reproduce_pest <- function(PEST, LAND, pa, cr, births = 2, K = 100){
    x_dim           <- dim(LAND)[1];
    y_dim           <- dim(LAND)[2];
    offspring       <- NULL;
    total_offspring <- 0; # This and the below are just to avoid an rbind(),
    cell            <- 1; # which is a massive memory sink
    lst_ID          <- max(PEST[,1]);
    for(xloc in 1:x_dim){ # Double loop is just much cleaner here
        for(yloc in 1:y_dim){
            locals     <- which(PEST[,3] == xloc & PEST[,4] == yloc);
            local_offs <- NULL;
            if( length(locals) > 1 ){
                local_offs <- toy_breed_locals(PEST, locals, births, K, lst_ID);
            }
            if(length(local_offs) > 0){
                lst_ID            <- lst_ID + dim(local_offs)[1];
                total_offspring   <- total_offspring + dim(local_offs)[1];
                offspring[[cell]] <- local_offs;
            }else{
                offspring[[cell]] <- NULL;
            }
            cell              <- cell + 1;
        }
    }
    offspring <- build_new_pest(offspring, total_offspring, pa, cr);
    return(offspring);
}

# Need to recombine the genomes correctly in the offspring
toy_breed_locals <- function(PEST, locals, births, K, last_ID){
    loc_PEST <- PEST[locals,]; # Need two of each sex (allee effect)
    if(sum(loc_PEST[,2] == 0) < 2 | sum(loc_PEST[,2] == 1) < 2){
        return(NULL);
    }
    females       <- loc_PEST[loc_PEST[,2] == 0,];
    males         <- loc_PEST[loc_PEST[,2] == 1,];
    new_offs      <- dim(females)[1] * floor(births);  
    offspring     <- matrix(data = 0, nrow = new_offs, ncol = 8);     
    offspring[,1] <- (last_ID + 1):(last_ID + new_offs);
    offspring[,2] <- sample(x = c(0, 1), size = new_offs, replace = TRUE);
    offspring[,3] <- loc_PEST[1, 3];
    offspring[,4] <- loc_PEST[1, 4];
    # Below grabs all of the alleles from females and males
    p_fem_alleles <- c(females[,5], females[,6]);
    p_mal_alleles <- c(males[,5], males[,6]);
    c_fem_alleles <- c(females[,7], females[,8]);
    c_mal_alleles <- c(males[,7], males[,8]);
    # Now add them to the offspring randomly, one from female and one from male
    for(i in 1:new_offs){
        if(runif(n = 1) < 0.5){
            offspring[i, 5] <- sample(x = p_fem_alleles, size = 1);
            offspring[i, 6] <- sample(x = p_mal_alleles, size = 1);
        }else{
            offspring[i, 5] <- sample(x = p_mal_alleles, size = 1);
            offspring[i, 6] <- sample(x = p_fem_alleles, size = 1);            
        }
        if(runif(n = 1) < 0.5){
            offspring[i, 7] <- sample(x = c_fem_alleles, size = 1);
            offspring[i, 8] <- sample(x = c_mal_alleles, size = 1);
        }else{
            offspring[i, 7] <- sample(x = c_mal_alleles, size = 1);
            offspring[i, 8] <- sample(x = c_fem_alleles, size = 1);            
        }
    }
    if(dim(offspring)[1] > K){
        offspring <- offspring[1:K,];
    }
    return(offspring);
}

# Merge the different layers into a new pest array
build_new_pest <- function(offspring, total_offspring, pa, cr, mutation = 0.01){
    new_PEST   <- matrix(data = 0, nrow = total_offspring, ncol = 8);
    start_row  <- 1; # Again, avoids an rbind memory issue
    for(cell in 1:length(offspring)){
        if(length(offspring[[cell]]) > 0){
            cell_offs       <- dim(offspring[[cell]])[1];
            rows            <- start_row:(start_row + cell_offs - 1);
            new_PEST[rows,] <- offspring[[cell]];
            start_row       <- start_row + cell_offs;
        }
    }
    mu5 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
    mu6 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
    mu7 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
    mu8 <- which(rbinom(n = total_offspring, size = 1, pr = mutation) == 1);
    new_PEST[mu5, 5] <- sample(x = 1:cr, size = length(mu5), replace = TRUE);
    new_PEST[mu6, 6] <- sample(x = 1:cr, size = length(mu6), replace = TRUE);
    new_PEST[mu7, 7] <- sample(x = 1:pa, size = length(mu7), replace = TRUE);
    new_PEST[mu8, 8] <- sample(x = 1:pa, size = length(mu8), replace = TRUE);
    return(new_PEST);
}

Okay, so what can the above code do? I’ll get some fancy results later, but for now, let’s just look at many simulations across different pathogens and crops to see how it affects pest global density. Below I run and plot the situation with just one pathogen and one crop. Note that when there is only 1 pathogen or crop, set rotation to static.

sim1 <- toy_simulate_resistance(pathogens = 1, crops = 1, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "static", path_rotate = "static");
dat1 <- summarise_pest_data(sim1);
plot(x = 1:length(dat1), y = dat1, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
     cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
     ylab = "Pest density");

Next, we can see what happens when we instead have 3 pathogens under otherwise identical conditions. In each generation, one of three pathogens is randomly applied to each landscape cell.

sim2 <- toy_simulate_resistance(pathogens = 3, crops = 1, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "static", path_rotate = "random");
dat2 <- summarise_pest_data(sim2);
plot(x = 1:length(dat2), y = dat2, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
     cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
     ylab = "Pest density");

sim3 <- toy_simulate_resistance(pathogens = 3, crops = 3, cell_K = 2000, 
                                pest_init = 2000, fecundity = 8, 
                                generations = 20, pest_move_pr = 0.1, 
                                crop_rotate = "random", path_rotate = "random");

## [1] "Extinction occurred in generation 18"

dat3 <- summarise_pest_data(sim3);
plot(x = 1:length(dat3), y = dat3, type = "b", pch = 20, lwd = 2, cex.lab = 1.5,
     cex.axis = 1.5, ylim = c(0, 8000), xlim = c(1, 19), xlab = "Generation",
     ylab = "Pest density");

As is evident, pathogen and crop rotation decreases pest density. More on this later, including diversity results, but this demonstrates a working toy model. As the heterogeneity of pathogens and crops increases, pest resistance seems to go down.

Update: 30 JUL 2018

Some general thoughts about code structure

There are some high-level functions that will be useful to think about before coding; that is, functions that need to do something in sequence during simulation initialisation and then during each time step of the simulation. These functions will call subfunctions that run key routines – I think that all of these function, and the outer function (e.g., simulate_resistance, which runs through multiple time steps to test a particular set of parameter combinations), should be written in C for efficiency. An R function can call simulate_resistance and return the results to R, or an outer C function can also call it and print results to a text file (faster and easier with the computing cluster). The point is to have a single funtion that everything feeds to that does the work by calling multiple subfunctions – this workhorse function simulate_resistance will contain the main loop over \(N\) time_steps.

Prior to the loop, there will be some initialisation of the landscape LAND and the pests PEST. This will include all of the values in each layer LAND based on parameter initialisation. The functions initialise_land and initialise_pest are therefore called once at the start of the simulation; they will need to decide the distribution of landscape values and pests, and the sizes of each. It will not be necessary to initialise the PEST traits, only their genomes (based on allele and loci parameters, dominance, etc.) and non-genetic attributes such as location and ID (sex might or might not be based on genome). Before starting the big loop, the key structures of PEST and LAND need to otherwise be built. Any other key data structures will need to be initialised – I am thinking that the simulate_resistance function, or maybe some small other function, should send a vector of information so that everything can be stored handily, then individual parameter values can be set in the small outer function then compiled into the vector for ease of use.

After initialisation, one big loop will run the simulation over multiple time steps (some pseudocode below).

simulate_resistance(parameter_values){
    error_checks(parameter_values);
    initialise_land(LAND);
    initialise_pest(PEST);
    for(time_step = 0; time_step < time_max; time_step++){
        build_pest_traits(PEST, build_type);
        set_crops(LAND, crop_parameters);
        set_biopesticide(LAND, biopesticide_parameters);
        move_pest(PEST, move_parameters);
        feed_pest(PEST, LAND, feeding_parameters);
        kill_pest(PEST, LAND, death_parameters);
        reproduce_pest(PEST, evolution_parameters);
        collect_PEST_data(PEST);
        collect_LAND_data(LAND);
        summarise_time_step(PEST, LAND);
        status_check(PEST, LAND);
    }
    summarise_simulation();
}

Multiple events will occur in each time step, including (but not limited to) the following: The build_pest_traits() function will simply read in and out the PEST array, building the traits from the pest genome according to some build_type in each generation. The set_crops function will assign LAND cel layer values basedon on the crop parameter values underlying, e.g., spatial heterogeneity; similarly, set_biopesticide will affect LAND cell layer values. Then the move_pest function will cause the pests to move according to some sort of disperal rules set in parameter_values. Note that the landscape and the pests have not interacted in the time step yet (though we can imagine the landscape affecting pest movement rules), and will now do so in feed_pest, where they will affect crops, followed by kill_pest and reproduce_pest, where they will suffer mortality and reproduce as a consequence of their interactions. We might want to think about the order of these operations. After pest reproduction (which will include sex and genetic – but not trait – transmission), then summary data on the pests and landscape need to be collected (could even have an option to print everything out at this point, but it might be a lot of information) – followed by a sort of grand summary of the time step. When the whole simulation has finished, then the results can be printed.

Note also that some error checks and status checks will be useful, since we’ll want to make this function able to run from R. The error checks will just make sure nothing will crash the simulation (some of this might be better programmed in R instead of C), while the status_check might look to see, e.g., if pests have gone extinct, or have some sort of rule for termination of the simulation if something happens.

I suspect that the above structure will need rearranging and rethinking, but it’s a starting point. Subfunctions will be the difficult part, coding efficient ways of building traits from the genome, movement rules, reproduction, etc.

Update: 23 JUL 2018

I have attempted to set this notebook up with Disqus commenting enabled with the universal code options (embedded HTML), which will ideally allow everyone on the project to make comments on the notes.

Data structures

There are two data structures that we need to think about in particular, and these include the pests, which will be represented individually, and the landscape, which will be a large 3D array. The biopesticide will not be modelled discretely, but through a change in landscape values.

Pests

Pests will need to be represented by some sort of data structure. We could create a custom structure in C, but this would get messy if and when we decide to convert the model into an R package so that users can simulate by calling an R function. I really don’t see why pests cannot be represented by a large table, which is how I have always represented individuals in the past, and I already have the code for reading R tables into C. The way that this will work then is to have each individual represented by a different row in the table, with columns corresponding to individual traits. Some traits that will be important include:

Individual ID
Sex (female or male)
Traits 1-T: This includes traits such as dispersal ability, location (x and y), fecundity, and resistance to different biopesticides
Loci 1-L: This will be a diploid system with allele values (\(A^{v}_{i}\)) and dominance coefficients \(A^{h}_{v}\), the notation of which will likely change, but three double values will be used for each haploid loci: 1. allele number (\(i\)), value (\(v\)), and dominance (\(h\)). Since alleles are diploid, we will need \(6L\) double columns in total, which will be the last columns of the table (this is just easier)

The idea of each individual having its own ID and sex is fairly straightforward, but what I’m thinking we can do with the traits will be a bit different. We want the evolving genome to affect resistance through a evolving traits, such that the genome affects traits underlying resistance, which then affect the evolution of the genetic architecture itself. The idea then is that each trait will be modelled as a double real number that is ‘visible’ to selection – in most cases we will want the value of each trait will be a function of the genome, but this wouldn’t necessarily be a requirement (e.g., if we just wanted to fix dispersal ability to some value, we could set it to whatever we desire in the relevant row and column – this will also be useful for location on the landscape, which will be represented by two values x and y). We could then write a specific modular function build_traits, which loops through each individual and derives a single number from the individual’s genome, which is then placed in the relevant trait column (1-T). Trait columns can by dynamic – we can have anywhere from 1 to T of them, and the value of T will need to be stored somewhere – better, if in the code we define ID and Sex as traits, we can just have loci start at T (or T + 1 in R).

By having a modular build_traits function, we can keep the code flexible enough to have different ways of deriving single trait values for any given genome. The genome itself is probably easiest to model by breaking it down into L blocks of six, taking up 6L total columns of the 2D pest array. The first three values of a block will be the number (1), value (2), and dominance (3) for an allele, while the next three values will be the same for a homologous allele at the locus. The dominance values could be compared to calculate which allele has a stronger effect, similar to Duthie and Reid (2016, code), and this could be pre-set or randomised in some way for different allele types. The values could mean whatever we want them to, but a good starting point would be to think of them as additively effecting one or more trait as in Duthie, Bocedi, et al. (2018, code). We could of course just have the allele numbers in the genome, with a separate lookup table to match them with values and dominance coefficients, but I actually think this would be slower and more difficult to code. The build_traits function will loop through all of the loci, gathering in the relevant values to build each trait – a sub-function (e.g., calc_trait) could do this for an individual trait, identifying where and what in the genome should be used to make a single trait such as resistance to a particular biopesticide. So, for example, the pest data structure for one (non ID or Sex) trait and one locus would look like this:

simple_pests <- matrix(data = 0, nrow = 4, ncol = 9);
colnames(simple_pests) <- c("ID", "Sex", "Trait_1", "L1A1_i", "L1A1_v", 
                            "L1A1_h", "L1A2_i", "L1A2_v", "L1A2_h");
simple_pests[,1] <- 1:4;
simple_pests[,2] <- c(0, 1, 0, 1);
simple_pests[,3] <- runif(n = 4);
simple_pests[,4] <- sample(x = 1:2, size = 4, replace = TRUE);
simple_pests[,7] <- sample(x = 1:2, size = 4, replace = TRUE);
print(simple_pests);

##      ID Sex    Trait_1 L1A1_i L1A1_v L1A1_h L1A2_i L1A2_v L1A2_h
## [1,]  1   0 0.71220863      2      0      0      2      0      0
## [2,]  2   1 0.81876411      1      0      0      2      0      0
## [3,]  3   0 0.26913112      1      0      0      1      0      0
## [4,]  4   1 0.01149918      2      0      0      1      0      0

I’ve not included the allele values or dominance coefficients for alleles 1 and 2. Note that in the columns ‘L1’ refers to locus 1, and ‘A1’ and ‘A2’ refer to alleles 1 and 2, respectively. As mentioned earlier, these will need to be represented by doubles in C, and the table could get quite big (well over 1000 columns, and potentially tens of thousands of rows). The benefit of doing all this is that we can potentially use multiple different types of infection models, as desired (e.g., gene for gene, quantative, threshold).

Landscape

The landscape will be modelled using a three dimensional array. We want space to be explicit, and the easiest way to do this is to have the first two dimensions of the array correspond to the x and y locations on a region of landscape. Interactions can happen at the scale of landscape cells (array elements), and distance between cells in any location can allow for explicit spatial processes (typically, count the number of cells in any of 8 directions, including diagonally, but other neighbour distances could be used, or cells could be hexagonal if there is good reason to do it). Having different cells with different values allows for landscape level heterogeneity (could also make values autocorrelated as desired as in Duthie and Falcy (2013)). Perhaps each cell starts with having some ID for the farm, as in the case of the 8 by 8 landscape below (actually x and y dimensions will be a variable in the model – I usually use 100 by 100, with a torus landscape that wraps around itself to avoid edge effects).

land_IDs <- matrix(data = 0, nrow = 8, ncol = 8);
land_IDs[1:2, 1:8] <- 1;
land_IDs[3:6, 1:5] <- 2;
land_IDs[3:8, 6:8] <- 3;
land_IDs[7:8, 1:3] <- 4;
land_IDs[7:8, 4:5] <- 5;
print(land_IDs);

##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    1    1    1    1    1    1    1
## [2,]    1    1    1    1    1    1    1    1
## [3,]    2    2    2    2    2    3    3    3
## [4,]    2    2    2    2    2    3    3    3
## [5,]    2    2    2    2    2    3    3    3
## [6,]    2    2    2    2    2    3    3    3
## [7,]    4    4    4    5    5    3    3    3
## [8,]    4    4    4    5    5    3    3    3

So each of five farms or types of farms (or something else) are represented in the landscape. But we need lots of pieces of information for each cell, including what is planted on the cell, local environmental variables, and biopesticide applied. To get this, we can then add multiple layers on the z axis of a 3D array. So the landscape layer above is just the ‘top’, with multiple layers beneath coding critical information in a land array. Assume, for now that we need three layers (we’ll actually need a lot more): (1) land_IDs, (2) crop used, (3) biopesticide 1 applied.

land      <- array(data = 0, dim = c(8, 8, 3));
land[,,1] <- land_IDs;
land[,,2] <- sample(x = 1:3, size = 64, replace = TRUE);
land[,,3] <- round(x = runif(n = 64, min = 0, max = 1), digits = 3);
print(land);

## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    1    1    1    1    1    1    1    1
## [2,]    1    1    1    1    1    1    1    1
## [3,]    2    2    2    2    2    3    3    3
## [4,]    2    2    2    2    2    3    3    3
## [5,]    2    2    2    2    2    3    3    3
## [6,]    2    2    2    2    2    3    3    3
## [7,]    4    4    4    5    5    3    3    3
## [8,]    4    4    4    5    5    3    3    3
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,]    3    3    1    1    2    2    2    3
## [2,]    1    3    3    3    2    1    2    3
## [3,]    3    1    1    2    1    1    2    2
## [4,]    3    3    3    2    1    3    1    2
## [5,]    3    2    2    2    3    2    1    1
## [6,]    2    3    3    1    1    2    1    1
## [7,]    1    1    1    1    3    1    1    1
## [8,]    3    1    2    2    1    3    2    2
## 
## , , 3
## 
##       [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
## [1,] 0.534 0.804 0.478 0.874 0.622 0.997 0.469 0.939
## [2,] 0.548 0.905 0.414 0.611 0.552 0.085 0.286 0.137
## [3,] 0.879 0.180 0.539 0.184 0.240 0.718 0.851 0.760
## [4,] 0.370 0.090 0.808 0.779 0.616 0.430 0.047 0.979
## [5,] 0.647 0.492 0.475 0.569 0.701 0.469 0.958 0.081
## [6,] 0.568 0.152 0.027 0.284 0.249 0.707 0.007 0.061
## [7,] 0.784 0.611 0.784 0.826 0.199 0.820 0.590 0.246
## [8,] 0.904 0.203 0.312 0.641 0.709 0.764 0.107 0.118

The details aren’t so important for now; what matters is the idea that we can represent information about landscape cells dynamically using array layers, which can therefore correspond to any number of things that we want to affect pests. The above structures can be initialised in R and diverted to a toy model that could help accomplish the short-term goals of the project, but also diverted to C to do the more intense simulatoin involving complex genetic architecture (we’ll want a standalone C version too for speed and high-performance cluster computing). The toy model is probably the fist step – to have something loop over time steps showing that something (e.g., allelic diversity) is maintained under variable landscapes, but is not when landscapes are homogenous (or when dispersal is infinite). Probably best to simplify the genetics as much as possible (e.g., 1 locus, fewer than 5 alleles) for this, then return some numbers that can be extracted from these data structures to build a graphic.

I see these two structures (PEST and LAND) as being the two most important ones to get right early on because these are the ones that will be used in almost every part of the model. The code will otherwise be written modularly, meaning that if we don’t like the way that we’re, e.g., deriving pest traits from their genetic architecture, or setting dispersal rules, or modelling sex, then we can simply rewrite the specific function that does these things and plug it back into the model. With this, some list or vector is also needed to hold parameter values (PARA), at lesat from the transition from R to C.

Update: 18 JUL 2018

I have now initialised this lab notebook for the helicoverpa R package and model.

Comments

References

Duthie, A Bradley, Greta Bocedi, Ryan R Germain, and Jane M Reid. 2018. “Evolution of pre-copulatory and post-copulatory strategies of inbreeding avoidance and associated polyandry.” Journal of Evolutionary Biology. https://doi.org/10.1111/jeb.13189.

Duthie, A Bradley, Jeremy J Cusack, Isabel L Jones, Erlend B Nilsen, Rocio A Pozo, O. Sarobidy Rakotonarivo, Bram Van Moorter, and Nils Bunnefeld. 2018. “GMSE: an R package for generalised management strategy evaluation.” Methods in Ecology and Evolution. https://doi.org/10.1101/221432.

Duthie, A Bradley, and Matthew R Falcy. 2013. “The influence of habitat autocorrelation on plants and their seed-eating pollinators.” Ecological Modelling 251: 260–70. https://doi.org/10.1016/j.ecolmodel.2012.12.019.

Duthie, A Bradley, and Jane M Reid. 2016. “Evolution of inbreeding avoidance and inbreeding preference through mate choice among interacting relatives.” American Naturalist 188 (6): 000–000. https://doi.org/10.1086/688919.

Hamblin, Steven. 2013. “On the practical usage of genetic algorithms in ecology and evolution.” Methods in Ecology and Evolution 4 (2): 184–94. https://doi.org/10.1111/2041-210X.12000.

The resevol package

Brad Duthie and Rose McKeon

Contents:

Project updates:

The `mine_gmatrix` function

The `run_farm_sim` function

A simple example of the simulation

A more complex simulation

Initial conclusions

Data structures

Comments

References

The resevol package

Brad Duthie and Rose McKeon

Contents:

Project updates:

The mine_gmatrix function

The run_farm_sim function

A simple example of the simulation

A more complex simulation

Initial conclusions

Data structures

Comments

References

The `mine_gmatrix` function

The `run_farm_sim` function