This notebook tracks progress on the development of GMSE software R package, for game-theoretic management strategy evaluation, and related issues surrounding the development and application of game theory for addressing questions of biodiversity and food security.


Contents:

Project updates

Towards a Game-theoretic Management Strategy Evaluation (G-MSE)

General model development

General software development

Game-theory modelling (game.c; green box above)

Game-theory and modelling

Notes regarding Nilsen’s MSE

Some side-notes that might be of use

Potentially relevant conferences and workshops

References consulted and annotated (Mendeley)

References cited


Project updates:

Update: 15 MAR 2019

GMSE v0.4.0.11 is now available on GitHub. Some minor changes and bug fixes are included in the update version.

I will update once more when GMSE v0.4.0.11 is successfully submitted to CRAN.

Update: 21 AUG 2018

GMSE v0.4.0.7 is now available on GitHub, and the official GMSE repository is now transferred to the ConFooBio organisation. I am currently in the process of fixing some website issues, but most of the important stuff has transferred to the new website location.

Update: 16 MAY 2018

GMSE v0.4.0.3 is now available on CRAN. A new website for GMSE has also been launched. This website was built with the R package pkgdown, recently released on CRAN. The site contains all of the vignettes and documentation for GMSE, and also includes a link to this lab notebook. A submission of the accompanying manuscript will soon be uploaded on bioRxiv.

Update: 14 MAY 2018

A new GMSE v0.4.0.3 has now been pushed to the master branch on GitHub and has been submitted to CRAN. The biggest update in this new version is a series of vignettes, plus a minor improvement to the genetic algorithm. More updates will follow soon, including some re-organisation of the GMSE project and a new manuscript submission.

Update: 13 APR 2018

I have re-worked the way that a manager restimates how the change in their policy affects users’ actions. The new new_act function in the genetic algorithm (games.c) performs well for getting more precise cost settings. The former way of doing it was much more of a blunt instrument, and it had a ceiling issue – that is, the manager would believe that higher costs caused fewer actions even when the resulting cost was over the users’ budgets.

/* =============================================================================
 * This function updates an action based on the change in costs & paras
 *     old_cost: The old cost of an action, as calculated in policy_to_counts
 *     new_cost: The new cost of an action, as calculated in policy_to_counts
 *     paras: Vector of global parameters
 * ========================================================================== */
int new_act(double old_cost, double new_cost, double old_act, double *paras){
    
    int total_acts;
    double users, max_to_spend, acts_per_user, cost_per_user, total_cost;
    double pr_on_act, budget_for_act, mgr_budget, min_cost;
    
    users        = paras[54] - 1; /* Minus one for the manager */
    min_cost     = paras[96];     /* Minimum cost of an action */
    max_to_spend = paras[97];     /* Maximum per user budget   */
    mgr_budget   = paras[105];    /* Manager's total budget    */
    
    total_cost    = 0.0;
    if(old_cost < mgr_budget){
        total_cost    = old_act * old_cost; /* Total cost devoted to action */
    }

    cost_per_user = (total_cost / users); /* Cost devoted per user */
    pr_on_act     = cost_per_user / max_to_spend;    /* Pr. devoted to action */

    /* Assume that the proportion of the budget a user spends will not change */
    budget_for_act = max_to_spend * pr_on_act;

    /* Calculate how many actions to expect given acts per user and users */
    acts_per_user  = budget_for_act / (new_cost + min_cost);  
    total_acts     = (double) users * acts_per_user;

    return(total_acts);
}

This new way of assessing how users will act is now the function to be run in the background of all manager genetic agorithms. Very nicely, this also resolves an annoyance with the maximum allowed budgets. Previously, it was unclear why maximum budgets greater than 10000 were causing problems (managers were making bad predictions). I have now set the maximum budget to an order of magnitude higher, and there are no longer any apparent issues. A new version of GMSE will soon have this update.

Update: 21 DEC 2017

New Issue #40: Age distribution bump

Running simulations using gmse_apply, jeremycusack noticed a small but noticeable sharp decline in the population size at a generation equal to the maximum age of resources in the population (used a maximum age of 20). This decline is caused by the initial seed of resources having a uniform age distribution. In the first generation, these resources reproduce offspring that all have an age of zero, leading to an age structure in the population with many zero age individuals and a uniform distribution of ages greater than zero. The initial seed of individuals with random ages died gradually, but there were enough individuals in the initial offspring cohort that made it to the maximum age for it to have a noticeable effect in decreasing population size (i.e., all of these resources died on the maximum_age + 1 time step).

This effect can be avoided entirely given sufficient burn in generations of a model, and is less of a problem when the maximum age is low because this allows the age distribution to stabilise sooner. Further, using gmse_apply can avoid the issue by directly manipulating resources ages after the initial generation. Nevertheless, it would be useful to have a different default of age distributions instead of a uniform distribution.

One way to do this would be to find the age (\(A\)) at which a resource is expected to be alive with a probability of \(0.5\), after accounting for mortality (\(\mu\)). This is simply calculated below:

\((1 - \mu)^A = 0.5\)

The above can be re-arranged to find A,

\(A = \frac{log(0.5)}{log(1 - \mu)}\).

Note that we could use a switch function (or something like it in R) to make \(A = 0\) when \(\mu = 1\), and revert to a uniform distribution of \(\mu = 0\) (though this should rarely happen).

The value of \(\mu\) would depend on res_death_type, and be processed in make_resource, which is used in both gmse and gmse_apply. If res_death_type = 1 (density independent, rarely used), then \mu is simply equal to remov_pr. If res_death_type = 2 (density dependent), then \mu could be found perhaps using something like the following:

mu = (RESOURCE_ini * lambda) / (RESOURCE_ini + RESOURCE_ini * lambda) gi This would get a value that is at least proportional to expected mortality rate of a resource (if res_death_type = 3, then we could use the some of types 1 and 2). Overall, the documentation should perhaps recommend finding a stable set of age distributions for a particular set of parameter combinations when using gmse_appy (i.e., through simulation), then using that distribution as an initial condition. But something like the above could probably get close to whatever the stable age distribution would be, at least close enough to make the decline in population size trivial.

I will start to consider some of the above as a potential default for the next version of GMSE. The best way to do this is probably to look at how code from the res_remove function in the resource.c file can best be integrated into a function called by the R function make_resource (i.e., either use the equations, or estimates of them, or somehow call res_remove directly).

Update: 13 DEC 2017

Improved convergence criteria

I have introduced and then immediately resolved Issue #39.

The convergence criteria has now been fixed with commit f598d8e52b47ef2017cac13d09aac1fb7aa6b506. To do this, I re-configured some of the genetic algorithm code into easier to read functions for checking the fitness increase. Now two separate ways of checking the increase in fitness from one genetic algorithm generation to the next exist; one for managers and one for users. This is needed because user fitness values are greater than zero and increase as their utility is maximised, but manager fitness values are less than zero and increase toward zero as their utility is maximised. The genetic algorithm now checks for a percentage improvement in fitness.

Now the default value of converge_crit equals 1, which means it does actually play a role sometimes (or is expected to). The genetic algorithm will continue until the percent increase in fitness from the previous generation is less than one percent. In practice, this doesn’t noticeably affect much, but it does allow better strategies to be found more quickly, and without having to play with ga_mingen to find them under extreme parameter settings (e.g., huge budgets and rapid shifts in abundance).

The new fix has now been checked and built with Winbuilder into v0.3.2.0, but I am leaving this on the development branch for now in anticipation of other potential improvements to be made soon.

Update: 1 NOV 2017

CRAN ready GMSE v0.3.1.7 – more flexibility, better error messages

I have now completed some substantial coding of error messages, which will be called in both gmse and gmse_apply. Essentially, these provide some help to software users who parameterise their models in a way that does not work with GMSE. For example, if the parameter stakeholders is set to equal a negative number, an error message will be returned that informs the user that at least one stakeholder is required in the model. These error messages become a bit more important in gmse_apply, where it is possible for users to include arguments that don’t make sense (e.g., arrays of incorrect dimensions, or arguments that contradict one another).

The function gmse_apply has also been improved to make looping it easier. What had been happening during testing was that we were finding it all too easy to crash R by reading in parameters that contradicted one another (e.g., changing setting the landscape dimensions through land_dim_1 and land_dim_2 caused a crash when also trying to add in a LAND of different dimension – now this returns an error that LAND and land_dim_1 disagree about landscape size). This has been resolved in two ways. First, I have included many error messages meant to catch bad and contradictory arguments in gmse_apply (and, to a lesser extent gmse); it is still possible to crash R by setting things incorrectly, but you have to work very hard to do it – i.e., it almost has to be deliberate, as far as I can tell. Second, I have added the argument old_list to gmse_apply, which is FALSE by default, but can instead take the output of a previous full list return of gmse_apply (where get_res = Full). An element of the full list includes the basic output from which key parameters can be pulled. As a reminder, the basic gmse_apply output looks like the below.

$resource_results
[1] 1062

$observation_results
[1] 680.2721

$manager_results
         resource_type scaring culling castration feeding help_offspring
policy_1             1      NA     110         NA      NA             NA

$user_results
        resource_type scaring culling castration feeding help_offspring tend_crops kill_crops
Manager             1      NA       0         NA      NA             NA         NA         NA
user_1              1      NA       9         NA      NA             NA         NA         NA
user_2              1      NA       9         NA      NA             NA         NA         NA
user_3              1      NA       9         NA      NA             NA         NA         NA
user_4              1      NA       9         NA      NA             NA         NA         NA

An example gmse_apply used in a loop is below.

to_scare <- FALSE;
sim_old  <- gmse_apply(scaring = to_scare, get_res = "Full", stakeholders = 6);
sim_sum  <- matrix(data = NA, nrow = 20, ncol = 7);
for(time_step in 1:20){
    sim_new               <- gmse_apply(scaring = to_scare, get_res = "Full", 
                                        old_list = sim_old);
    sim_sum[time_step, 1] <- time_step;
    sim_sum[time_step, 2] <- sim_new$basic_output$resource_results[1];
    sim_sum[time_step, 3] <- sim_new$basic_output$observation_results[1];
    sim_sum[time_step, 4] <- sim_new$basic_output$manager_results[2];
    sim_sum[time_step, 5] <- sim_new$basic_output$manager_results[3];
    sim_sum[time_step, 6] <- sum(sim_new$basic_output$user_results[,2]); 
    sim_sum[time_step, 7] <- sum(sim_new$basic_output$user_results[,3]); 
    sim_old               <- sim_new;
    print(time_step);
}
colnames(sim_sum) <- c("Time", "Pop_size", "Pop_est", "Scare_cost", 
                       "Cull_cost", "Scare_count", "Cull_count");

The ouput sim_sum is shown below.

      Time Pop_size   Pop_est Scare_cost Cull_cost Scare_count Cull_count
 [1,]    1      733  839.0023         NA       110          NA         54
 [2,]    2      768  702.9478         NA       110          NA         54
 [3,]    3      824  725.6236         NA       110          NA         54
 [4,]    4      933  907.0295         NA       110          NA         54
 [5,]    5     1180  816.3265         NA       110          NA         54
 [6,]    6     1345 1224.4898         NA        10          NA        426
 [7,]    7     1114 1269.8413         NA        10          NA        425
 [8,]    8      820  884.3537         NA       110          NA         54
 [9,]    9      952  793.6508         NA       110          NA         54
[10,]   10     1101  884.3537         NA       110          NA         54
[11,]   11     1299 1111.1111         NA        12          NA        402
[12,]   12     1079  907.0295         NA       110          NA         54
[13,]   13     1227 1564.6259         NA        10          NA        431
[14,]   14      934  839.0023         NA       110          NA         54
[15,]   15     1065 1133.7868         NA        10          NA        423
[16,]   16      768  725.6236         NA       110          NA         54
[17,]   17      869  929.7052         NA       110          NA         54
[18,]   18      949  907.0295         NA       110          NA         54
[19,]   19     1049  884.3537         NA       110          NA         54
[20,]   20     1200 1020.4082         NA        64          NA         90

We can take advantage of gmse_apply to dynamically change parameter values mid-loop. For example, below shows the same code, but with a policy of scaring introduced on time step 10.

to_scare <- FALSE;
sim_old  <- gmse_apply(scaring = to_scare, get_res = "Full", stakeholders = 6);
sim_sum  <- matrix(data = NA, nrow = 20, ncol = 7);
for(time_step in 1:20){
    sim_new               <- gmse_apply(scaring = to_scare, get_res = "Full", 
                                        old_list = sim_old);
    sim_sum[time_step, 1] <- time_step;
    sim_sum[time_step, 2] <- sim_new$basic_output$resource_results[1];
    sim_sum[time_step, 3] <- sim_new$basic_output$observation_results[1];
    sim_sum[time_step, 4] <- sim_new$basic_output$manager_results[2];
    sim_sum[time_step, 5] <- sim_new$basic_output$manager_results[3];
    sim_sum[time_step, 6] <- sum(sim_new$basic_output$user_results[,2]); 
    sim_sum[time_step, 7] <- sum(sim_new$basic_output$user_results[,3]); 
    sim_old               <- sim_new;
    if(time_step == 10){
        to_scare <- TRUE;
    }
    print(time_step);
}
colnames(sim_sum) <- c("Time", "Pop_size", "Pop_est", "Scare_cost", 
                       "Cull_cost", "Scare_count", "Cull_count");

The above simulation results in the following output for sim_sum.

      Time Pop_size   Pop_est Scare_cost Cull_cost Scare_count Cull_count
 [1,]    1      745  657.5964         NA       110          NA         54
 [2,]    2      805 1111.1111         NA        12          NA        400
 [3,]    3      473  634.9206         NA       110          NA         54
 [4,]    4      504  566.8934         NA       110          NA         54
 [5,]    5      577  498.8662         NA       110          NA         54
 [6,]    6      600  430.8390         NA       110          NA         54
 [7,]    7      648  612.2449         NA       110          NA         54
 [8,]    8      714  702.9478         NA       110          NA         54
 [9,]    9      813  612.2449         NA       110          NA         54
[10,]   10      914 1020.4082         NA        64          NA         90
[11,]   11     1011 1179.1383         57        10          49        301
[12,]   12      858  725.6236         10       110         193         37
[13,]   13     1011 1043.0839         37        30           0        198
[14,]   14      989 1043.0839         57        30           0        198
[15,]   15      983 1065.7596         48        20          10        270
[16,]   16      851  839.0023         10       110         193         37
[17,]   17      962 1111.1111         38        12          58        306
[18,]   18      783  612.2449         10       110         193         37
[19,]   19      862  816.3265         10       110         193         37
[20,]   20      963  702.9478         10       110         182         38

Hence, in addition to all of the other benefits of gmse_apply, one new feature is that we can use it to study change in policy availability – in this case, what happens when scaring is introduced as a possible policy option. Similar things can be done, for example, to see how manager or user power changes over time. In the example below, users’ budgets increase by 100 every time step, with the manager’s budget remaining the same. The consequence appears to be decreased population stability and a higher likelihood of extinction.

ub          <- 500;
sim_old     <- gmse_apply(get_res = "Full", stakeholders = 6, user_budget = ub);
sim_sum     <- matrix(data = NA, nrow = 20, ncol = 6);
for(time_step in 1:20){
    sim_new               <- gmse_apply(get_res = "Full", old_list = sim_old,
                                        user_budget = ub);
    sim_sum[time_step, 1] <- time_step;
    sim_sum[time_step, 2] <- sim_new$basic_output$resource_results[1];
    sim_sum[time_step, 3] <- sim_new$basic_output$observation_results[1];
    sim_sum[time_step, 4] <- sim_new$basic_output$manager_results[3];
    sim_sum[time_step, 5] <- sum(sim_new$basic_output$user_results[,3]);
    sim_sum[time_step, 6] <- ub;
    sim_old               <- sim_new;
    ub                    <- ub + 100;
    print(time_step);
}
colnames(sim_sum) <- c("Time", "Pop_size", "Pop_est", "Cull_cost", "Cull_count",
                       "User_budget");

The output of sim_sum is below.

      Time Pop_size   Pop_est Cull_cost Cull_count User_budget
 [1,]    1     1215 1405.8957        10        292         500
 [2,]    2     1065 1224.4898        10        336         600
 [3,]    3      833  680.2721       110         36         700
 [4,]    4      936  907.0295       110         42         800
 [5,]    5     1174 1224.4898        10        401         900
 [6,]    6      887  521.5420       110         54        1000
 [7,]    7      988  680.2721       110         60        1100
 [8,]    8     1084  975.0567       110         60        1200
 [9,]    9     1208  861.6780       110         66        1300
[10,]   10     1360 1133.7868        10        520        1400
[11,]   11      975  861.6780       110         78        1500
[12,]   12     1079 1156.4626        10        560        1600
[13,]   13      597  770.9751       110         90        1700
[14,]   14      595  476.1905       110         96        1800
[15,]   15      586  612.2449       110        102        1900
[16,]   16      584  770.9751       110        108        2000
[17,]   17      557  589.5692       110        114        2100
[18,]   18      519  521.5420       110        120        2200
[19,]   19      469  521.5420       110        120        2300
[20,]   20      430  453.5147       110        126        2400

There is an important note to make about changing arguments to gmse_apply when old_list is being used: The function gmse_apply is trying to avoid a crash, so the function will accomodate parameter changes by rebuilding data structures if necessary. For example, if you change the number of stakeholders (and by including an argument stakeholders to gmse_apply, it is assumed that stakeholders are changing even they are not), then a new array of agents will need to be built. If you change landscape dimensions (or just include the argument land_dim_1 or land_dim_2), then a new landscape willl be built. This is mentioned in the documentation.

GMSE v0.3.3.7 passes all CRAN checks in Rstudio. I will make sure that the code works with win-builder, then prepare the new submission. Alternatively, as always, the newest GMSE version can be downloaded through GitHub if you have devtools installed in R.

devtools::install_github("bradduthie/GMSE")

I will soon update the manuscript for GMSE and upload it to biorXiv.

Update: 23 OCT 2017

Bug fix concerning density-based estimation

An error with density-based resource estimation (observe_type = 0) at very high values of agent_view was identified by Jeremy. When managers had a view of the landscape that encompassed a number of cells that was calculated to be larger than the actual number of landscape cells (as defined by land_dim_1 * land_dim_2), the manager would understimate actual population size. This occurred only in the manager.c file and not in the equivalent R function shown during plotting. The bug was fixed in commit a916b8f8a40041b5f08984cf73348108482dde59 with a simple if statement. This has therefore been resolved in a patched GMSE v0.3.1.3, which is now availabe on GitHub.

Update: 19 OCT 2017

Bug fix concerning resource movement

An error with the res_move_obs parameter was identified by Jeremy. This parameter was supposed to only affect resource movement during observation, but an if statement corrected in commit 5eeb88d285af57984171e7d72410659b3b441af3 was causing res_move_obs = FALSE to stop moving entirely in the resource model. This has now been resolved in a patched GMSE v0.3.1.1, which is now available on GitHub.

New option for removal of resources

A new option has been included for the argument res_death_type. By setting res_death_type = 3 in gmse or gmse_apply, resources can experience both density dependent (caused by res_death_K) and density independent (caused by remove_pr) removal simultaneously. Effects of each are independent of one another (i.e., both processes occur simultaneously, so the calculation of population size affecting removal due to carrying capacity includes resources that might experience density independent mortality).

Update: 16 OCT 2017

New group_think parameter in GMSE v0.3.1.0

A new group_think parameter has been developed by Jeremy and me, and included into an updated v0.3.1.0. This parameter is defined as FALSE by default, but when set to be TRUE will cause all users to act as a single block instead of independently. In the code, what happens is that a single user (user ID number 2) runs through the genetic algorithm, but then instead of having the resulting actions apply to only this user, they apply to all users so that the genetic algorithm only needs to be run once in the user model. This decreases simulation time, particularly when there are a lot of users to model, but at a cost of removing all variation in actions among users. The group_think parameter can be defined in both gmse() and gmse_apply(), but I have not added it as an option in gmse_gui().

Update: 13 OCT 2017

GMSE v0.3.0.0 now available with gmse_apply

The gmse_apply function is now available on a new GMSE version 0.3.0.0. (minor tweaks to other functions have also been made, but nothing that changes the user experience of gmse – mostly typos corrected in the documentation). The new function allows software users to integrate their own submodels (resource, observation, manager, and user) into GMSE, or to use their own submodels entirely within a single function.


GMSE apply function

The gmse_apply function is a flexible function that allows for user-defined sub-functions calling resource, observation, manager, and user models. Where such models are not specified, GMSE submodels ‘resource’, ‘observation’, ‘manager’, and ‘user’ are run by default. Any type of sub-model (e.g., numerical, individual-based) is permitted as long as the input and output are appropriately specified. Only one time step is simulated per call to gmse_apply, so the function must be looped for simulation over time. Where model parameters are needed but not specified, defaults from gmse are used.

gmse_apply arguments


Example uses of gmse_apply

A simple run of gmse_apply() will return one generation of gmse using default submodels and parameter values.

sim <- gmse_apply();

For sim, the default ‘basic’ results are returned as below.

$resource_results
[1] 1102

$observation_results
[1] 1179.138

$manager_results
       scaring culling castration feeding help_offspring
policy      NA      10         NA      NA             NA

$user_results
        resource_type scaring culling castration feeding help_offspring tend_crops kill_crops
Manager             1      NA       0         NA      NA             NA         NA         NA
user_2              1      NA      70         NA      NA             NA         NA         NA
user_3              1      NA      75         NA      NA             NA         NA         NA
user_4              1      NA      69         NA      NA             NA         NA         NA
user_5              1      NA      74         NA      NA             NA         NA         NA

Note in the case above we have the total abundance of resources returned, the estimate of resource abundance from the observation function, the costs the manager sets for the only available action of culling, and the number of culls attempted by each user.

The above was produced by all of the individual-based functions that are default in GMSE; custom generated subfunctions can instead be included provided that they fit the specifications described above. For example, we can define a very simple logistic growth function to send to res_mod instead.

alt_res <- function(X, K = 2000, rate = 1){
    X_1 <- X + rate*X*(1 - X/K);
    return(X_1);
}

The above function takes in a population size of X and returns a value X_1 based on the population intrinsic growth rate rate and carrying capacity K. Iterating the logistic growth model by itself under default parameter values with a starting population of 100 will cause the population to increase to carrying capacity in roughly 7 generations. The function can be substituted into gmse_apply to use it instead of the default GMSE resource model.

sim <- gmse_apply(res_mod = alt_res, X = 100, rate = 0.3);

The gmse_apply function will find the parameters it needs to run the alt_res function in place of the default resource function, either by running the default function values (e.g., K = 2000) or values specified directly into gmse_apply (e.g., X = 100 and rate = 0.3). If an argument to a custom function is required but not provided either as a default or specified in gmse_apply, then an error will be returned.

To integrate across different types of submodels, gmse_apply translates between vectors and arrays between each submodel. For example, because the default GMSE observation model requires a resource array with particular requirements for column identites, when a resource model subfunction returns a vector, or a list with a named element ‘resource_vector’, this vector is translated into an array that can be used by the observation model. Specifically, each element of the vector identifies the abundance of a resource type (and hence will usually be just a single value denoting abundance of the only focal population). If this is all the information provided, then a resource_array will be made with default GMSE parameter values with an identical number of rows to the abundance value (floored if the value is a non-integer; non-default values can also be put into this transformation from vector to array if they are specified in gmse_apply, e.g., through an argument such as lambda = 0.8). Similarly, a resource_array is also translated into a vector after the default individual-based resource model is run, should the observation model require simple abundances instead of an array. The same is true of observation_vector and observation_array objects returned by observation models, of manager_vector and manager_array (i.e., COST) objects returned by manager models, and of user_vector and user_array (i.e., ACTION) objects returned by user models. At each step, a translation between the two is made, with necessary adjustments that can be tweaked through arguments to gmse_apply when needed. Alternative observation, manager, and user, submodels, for example, are defined below; note that each requires a vector from the preceding model.

# Alternative observation submodel
alt_obs <- function(resource_vector){ 
    X_obs <- resource_vector - 0.1 * resource_vector;
    return(X_obs);
}

# Alternative manager submodel
alt_man <- function(observation_vector){
    policy <- observation_vector - 1000;
    if(policy < 0){
        policy <- 0;
    }
    return(policy);
}

# Alternative user submodel
alt_usr <- function(manager_vector){
    harvest <- manager_vector + manager_vector * 0.1;
    return(harvest);
}

All of these submodels are completely deterministic, so when run with the same parameter combinations, they produce replicable outputs.

gmse_apply(res_mod = alt_res, obs_mod = alt_obs, 
           man_mod = alt_man, use_mod = alt_usr, X = 1000);

The above, for example, produces the following output (Note that the X argument needs to be specified, but the rest of the subfunctions take vectors that gmse_apply recognises will become available after a previous submodel is run).

$resource_results
[1] 1500

$observation_results
[1] 1350

$manager_results
[1] 350

$user_results
[1] 385

Note that the manager_results and user_results are ambiguous here, and can be interpreted as desired – e.g., as total allowable catch and catches made, or as something like costs of catching set by the manager and effort to catching made by the user. Hence while manger output is set in terms of costs of performing each action, and user output is set in terms of action attempts, this need not be the case when using gmse_apply (though it should be recognised when using default GMSE manager and user functions).

GMSE default submodels can be added in at any point.

gmse_apply(res_mod = alt_res, obs_mod = observation, 
           man_mod = alt_man, use_mod = alt_usr, X = 1000)

The above produces the results below.

$resource_results
[1] 1500

$observation_results
[1] 1655.329

$manager_results
[1] 655.3288

$user_results
[1] 720.8617

If we wanted to, for example, specify a simple resource and observation model, but then take advantage of the genetic algorithm to predict policy decisions and user actions, we could use the default GMSE manager and user functions (written below explicitly, though this is not necessary).

gmse_apply(res_mod = alt_res, obs_mod = alt_obs, 
           man_mod = manager, use_mod = user, X = 1000)

The above produces the output below returning culling costs and culling actions attempted by four users (note that the default manager target abundance is 1000).

$resource_results
[1] 1500

$observation_results
[1] 1350

$manager_results
       scaring culling castration feeding help_offspring
policy      NA      10         NA      NA             NA

$user_results
        resource_type scaring culling castration feeding help_offspring tend_crops kill_crops
Manager             1      NA       0         NA      NA             NA         NA         NA
user_2              1      NA      70         NA      NA             NA         NA         NA
user_3              1      NA      70         NA      NA             NA         NA         NA
user_4              1      NA      71         NA      NA             NA         NA         NA
user_5              1      NA      73         NA      NA             NA         NA         NA

Instead of using the gmse function, we might simulate multiple generations by calling gmse_apply through a loop, reassigning outputs where necessary for the next generation (where outputs are not reassigned, new defaults will be inserted in their place, so, e.g., if we were to just loop without reassigning any variables, nothing would update and we would be running the same model, effectively, multiple times). Below shows how this might be done.

sim1      <- gmse_apply(get_res = "full", lambda = 0.3);
RESOURCES <- sim1$resource_array;
LAND      <- sim1$LAND;
PARAS     <- sim1$PARAS;
results   <- matrix(dat = NA, nrow = 40, ncol = 4);

for(time_step in 1:40){
    sim_new <- gmse_apply(RESOURCES = RESOURCES, LAND = LAND, PARAS = PARAS,
                          COST = COST, ACTION = ACTION, stakeholders = 10,
                          get_res   = "full", agent_view = 20);
    
    results[time_step, 1] <- sim_new$resource_vector;
    results[time_step, 2] <- sim_new$observation_vector;
    results[time_step, 3] <- sim_new$manager_vector;
    results[time_step, 4] <- sim_new$user_vector;
    
    RESOURCES <- sim_new$resource_array;
    LAND      <- sim_new$LAND;
    PARAS     <- sim_new$PARAS;
    COST      <- sim_new$COST;
    ACTION    <- sim_new$ACTION;
}

colnames(results) <- c("Abundance", "Estimate", "Cull_cost", "Cull_attempts");

The above results in the following output for results.

      Abundance  Estimate Cull_cost Cull_attempts
 [1,]      1195 1165.9726        10   716
 [2,]      1045  939.9167       110   461
 [3,]      1160 1160.0238        10   715
 [4,]      1056 1183.8192        10   715
 [5,]      1014  850.6841       110   468
 [6,]      1171 1237.3587        10   717
 [7,]      1026  993.4563       110   464
 [8,]      1202  957.7632       110   464
 [9,]      1394 1469.3635        10   702
[10,]      1333 1457.4658        10   702
[11,]      1277 1397.9774        10   702
[12,]      1175 1415.8239        10   702
[13,]      1088  701.9631       110   468
[14,]      1275 1207.6145        10   718
[15,]      1200 1332.5402        10   718
[16,]      1116 1029.1493        45   512
[17,]      1249 1814.3962        10   699
[18,]      1141 1273.0518        10   722
[19,]      1019  963.7121       110   455
[20,]      1216 1629.9822        10   708
[21,]      1088 1130.2796        10   708
[22,]       988 1035.0982        38   537
[23,]      1056 1029.1493        45   505
[24,]      1154  749.5538       110   463
[25,]      1344 1499.1077        10   722
[26,]      1268 1386.0797        10   712
[27,]      1165 1493.1588        10   707
[28,]      1061 1070.7912        19   633
[29,]      1019 1076.7400        17   663
[30,]       961  600.8328       110   457
[31,]      1135  874.4795       110   450
[32,]      1338 1189.7680        10   701
[33,]      1275 1600.2380        10   710
[34,]      1174 1362.2844        10   709
[35,]      1104 1112.4331        12   685
[36,]      1003 1302.7960        10   715
[37,]       828 1183.8192        10   712
[38,]       649  785.2469       110   462
[39,]       739 1023.2005        56   488
[40,]       813  910.1725       110   455

Note that managers increase the cost of culling based on the time step’s estimated abundance, and user culling attempts decrease when culling costs increase.

In addition to the flexibility of allowing user-defined submodels, gmse_apply is also useful for modellers who might be interested in simulating processes not currently available in gmse by itself. For example, if we wanted to model a sudden environmental perturbation decreasing population size, or a sudden influx of new users, after 30 generations, we could do so in the loop.

In the near future, the gmse_apply function will be included in the GMSE vignette and submitted to CRAN with the rest of v0.3.0.0 – in the mean time, I believe that all major bugs have been ironed out, but please let me know or report an issue if you are able to crash the function (i.e., if you run it and it causes R to crash – you should always get an error message before this happens).

To download the latest GMSE v0.3.0.0, simply run the below in R (make sure that devtools is installed).

devtools::install_github("bradduthie/GMSE")

I welcome any feedback, and I expect to submit an update to CRAN around late October.

Update: 12 OCT 2017

New function gmse_apply complete and tested

I have now completed the gmse_apply function, which exploits the full modularity of GMSE by allowing software users to develop their own sub-functions and string them together with any combination of GMSE default sub-functions. As a brief summary, gmse_apply includes the following features:

Any arguments for custom user functions can simply be passed along by specifying them in gmse_apply. For example, if we have a custom resource function alt_res below:

alt_res <- function(X = 1000, K = 2000, r = 1){
    X_1 <- X + r*X*(1 - X/K);
    return(X_1);
}

We can simply include the above in gmse_apply as follows to use the very simple logistic growth sub-model with the individual-based submodels that are defaults of GMSE.

sim_app <- gmse_apply(res_mod = alt_res);

The gmse_apply function simply adds in GMSE defaults for unspecified models, but we can specify them too.

sim_app <- gmse_apply(res_mod = alt_res, obs_mod = observation);

To adjust parameters in the alternative resource model, simply add in the arguments as below.

sim_app <- gmse_apply(res_mod = alt_res, X = 2000, K = 5000, r = 1.2);

The gmse_apply function will know where to place them, and update them should they be needed for other models.

I will give a more lengthy description of how to use gmse_apply tomorrow, when I push GMSE v0.3.0.0 to the master branch of GitHub and advertise the update.

Update: 6 OCT 2017

Compensation suggestion

A suggestion from Jeremy to include a compensation option for users. Users could devote some of their budget to compensation, then managers could compensate a proportion of their damaged yield. Implementing this will require consideration from the manager’s perspective with respect to the genetic algorithm – the users’ perspective will be easier because a user can remember their previous losses and assess compensation versus culling. Managers might have to think about how compensation could incentivise non-culling, but this might actually already work given the way the manager anticipates actions; more investigation into this will be useful following the finalisation of gmse_apply(), which is in progress.

Update: 28 SEP 2017

Progress has been made on the gmse_apply() function. My goal is to make this as modular as possible – to allow any four functions to be included in the GMSE framework, including arbitrary arguments to each function. The gmse_apply() function will recognise which arguments go along with which functions, and naturally string together results from one sub-function to the input of the next sub-function (though this will demand that the output from functions is labelled in a way that matches the arguments of the next function; e.g., if you have a ‘N_total’ as input for the observation model, then ‘N_total’ will either need to be labelled output of the resource model or specificied directly in gmse_apply()). Default submodels will be the IBMs used in gmse(), and where arguments are not specified by the software user in gmse_apply() (e.g., LAND) they will be built from default gmse() parameters.

Update: 27 SEP 2017

The GMSE GUI has been updated with all of the new features in version 0.2.3.0. The gmse_gui() function is likewise updated in a new patch version 0.2.3.1. I did this quickly because the GUI was actually easy to update; plans for the gmse_apply function are now also clear, and I hope to have a working function and version 0.3.0.0 by the end of the week, or by early next week.

Update: 26 SEP 2017


GMSE Version 0.2.3.0 on GitHub

I have pushed a new version 0.2.3.0 of GMSE onto the master branch of GitHub, which means that the most up-to-date version can be installed using the code below (make sure the devtools library is installed).

devtools::install_github("bradduthie/GMSE")

The new version includes multiple new features:

To run a simple default simulation, the gmse function remains unchanged.

sim <- gmse();

To plot the effort of managers and users, use the below.

plot_gmse_effort(agents = sim$agents, paras = sim$paras, 
                 ACTION = sim$action,  COST = sim$cost);

Below summarises the results more cleanly, extracting key information from sim.

gmse_summary(sim);

And as before, the GUI can be called directly from the R console.

gmse_gui();

The GUI does not yet allow you to get a vew of the plot_gmse_effort output, or a gmse_summary, but this will be a goal for future versions of GMSE.

If able, I recommend updating to version 0.2.3.0 as soon as possible. In the coming few days, I will also add the gmse_apply function, primarily for developers who will benefit from a more modular way of using GMSE, allowing for different types of submodules to be used within the broader GMSE framework. When the new apply function has been added (and possibly the GUI improved), I will submit a new version 0.3.x.x to CRAN.


Bug Fix and tweaks to agent prediction

I have now fixed a bug in the code that was causing confusion between culling and castration. After recompiling and running simulations, manager and user actions improve. I have also made some minor changes to default gmse() options. Regarding the predicted consequences of manager and user actions (i.e., the predictions from the agents’ perspective that guid their decision making), I have adjusted some things to make them more in line with what is expected in the simulation as follows (recall that managers are interested in global abundance and users are interested specifically in how abundance affects themselves):

  1. Scaring: Managers predict no change in resource abundance, while users predict a decrease of 1
  2. Culling: Managers and users predict a decrease of \(1 + \lambda\) (Note that this brings in knowledge of birth rate a priori – might want to allow for a change in this in the simulation, but it also seems realistic for agents to recognise that adults can reproduce, and a value is needed to reflect this)
  3. Castration: Managers and users predict a decrease of \(\lambda\)
  4. Feeding: Managers and users predict an increase of \(\lambda\)
  5. Help offspring: Managers and users predict an increase of 1

These values are a bit more in line with what will actually happen, so we assume that managers and users are a bit more informed now. It also allows for a bit more differentiation among actions. Overall, the model appears to perform better now – meaning that managers and users appear to be better predictors of the conseuqneces of their actions.

Before finishing the gmse_apply() function, I will push an updated version of GMSE to GitHub with these changes, plus new plotting options.

Update: 25 SEP 2017

I have written a gmse_summary function (see below), which returns a simplified list that includes four elements, each of which is a table of data: 1. resources, a table showing time step in the first column, followed by resource abundance in the second column. 2. observations, a table showing time step in the first column, followed by the estimate of population size (produced by the manager) in the second column. 3. costs, a table showing time step in the first column, manager number in the second column (should always be zero), followed by the costs of each action set by the manager (policy); the far-right column indicates budget that is unused and therefore not allocated to any policy. 4. actions, a table showing time step in the first column, user number in the second column, followed by the actions of each user in the time step; additional columns indicate unused actions, crop yield on the user’s land (if applicable), and the number of resources that a user successfully harvests (i.e., ‘culls’).

At the moment, I have not added in the actual number of resources that a user culls. This will be added shortly, after which I will post a new function. Doing so is a bit more complicated because it requires me to go into the C code and make a recording every time it happens (see how I plan to do this below the function).

gmse_summary <- function(gmse_results){
    time_steps <- dim(gmse_results$paras)[1];
    parameters <- gmse_results$paras[1,];
    #--- First get the resource abundances
    res_types    <- unique(gmse_results$resource[[1]][,2]);
    resources    <- matrix(dat  = 0, nrow = time_steps, 
                           ncol = length(res_types) + 1);
    res_colna    <- rep(x = NA, times = dim(resources)[2]);
    res_colna[1] <- "time_step";
    for(i in 1:length(res_types)){
        res_colna[i+1] <- paste("type_", res_types[i], sep = "");
    }
    colnames(resources) <- res_colna;
    #--- Next get estimates abd the costs set by the manager
    observations    <- matrix(dat  = 0, nrow = time_steps, 
                              ncol = length(res_types) + 1);
    costs   <- matrix(dat = NA, nrow = time_steps*length(res_types), ncol = 10);
    agents  <- gmse_results$agents[[1]];
    users   <- agents[agents[,2] > 0, 1];
    actions <- matrix(dat  = NA, ncol = 13,
                      nrow = time_steps * length(res_types) * length(users));
    c_row  <- 1;
    a_row  <- 1;
    for(i in 1:time_steps){
        the_res            <- gmse_results$resource[[i]][,2];
        manager_acts       <- gmse_results$action[[i]][,,1];
        resources[i, 1]    <- i;
        observations[i, 1] <- i;
        land_prod          <- gmse_results$land[[i]][,,2];
        land_own           <- gmse_results$land[[i]][,,3];
        for(j in 1:length(res_types)){
            #---- Resource abundance below
            resources[i,j+1] <- sum(the_res == res_types[j]);
            #---- Manager estimates below
            target_row <- which(manager_acts[,1] == -2 & 
                                    manager_acts[,2] == res_types[j]);
            estim_row  <- which(manager_acts[,1] ==  1 & 
                                    manager_acts[,2] == res_types[j]);
            target <- manager_acts[target_row, 5];
            adjusr <- manager_acts[estim_row,  5];
            observations[i,j+1] <- target - adjusr;
            #---- Cost setting below
            costs[c_row, 1]  <- i;
            costs[c_row, 2]  <- res_types[j];
            estim_row    <- which(manager_acts[,1] ==  1 & 
                                  manager_acts[,2] == res_types[j]);
            if(parameters[89] == TRUE){
                costs[c_row, 3] <- manager_acts[estim_row,  8];
            }
            if(parameters[90] == TRUE){
                costs[c_row, 4] <- manager_acts[estim_row,  9];
            }
            if(parameters[91] == TRUE){
                costs[c_row, 5] <- manager_acts[estim_row,  10];
            }
            if(parameters[92] == TRUE){
                costs[c_row, 6] <- manager_acts[estim_row,  11];
            }
            if(parameters[93] == TRUE){
                costs[c_row, 7] <- manager_acts[estim_row,  12];
            }
            if(parameters[94] == TRUE){
                costs[c_row, 8] <- parameters[97];
            }
            if(parameters[95] == TRUE){
                costs[c_row, 9] <- parameters[97];
            }
            costs[c_row, 10] <- manager_acts[estim_row, 13] - parameters[97];
            c_row <- c_row + 1;
            #--- Action setting below
            for(k in 1:length(users)){
                usr_acts <- gmse_results$action[[i]][,,users[k]];
                actions[a_row, 1] <- i;
                actions[a_row, 2] <- users[k];
                actions[a_row, 3] <- res_types[j];
                res_row <- which(usr_acts[,1] == -2 & 
                                     usr_acts[,2] == res_types[j]);
                if(parameters[89] == TRUE){
                    actions[a_row, 4] <- usr_acts[res_row,  8];
                }
                if(parameters[90] == TRUE){
                    actions[a_row, 5] <- usr_acts[res_row,  9];
                }
                if(parameters[91] == TRUE){
                    actions[a_row, 6] <- usr_acts[res_row,  10];
                }
                if(parameters[92] == TRUE){
                    actions[a_row, 7] <- usr_acts[res_row,  11];
                }
                if(parameters[93] == TRUE){
                    actions[a_row, 8] <- usr_acts[res_row,  12];
                }
                if(j == length(res_types)){
                    if(parameters[104] > 0){
                        land_row <- which(usr_acts[,1] == -1);
                        if(parameters[95] > 0){
                            actions[a_row, 9]  <- usr_acts[land_row, 10];
                        }
                        if(parameters[94] > 0){
                            actions[a_row, 10] <- usr_acts[land_row, 11];
                        }
                    }
                    actions[a_row, 11] <- sum(usr_acts[, 13]);
                }
                if(parameters[104] > 0){
                    max_yield <- sum(land_own == users[k]);
                    usr_yield <- sum(land_prod[land_own == users[k]]);
                    actions[a_row, 12] <- 100 * (usr_yield / max_yield);
                }
                a_row <- a_row + 1;
            }
        }
    }
    cost_col <- c("time_step", "resource_type", "scaring", "culling",
                  "castration", "feeding", "helping", "tend_crop", 
                  "kill_crop", "unused");
    colnames(costs)        <- cost_col;
    colnames(resources)    <- res_colna;
    colnames(observations) <- res_colna;
    action_col <- c("time_step", "user_ID", "resource_type", "scaring", 
                    "culling", "castration", "feeding", "helping", "tend_crop",
                    "kill_crop", "unused", "crop_yield", "harvested");
    colnames(actions) <- action_col;
    the_summary <- list(resources    = resources, 
                        observations = observations, 
                        costs        = costs, 
                        actions      = actions);
    return(the_summary);
}

To record kills, I think that the best way is to use the resource mortality adjustment column (at the moment, column 17 in C and 18 in R of the resource array). Mortality as of now is just adjusted to 1 in the event of a kill, and mortality occurs whenever a random probability is greater than or equal to 1. Hence, I can replace the 1 value with the user’s ID (for non-managers, this must be at least 1), and then the resource array will record the ID of the user that killed it at the particular time step. Note that this cannot be done for other adjustments such as growth rate or offspring production because the values are not interpreted as probabilities.

I will do the above tomorrow, which should not take too long. I will then continue work on the gmse_apply function.

Update: 22 SEP 2017

Currently, the gmse() function returns a list that includes all of the data produced by the model, some details of which are required for plotting.

sim_results <- list(resource    = RESOURCE_REC,
                    observation = OBSERVATION_REC,
                    paras       = PARAS_REC,
                    land        = LANDSCAPE_REC,
                    time_taken  = total_time,
                    agents      = AGENT_REC,
                    cost        = COST_REC,
                    action      = ACTION_REC
                );

I think that this list is fine, perhaps necessary, to keep, but the ConFooBio group has also concluded that there should be some easier to understand summary of the data. I propose that some function written, a gmse_summary(), that summarises the results in an easier to understand way would be useful. The function could just be run as below.

sim         <- gmse();
sim_summary <- gmse_summary(sim);

The output of gmse_summary() should be a list of all of the relevant information that a user might want to plot or analyse. It should include the following list elements.

  1. sim_summary$resources
  2. sim_summary$observations
  3. sim_summary$costs
  4. sim_summary$actions

More might be needed, but the above should be a good starting point that will provide four clear data tables for the user. The tables will look like the below.

1. Resource abundances over time

time_step abundance
1 100
2 104
99 116
100 108

In the above, only the resource abundance is reported to the software user, though it might also be useful to have additional columns as well eventually.

2. Observation estimates of abundance over time

time_step estimated_abundance
1 102
2 101
99 121
100 112

In the above, only the estimate from the observaiton submodel is reported to the software user. Additional columns might also be useful for things like confidence intervals, though for now I’m not sure if this is needed.

3. Costs set in each time step

time_step manager scaring castration culling feeding helping unused
1 0 40 NA 60 NA NA 0
2 0 36 NA 62 NA NA 2
99 0 0 NA 100 NA NA 0
100 0 3 NA 97 NA NA 0

In the above, the manager number is always 0 because this is the number of the agent that has that role in GMSE. All impossible actions (specificed by the simulation) are labelled NA, while the possible scaring and culling actions are given values that correspond to the cost of each action for users in each time step. Hence the table summarises policy for each time step in a way that software users can interpret more cleanly.

4. Actions in each time step

time_step user scaring castration culling feeding helping tend_crop kill_crop unused crop_yield harvested
1 1 50 NA 50 NA NA NA NA 0 90 12
1 2 59 NA 40 NA NA NA NA 1 92 9
1 3 100 NA 0 NA NA NA NA 0 89 0
2 1 44 NA 66 NA NA NA NA 0 88 16
2 2 52 NA 48 NA NA NA NA 0 94 12
2 3 98 NA 0 NA NA NA NA 2 90 0
99 1 36 NA 63 NA NA NA NA 1 79 20
99 2 40 NA 60 NA NA NA NA 0 83 18
99 3 28 NA 72 NA NA NA NA 0 88 12
100 1 35 NA 62 NA NA NA NA 3 82 18
100 2 37 NA 63 NA NA NA NA 0 84 22
100 3 23 NA 77 NA NA NA NA 0 84 13

The above action table has more rows in it than the cost table because a row is needed for each user in each time step. This gives the software user full access to each individual user’s actions, and their results. Note that as above, castration, feeding, and helping, are not options. Additionally, in this hypothetical simulation, tending or killing crops are not options, so no actions are performed. Users divide their budget between scaring and culling in each time step. The last two columns also give useful information to the software user. The first is crop yield on the user’s owned land (should probably be NA if land_ownership = FALSE), which will reflect the percentage of the total possible yield (or maybe raw yield?) for each user – hencing allowing the table to direclty correlate actions with yield. The last column is the number of resources ‘harvested’, which I think should count successful ‘kills’ (rather than just actions devoted to culling). The realised culling might be lower than the actions devoted to culling, for example, if not enough resources are actually on the user’s land to cull. Additional statistics for each user could be added in as columns, but this seems a good place to start. This gmse_summary producing a list of the above four tables will be included in the next version of GMSE, along with the new plotting function highlighting the conflict itself, and the gmse_apply function discussed on 6 SEPT.

Update: 15 SEP 2017

Continued progress has been made on slides for an upcoming talk.

Update: 13 SEP 2017

I will be giving a talk on 19 September 2017 for the Mathematics and Statistics Group at the University of Stirling on GMSE as a general tool for management strategy evaluation. Slides for this talk will be available on GitHub.

Update: 8 SEP 2017

The alternative approach from Wednesday is being implemented smoothly. Passing user-defined functions in a modular way is possible, but inputs and outputs need to be carefully considered within gmse_apply(). The objective is to make things as easy and flexible as possible for the user, while also making sure that the function runs efficiently.

Update: 6 SEP 2017

A modular function for modellers

I am beginning work on a gmse_apply() function, which will improve the modularity of GMSE for developers. The goal behind this function will be to provide a simple tool for allowing developers to use their own resource and observation models and, with the correct inputs, take advantage of the manager and user functions. Hence, simple resource and observation models will be possible, but the flexibility of GMSE should be retained as much as possible. A few starting points include the following:

Inputs and outputs of different functions will then include the following:

As an alternative, at least to the implementation, I think that the call could be made at the level of the individual resource() and observation() R functions. This was kind of always the plan, but there’s a semi-dirty way to mix numerical resource and observation models with the full individual-based manager and user models. This can be done by adding a model option to be user defined through an if( is.function(model) == TRUE ) in the resource.R function. If the condition is satisfied, then resource() will shift to the user generated model. This can actually be done for all of the submodels very easily.

This alternative might be a better way to go. The aforementioned ‘dirty’ part of the technique might be to check to see if the output is in the correct form, then, if only a vector is returned – turn it into the correct form by making a data frame that has the same number of rows by calling make_resource. The type 1 values could correspond to vector elements. Admittedly, this could get slow for huge population sizes, but population sizes would have to be massive for R to slow down from simplying making a matrix with a lot of rows. In any case, it would at least standardise the input and output for the user of gmse_apply in a way that plays nice with everything else in GMSE.

Similarly, the observation function could also call make_resource if a vector is returned (since individual variation wouldn’t be relevant in the numerical model).

With this alternative approach, no changes to the C code need to be made – the inputs and outputs just need to be tweaked into a standardised way when a vector or scalar is returned from any user-defined model (small detail – population size needs to be an integer). This can be an option later for the user and manager models – though I’m not sure how this would work, exactly. A benefit here is that some parts of the model could concievably individual-based, with others being numerical – the trade-off being the requirement for discrete resource numbers and a very small amount of slowdown (which will almost certainly not be noticable for any resaonable model).

The gmse_apply function would then initialise a very small number of agents, and a small landscape (unless otherwise specified) in every run. The possibility of passing more options could be applied with a simple .... This would also require a sub-function build_para_vec, which would be used for the sole purpose of taking the list of options included (same as in gmse()) and passing it to the sub-function, with any functions not passed being assumed defaults (and most would be irrelevant). So the default function should then look like

gmse_apply(resource_function = "IBM", observation_function = "IBM", manager_function = "IBM", user_function = "IBM", res_number = 100, ...);

I think at least an initial population needs to be specified, but everything else can be left up to the user, with the elipses passing to the function building the parameter vector (which can also be called by gmse(), replacing some clutter). Overall, the function will run without any input if none is specified, defaulting to an IBM with a population size of 100 for one generation. All other options, including non-standard functions, are left to the user.

Additional thoughts

Working this through, I’m slightly apprehensive about the motivation for including gmse_apply function. Once you strip the mechanistic approach from the resource and observation models, all you really have are two values: (1) the population abundance or density and (2) the estimate of the population size or density. Once you include the manage_target into gmse_apply (necessary, I believe), then the genetic algorithm is really just a fancy way of getting the difference between the population estimate and the target size, and then setting a number of culling actions acceptable for users. Users then cull as much as possible because they’re assumed to want to use the resource as much as possible. Of course, we can consider other parameters that affect user actions (e.g., maximum catch, effort), but if we’re interested in learning about how these concepts affect harvesting in theory, then they can and should probably be studied using a simpler model. The real point of the genetic algorithm is that it allows for complex, multi-variable goal driven behaviour, as might occur given indirect effects (e.g., organisms on crop yield) or multiple options (e.g., culling versus scaring or growing) and spatial complexities. There seems little to be gained by calling the genetic algorithm to tell users to cull as much as possible, which can be done with a (very) simple function.

Update: 28 AUG 2017

I have finally fixed the annoyance in the shiny app of GMSE that caused the bottom of the browser to black, hence making it difficult to set parameter values in some tabs.

Additionally, by hovering over the different options in the application, the software user can now see a brief description of what each option does in the simulation.

Update: 23 AUG 2017

I am experimenting with ways of demonstrating the conflict between what a manager incentivises, and what the users actually do, in GMSE. Below are some plots that show this for a few sample simulations. The five panels in each plot correspond to the five possible actions where policy is set. Policy set by the manager is shown with the black solid line, with the thin coloured lines reflecting individual user effort expended into each action.

The right axis is fairly easy to interpret – it’s just the percentage of the user’s total budget devoted to a particular action (note, this is not necessarily the number of actions a user performs because different actions can cost different amounts – hence the term ‘effort’).

The left axis is a bit trickier – it’s how permissive of an action the manager is in practice. High values correspond to an action being highly permitted by the manager (i.e., the manager invests no effort in making these actions costly), whereas low values correspond to an action being less permitted (i.e., the manger invests highly in making these actions costly for users).

The end result is that the lines indicating manger permissiveness are typically correlated with user effort towards any particular action. In the first example below, this is true for scaring and culling (as the manager becomes more permissive of these actions, users tend to take advantage and spend more effort doing them). Note that users do not feed because they have nothing to gain by feeding the resources, even though the manager is usually permits feeding (around generation 75, the population started going way over the manager’s target).

In the second example (below), the option for scaring has been removed. Because users want resources off of their land, the only option is to cull, so users will cull as much as permitted even though the manager is incentivising them not to as much as possible.

The below is a final example where all actions except helping are possible options.

Update: 17 AUG 2017

While playing with the proto-type GUI, I discovered a minor bug in the plotting function, which I fixed so that the plot doesn’t make an error. I have also updated the list of contributors in the description file, and the list of recommended packages (shiny packages for the new gmse_gui function).

I have now also added a new release version 0.2.2.8 to GitHub. This version requires three additional libraries:

  1. shiny
  2. shinydashboard
  3. shinyjs

The above three libraries will be imported as dependencies (or should be) in the new version of GMSE.

Update: 16 AUG 2017

A proto-type GUI for the GMSE package is now up on shiny. I’m going to make this look nicer with a CSS style-sheet, but for now this gets the job done.

Update: 15 AUG 2017

I am currently trying to get a handle on creating a GMSE GUI in shiny by looking at the elementR package. The authors of this package, to get their very impressive shiny application running, need to nest multiple sub-functions inside the long (10000+ line) runElementR function. GMSE won’t need to have this much code for the user interface – I have figured out roughly how to make the input look good and functional in a browser, but a tricky part will be to link that input with the gmse() function paramters, then run things.

Update: 7 AUG 2017

In writing a draft manuscript, the term ‘stakeholder’ is being applied to mean both managers and users. This differs from the model itself and therefore in the use of GMSE. To resolve this, I think that it would be worthwhile to change the documentation to match the manuscript. But I don’t want to change the input stakeholder for any existing users of GMSE that might be inconvenienced. Instead, I think just defining stakeholder to be the number of managers and users could be fine by changing stakeholder <- stakeholder + 1 in the gmse() function. This might need revisiting in later versions (if we wanted to have multiple managers and stakeholders), but such a change would be likely part of a much bigger release in which major (and potentially inconvenient) changes would be unavoidable.

Update: 7 JUL 2017

Following the release of GMSE v0.2.2.7 on CRAN, with extended documentation, as introduced on the ConFooBio blog and my blog, I shift my attention to the vignette. The vignette in development will eventually be packaged into a futre version of GMSE, then submitted as a separate methods paper.

Update: 3 JUL 2017

GMSE v0.2.2.5 is now up on CRAN (13:32 GMT), and my hope is that v0.2.2.7 will replace it soon following some clarification of the documentation. I am avoiding a public announcement of the package on CRAN until I receive confirmation that the new version is accepted.

New logo





Update: 27 JUN 2017

v0.2.2.0: Bug fixes, new feature

While beginning to write up the vignette, I worked out a bug that applied to simulations in which stakeholder number was greater than 4 (tl;dr, these stakeholders were not acting according to their interests). This was fixed with commit 6ae58ec374f48464a0706fcf585dd5f1534e4511, and in fixing this I made the distinction between hunting type scenarios (where stakeholders have an interested in directly using resources) and farmer type scenarios (where stakeholders care about their land, and resources only indirectly because the resources affect the land).

I also added a new feature allowing the software user to adjust the proportion of the landscape that is public_land (commit f88545569a4c3e39906291759f376403b8e665f3). This can be interpreted as land that is unmanaged and therefore available for resources to use without fear of scaring or culling when land_ownership = TRUE. Also, now when land_ownership = FALSE, all land is considered public and this is now reflected accurately with the plots.

I have also opted to change the default res_min_age, the age at which resources can be seen, to zero instead of one. This results in plots that are above the defined carrying capacity sometimes because carrying capacity is applied to adults, not juveniles when res_death_K is set. The result is a total carrying capacity of (res_death_K + (1 * lambda)), which accounts for birth of juveniles in a population at carrying capacity.

Update: 26 JUN 2017

The fixed_recapt is now running as intended as of commit ad9d9e10ead215a703f9accdfbd149d35b350567.

New issues – proposed enhancements for the future GMSE

Before I lose track of all the proposed ideas for improving upon the GMSE package, I want to get all of them up as issues on GitHub. For completeness, I have also included the unresolved Issue 9. I will add to the below to form an organised list of future ideas to work on, all laid out as enhancement issues on GitHub. Anyone should be able to add to this list, or comment on the issues (e.g., if they would be especially useful ones to resolve).

Issue 9: Observation Error

It would be useful to incorporate observation error into the simulations more directly. This could be affected by one or more variables attached to each agent, which would potentially cause the mis-identification (e.g., incorrect return of seeme) or mis-labelling (incorrect traits read into the observation array) of resources. This could be done in either of two ways:

  1. Cause the errors to happen in ‘real time’ – that is, while the observations are happening in the simulation. This would probably be slightly inefficient, but have the benefit of being able to assign errors specifically to agents more directly.

  2. Wait until the resource_array is marked in the observation function, then introduce errors to the array itself, including errors to whether or not resources are recorded and what their trait values are. These errors would then be read into the obs_array, which is returned by the function.

Issue 30: Manager assumptions about user actions

It would be useful to allow for simulations to dynamically adjust the caution that the manager has when changing actions. At the moment, managers always assume that some specified number of actions will be performed by users, and this number does not change over the course of the simulation. But managers might be able to use the history of user actions to learn to be more or less cautious when setting new policy.

Issue 31: Modify manager’s predicted effects

Currently, the predicted effects of a manager’s actions are set to values that, heuristically, appear to work in the genetic algorithm. This is adjusted with the manager_sense parameter, which has a default of 0.1, such that the manager assumes that if they set costs to increase culling by 100 percent, it will actually only increase by 10 percent (as not all users are going to necessarily cull if given the opportunity). Like real-world management, this is heuristic and results in uncertainty, but future versions of GMSE could dynamically modify this value during the course of the simulation based on real knowledge of how policy changes have affected user actions in previous time steps.

Issue 32: Long-term histories affect genetic algorithm

Currently, only the history of interactions from the previous time step directly affects the genetic algorithm for stakeholders and managers. For managers especially, this could be made a bit more nuanced. The entire history of total actions and resource dynamics is recorded, and this could easily be made available (e.g., in PARAS_REC) for managers to make decisions. Incorporating these data into the genetic algorithm, and therefore into agent decision making, could be tricky, but one simple example of this could be having managers use the per-time step mean number of stakeholder actions in the last 2-3 time steps to predict future user actions with a bit more inertia. Managers could also use stakeholder action history from earlier time steps, but weighting each by how long ago they occurred.

Issue 33: Non fixed mark-recapture sampling number

Currently, to simulate mark-recapture observation and analysis, values for fixed_mark and fixed_recapt need to be specified in GMSE, and the manager would have exactly these numbers of marks and recaptures in each generation, respectively. It would also be useful to, instead of specifying exact numbers, to have the manager search a general area, then mark all resources in that area. Next, the manager could search again and recapture, so the exact number is not always set and the observation process probably mimics more closely what happens in the field. This type of sampling is actually already available (observe_type = 0), so I would just need to add some code to have managers interpret some observations as marks and others as recaptures.

Issue 34: Resource interactions

Currently, more than one resource type is permitted, but this is not offered/visible to users of the software. A next major version of GMSE could have multiple resource types with resources actually interacting with one another (could borrow future development code from EcoEdu). Simple interactions could include competition and predator-prey functions in the resource model. The code is also already ready for managers and users to consider multiple resources in making policy decisions and actions, respectively.

Issue 35: Stakeholder lobbying

Currently, GMSE assumes that stakeholders have a negative relationship with resources – they either want to hunt them or scare them from their land. Future versions of GMSE should include an option for a stakeholder type (e.g., activist) that lobbies the manager to adjust the manager’s utilities, effectively increasing or decreasing the target. The data structure to do this already exists, it’s just a matter of figuring out how best to enact it and why. For example, would adding this have any actual effect that differs from just assuming that the manager is being lobbied by conservationists continuously, and that their target is a reflection of that.

Update: 25 JUN 2017

I need to double check that fixed_recapt is doing what I said it did on 23 JUN. My concern is that it is not being implemented properly in the observation model – there needs to be a difference between the first and second times_obs, or times_obs might need to be redefined for the first and second rounds of observation. It looks like the observation model is just doing times_obs observations with the same number of samples in each one.

Update: 23 JUN 2017

A better mark-recapture observation model estimator

Setting the parameters for the mark-recapture observation model (observe_type = 1) was confusing, so much so that I had to remember how to do it. In v0.2.1.3, I have fixed this so that the sampling is clearer. Rather than having a fixed_observe argument in gmse(), I’ve included a fixed_mark and fixed_recapt; arguments that only apply when observe_type = 1. Under these conditions, times_observe is ignored and fixed_mark defines how many resources will be marked in each time step; fixed_recapt defines how many recaptures will be made. If the value of fixed_mark or fixed_recapt is greater than the actual size of the resource popuation, then all resources in the population will be sampled.

Get a better confidence interval for the density estimator

The density estimator is giving too few Type 1 errors because of times_observe > 1. This doesn’t affect anything but the visualisation, since managers don’t make decisions based on confidence intervals. Still, fixing the CIs would be a good idea. The CIs should also be correct when times_observe = 1. Really, the times_observe > 1 is simulating a weird case in which the central limit theorem would apply to the times_observe estimates, and hence the mean estimate among time times observed should be normally distributed around the mean.

Double-check for memory leaks with Valgrind

Running Valgrind on the R package GMSE revealed no memory leaks.

==15438== 
==15438== HEAP SUMMARY:
==15438==     in use at exit: 67,722,174 bytes in 20,290 blocks
==15438==   total heap usage: 6,398,005 allocs, 6,377,715 frees, 1,211,596,543 bytes allocated
==15438== 
==15438== LEAK SUMMARY:
==15438==    definitely lost: 0 bytes in 0 blocks
==15438==    indirectly lost: 0 bytes in 0 blocks
==15438==      possibly lost: 0 bytes in 0 blocks
==15438==    still reachable: 67,722,174 bytes in 20,290 blocks
==15438==         suppressed: 0 bytes in 0 blocks
==15438== Reachable blocks (those to which a pointer was found) are not shown.
==15438== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15438== 
==15438== For counts of detected and suppressed errors, rerun with: -v
==15438== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I have changed some default parameters so that when I write up the default example, it provides a description that will be useful to new users.

Update: 20 JUN 2017

I am in the process of organising the vignette. As I’ve done in a previous manuscript, I’ll start with notes and an outline and update now on the Rmarkdown file. The vignette will therefore evolve and be tracked through git, just like the code.

Also, while compensation payements are not yet included as a feature of GMSE, I think that an option to include them should be relatively easy to implement through the COST and ACTION arrays manager layers where column 1 equals -1 (landscape). The killem and helpem columns remove and increase crops, respectively, but the additional available columns could be used to track compensation owed (to stakeholders) and paid (by managers) – both at a cost, of course.

Update: 16 JUN 2017

Unit tests are written

Unit tests for all sub-functions of the model are written, with the exception of functions used for plotting, which I don’t think are necessary to unit test because errors to the plot will be very obvious in development. Everything now passes CRAN checks except the licensing, which we’ll need to agree on at some point. As of Monday, I will be able to start on the vignette (manuscript).

Update: 15 JUN 2017

Unit testing for long-term code maintenance

To ensure that the gmse package functions as intended in the long term, I am writing an extensive battery of unit tests that will need to be passed to ensure that any new features do not introduce bugs or break existing functions. To do this, I will use the testthat package in R and follow the advice in Hadley Wickham’s chapter on unit testing R code. I’ve done this already for the gamesGA package, which is now on CRAN, though the gmse package will require much more tests simply because there are many more functions to test.

The unit testing already helped by identifying a potential bug later on down the line when initialising cost arrays for simulations with more than two resources (see commit 65088054481266e67f06513dc368c515e4a9fed0. Unit tests for all initialisation functions except landscape functions are now complete. Next, I will need to do landscape functions and perhaps some (but probably not all) functions associated with plotting, then the four main GMSE model functions.

Update: 14 JUN 2017

The density-based observation estimates were giving incorrect values. Looking into this, the reason for the error was because I was useing confidence intervals for proportions (e.g., the proportion of cells with resources on them) rather than counts (which will be, we assume, have a Poisson error structure). I have replaced the previous estimate of confidence intervals around local density with a Poisson estimate

\[ \hat{\lambda} \pm 1.96 \times \sqrt \frac{\hat{\lambda}}{vision^2} \]

In the above \(\hat{\lambda}\) is the estimated local density and \(vision\) refers to the total number of cells that managers can see.

With this new correction, and also fairly major bug fixes having to do with fixing an error in landscape actions causing an infinite loop (commit 310fb76b7e3b3499ab74e2f94c61c3276f3c4118 and fixing user actions to tend crops or kill crops appropriately (commit 424fc2eb4f6274763f5ead0fc48ad5dd7f68c422), I am now pushing to master and releasing v0.2.1.1, which effectively patches some major issues and improves plotting (including a new legend for costs and actions).

Update: 13 JUN 2017

Bug fix to the user function

An erroneous condition in a while loop was causing an infinite loop when manager budgets were very high and user actions were not restricted to landscape that they owned. This has been fixed on the development branch but not yet pushed to the master branch.

Update: 9 JUN 2017

Some initial notes: GMSE (beta) package v0.2.1.0

A beta version of GMSE is now available, and is ready to be experimented with and tested as an R package. To download and begin using GMSE, it is necessary to first download the devtools library.

install.packages("devtools")
library(devtools)

Use install_github to install using devtools.

install_github("bradduthie/gmse")

From here, it is possible to run GMSE simulations using the gmse() function. For help using this function, all documentation can be accessed by simply calling the help files.

help(gmse)

The documentation contains a basic description of the gmse() function (the only one that is needed to run simulations – subfunctions for resource, observation, manager, and user models are all accessible as independent R functions, but are not very useful at the moment without the initialisation in the main function – nevertheless, the documentation for these can be accessed with help(resource), help(observation), help(manager), and help(user)). It also contains arguments for most of the variables that might be usefully changed to simulate different types of management scenarios; additional options are not shown for the moment either because more coding is needed to make them useful or because I don’t expect they will be needed. The explanations of the arguments are detailed, along with documentation explaining the (extensive) amount of data that is returned after running a simulation. To get started though, the default simulation can be run simply.

sim <- gmse();

Parameter values can then be adjust by varying the options in the gmse() function.

R vignette, and the beginning of a methods paper

I will soon begin work on an R vignette, which is essentially a long form documentation that can also be a manuscript to submit to a journal.

Add some formal testing functions for future development

I will also need to add some formal R tests, which are basically ways of automating the kind of testing that is done continually while writing the code. The idea with formal unit tests is to have a process that checks to see if the code breaks when a new feature (and therefore new coded) is added. Since the results of the simulation are stochastic, I think the best way to test is to set a seed and use default parameter values, then check to make sure that the results match the expect_equal_to_reference() function in devtools. It might be useful to do this for each of the resource(), observation(), manager(), user(), and gmse() models – perhaps testing the ith time step for each of the sub-functions, but then the gmse() function also as a whole (perhaps using just 10 time steps would be sufficient for this instead of a default of 100).

Update: 8 JUN 2017

Introduce Issue #29: No edge effect causes crash

When edge_effect = 0, and therefore nothing happens when resources and agents move off of the edge of the landscape, R crashes. This is almost certainly due to some sort of memory leak. This is a low priority issue at the moment because I cannot think of a reason why anyone explicitly want the model to just ignore resources moving off of the landscape if someone wants something other than a torus (edge_effect = 1), such as a reflective edge or emigration upon leaving the landscape, this should be explicitly coded into the edge_effect function in utilities.c. Until someone asks for it, I’ll stick with a torus.

New (draft) documentation for the gmse() function


DESCRIPTION: GMSE simulation

The gmse function is the the primary function to call to run a simulation. It calls other functions that run resource, observation, management, and user models in each time step. Hence while individual models can be used on their own, gmse() is really all that is needed to run a simulation.

Returns: A large list is returned that includes detailed simulation histories for the resource, observation, management, and user models. This list includes eight elements, most of which are themselves complex lists of arrays: (1) A list of length time_max in which each element is an array of resources as they exist at the end of each time step. Resource arrays include all resources and their attributes (e.g., locations, growth rates, offspring, how they are affected by stakeholders, etc.). (2) A list of length time_max in which each element is an array of resource observations from the observation model. Observation arrays are similar to resource arrays, except that they can have a smaller number of rows if not all resources are observed, and they have additional columns that show the history of each resource being observed over the course of times_observe observations in the observation model. (3) A 2D array showing parameter values at each time step (unique rows); most of these values are static but some (e.g., resource number) change over time steps. (4) A list of length time_max in which each element is an array of the landscape that identifies proportion of crop production per cell. This allows for looking at where crop production is increased or decreased over time steps as a consequence of resource and stakeholder actions. (5) The total time the simulation took to run (not counting plotting time). (6) A 2D array of agents and their traits. (7) A list of length time_max in which each element is a 3D array of the costs of performing each action for managers and stakeholders (each agent gets its own array layer with an identical number of rows and columns); the change in costs of particular actions can therefore be be examined over time. (8) A list of length time_max in which each element is a 3D array of the actions performed by managers and stakeholders (each agent gets its own array layer with an identical number of rows and columns); the change in actions of agents can therefore be examined over time. Because the above lists cannot possibly be interpreted by eye all at once in the simulation output, it is highly recommended that the contents of a simulation be stored and interprted individually if need be; alternativley, simulations can more easily be interpreted through plots when plotting = TRUE.


Update: 7 JUN 2017

GMSE is now a package

I have now made GMSE package, including documentation for all of the R code except the main gmse() function, which I will complete soon. The package should be available to use as early as tomorrow evening. There are still some additional tweaks that I will continue to make, particularly to the plotting, and I want to add some tests to the model as well. Uploading to CRAN will be done after some beta testing – I’ll mainly follow Hadley Wickam’s book for advice here.

Update: 6 JUN 2017

Progess on new features

A new six by two plot for for case 2 and case 3 observation functions has been added. Additionally, G-MSE now records a new array PARAS_REC, which holds parameters each generation, including observation estimates and confidence intervals. The PARAS_REC will allow me to simplify the plotting functions because the relevant data will be calculated in C on the fly and neatly held in PARAS_REC. Additionally, I will add in the total actions for all stake-holders as seven elements in paras (five actions on resource type1 and two landscape actions), and also the cost of each action. This will not only make all of the plotting code much simpler, it will also allow the potential for the history of actions and costs to affect manager and stake-holder actions in future software development.

It’s always tempting to push the model a bit further with new features or more efficient algorithms and code, but I think that now is the time to turn G-MSE into a package and send it off to colleagues to experiment with, which I will do tomorrow. Nevertheless, I want to hit a few points that will be very useful for future G-MSE features:

Each of the above will take a bit of planning in addition to coding. I’m not sure if they would also require the addition of new data arrays, but I think they are worth considering.

Update: 5 JUN 2017

Resolve Issue #27

The start column (and, because observation column number equals times observed, the end column) was specified incorrectly in the density and mark-recapture estimates in R. This meant that three columns were sampled with values all equaling zero, and three columns were not sampled with values equaling one and zero, to estimate population size. Hence, this produced an underestimate of population size in plots. The issue has now been resolved.

Additional user options

Additional user options now include the following (defaults shown):

 stakeholders   = 4,     # Number of stake-holders
 manage_caution = 1,     # Caution rate of the manager
 land_ownership = FALSE, # Do stake-holders act on their land?
 manage_freq    = 1      # Frequency that management enacted

Within the week the following features will be added:

The above points should not take more than a day to complete, at most, and upon completing them I will then make G-MSE into a package that can be downloaded using devtools from GitHub. More long-term, I want to do the following, but this might not happen until after a draft of the methods paper is written.

Update: 4 JUN 2017

Introduce Issue #27: Observation estimate understimates real population size

The case 0 observation type is consistently underestimating the true population size. This could be caused by a calculation that assumes that the size of the sampled area is larger than it actually is, or that the size of the landscape is smaller than it actually is; either way, the observation.c file needs to be double-checked and potentially debugged.

Playing around with parameter values

I have made some of the simulation inputs easier to work with on the user end and played around with different variable combinations on a relatively low-power laptop (Lenovo X201 Thinkpad). The simulation is a bit slower than desirable, but not so slow as to cause major issues (takes about a minute ore so to simulate a fairly big population of ca 200 with 12 stake-holders).

Introduce Issue #28: More stake-holders have fewer actions

For some reason, having more stakeholders appears to lead to less culling of resources even when all of them are attempting to do it. If there are more stakeholders to act, then actions should happen more often because each has the same budget.

Note that this appears to even occur when users are not restricted to their landscape; it might be something to do with double killing? There just aren’t enough resources dying in the model to match with the actions.

Resoved Issue #28:

Resolved – just input the stakeholder number incorrectly. See commit 6b63439b384cab90680f6a36a79f2c94eba46c45

Update: 31 MAY 2017

Code is finally stable

I have now deliberately tried to crash G-MSE in multiple ways – the goal being to throw parameter combinations or options at the model in such a way as to cause the model to not work accurately. At first, I was successful at this when I forced managers to only allow for one management option (culling, scaring, etc.). After much debugging and testing, I have fixed this so that I am confident that the code runs as advertised, for the moment. Features that have now been included to G-MSE as a consequence of this process include the following.

Finally shift to Friday’s goals

I have started to change some parameter inputs to make it easier to play with parameters, but I’m going to do more of this now that I’m much more confident in the G-MSE software. Once this is done, I will make the whole thing an Rpackage that can be downloaded using github developer tools, and I’ll add documentation before sending instructions around. Additionally, I will then start to write up some sample case studies (e.g., hunters on a public landscape or farmers trying to maximise yield) to show what G-MSE can do. Writing these out into an Rmarkdown file, I’ll have the start of a methods paper introducing the software.

Update: 30 MAY 2017

Minor debugging

There are still a few minor bugs to work out, some of which I was able to take care of (see commit history if need be). I’m now trying to give the option to restric the number of possible actions, but restricting them seems to still produce some errors – namely, the genetic algorithm for managers doesn’t seem to be responding appropriately.

Update: 29 MAY 2017

Landscape actions added to the user.c file

I have added the function act_on_landscape in user.c so that users can perform actions on the landscape. The only two actions that the users can do, at the moment are killem and feedem which effectively kill the crop yield and increase it, respectively. All other action columns do nothing. I’ve also added a new element to paras that modifies how much a user can increase crop yield (previously, I was allowing users to double crop production on a cell only). Testing confirmst that when users value crop yield and can greatly increase it by feedem, they will find this option and do so to increase crop yield.

Resolution of Issue #21: paras now used everywhere

I have now cleaned the code so that paras is effectively used across all G-MSE functions. This effectively resolves Issue #21 and makes the code more readable. Likewise, I have also cleaned up the functions in a few places and introduced get_rand_int for easier sampling.

Next steps: Making it easier for users

The next steps as outlined on Friday are to do the following:

I don’t think that this will be too time-consuming because there is likely to be very little trouble-shooting and debugging for the above. Once all of this is done though, I will want to add the browser interface for G-MSE. This will be challinging, but the recently developed elementR package can provide some inspiration for working with shiny in a package that requires a lot of options to be set.

Update: 26 MAY 2017

Rewritten do_actions successful

I am largely satisfied with a rewrite of the do_actions function, which affects the way that users perform actions on resources by changing the rules to make actions simultaneous instead of sequential by user. Instead of having one user perform actions on resources, then another user perform actions, etc., the new do_actions program instead just grabs the ACTIONS array after the genetic algorithm is called for all users and randomly performs actions until no more actions exist. In other words, the order in which the actions of all users are performed is effectively randomised so that, for example, one user does not have an advantage of acting last and therefore moving all of their resources to a neighbour’s territory after their neighbour has performed all of their actions in a time step. This implementation is probably slightly less efficient, but probably not too much.

Landscape actions are not implemented at the moment, and will need to be rewritten, though this should be considerably easier as their are fewer actions to perform and the actions occur directly on the landscape. The existing landscape_actions (still not deleted from user.c) might be easy to edit even. Once this is done, the whole model should be in place without any major issues; I’m not sure if Issue #26 is actually a bug, or just a consequence to be expected from a low-seeded genetic algorithm, but the algorithm works either way, and not having a seed is probably always a bad idea.

There are a few things that are definitely left to do.

One weird thing to address, which I actually don’t think is a bug: Sometimes when resource movement is very low (ca 1), the resources become highly autocorrelated on the landcape. I hypothesise that this is caused by some stake-holders doing a relatively poor job of killing resources at some point in the past and leading to a threshold of population growth that is localised and out of control. This happens often, but not consistently to the same agent, and sometimes to more than one agent in a simulation; it would be good to check to make sure that this is the correct interpretation of the patterns from the model.

One potential idea is to also give the manager a bit more information, in addition to allowing them to see growth rates of species empirically measured, to also see the enactment of policy in relation to how it is set. For example, if a manager sets a cost of 10 for killing, does that over or undershoot the target – the degree to which the target is over or undershot could be multiplied by the existing value to get a clearer prediction (e.g., if the manager wants 50 resources to die, but the way that those resources are distributed only allows 30 deaths because some resources are autocorrelated among different user’s land).

Update: 25 MAY 2017

Resolved Issue #22

I’ve finally tracked down the bug that causes multiple resource types to crash in the user function. The error was in the land_to_counts, which had conditions in the main while loop that couldn’t be met and were unnecesary (might have to add one more at some point if we want multiple landscape layers to work – later though). I’ve removed this condition, and also initiased the COST and ACTION arrays without the extraneous rows caused by landscape levels being repeated for resource types; if nothing else, these were distracting, but I could see them causing bugs later. As of commit 102018fc0457e510f87e812a97681860bed1a382, G-MSE should be, in theory, functional with multiple resources, though I still have the rewrite of do_actions to do.

Update: 24 MAY 2017

Major test fails – rewrite of do_actions needed

The user function has a major bug that is causing strange things to happen to resources. Consistently, resources are piling up on one or another user’s land – I’ve found little rhyme or reason why, but it is caused at least partly by the location-specific nature of user actions (in other words, once u_loc = 0 and users can affect any resource on the landscape, no spatial pattern exists). Note that this happens even when users cannot move resources (e.g., only kill on their land), so it’s not just that the last agent to act clears all the resources from their landscape. The resources always seem to collect on one owner’s land, and it’s not consistent whose (nor is there any seeming connection from the spatial distribution and the agent actions).

This bug gives me an excuse to re-write do_actions, which I probably needed to do anyway because Issue #22 is still unresolved. A rewrite of do_actions and everything down stream might fix the resource type specification error. As of now the do_actions function is called for each agent sequentially, and each agent then performs their actions on each type of resource and landscape level by moving through rows of the ACTION array (with error if there is more than one type of resource). Hence, one user does all of their business, then another, and so on.

It would be much better to do this all simultaneously, and it shouldn’t take too much computation time or coding time. Instead of going through agents sequentially, the idea is to copy the entire ACTION array (all agents having gone through their genetic algorithms) into a function. Next, calculate the total number of actions to be performed. Then, sample a random row, column, and layer of ACTION, which will be associated with a randomly selected agent. The lucky winner will then randomly sample rows of the RESOURCE array until they find one that they can affect (e.g., is on their land, has not been killed, and is of the correct type); if they exhaustively search all resources but cannot find one to affect, then they don’t perform the action – note the element in the copied ACTION array should not necessarily be set to zero because another agent might subsequently kick a resource onto their land to kill; it should decrement the action by one though (else a clear risk of infinite loop). Landscape actions can proceed the same way; the random selection simulates people doing things simultaneously over the course of a season.

Update: 23 MAY 2017

Introduce Issue #26: Genetic algorithm seed reliance

For some reason, the initial seed of the genetic algorithm appears to be having an effect that it shouldn’t. When there are no individuals seeded in the genetic algorithm from the previous generation, the agents appear to go under-budget. It’s not clear why this is the case. Oddly, managers appear to use a budget of 250 despite it being set at 300 given any seed greater than zero. When the seed is zero, the budget for setting costs drops to ca 100 for reasons that are not at all clear to me. For stake-holders, the cost drops to a fraction of its set budget (about a 30th of it). Yet, the stake-holder cost is still too low even when a seed of 20 is set; most stake-holders spend ca 1/6 of their budget when they should be forced to spend all of it.

Stake-holders are helping resources when they should not

Stake-holders are helping resources. This was caused by some issues in resource_actions in the user model which has now been partially resolved in commit f1ce95e092739e6e53df05b326c491d917679eb9. Essentially, resources were being helped out too much (i.e., growth rates went from 0.05 to 1 when helping – changed now to 0.05 times two – increasing birth rate 100 percent), and sometimes being helped out even after having been killed or castrated. This is still happening, as is evident when looking at RESOURCE_REC. Resolving it is priority one.

Update: 22 MAY 2017

Cleanup and toward resolving Issue #21

I have done some more clean-up of the manager.c file, mainly reducing the number of arguments passed to functions using the paras vector. I’ve also removed some more hard-coded values, particularly by defining columns for things like resource types by holding column numbers in paras. I’m not sure whether I want to do this for the action and cost array cols 7-12 in set_action_costs yet. It might be a good idea. One thing to keep an eye on is the para[66] value, which now is just the number of resources (also 1 minus the lookup table rows). It holds together for now, and nicely can be affected globally, but I need to pay attention to how its affecting management in the set_action_costs function.

Update: 18 MAY 2017

Note on managing-observing trade-off

We could introduce a trade-off between observation and allocating costs for the manager in G-MSE, as in Milner-Gulland (2011). Running this through the genetic algorithm could be a challenge – somehow the observation intensity would need to be put into the fitness function. Storing it would be fairly trivial – could just use bankem, but converting observation time to manager fitness would require somemore thought.

Introduce Issue #25: Agent’s action error

For some reason, some initial testing seemed to suggest that resource population growth increases with the number of stake-holders, even if those stake-holders are hostile to the resource. Some further testing confirmed that stake-holders don’t engage in actions there are more than two of them – it’s possible that I hard-coded something during testing, but it needs to be fixed. For now, I’m shifting the default testing options to 3 stake-holders to isolate the issue.

Resolve Issue #25: Agent’s action error

That was quick. What happened was an issue with the COST and ACTION array – I basically had the code to initialise three but not four stake-holders accurately. When a fourth was initialised (or nine, in the case of one test), the stake-holder did nothing because it was devoting itself to costly non-sense actions from the start and couldn’t get out of them. Resources then did better because there were fewer agents able to affect them (those agents owning a smaller amount of land). When this is resolved, a fourth stake-holder performs the expected actions and the population dynamics oscillate even more as a result because more total actions are being performed (and on more land, as I’ve set it).

More progress toward resolving Issue #21

I’ve now reduced the number of arguments and hard-coded values in the functions of observation.c, leaving only the transect and sample_fixed_res functions to go. Overall, I do think tha this makes the code more readable, and everything goes back to the paras vector, which will be useful later in input and output during software testing and use.

Update: 17 MAY 2017

Progress toward resolving Issue #21

The functions in resource.c now take the paras vector as an argument where practical (most of the time). This cleans the code up quite a bit and has the nice side effect of giving me an excuse to also remove some of the hard coded values (even if they don’t change, this is probably a good idea).

Update: 16 MAY 2017

Concrete plans for cleaning up the code

Now that the main engine of G-MSE is in place, there are a few things that I want to do in the next week or two to clean up the code.

Following these things, it would also be helpful to do the following.

Update: 15 MAY 2017

Resolved Issue #24 Resources retain helpem and feedem

Issue #24 appears to be resolved, although the it was a bit trickier than anticipated to do so. I created three additional columns in the RESORUCE array to store the change in the baseline values of birthrate, death probability, and offspring number. As far as I can tell, there is no longer any carryover in these demographic values, nor do parents pass on their adjusted values to their offspring. Fixing this required several changes to user.c and resource.c. As a consequence, death rate caused by killing is now completely independent of carrying capacity (as seems sensible). Another thing to decide is if increases in birth rate or offspring number caused by user actions should also be independent of carrying capacity; that is, when users helpem or feedem, are they increasing the carrying capacity itself, or just the population growth rate to carrying capacity (as of now, it’s the latter).

A working example – but still some debugging to do

After resolving Issue #24, some initial testing shows that the model appears to be working as intended (more testing is obviously needed). The below shows a scenario in which one resource has a small effect on crop yield. The upper left panel shows resources on the landscape. The upper right panel shows land ownership (the blue is public – manager owned – land). The middle left shows population abundance (black) and its estimate (blue); carrying capacity is 400 (red dashed line), but the manager is trying to keep the population around 200 (dashed blue line) – mean percent yield of the crop is shown in the orange line. The mid right panel shows yields of each plot. The lower left panel shows the changing policy set by the manager – red lines show the cost of stakeholders killing or castrating resources (very high values effectively prohibit it). Green shows the cost of moving (scaring) resources off the stake-holder’s land, and blue shows the cost of helping the resource (increasing its birthrate or offspring production). The lower right shows what stake-holders do in response to policy – colours show actions corresponding to the same colour costs in the lower left panel.

Output when manager responds to observation model and sets policy, which is then followed by stake-holders

Output when manager responds to observation model and sets policy, which is then followed by stake-holders

So in the above example, we have the manager effectively prohibiting killing or castrating resources until about generation 18, when the population gets higher than desired. At this point, the manager switches to allow killing and castrating, and makes moving and helping resources more costly – stake-holders respond to this by doing a bit more killing and castrating, and the population goes down in response.

Looking good, but still need to clean the code

The above example is encouraging, but there is still quite a bit of clean-up to do. More unit testing is necessary to make sure that all resources are doing what they should, and I think the interaction between resources and landscape could be made a bit better in the resource model. Also, setting the initial costs and utilities is quite messy – I need to fix this up a bit so that there is at least one easy place to do this in the code, then an easy way to do it as an argument in the gmse function. It would also be nice not to have managers or stake-holders be quite so short-sighted – but having decisions be made based on history will require quite a bit more work, though the structure is there for it to be done in the code.

Update: 12 MAY 2017

One more debugging in the genetic algorithm

A bug in the code was causing managers to set their marginal fitneses to zero within the genetic algorithm. The reason for this was that the functions crossover and mutation allowed for util, u_loc, and u_land columns to be changed when the zero column of the action array was positive – i.e., when the actions corresponded to affecting other agents utilities in some way. The reason for this coding is so that agents can potentially affect one anothers utilities (e.g., a stake-holder lobbying the manager), but it does not make sense for stake-holders to affect their own utilities. The bug was caused because when the manager mutated (or crossed over) to change their own utilities in some way, the high cost recognised this as over-budget and set the value to zero, hence replacing the marginal utility set in the manager model. This was easily fixed by not allowing an agent to affect its own utility values (i.e., disallow the utility columns to be changed when the first column of the ACTION array equal’s the agent’s own ID). This would have caused an issue later anyway, so it’s better to spot it now. Re-running the model, the bug is fixed and the manager marginal utilities are retained in the appropriate row of ACTION (see commit 4dacbe83ed1be0d1216b692a1db18f5323ed22f2).

Another thing that needs de-bugging

For some reason, managers are going way overbudget in allocating actions. Fortunately, they’re at least allocating their actions well, but I need to fined out why their budget looks more like 500 when I set it to 100. Note that this only happens when managers want more of a resource, not fewer. Perhaps the marginal utility is getting added into the budget? Yes, this appears to be the case and has been fixed with commit d60312da590630fc2a680a57b8daed8e6d6bfafd, and now the costs no longer go over the manager’s budget.

Valgrind summary

Some initial testing revealed that some memory might have been poorly allocated; allocating space for an int instead of a double in the genetic algorithm was flagged by valgrind. After fixing this (see commit 57d0c78de7e421687870749549d309cf85d31dab), valgrind returns no errors or leaks.

==8048== 
==8048== HEAP SUMMARY:
==8048==     in use at exit: 194,900,716 bytes in 19,004 blocks
==8048==   total heap usage: 12,387,437 allocs, 12,368,433 frees, 2,482,975,628 bytes allocated
==8048== 
==8048== LEAK SUMMARY:
==8048==    definitely lost: 0 bytes in 0 blocks
==8048==    indirectly lost: 0 bytes in 0 blocks
==8048==      possibly lost: 0 bytes in 0 blocks
==8048==    still reachable: 194,900,716 bytes in 19,004 blocks
==8048==         suppressed: 0 bytes in 0 blocks
==8048== Reachable blocks (those to which a pointer was found) are not shown.
==8048== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==8048== 
==8048== For counts of detected and suppressed errors, rerun with: -v
==8048== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I dare say that this might nearly be an alpha version of the software. I just need to get some more clever ways to input array values, and make sure that I resolve Issue #22.

Issue #24 Resources retain helpem and feedem

Resources are retaining their values of helpem and feedem after being helped for one generation. Worse, they are passing their inherited characteristics on to their offspring. This needs to be changed so that agent actions have the temporary effect of increasing offspring survival probability or reproduction – else populations will never run the risk of crashing.

Update: 11 MAY 2017

Some success with manager fitness function debugging – more testing needed

After much time working on debugging the manager fitness function, I believe that all of the bugs are worked out of it, and that the managers are now responding dynmically to agent actions and resource abundances. I now need to test the whole function in multiple different ways to confirm this, and to make sure that the manager sets policy as predicted for some very simple scenarios.

One potential issue I’ve already noticed – if managers make stake-holder actions so costly that they never perform them, then the manager might operate under the assumption that they will never perform the action even if costs drop. It might therefore be necessary to add an increment to the total actions (e.g., add 10 to each, just to give managers the ability to consider the possibility) or somehow have managers tie predicted actions to stake-holder utilities (I don’t like this as much – to speculative and computationally intense).

Update: 10 MAY 2017

Debugging the manager fitness function in the genetic algorithm

Today I have spent my time attempting to completely debug the newly created manager_fitness function and its sub-functions. Unfortunately, one bug still appears to remain. For some reason, the function adds actions to the POPULATION array in the first row. This issue has been isolated, and I’m almost sure that it is caused by something in manager_fitness. Tomorrow, the goal is to fix this so that actions are applied correctly where the row’s first column is 1 (the manager agentID).

Update: 9 MAY 2017

New Issue #23: Revise predicted consequences of user and manager actions

In functions in the genetic algorithm res_to_counts and policy_to_counts, the projected consequences of actions needs to be fine tuned. As of now, it predicts one fewer resource from movem, killem, and castem, and one more resource from feedem and helpem in res_to_counts. In policy_to_counts, it predicts one fewer resource for killem and one more resource for feedem and helpem. Really, there should probably at least be an option to use more precise estimates of what will happen. For the user function, this matters a bit less because stake-holders typically just want more or less of a resource. Managers, however, are trying to hit a middle ground a lot of the time; it is also more reasonable to assume that they have demographic information on the resources of interest.

More writing and re-writing the manager genetic algorithm

I have completed an initial draft of the manager fitness function manager_fitness and its associated sub-functions policy_to_counts and sum_array_layers. The function manager_fitness might need to be pruned a bit by adding a third sub-function, as it’s a bit long at the moment.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent by 
 * passing through a loop to payoffs_to_fitness
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     agent_array: The agent array
 *     jaco: The jacobian matrix of resource and landscape interactions
 *     interact_table: Lookup table for figuring out rows of jaco and types
 *     interest_num: The number of rows and cols in jac, and rows in lookup
 * ========================================================================== */
void manager_fitness(double *fitnesses, double ***population, int pop_size, 
                     int ROWS, double **agent_array, double **jaco,
                     int **interact_table, int interest_num, int agentID,
                     double ***COST, double ***ACTION, int COLS, int layers){
    
    int agent, i, row, act_type, action_row, manager_row, type1, type2, type3;
    double agent_fitness, *count_change, foc_effect, change_dev, max_dev;
    double movem, castem, killem, feedem, helpem, *dev_from_util;
    double utility, *utils, **merged_acts, **merged_costs, **act_change;
    
    count_change  = malloc(interest_num * sizeof(int));
    utils         = malloc(interest_num * sizeof(int));
    dev_from_util = malloc(interest_num * sizeof(double));
    merged_acts   = malloc(ROWS * sizeof(double *));
    for(i = 0; i < ROWS; i++){
        merged_acts[i] = malloc(COLS * sizeof(double));
    }
    merged_costs = malloc(ROWS * sizeof(double *));
    for(i = 0; i < ROWS; i++){
        merged_costs[i] = malloc(COLS * sizeof(double));
    }
    act_change = malloc(ROWS * sizeof(double *));
    for(i = 0; i < ROWS; i++){
        act_change[i] = malloc(COLS * sizeof(double));
    }
    
    sum_array_layers(ACTION, merged_acts, 0, ROWS, COLS, layers);
    sum_array_layers(COST,  merged_costs, 1, ROWS, COLS, layers);
    
    max_dev = 0;
    for(agent = 0; agent < pop_size; agent++){
        for(action_row = 0; action_row < interest_num; action_row++){
            count_change[action_row] = 0; /* Initialise at zero */
            utils[action_row]        = 0; /* Same for utilities */
            while(population[action_row][0][agent] < -1){
                type1 = population[action_row][1][agent];
                type2 = population[action_row][2][agent];
                type3 = population[action_row][3][agent];
                manager_row = 0;
                while(population[manager_row][0][agent] == agentID &&
                      population[manager_row][1][agent] == type1   &&
                      population[manager_row][2][agent] == type2   &&
                      population[manager_row][3][agent] == type3
                ){
                    manager_row++;
                }
            }
            policy_to_counts(population, merged_acts, agent, merged_costs, 
                             act_change, action_row, manager_row, COLS);
            foc_effect  = 0.0;
            foc_effect -= act_change[action_row][9];  /* See Issue #23 */
            foc_effect += act_change[action_row][10]; 
            foc_effect += act_change[action_row][11]; 
            for(i = 0; i < interest_num; i++){
                count_change[i] += foc_effect * jaco[action_row][i];
            }
            utils[action_row] = population[manager_row][4][agent];
        }
        for(i = 0; i < interest_num; i++){ /* Minimises dev from marg util*/
            change_dev += (count_change[i]-utils[i])*(count_change[i]-utils[i]);
        } 
        if(change_dev > max_dev){
            max_dev = change_dev;
        }
        dev_from_util[agent] = change_dev;
    }
    
    for(agent = 0; agent < pop_size; agent++){
        fitnesses[agent] = max_dev - dev_from_util[agent];
    }
    
    for(i = 0; i < ROWS; i++){
        free(act_change[i]);
    }
    free(act_change);
    for(i = 0; i < ROWS; i++){
        free(merged_costs[i]);
    }
    free(merged_costs);
    for(i = 0; i < ROWS; i++){
        free(merged_acts[i]);
    }
    free(merged_acts);
    free(dev_from_util);
    free(utils);
    free(count_change);
}

The policy_to_counts function feeds new actions back to the main manager fitness function based on the new costs imposed by managers. We assume that new actions are proportional to the percent increase or reduction to costs (e.g., twice as many killem actions if the manager makes it cost half as much). I cases where the cost drops to zero (debating whether I want his to be possible – probably not), we assume the new cost is 0.5 and calculate accordingly.

/* =============================================================================
 * This function updates a temporary action array for changes in policy
 *     population: The population array of agents in the genetic algorithm
 *     merged_acts: The action 2D array of summed elements across 3D ACTION
 *     agent: The agent (layer) in the population being simulated
 *     merged_costs: The mean cost paid for each element in the ACTION array
 *     act_change: The array of predicted new actions given new costs
 *     action_row: The row where the action and old costs are located
 *     manager_row: The row where the new costs from the manager are located
 *     COLS: The number of columns in the ACTION and COST arrays
 * ========================================================================== */
void policy_to_counts(double ***population, double **merged_acts, int agent,
                      double **merged_costs, double **act_change, 
                      int action_row, int manager_row, int COLS){
    
    int col;
    double old_cost, new_cost, cost_change, new_action;
    
    for(col = 0; col < COLS; col++){
        old_cost    = merged_costs[action_row][col];
        new_cost    = population[manager_row][col][agent];
        if(new_cost == 0){
            new_cost = 0.5; /* Need to avoid Inf increase in cost somehow */
        }
        cost_change = old_cost / new_cost;
        new_action  = merged_acts[action_row][col] * cost_change;
        act_change[action_row][col] = floor(new_action);
    }
}

The function sum_array_layers is basically an apply function in R, except that it only works with the COST or ACTION arrays, and only in one dimension.

/* =============================================================================
 * This function sums (or averages) a row of COST or ACTION across all layers
 *    array: The 3D array that is meant to be summed or averaged
 *    out: The 2D array where the summed/average values are to be stored
 *    get_mean: TRUE (1) or FALSE (0) indiciating whether to get mean vs sum
 *    ROWS: Number of rows in array
 *    COLS: Number of cols in array
 *    total_layers: How many layers there are in array (depth)
 * ========================================================================== */
void sum_array_layers(double ***array, double **out, int get_mean, int ROWS,
                      int COLS, int layers){
    
    int row, col, layer;

    for(row = 0; row < ROWS; row++){
        for(col = 0; col < COLS; col++){
            if(get_mean == 1){
                for(layer = 0; layer < layers; layer++){
                    out[row][col] += (array[row][col][layer] / layers);
                }
            }else{
                for(layer = 0; layer < layers; layer++){
                    out[row][col] += array[row][col][layer];
                }                
            }
        }
    }
}

I have not tested any of these functions at all. They almost certainly contain some bugs at the moment, so a lot of work is going to need to debugging them and making sure that they actually are doing what I want them to do. Tomorrow might be a good time for a thorough debugging and memory leak checks. If all this works though, managers should be able to dynamically change costs in response to stake-holders to manage resources – once the appropriate call from manager.c is in place (it hasn’t been coded yet, but this should be trivial to write). Note that the git history immediately prior to commit 79446e394133bb9e6b4792d334ab863e32ef0881 will show some attempts at getting the above functions working in different ways. I settled on the above after restructuring the code considerably for both speed and readability.

Update: 8 MAY 2017

Linking manager marginal utilities and manager actions remains difficult, but I have decided on the following plan to move forward.

It will be useful to develop a very simple criteria for assessing the fitness of adjusting costs in strategy_fitness in game.c. Do this in the switch function where case 1:, but include an if statement to make sure that if(act_type == agentID), then the genetic algorithm knows that it’s affecting all other user actions in the -2 row. A new function policy_to_counts will be created in game.c which takes in the ACTION and COST arrays. This new function will assume two things.

  1. The proportion of +, -, and 0 actions (from the perspective of a stake-holder) will not change – i.e., stake-holders will try to achieve the same ends in the next time step as they did in the previous time step. The movem column will be defined as - if util_loc = 1 and util_land = 1, else it will be defined as 0 (again, from the stake-holder perspective, from the manager’s perspective this is always 0 – at least, I can’t think of any reason why we would want it not to be zero.

  2. Stake-holders will invest in whatever +, -, or 0 action is least costly. Hence to loosely predict stake-holder actions, the manager could simply assume that the stake-holder invests a proportion of their total budget to the least costly action, and based on the manager’s set cost, puts their budget into those actions accordingly. Hence, if we had a farmer that wanted to increase crop yield by reducing resource abundance, and had to choose between movem, killem, and castem with costs of 10, 2, and 5, respectively, then the farmer would put all of their budget into killem (or a high proportion, at least). This requires the manager (whose actions are already set within the genetic algorithm) to get a proportion for each of the stake-holders actions, then divy their actions out based on the revised costs.

The initial plan: getting something to work

Let’s try all of the above again. We’re trying to get from cost adjustments to fitness. We have the cost adjustments in hand; the manager population in the genetic algorithm is in the process of selecting which of these adjustments are best. The difficultly is now translating the cost adjustments to stake-holder actions, and figuring out just how good we want managers to be at assessing stake-holder actions. One extreme is to run the genetic algorithm in each within the manger’s decision making to figure out how stake-holders will respond to policy change with a high degree of acurracy; this would take a massive amount of computation time and be a bit unrealistic in that it would kind of assume that managers can read the minds of stake-holders.

Another extreme is to assume the sum total of each action will not change and to adjust costs accordingly. Perhaps, to start, we could define a new array within the new function policy_to_counts, **sum_actions, which would sum up all stake-holder actions for each resource type.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-2 1 0 0 10 10 10 32 2 16 0 0 0
-2 2 0 0 301 10 10 4 1 0 0 1 1

The hypothetical sum_actions array above adds up all of the rows in the ACTION array where column 1 equals -2. For each resource we then get a picture of what is going to happen in the next generation (to some extent it is unrealistic to assume that managers have even this much detailed information, but then again, these are actions from the previous time step, so it’s perhaps not too much of a stretch to assume that the manager has some idea of what actions were taken by stake-holders). We also get a picture of the sum utilities for each resource type. To project the consequences of manager cost adjustment, managers can compute the proportion change in cost (which will require to read the COST array into the fitness function) and assume that the proportion of actions changes accordingly. For example, if the manager makes it half as costly to killem for resource type1 = 1 above, then they could assume that killem will be 32 total actions in the next generation. These sum total actions, adjusted by manager changes in costs, could then be run through the interaction array to project the change in resource abundance – fitness could be assessed by minimising the the difference between the projected change in resources from the marginal utilities.

This isn’t perfect prediction. Sometimes stake-holders will probably radically change behaviour after some cost threshold is met, but I think this is kind okay (at the very least, managers will respond in the next generation).

Other ideas

I will start coding the above plan, but there are probably other reasonable options to consider. I would like to also add the option of enacting policy via a second resource – representing resources as something like hunting licenses. The effects of these licenses could be understood through the interaction matrix (essentially, they’d be like introducing a predator, but one that goes through stake-holders). The manager could set the number of hunting licenses using the feedem (increases birth rate by action number) and castem (causes one fewer so resource doesn’t reproduce) columns (licenses would otherwise have a birthrate and death rate of one, so each replaces itself in the next generation) – birth type would also need to be changed to not be selected from a random Poisson. The bankem column could be interpreted as buying a license, somehow.

Implementation of the initial plan

This was somewhat difficult because of the way that marginal utilities are handled in the manager.c file. A new vector needed to grab the correct utilities and actions for adjusting costs and it was easier and more readable to just write a separate manager_fitness function (it can still be called by non-managers, though I’m struggling to think of when this would be desirable). The manager_fitness function is unfinished.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent by 
 * passing through a loop to payoffs_to_fitness
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     agent_array: The agent array
 *     jaco: The jacobian matrix of resource and landscape interactions
 *     interact_table: Lookup table for figuring out rows of jaco and types
 *     interest_num: The number of rows and cols in jac, and rows in lookup
 * ========================================================================== */
void manager_fitness(double *fitnesses, double ***population, int pop_size, 
                     int ROWS, double **agent_array, double **jaco,
                     int **interact_table, int interest_num, int agentID){
    
    int agent, i, row, act_type, action_row, manager_row, type1, type2, type3;
    double agent_fitness, *count_change, foc_effect, change_dev;
    double movem, castem, killem, feedem, helpem;
    double utility, *utilities;
    
    count_change = malloc(interest_num * sizeof(int));
    utilities    = malloc(interest_num * sizeof(int));
    
    for(agent = 0; agent < pop_size; agent++){
        for(i = 0; i < interest_num; i++){
            count_change[i] = 0; /* Initialise all count changes at zero */
            utilities[i]    = 0; /* Same for utilities */
        }
        action_row  = 0;
        while(population[action_row][0][agent] < -1){
            type1 = population[action_row][1][agent];
            type2 = population[action_row][2][agent];
            type3 = population[action_row][3][agent];
            manager_row = 0;
            while(population[manager_row][0][agent] == agentID &&
                  population[manager_row][1][agent] == type1   &&
                  population[manager_row][2][agent] == type2   &&
                  population[manager_row][3][agent] == type3
            ){
                manager_row++;
            }
        }
        
        /* Get the marginal utilities into utilities by running policy_to_counts
         * and get the count_change the same way. The above runs thorugh this
         * for each agent and for each resource Here still within the agent loop
         * we need to get the vectors summed appropriately to a reasonable
         * fitness metric (keeping in mind that it's not just ordinal
         */
        
        
        fitnesses[agent] = 0;
        for(i = 0; i < interest_num; i++){ /* Minimises dev from marg util*/
            change_dev       =  (count_change[i] - utilities[i]) * 
                                (count_change[i] - utilities[i]) + 1;
            fitnesses[agent] += (1 / change_dev);
        }
    }
    free(utilities);
    free(count_change);
}

Likewise, a sub-function that manager_fitness will call also needs some work.

/* =============================================================================
 * This function updates count change and utility arrays for changes in policy
 *     population: The population array of agents in the genetic algorithm
 *     interact_table: The lookup table for figuring out how resources interact
 *     int_num: The number of rows and cols in jac, and rows in the lookup
 *     utilities: A vector of the utilities of each resource/landscape level
 *     agent: The agent in the population whose fitness is being assessed
 *     layers: The number of layers (z dimension) in the COST and ACTION arrays
 *     COST: The cost array, for comparison with how costs change with actions
 *     ACTION: The action array to summarise current stake-holder actions
 *     agentID: The ID of the agent doing policy (should probably always be 1)
 * ========================================================================== */
void policy_to_counts(double ***population, int **interact_table, int int_num,
                      double *utilities, int agent, int layers, double **jaco,
                      double *count_change, double ***COST, double ***ACTION,
                      int agentID, int ROWS, int action_row, int manager_row){
    
    int row, col, layer, act_type, i, type1, type2, type3, cost_row;
    double old_cost, new_cost, cost_change, new_action, mean_cost, sum_actions;
    double **mean_costs, *hold_actions;
    

    hold_actions = malloc(13 * sizeof(double));
    
    for(i = 0; i < 13; i++){
        hold_actions[i] = population[action_row][i][agent];
    }
    
    for(col = 7; col < 13; col++){
        sum_actions = 0; 
        mean_cost   = 0;
        for(layer = 0; layer < layers; layer++){
            sum_actions += ACTION[action_row][col][layer];
            mean_cost   += (COST[action_row][col][layer] / layers);
        }
        old_cost    = mean_cost;
        new_cost    = population[manager_row][col][agent];
        cost_change = old_cost / new_cost;
        new_action  = sum_actions * cost_change;
        population[action_row][col][agent] = floor(new_action);
    }
    
    res_to_counts(population, interact_table, int_num, count_change, utilities, 
                  jaco, action_row, agent);

    for(i = 0; i < 13; i++){
        population[action_row][i][agent] = hold_actions[i];
    }

    free(hold_actions);
}

The history of struggling with these two functions in a way that is accurate, readable, and efficient is in the git history. I’ll consider both functions with fresh eyes tomorrow with the goal of getting something working.

Update: 5 MAY 2017

We have now reached a point where we have a clear link from manager utility to a manager’s desired change in resources. The util column of a manager (layer = 1) action array defines how many resources of a particular type the manager wants there to be when column 1 equals -2 (added below for clarity).

-2.000000   1.000000    0.000000    0.000000    100.000000  1.000000    1.000000    
-1.000000   1.000000    0.000000    0.000000    1.000000    1.000000    1.000000    
1.000000    1.000000    0.000000    0.000000    -330.696014 0.000000    0.000000    
2.000000    1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
3.000000    1.000000    0.000000    0.000000    0.000000    0.000000    0.000000

From this util value and the estimated abundance from the observation model, we can get to the marginal utility, which is placed in the same action array layer where column 1 equals the manager ID. We now need this value to have some effect; e.g., in the above where the population size is 330 individuals more than the manager wants, the manager needs to adjust the cost array in some way that has the predicted effect of lowering population size by roughly this amount. The way that the genetic algorithm can learn to do this is by assuming that the action array (which will have been the actions run in the last user model) represents what stake-holders will do when constrained appropriately by costs. So, for example, we can consider the ACTION array below.

, , 1

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0  100    1    1    0    0     0     0    20     0
[2,]   -1    1    0    0    1    1    1    0    0     1     0     0     0
[3,]    1    1    0    0    0    0    0    0    0     0     0     0     1
[4,]    2    1    0    0    0    0    0    0    1     1     0     0     0
[5,]    3    1    0    0    0    0    0    0    0     0     0     0     0

, , 2

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0    1    1    1   81    0     0     0     0     0
[2,]   -1    1    0    0  100    1    1    0    0     0     0     0     0
[3,]    1    1    0    0    0    0    0    0    0     0     0     0     0
[4,]    2    1    0    0    0    0    0    0    0     0     0     0     0
[5,]    3    1    0    0    0    0    0    0    0     0     0     0     0

, , 3

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0    1    1    1   72    2     0     0     0     1
[2,]   -1    1    0    0  100    1    1    1    0     0     0     0     0
[3,]    1    1    0    0    0    0    0    0    1     0     0     0     0
[4,]    2    1    0    0    0    0    0    0    0     0     1     0     0
[5,]    3    1    0    0    0    0    0    0    0     1     0     0     0

The first layer is the manager, and the second two are stake-holders that have util = 100 for the landscape layer 1, and util = 1 for the resource defined by type1 = 1, type2 = 0, and type3 = 0. In the above example, the resource might be geese disturbing crops, and the stake-holders might be farmers. In both cases, the stake-holders devote nearly all or nearly all of their budget to moving the resource (column 8 corresopnds to movem). The manager can then project total number of resources increased or decreased by these actions, and – perhaps eventually – whose land they will be on (the code is there for the manager to prefer them on public or private land, but this might need to be implemented later). Assuming that the manager just cares about total resource abundance for now, the should be able to recognise that movem will not decrease total resource abundance; hence, the manager might prefer stake-holders to switch to killem (column 10).

It’s this switch that is a challenge for the model. It’s easy now to have the manager recognise that stake-holder actions are not optimal in terms of policy – more actions are being devoted to something that doesn’t kill resources, and more would be better placed by increasing stake-holder column 10 values. Still, how much the manager should lower costs to get the desired abundance is unclear (and even more so if we were to add multiple resources). The manager can’t exactly use the stake-holder utilities for the resource per se either, because the actions are determined by how resources interact; we also don’t want to run a sub-genetic algorithm for the manager to anticipate stake-holder actions, as this would end up being computationally intense.

Perhaps the manager should simply recognise the plus-minus-neutral effects of each column (from columns 8-13 above, 0 - - + + 0). This gets part of the way there; if the manager wants resources killed, then they could crank up the costs associated with all 0 and + actions (perhaps this shouldn’t be allowed though for bankem, which would effectively prohibit stake-holders from inaction). The magnitude of costs for - actions such as killem or castem could then be decided by assuming that stake-holders would transfer - and 0 (again, exclude bankem) to the lowest column of - if the cost were the lowest.

Maybe managers should make a judgement a priori about what stake-holders are trying to do; classifying them as either wanting more or less of a particular resource. Wanting less of a resource would be associated with high values in action columns for killem and castem, and also movem but only if u_loc = 1 and u_land = 1 for the stakeholder (i.e., if the desires and ability to move them depends on the resource being on their land). Wanting more of a resource would be associated with high values in action columns for feedem and helpem. The manager could then assume that stake-holders would allocate their total budget actions proportionally to -, +, or 0 columns, but without discrimination between columns. It could be easy to summarise budget and action totals as in the table below.

Action type Total budget
Increasing 500
Decreasing 200
Neutral 300

In the above, a net 300 more resources would appear if the manager does nothing (ignoring the resource model and consequences of carrying capacity for the moment). Note that the actions should really be run through the interaction array so that interactions between two resources could be hypothetically projected. This wouldn’t be much extra work – the increase in a resource could just be multiplied by the appropriate column in the Jacobian matrix. Also note that costs are only one way to adjust resources – another would be having something like licenses to kill be a resource that stake-holders might want to buy – the manager could make more of these and they could themselves be modelled as a dynamic and affecting the Jacobian matrix.

Update: 4 MAY 2017

Manager model looped resources and mark-recapture

The density estimate within the manager model manager.c now returns accurate abundances for multiple resources (storing them in a vector est_abun). This was confirmed by shutting off the user function (see Issue: #22).

Additionally, the mark-recapture estimator has been successfully initialised in manager.c, and works for multiple resources. There was a bit of a hiccup here because the test printouts of abundances were consistently different from what was seen in the R plot. This turned out to be a minor error in R, not C (one too many columns were being read in in R to estimate resources marked). By fixing the error in R, both R and C estimates now match and are accurate. The next step is to return estmates for transect-based sampling abundances.

The mark-recapture analysis uses Chapman estimation, which is calculated in two functions. The function rmr_est runs calls chapman_est for each individual resource, inputting the results into the abun_est vector.

/* =============================================================================
 * This function calculates mark-recapture-based (Chapman) abundance estimates
 *     obs_array:      The observation array
 *     para:           A vector of parameters needed to handle the obs_array
 *     obs_array_rows: Number of rows in the observation array obs_array
 *     obs_array_cols: Number of cols in the observation array obs_array
 *     abun_est:       Vector where abundance estimates for each type are placed
 *     interact_table: Lookup table to get all types of resource values
 *     int_table_rows: The number of rows in the interact_table
 *     trait_number:   The number of traits in the resouce array
 * ========================================================================== */
void rmr_est(double **obs_array, double *para, int obs_array_rows, 
             int obs_array_cols, double *abun_est, int **interact_table, 
             int int_table_rows, int trait_number){
    
    int resource, type1, type2, type3;
    double estimate;
    
    for(resource = 0; resource < int_table_rows; resource++){
        abun_est[resource] = 0;
        if(interact_table[resource][0] == 0){ /* Change when turn off type? */
            type1    = interact_table[resource][1];
            type2    = interact_table[resource][2];
            type3    = interact_table[resource][3];
            estimate = chapman_est(obs_array, para, obs_array_rows, 
                                   obs_array_cols, trait_number, type1, type2,
                                   type3);
            abun_est[resource] = estimate;
        }
    }
}

The function chapman_est itself does all of the maths for estimating population abundance from mark-recapture data in the OBSERVATION ARRAY.

/* =============================================================================
 * This function calculates RMR (chapman) for one resource type
 *     obs_array:      The observation array
 *     para:           A vector of parameters needed to handle the obs_array
 *     obs_array_rows: Number of rows in the observation array obs_array
 *     obs_array_cols: Number of cols in the observation array obs_array
 *     trait_number:   The number of traits in the resource array
 *     type1:          Resource type 1
 *     type2:          Resource type 2
 *     type3:          Resource type 3
 * ========================================================================== */
double chapman_est(double **obs_array, double *para, int obs_array_rows, 
                   int obs_array_cols, int trait_number, int type1, int type2,
                   int type3){
    
    int row, col;
    int total_marks, recaptures, mark_start, recapture_start;
    int *marked, sum_marked, n, K, k;
    double estimate, floored_est;
    
    total_marks     = (int) para[11];
    recaptures      = (int) para[10];
    mark_start      = trait_number + 1;
    recapture_start = mark_start + (total_marks - recaptures);
    
    if(total_marks < 2 || recaptures < 1){
        printf("ERROR: Not enough marks or recaptures for management");
        return 0;
    }
    
    n      = 0;
    marked = malloc(obs_array_rows * sizeof(int));
    for(row = 0; row < obs_array_rows; row++){
        marked[row] = 0;
        if(obs_array[row][1] == type1 && 
           obs_array[row][2] == type2 &&
           obs_array[row][3] == type3
        ){
            for(col = mark_start; col < recapture_start; col++){
                if(obs_array[row][col] > 0){
                    marked[row] = 1;
                    n++;
                    break;
                }
            }
        }
    }
    
    K = 0;
    k = 0;
    for(row = 0; row < obs_array_rows; row++){
        if(obs_array[row][1] == type1 && 
           obs_array[row][2] == type2 &&
           obs_array[row][3] == type3
        ){
            for(col = recapture_start; col < obs_array_cols; col++){
                if(obs_array[row][col] > 0){
                    K++;
                    if(marked[row] > 0){
                        k++;
                    }
                    break;
                }
            }
        }
    }
    
    estimate    = ((n + 1) * (K + 1) / (k + 1)) - 1;
    floored_est = floor(estimate);

    free(marked);  
    
    return floored_est;
}

No confidence intervals are calculated at the moment, since I’m not sure how the simulated manager would use the uncertaintly, but if we eventually want real people to be able to ‘play’ the game as managers, then it shouldn’t be too difficult to add confidence intervals to all population size estimates within the C functions of manager.c.

Transect estimation of resource abundances

Manager estimation of abundances collected from transect type sampling (i.e., case 2 and case 3) are considerably easier than density-based and mark-recapture matrics. The times a resource is observed is simply stored in the 12th column (in C; 13 in R) of the observation matrix. The transect_est does the job for any number of resources all in one go.

/* =============================================================================
 * This function calculates mark-recapture-based (Chapman) abundance estimates
 *     obs_array:      The observation array
 *     para:           A vector of parameters needed to handle the obs_array
 *     obs_array_rows: Number of rows in the observation array obs_array
 *     abun_est:       Vector where abundance estimates for each type are placed
 *     interact_table: Lookup table to get all types of resource values
 *     int_table_rows: The number of rows in the interact_table
 * ========================================================================== */
void transect_est(double **obs_array, double *para, int obs_array_rows, 
                  double *abun_est, int **interact_table, int int_table_rows){
    
    int resource, observation, type1, type2, type3;
    
    for(resource = 0; resource < int_table_rows; resource++){
        abun_est[resource] = 0;
        if(interact_table[resource][0] == 0){ /* Change when turn off type? */
            type1    = interact_table[resource][1];
            type2    = interact_table[resource][2];
            type3    = interact_table[resource][3];
            for(observation = 0; observation < obs_array_rows; observation++){
                if(obs_array[observation][1] == type1 && 
                   obs_array[observation][2] == type2 && 
                   obs_array[observation][3] == type3
                ){
                    abun_est[resource] += obs_array[observation][12];
                }
                    
            }
        }
    }
}

Abundances need to now be compared to manager utilities (and for now, I’m just going to assume that the agent with agentID = 1 is the head manager (other type1 = 0 agents can be ‘managers’ collecting data, but I don’t see how or why we would want multiple managers bargaining over resources with different util values; not yet at least, and probably not ever).

Getting marginal utilities for management and putting them in ACTION

Back to the big picture, I have now finished the first five (easiest) of the tasks below.

/* 1. Get summary statistics for resources from the observation array     */
/* 2. Place estimated resource abundances in a vector the size of int_d0  */
/* 3. Initialise new vector of size int_d0 with temp utilities of manager */  
/* 4. Subtract abundances from temp utilities to get marginal utilities   */
/* 5. Insert the marginal utilities into the agent = 1 col1 of ACTION     */
/* 6. Run the genetic algorithm (add extension to interpet cost effects)  */
/* 7. Put in place the new ACTION array from 6                            */
/* 8. Adjust the COST array appropriately from the new manager actions    */

Essentially, the manager.c function now gets estimates for the abundances of each resource, then places those estimates in a temporary vector. Elements in this vector (corresponding to resource abundances) are then subtracted from the manager’s utility values (corresponding to desired resource counts). What’s left is then the marginal utility of resources – if there are more resources than the manager desires, then the marginal utility is negative, and if there are fewer, then the marginal utility is positive. The marginal utility is then placed back into the first layer of the ACTION array (corresponding to the manager) where column 1 equals 1 (i.e., intepreted as actions of the manager affecting their own costs – existing values of which aren’t really being used because the concept doesn’t make a lot of sense, and the values are really just there as place-holders for where they mean things in other layers of the array). Hence the util column then includes values for the ideal resource abundance (where column 1 equals -2 – util is in column 5) and the marginal utility given estimated resource abundance (where column 1 equals 1). See below.

-2.000000   1.000000    0.000000    0.000000    100.000000  1.000000    1.000000    
-1.000000   1.000000    0.000000    0.000000    1.000000    1.000000    1.000000    
1.000000    1.000000    0.000000    0.000000    -330.696014 0.000000    0.000000    
2.000000    1.000000    0.000000    0.000000    0.000000    0.000000    0.000000    
3.000000    1.000000    0.000000    0.000000    0.000000    0.000000    0.000000

In the above, the manager sees 100 as the ideal population size, but there are ca 430 resources of type1 = 1, type2 = 0, type3 = 0 in the population. Hence the manager would like to see ca 330 fewer of these kinds of resources. The -330.696014 printed from a test simulation above will allow the genetic algorithm to adjust the COST array accordingly, decreasing COST columns that correspond to the killing or castrating (but not moving, I suppose) of resources.

Update: 3 MAY 2017

Resolved Issue: #20

Issue: #20 has now been resolved. The res_type has now been removed from the observation model, and observations simply occur for all unique resource types – if some are not needed, then they are not analysed. Doing this required very little modification for transect type sampling (case 2 and case 3), but considerably more for density base sampling (case 0) and especially mark and recapture (case 1). In these cases, I decided to split the sampling functions up more clearly. Testing revealed some initial errors, but these were ironed out and fixed. Currently, the correct OBSERVATION array is returned, although this array is not analysed correctly when plotting in R for more than one resource (the code for this has not yet be written).

NOTE: there is no code written to ignore subdivisions yet. I’m not sure whether or not we’ll actually want this, but the option could simply be something placed in para and checked in the subfunctions in observation.c

Major changes to observation.c

In resolving Issue: #20, I have re-worked the code in observation.c to be more readable. instead of the switch(methods) in the main observation function calling density based estimation or mark-recapture, but both of these functions, confusingly, calling the same mark_res sub-function, I now have mark_res being called for density based estimation. Hence, each method of observation has its own (considerably smaller) sub-function, each of which calls another sub-function. For example, with density-based estimation, we have the following function called times_obs times.

/* =============================================================================
 * Density method of estimation
 * ===========================================================================*/
/* =============================================================================
 * This simulates the capture-mark-recapture of a resource type
 * Inputs include:
 *     resource_array: data frame of resources to be marked and/or recaptured
 *     agent_array: data frame of agents, potentially doing the marking
 *     land: The landscape on which interactions occur
 *     paras: vector of parameter values
 *     res_rows: Total number of resources that can be sampled
 *     a_row: Total number agents that could possibly sample
 *     obs_col: The number of columns in the observational array
 *     a_type: The type of agent that is doing the marking
 *     by_type: The type column that is being used
 *     find_type: The type of finding that observers do (view-based or rand)
 * Output:
 *     Accumlated markings of resources by agents
 * ========================================================================== */
void mark_res(double **resource_array, double **agent_array, double ***land,
              double *paras, int res_rows, int a_row, int obs_col, int a_type, 
              int by_type, int find_type){
    
    int resource;
    int agent;
    int count;
    int edge;        /* How does edge work? (Effects agent vision & movement) */
    int samp_res;    /* A randomly sampled resource */
    int ldx, ldy;
    int move_t;
    int sample_num;   /* Times resources observed during one time step */

    edge       = (int) paras[1];  /* What type of edge is on the landscape */
    sample_num = (int) paras[11];
    ldx        = (int) paras[12]; /* dimensions of landscape -- x and y */
    ldy        = (int) paras[13];
    move_t     = (int) paras[14]; /* Type of movement being used  */

    for(agent = 0; agent < a_row; agent++){
        if(agent_array[agent][by_type] == a_type){ 
            mark_in_view(resource_array, agent_array, paras, res_rows, agent, 
                         find_type, obs_col);
        }
        if(sample_num > 1){
            a_mover(agent_array, 4, 5, 6, edge, agent, land, ldx, ldy, move_t);
        }
    }
}

The above function calls mark_in_view, which marks all resources in the agent’s view (regardless of type, which will get sorted out later).

/* =============================================================================
 * This simulates an individual agent doing some field work (observing)
 * Inputs include:
 *     resource_array: data frame of resources to be marked and/or recaptured
 *     agent_array: data frame of agents, potentially doing the marking
 *     paras: vector of parameter values
 *     res_rows: Total number of rows in the res_adding data frame
 *     worker: The row of the agent that is doing the working
 *     find_proc: The procedure used for finding and marking resources
 *     res_type: The type of resources being marked
 *     obs_col: The number of columns in the observation array
 * Output:
 *     The resource_array is marked by a particular agent
 * ========================================================================== */
void mark_in_view(double **resource_array, double **agent_array, double *paras,
                  int res_rows, int worker, int find_proc, int obs_col){

    int xloc;         /* x location of the agent doing work */
    int yloc;         /* y location of the agent doing work */
    int view;         /* The 'view' (sampling range) around agent's location */
    int edge;         /* What type of edge is being used in the simulation */
    int resource;     /* Index for resource array */
    int r_x;          /* x location of a resource */
    int r_y;          /* y location of a resource */
    int seeme;        /* Test if observer sees/captures the resource */
    int ldx;          /* Landscape dimension on the x-axis */
    int ldy;          /* Landscape dimension on the y-axis */
    int EucD;         /* Is vision based on Euclidean distance? */
    double min_age;   /* Minimum at which sampling can occur */
    
    xloc  = (int) agent_array[worker][4];
    yloc  = (int) agent_array[worker][5];
    view  = (int) agent_array[worker][8];
    edge  = (int) paras[1];
    ldx   = (int) paras[12];
    ldy   = (int) paras[13];
    EucD  = (int) paras[20];
    
    min_age = paras[16];
    
    for(resource = 0; resource < res_rows; resource++){
        if(resource_array[resource][11] >= min_age){
            r_x   = resource_array[resource][4];
            r_y   = resource_array[resource][5];
            seeme = binos(xloc, yloc, r_x, r_y, edge, view, ldx, ldy, EucD);
            agent_array[worker][10]           += seeme;
            resource_array[resource][obs_col] += seeme;
            resource_array[resource][12]      += seeme;
        }
    }
}

The mark-recapture technique, in contrast, calls the new function sample_fixed_res once (time_obs is taken care of in the sub-function).

/* =============================================================================
 * Mark re-capture method of estimation
 * ===========================================================================*/
/* =============================================================================
 * This simulates the capture-mark-recapture of a resource type
 * Inputs include:
 *     resource_array: data frame of resources to be marked and/or recaptured
 *     agent_array: data frame of agents, potentially doing the marking
 *     land: The landscape on which interactions occur
 *     paras: vector of parameter values
 *     lookup: The table listing resources and landscape layers to lookup
 *     res_rows: Total number of resources that can be sampled
 *     agent_number: Total number of agents in the agent array
 *     a_type: The type of agent that is doing the marking
 *     trait_number: The number of traits (columns) of the resource array
 *     lookup_rows: The number of rows in the lookup table
 * Output:
 *     Accumlated markings of resources by agents
 * ========================================================================== */
void sample_fixed_res(double **resource_array, double **agent_array, 
                      double ***land, double *paras, int **lookup, int res_rows, 
                      int agent_number, int a_type, int trait_number, 
                      int lookup_rows){

    int edge_type, move_type, fixed_sample, times_obs, move_res, by_type;
    int land_x, land_y;
    int obs_iter, agent;
    int row, type1, type2, type3;
    
    edge_type    = (int) paras[1];
    move_type    = (int) paras[2];
    fixed_sample = (int) paras[10];
    land_x       = (int) paras[12];
    land_y       = (int) paras[13];
    by_type      = (int) paras[17];
    move_res     = (int) paras[19];
    
    if(fixed_sample < 1){
        printf("ERROR: Fixed sample must be >= 1 \n ... Making = 1 \n");
        paras[10]    = 1;
        fixed_sample = 1;
    }
    

    for(row = 0; row < lookup_rows; row++){
        if(lookup[row][0] == 0){
            obs_iter     = trait_number + 1; 
            times_obs    = (int) paras[11];
            
            type1 = lookup[row][1];    
            type2 = lookup[row][2];
            type3 = lookup[row][3];
            while(times_obs > 0){
                for(agent = 0; agent < agent_number; agent++){
                    if(agent_array[agent][by_type] == a_type){ 
                        mark_fixed(resource_array, agent_array, paras, res_rows, 
                                   agent, obs_iter, type1, type2, type3);
                    }
                }
                obs_iter++;
                times_obs--;
                if(move_res == 1){ /* Move resources if need for new sample */
                    res_mover(resource_array, 4, 5, 6, res_rows, edge_type, 
                              land, land_x, land_y, move_type); 
                }
            }
        }
    }
}

The sub-function mark_fixed marks a fixed number of a specific type of resource.

/* =============================================================================
 * This simulates an individual agent marking a fixed number of resources
 * Inputs include:
 *     resource_array: data frame of resources to be marked and/or recaptured
 *     agent_array: data frame of agents, potentially doing the marking
 *     paras: vector of parameter values
 *     res_rows: Total number of rows in the res_adding data frame
 *     worker: The row of the agent that is doing the working
 *     obs_col: The number of columns in the observation array
 *     type1: Resource type 1 being marked
 *     type2: Resource type 2 being marked
 *     type3: Resource type 3 being marked
 * Output:
 *     Specific resources in resource_array are marked by a particular agent
 * ========================================================================== */
void mark_fixed(double **resource_array, double **agent_array, double *paras,
                int res_rows, int worker, int obs_col, int type1, int type2, 
                int type3){
    
    int xloc;         /* x location of the agent doing work */
    int yloc;         /* y location of the agent doing work */
    int view;         /* The 'view' (sampling range) around agent's location */
    int edge;         /* What type of edge is being used in the simulation */
    int resource;     /* Index for resource array */
    int r_x;          /* x location of a resource */
    int r_y;          /* y location of a resource */
    int seeme;        /* Test if observer sees/captures the resource */
    int ldx;          /* Landscape dimension on the x-axis */
    int ldy;          /* Landscape dimension on the y-axis */
    int fixn;         /* If procedure is to sample a fixed number; how many? */
    int count;        /* Index for sampling a fixed number of resource */
    int sampled;      /* The resource randomly sampled */
    int type_num;     /* Number of the type of resource to be fixed sampled */
    int EucD;         /* Is vision based on Euclidean distance? */
    double sampl;     /* Random uniform sampling of a resource */
    double min_age;   /* Minimum at which sampling can occur */

    xloc  = (int) agent_array[worker][4];
    yloc  = (int) agent_array[worker][5];
    view  = (int) agent_array[worker][8];
    edge  = (int) paras[1];
    ldx   = (int) paras[12];
    ldy   = (int) paras[13];
    EucD  = (int) paras[20];

    min_age = (int) paras[16];
    
    fixn     = (int) paras[10];
    type_num = 0;
    for(resource = 0; resource < res_rows; resource++){
        if(resource_array[resource][1]  == type1 &&
           resource_array[resource][2]  == type2 &&
           resource_array[resource][3]  == type3 &&
           resource_array[resource][11] >= min_age
        ){
            type_num++;
        }
    }
    if(type_num > fixn){ /* If more resources than the sample number */
        /* Temp tallies are used here to sample without replacement */
        for(resource = 0; resource < res_rows; resource++){
            if(resource_array[resource][1]  == type1 &&
               resource_array[resource][2]  == type2 &&
               resource_array[resource][3]  == type3
            ){
                resource_array[resource][13] = 0; /* Start untallied */
            }
        }
        count = fixn;
        sampl = 0;
        while(count > 0){
            do{ /* Find an un-tallied resource in the array */
                 sampl   = runif(0, 1) * res_rows;
                 sampled = (int) sampl;
              } while(resource_array[sampled][13] == 1         || 
                      resource_array[sampled][1]  != type1     ||
                      resource_array[sampled][2]  != type2     ||
                      resource_array[sampled][3]  != type3     ||
                      resource_array[sampled][11] <  min_age   ||
                      sampled == res_rows /* In case sample returns 1 */
                );
            resource_array[sampled][obs_col]++; /* Marks accumulate  */
            resource_array[sampled][12]++;
            resource_array[sampled][13] = 1;    /* Tally is noted    */
            count--;
        }
        agent_array[worker][10] += fixn;
    }else{ /* Else all of the resources should be marked */
        for(resource = 0; resource < res_rows; resource++){
            if(resource_array[resource][1]  == type1 &&
               resource_array[resource][2]  == type2 &&
               resource_array[resource][3]  == type3 &&
               resource_array[resource][11] >= min_age
            ){
               resource_array[resource][obs_col]++;  /* Mark all */
               resource_array[resource][12]++;
             }
        }
        agent_array[worker][10] += type_num; /* All resources marked */ 
   }
}

This still isn’t the most readable code, but it’s better than what it was before. Eventually, I would prefer to get rid of as man of the function arguments as possible and place these as elements within para. The identities of para elements could then be explained in the function more clearly (and consistently). This would lead to less bulky functions and a bit clearer structure for the code, but I think it can be implemented later as the code is given a more general clean-up.

Issue: #22 User function crashes with multiple resources

For some reason, the user.c call appears to crash when there is more than one resource. I only noticed this after the overhaul of the observation model, but I doubt they are related. More likely, one of the many arrays with dimensions that depend on resource number is being built or called improperly, leading to a segmentation fault. This should be fixed, of course, but I suspect the problem is not too far buried.

Next steps still to get abundance estimate from observation array in manager.c

The above issues were progress, but it has set back the pace of the manager model a bit. The next item on the agenda is to still get individual abundance estimates within for each unique resource type in the manager model. The density estimate is completed, and the whole thing should work because the code is already written to loop through the interaction table and get abundance estimates for each unique resource. This should be tested by including more than one resource and turning off the user function (see Issue: #22). Once it works, then I need to do the same thing for mark-recapture and transect-based estimates of abundance. Then, I’ll try to get through items 2-5 from the list on the list from Monday.

Update: 2 MAY 2017

Issue: #20 Remove res_type from observation model

Currently, the observation model only records resources into the observation array if they are of a particular type1, which is specified in the para vector and used to produce a data array with only one type of specified resource. Originally, this seemed like a good idea, but after spending some initial time writing the management model, I don’t think there is any need nor good reason to restrict observation to a specific resource type. Instead, all types should be marked and moved to the observation array. Then, if management analysis wants statistics for only one type of resource, its very easy to use an if statement to check that the type is appropriate. It’s much easier to ignore parts of the array than to make more than one array when needed through multiple calls of the observation function.

To fix this, it shouldn’t be much more than a simple removal of specifying res_type values in observation.c. When there is only one resource type, all calculations should proceed normally, but when more resources are introduced, an if is needed for both management and plotting (different groups of resources could even be made, ignoring subdivisions, by skipping the if if the type specified to look at equals -1.

Issue: #21 Improve code readability using para

Originally, I had the idea to use the global vector para as a way of storing information easily and using it across all of the models. The vector para would store key information about pretty much everything, then be dynamically updated as need be from higher level functions in the model. In the last two months of coding, I have been specifying parameter names in functions explicitly, which has made sense during the coding process for my own writing, but it will be beneficial to clean all of this up later by reading para into these sub-functions that otherwise have sometimes about a dozen arguments. Most functions would then have considerably fewer arguments, and the description of variables stored as vector elements in para could be immediately defined within sub-functions and used by name thereafter. The whole program would then have a similar feel of reading in key arrays and vectors and then specifying the key variables within sub-functions.

1. Get summary statistics for resources from the observation array

I have now written the functions for case 0 (density-based estimation) getting abundance estimates from the observation array, as outlined yesterday. This took slightly longer than anticipated because it turns out that there was a minor error in R’s estimation of abundances due to an incorrect column being summed. I had to figure out why C and R did not agree on the same abundance estimates; they now do (i.e., both independent codings to get the same estimate from the same observation array). I now need to do the other three types of observation and get abundance estimates from it. In the highest level function for this part of the manager model, the estimate_abundances function is called.

/* =============================================================================
 * This function uses the observation array to estimate resource abundances
 *     obs_array:      The observation array
 *     para:           A vector of parameters needed to handle the obs_array
 *     interact_table: Lookup table to get all types of resource values
 *     agent_array:    Agent array, including managers (agent type 0)
 *     agents:         Total number of agents (rows) in the agents array
 *     obs_x:          Number of rows in the observation array
 *     obs_y:          Number of cols in the observation array
 *     abun_est:       Vector where abundance estimates for each type are placed
 *     int_table_rows: The number of rows in the interact_table
 * ========================================================================== */
void estimate_abundances(double **obs_array, double *para, int **interact_table,
                         double **agent_array, int agents, int obs_x, int obs_y,
                         double *abun_est, int int_table_rows){
    
    int estimate_type, recaptures;
    double abun;
    
    estimate_type = (int) para[8];

    switch(estimate_type){
        case 0:
            dens_est(obs_array, para, agent_array, agents, obs_x, obs_y, 
                     abun_est, interact_table, int_table_rows);
            break;
        case 1:
            recaptures    = (int) para[10];
            break;
        case 2:
            break;
        case 3:
            break;
        default:
            break;
    }
}

The above function will call subfunctions based on estimate type (0 to 3). Now only the density function dens_est has been written and tested.

/* =============================================================================
 * This function calculates density-based abundance estimates
 *     obs_array:      The observation array
 *     para:           A vector of parameters needed to handle the obs_array
 *     agent_array:    Agent array, including managers (agent type 0)
 *     agents:         Total number of agents (rows) in the agents array
 *     obs_array_rows: Number of rows in the observation array obs_array
 *     obs_array_cols: Number of cols in the observation array obs_array
 *     abun_est:       Vector where abundance estimates for each type are placed
 *     interact_table: Lookup table to get all types of resource values
 *     int_table_rows: The number of rows in the interact_table
 * ========================================================================== */
void dens_est(double **obs_array, double *para, double **agent_array, 
              int agents, int obs_array_rows, int obs_array_cols, 
              double *abun_est, int **interact_table, int int_table_rows){
 
    int i, j, resource;
    int view, a_type, land_x, land_y, type1, type2, type3;
    int vision, area, cells, times_obs, tot_obs;
    double prop_obs, estimate;

    a_type    = (int) para[7];  /* What type of agent does the observing  */
    times_obs = (int) para[11];
    land_x    = (int) para[12];
    land_y    = (int) para[13];
    
    view = 0;
    for(i = 0; i < agents; i++){
        if(agent_array[i][1] == a_type){
            view += agent_array[i][8];
        }
    }

    vision  = (2 * view) + 1;
    area    = vision * vision * times_obs;
    cells   = land_x * land_y; /* Plus one needed for zero index */
    tot_obs = 0;
    
    for(resource = 0; resource < int_table_rows; resource++){
        abun_est[resource] = 0;
        if(interact_table[resource][0] == 0){ /* Change when turn off type? */
            type1   = interact_table[resource][1];
            type2   = interact_table[resource][2];
            type3   = interact_table[resource][3];
            tot_obs = res_obs(obs_array, obs_array_rows, obs_array_cols, type1, 
                              type2, type3);
            prop_obs = (double) tot_obs / area;
            estimate = prop_obs * cells;
            
            abun_est[resource] = estimate;
        }
    }
}

The dens_est function above calls the res_obs function, which returns the number of observations for a specific resource type.

/* =============================================================================
 * This function calculates density-based abundance estimates
 *     obs_array:  The observation array
 *     obs_rows:   Number of rows in the observation array obs_array
 *     obs_cols:   Number of cols in the observation array obs_array
 *     type1:      Resources of type 1 being observed
 *     type2:      Resources of type 2 being observed
 *     type3:      Resources of type 3 being observed
 * ========================================================================== */
int res_obs(double **obs_array, int obs_rows, int obs_cols, int type1, 
            int type2, int type3){

    int i, j, obs_count;
    
    obs_count = 0;
 
    for(i = 0; i < obs_rows; i++){
        if( (obs_array[i][1] == type1 || obs_array[i][1] < 0) &&
            (obs_array[i][2] == type2 || obs_array[i][2] < 0) &&
            (obs_array[i][3] == type3 || obs_array[i][3] < 0)
        ){
            for(j = 15; j < obs_cols; j++){
                obs_count += obs_array[i][j];
            }
        }
    }
    return obs_count;
}

The point of the above break-down, aside from making things more readable, is that we might want to get abundance estimates for each resource type – at least have G-MSE produce them even if we pretend that managers cannot see them. When Issue: #20 is resolved, all resources will then be estimated. NOTE: This could be an issue because if a fixed number of resource types are sampled, as with mark-recapture, then it could sample different resources. It might be best to change mark-recapture so that it takes a fixed_obs for each unique resource type, somehow. The point is that it’s easier and more computationally efficient to ignore some data (and not allow managers to notice it) than it is to have to run observation multiple times to re-collect using the same protocol.

Eventually, I also want the obs_array[i][1, 2, or 3] to be able to take -1 as a value in here somehow – basically, I want the counts to be taken to ignore one of a resource’s type. For example, we could imagine wanting to have separate sexes indicated by type2 = 0 or type2 = 1 in column 2 of the obs_array, but perhaps not want managers to actually use this when estimating abundance (alternatively – could combine observations later).

Update: 1 MAY 2017

Manager function to genetic algorithm link

There is a minor conceptual issue regarding the implementation of the genetic algorithm with the manager function. The manager’s actions need to be based on the OBSERVATION array, but stake-holders need not use this information. There are two options for implementing the genetic algorithm regarding observations.

  1. The OBSERVATION array could just be read into the genetic algorithm and not used for stake-holders. This might require the user function to be initially called after the manager function so that an OBSERVATION array exists (or a dummy could be made easily enough in the user.R function.
  2. Manager utilites could be manipulated within the manager funciton before going into the genetic algorithm. This could have its own benefits. The point would be for managers to have some sort of absolute utility for resources, but for this utility to be adjusted before reading the manager’s ACTION and COST arrays into the genetic algirthm to zero in on the actions. For example, if there are too many resources, then the util could be adjusted within manager.c (or manager.R) to be negative, hence making the genetic algorithm select strategies that lower costs on killem actions proportional to how many the manager wants killed.

I think that option 2 is actually a bit faster, and will probably be easier to implement in terms of coding.

Isolating effects of uncertainty

It is worth pointing out in passing that above option 2 offers a very straight-forward way of looking at uncertainty with respect to management decisions. When passing resource abundances to update temporary util values for managers, we could compare the estimates of abundances produced from the observation model to the actual abundances from the resource model. This could be a very simple option in the software, and it might be useful to run the genetic algorithm twice for managers in each time step to simulate side-by-side how decisions would be made in the presence and absence of uncertainty.

Initialising the manger model

To get the ball rolling on the manager function’s implementation of the genetic algorithm, now is as good of a time as any to initialise manager.R and manager.c, since the arguments passed to game.c need to be coaxed into the right form via the manager model. It’s important to keep in mind that I still need to implement the lobbying option for stake-holders, but I think that this will be easier once the manager’s genetic algorithm is built. It’s also notworthy that we’re probably not going to need managers to adjust stake-holder’s utilities. So really, in a pinch, their are three types of actions that are really going to be important, probably always.

  1. Stake-holders actions affect resoure and landscape properties (done)
  2. Stake-holders actions affect manager utilities
  3. Manager actions affect stake-holder costs of performing actions

To this list of three essential types of actions, there are a few additional actions that would be good to have, ideally as fitting within the general framework of the model seemlessely, but if necessary could be add-ons for future development.

  1. Managers actions affect resource and landscape properties (effectively done)
  2. Stake-holders affect the costs of one another
  3. Stake-holders affect the utilities of one another

To this, there are a few more other possible options that I can’t, at the moment, see why anyone would want to model. I’m not entirely sure these are really sensical, actually.

  1. Stake-holders affect the costs of managers’ actions (on resources or other stake-holders costs)
  2. Managers affecting stake-holder utilities

Framework for manager actions

The framework for manager actions in both R and C is now entirely built – data structures can be read in and out, so now all that is left is to do the modelling. I’ve commented what will happen within manager.c; each of the numbers below might or might not represent uniqe sub-functions.

/* Do the biology here now */
/* ====================================================================== */

/* 1. Get summary statistics for resources from the observation array     */
/* 2. Place estimated resource abundances in a vector the size of int_d0  */
/* 3. Initialise new vector of size int_d0 with temp utilities of manager */  
/* 4. Subtract abundances from temp utilities to get marginal utilities   */
/* 5. Insert the marginal utilities into the agent = 1 col1 of ACTION     */
/* 6. Run the genetic algorithm (add extension to interpet cost effects)  */
/* 7. Put in place the new ACTION array from 6                            */
/* 8. Adjust the COST array appropriately from the new manager actions    */

/* This code switches from C back to R */
/* ====================================================================== */        

With all of these in place, the end result should be a new COST array based on manager actions. It will be important to make sure that the manager’s costs are defined appropriately so that the manager doesn’t start doing actions themselves. This could actually be a bit of a problem; if we want the manager to do things themselves, then it’s hard to see why they wouldn’t just perform the actions instead of adjusting the costs. Then again, perhaps this is kind of the point? Maybe sufficiently high costs of actions and sufficiently low costs of policy adjustment should cause the genetic algorithm to naturally find policy as a better means of acheiving what the manager wants. In fact, this seems almost certain; if managers in the real world could achieve all policy aims single-handedly, then that’s what they would probably be hired to do. In the real world, changing policy is more effective – it’s also possible that we could allow them to do their own direct actions to resources in the user model, like the stake-holders. Costs of setting policy could then be independent from costs of doing actions by changing manager COST between models.

Update: 28 APR 2017

Concrete plan for manager fitness function

The next step in the coding is to allow managers to generate policy by using their utilities to affect the costs of other agents. This will require that managers recognise how the actions of other agents will affect resources and the landscape, then adjust costs to encourage agents to act in a particular way. There are several things to keep in mind here.

Some solutions to account for the above

Given that the manager already has a special status in the rest of G-MSE, maybe it’s not too much of a stretch to make their cost-adjusting actions apply to all non-managers by default by making all cost-adjusting rows in the ACTION array (on the manager’s layer) equivalent. Or, even better, the first row, which corresponds to the manager’s own costs (or any agent’s own cost row, since the cost of adjusting their own cost doesn’t really come into play – or really make much sense), could simply define the cost of affecting all stake-holders cost values.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-2 1 0 0 101 101 101 3 8 4 4 2 1
-2 2 0 0 101 101 101 4 3 6 2 3 1
-1 1 0 0 101 101 101 101 101 101 101 101 101
1 1 0 0 101 101 101 101 101 101 101 101 101
1 2 0 0 101 101 101 101 101 101 101 101 101
2 1 0 0 101 101 101 101 101 101 101 101 101
2 2 0 0 101 101 101 101 101 101 101 101 101
3 1 0 0 101 101 101 101 101 101 101 101 101
3 2 0 0 101 101 101 101 101 101 101 101 101

Assume that the above COST array layer corresponds to the manager: agent = 1. In the above COST array, we have three agents (agent = 1 is a manager, while agent = 2 and agent = 3 are stake-holders), two resource types, and one landscape layer. So in rows 4 and 5 above where the first column agent = 1, we have, essentially, costs of what it takes to affect the entire array of stake-holders actions (all layers where agent = -2 or agent = -1) on a particular resource. This value can then be used to directly implement actions in the first two rows of all layers of the ACTION array. Note that the structure of the code does not allow managers to make policy on landscape use – only resources, which might or might not have to be changed. What we’re essentially saying with this is that a manager cannot tell a farmer to not kill or fertilise crops on the farmer’s own land. Only resources are affected by manager policy (we could, of course, make crops resources – though this would be a bit of a time consuming work around). We could find a way around this if need be, but I can’t think of many situations in which we would want a manager to be able to tell a stake-holder that they can’t increase their own crop yield or kill their crops.

Issue #19: Cost and Action arrays: landscape level initialisation

The number of landscape layers in the COST and ACTION arrays is too many – the utility_layer adds one in for every unique resource instead of for every unique landscape layer. This is an easy fix by adding an option to the function to specify landscape layers.

More concrete ideas

A general algorithm-sketch for the managers could be as follows in the fitness function of the genetic algorithm. Note that this will be called in the manager model, not the user model, so there isn’t a worry about the stake-holders updating their actions at the wrong time – their actions will be static while the manager is considering what to do.

  1. Identify the current abundances of all resources based on an analysis of the observation model.
  2. Identify the ideal abundances for management, identified from the agent’s layer of the ACTION array, column util.
  3. Calculate the updated resource abundances predicted from the ACTION matrix of stake-holders (e.g., from all rows where agent = -2). Hence, resources will be decreased by some columns and increased by others based on the past actions of stake-holders. This might or might not also need to include resource abundance projections based on birth and death rates – ideally this would be the case, with projections using estimates from the observation model, but maybe just use the abundances as a first step? Note, the ‘projected’ birth rate need not use the RESOURCE array explicitly, but could be estmated and applied from the history of observation – in fact, this would probably be better.
  4. Apply new costs to all users – have managers assume that higher costs means that actions will decrease proportionally, and lower costs will mean that actions will increase proportionally. It could potentially get more complex than this, but I don’t think that it needs to at the moment; stake-holders will usually be wanting to maximise something rather than balancing between many things, and even if not, costs should still constrain their decisions in line with the direct that the manager is trying to push – the point isn’t to optimise decisions, but to replicate the common sense and expertise of a manager that does not have complete access to other stake-holders minds.
  5. Given the new actions, re-calculate updated resource abundances from 3 to assign fitness.

The algorithm above should be fairly fast, and while it won’t provide the optimal solution for managers, it isn’t actually intended to do so. The point is to find an adaptive strategy based on the tools available to the manager and the limited information that the manager realistically has about resource abundacnes and stake-holder strategies. Following from the above, I dare say that the fitness function for stake-holders affecting manager’s util might be not too difficult, but I’ll need to think carefully about the best way to implement it.

Update: 26 APR 2017

Resolved issue #12

I have finally resolved issue #12, which was always a bit annoying but never terribly serious. The problem was that density-based estimation as done in Nuno et al. (2013) would only plot correctly when times_observe = 1; that is, when managers went out to observe a sample of the population exactly once per time step. Obviously we want the option to allow multiple trips to sample in a single time step, as sampling once (unless the number of cells viewed on the landscape include almost the entire landscape) leads to highly variable results – and even more so now that resource distributions tend to become clumped on the landscape when agents scare them off their land. Previously the proportion estimate used the number of unique resources observed, but what we really needed was the unique observations. By simply summing all values in columns 16 to 16 + times_observed - 1 of the observation array, we get the total number of observations.

Quick note about agents affecting each other’s costs

It’s worth noting that there is no need to save any additional data structure to have agents affect each others costs – at least not at the moment. This is because the user and manager models are separated in a broader time. When a manager uses the genetic algorithm, the ACTION array that they have to use has already been updated for stake-holders, so the managers are effectively seeing the most recent actions of stake-holders and will be able to adjust costs and use the recent actions to predict changes (can perhaps assume proportional allocation of actions, so if the manager makes one action more costly, then the stake-holder will shift to increase other actions – or perhaps this is too much to predict; maybe managers should just assume that increasing cost will decrease an action as if the action isolated. How strategic do we assume managers think?). Likewise, stake-holders are seeing the managers most recent priorities and might lobby them accordingly.

Update: 25 APR 2017

Resolved issue #18

The landscape actions within the user model are now affected by the interaction matrix from the appropriate diagonal element. This effectively adjust the effect of a user’s actions to increase a cell value by some magnitude. For example, if an agent wants to increase their crop yield they will not do so by the current cell yield plus one times whatever the appropriate element is in the interaction matrix (default could be one – doubling yield). Initial testing shows that this works as intended; stake-holders interested in maximising crop yield do so reliably when they can increase yield on a cell twenty-fold; mean crop yield on the whole landscape increases in turn. When the increase is smaller (50%), then a range of strategies appears possible – one stake-holder chose to kill resources while the other chose to directly increase yield (dependent on costs, which varied among stake-holders). Next, the plan is to address some of the minor clean-up tasks (bulleted list from yesterday) before getting to the ultimate goal of allowing agents to affect one another’s costs in the genetic algorithm.

Resolved issue #11

I appear to have resolved issue #11 by calling the a_mover function in observation.c from within the anecdotal function. This gives the option to move agents when the R function anecdotal is called during a time step. Later I might also consider giving the option to specify moving agents onto land that they own; this would probably be best accomplished by calling send_agents_home, which is currently in user.c, but could be moved to utilities.c. Tests of anecdotal in R confirm that it is moving agents as expected.

Update: 24 APR 2017

I have begun to implement actions on the landscape as an option. For now, these actions will include increasing crop yield directly in some way (magnitude to be affected by the interaction array, see below), and killing crops.

New issue #18: Make landscape actions from interaction array

It will be helpful to link the appropriate element of the interaction array (Jacobian matrix) to the actions in the landscape_actions function in user.c. As of now, the amount of increase in crop yield (and decrease) is hard-coded in the function, but it really should be linked with the appropriate diagonal element in the interaction array – increasing or decreasing a cell’s value by the magnitude in the array element.

More testing, success

More testing shows that the genetic algorithm and user function is working as intendend, and I have looped the genetic algorithm so that it is run for all simulated agents with now issues – even when landscape dimensions or agent number changes. There are, however, some minor things that need to be tweaked.

Update: 20 APR 2017

The minor bug from yesterday has been resolved. The following bet of code needed to be within the larger loop that cycled through the population of agents in the genetic algorithm.

for(i = 0; i < interest_num; i++){
    count_change[i] = 0; /* Initialise all count changes at zero */
    utilities[i]    = 0; /* Same for utilities */
}

The count changes and utilities were not being initialised at zero, meaning that count_change was cumulative over agents. When this is fixed, and agents highly value the resource (utility = 100), they either evolve to feedem or helpem as much as possible, as reflected in the ACTION array.

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0  100    1    1    0    0     0    92     0     0
[2,]   -1    1    0    0    1    1    1    0    0     0     0     0     0
[3,]    1    1    0    0    0    0    0    0    0     0     0     0     0
[4,]    2    1    0    0    0    0    0    0    0     0     0     1     0

So with this very simplified test, the function is doing what it is supposed to do. Next, I fixed the utility of the landscape to 100 to see if the agent recognises that it can increase crop yield by killing or scaring the resource via the interaction array.

Test of killing resource to maximise crop yield

After some further debugging, the agents in the genetic algorithm now figure out to kill resources when resources destroy crops.

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0    1    1    1   94    0     0     0     0     0
[2,]   -1    1    0    0  100    1    1    0    0     0     0     0     0
[3,]    1    1    0    0    0    0    0    0    0     0     0     0     0
[4,]    2    1    0    0    0    0    0    0    0     0     0     0     1

In the above ACTION array, utility of the crop yield is 100, and the interaction matrix indicates that resource of type1 = 0 decreases crop yield on a cell by one half. In response to this, the stake-holders find the solution of killing resources on their land (indicated by the 94 in the eight column above). THe code for doing this is not terribly readable at the moment.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent by 
 * passing through a loop to payoffs_to_fitness
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     agent_array: The agent array
 *     jaco: The jacobian matrix of resource and landscape interactions
 *     interact_table: Lookup table for figuring out rows of jaco and types
 *     interest_num: The number of rows and cols in jac, and rows in lookup
 * ========================================================================== */
void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double **agent_array, double **jaco,
                      int **interact_table, int interest_num){
    
    int agent, i, row, act_type, type1, type2, type3, interest_row;
    double agent_fitness, *count_change, foc_effect;
    double movem, castem, killem, feedem, helpem;
    double utility, *utilities;
    
    count_change = malloc(interest_num * sizeof(int));
    utilities    = malloc(interest_num * sizeof(int));
    
    for(agent = 0; agent < pop_size; agent++){
        for(i = 0; i < interest_num; i++){
            count_change[i] = 0; /* Initialise all count changes at zero */
            utilities[i]    = 0; /* Same for utilities */
        }
        for(row = 0; row < ROWS; row++){
            foc_effect = 0;
            act_type   = (int) population[row][0][agent];
            type1      = population[row][1][agent];
            type2      = population[row][2][agent];
            type3      = population[row][3][agent];
            utility    = population[row][4][agent];
            movem      = population[row][7][agent];
            castem     = population[row][8][agent];
            killem     = population[row][9][agent];
            feedem     = population[row][10][agent];
            helpem     = population[row][11][agent];
            switch(act_type){
                case -2:
                    foc_effect -= movem;  /* Times birth to account for repr? */
                    foc_effect -= castem; /* But only remove E offspring? */
                    foc_effect -= killem; /* But also remove E offspring? */
                    foc_effect += feedem; /* But should less mortality */
                    foc_effect += helpem; /* But should affect offspring? */
                    interest_row = 0;
                    while(interest_row < interest_num){
                        if(interact_table[interest_row][0] == 0     &&
                           interact_table[interest_row][1] == type1 &&
                           interact_table[interest_row][2] == type2 &&
                           interact_table[interest_row][3] == type3
                        ){
                            break;
                        }else{
                            interest_row++;
                        }
                    }
                    for(i = 0; i < interest_num; i++){
                        count_change[i] += foc_effect * jaco[interest_row][i];
                    }
                    utilities[interest_row] = utility;
                case -1:
                    interest_row = 0;
                    while(interest_row < interest_num){
                        if(interact_table[interest_row][0] == 1     &&
                           interact_table[interest_row][1] == type1 &&
                           interact_table[interest_row][2] == type2 &&
                           interact_table[interest_row][3] == type3
                        ){
                            break;
                        }else{
                            interest_row++;
                        }
                    }
                    utilities[interest_row] = utility;
                    break; /* Add landscape effects here */
                default:
                    break;
            }
        }
        fitnesses[agent] = 0;
        for(i = 0; i < interest_num; i++){
            fitnesses[agent] += count_change[i] * utilities[i];
        }
    }
    free(utilities);
    free(count_change);
}

The above can be greatly simplified and made clearer, with the goal towards simple fitness functions for the case in which agents directly affect resources or crops. The indirect interactions will be sub-functions in the above called in the switch where case is greater than zero.

Breaking down the strategy_fitness function

I’ve broken down the strategy_fitness function into three more manageable functions that can be further developed as necessary. The strategy_fitness function now calls functions thatupdate the count_change and utilities arrays as a result of direct actions to resources and the landscape.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent by 
 * passing through a loop to payoffs_to_fitness
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     agent_array: The agent array
 *     jaco: The jacobian matrix of resource and landscape interactions
 *     interact_table: Lookup table for figuring out rows of jaco and types
 *     interest_num: The number of rows and cols in jac, and rows in lookup
 * ========================================================================== */
void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double **agent_array, double **jaco,
                      int **interact_table, int interest_num){
    
    int agent, i, row, act_type, type1, type2, type3, interest_row;
    double agent_fitness, *count_change, foc_effect;
    double movem, castem, killem, feedem, helpem;
    double utility, *utilities;
    
    count_change = malloc(interest_num * sizeof(int));
    utilities    = malloc(interest_num * sizeof(int));
    
    for(agent = 0; agent < pop_size; agent++){
        for(i = 0; i < interest_num; i++){
            count_change[i] = 0; /* Initialise all count changes at zero */
            utilities[i]    = 0; /* Same for utilities */
        }
        for(row = 0; row < ROWS; row++){
            act_type   = (int) population[row][0][agent];
            switch(act_type){
                case -2:
                    res_to_counts(population, interact_table, interest_num,
                                  count_change, utilities, jaco, row, agent);
                    break;
                case -1:
                    land_to_counts(population, interact_table, interest_num,
                                   utilities, row, agent);
                    break; 
                default:
                    break;
            }
        }
        fitnesses[agent] = 0;
        for(i = 0; i < interest_num; i++){
            fitnesses[agent] += count_change[i] * utilities[i];
        }
    }
    free(utilities);
    free(count_change);
}

The case -2 calls res_to_counts below.

/* =============================================================================
 * This function updates count change and utility arrays for direct actions on 
 * resources
 *     population: The population array of agents in the genetic algorithm
 *     interact_table: The lookup table for figuring out how resources interact
 *     int_num: The number of rows and cols in jac, and rows in the lookup
 *     count_change: A vector of how counts have changed as a result of actions
 *     utilities: A vector of the utilities of each resource/landscape level
 *     jaco: The interaction table itself (i.e., Jacobian matrix)
 *     row: The row of the interaction and lookup table being examined
 *     agent: The agent in the population whose fitness is being assessed
 * ========================================================================== */
void res_to_counts(double ***population, int **interact_table, int int_num,
                   double *count_change, double *utilities, double **jaco,
                   int row, int agent){
    
    int i, act_type, interest_row;
    double foc_effect;
    
    foc_effect  = 0.0;
    foc_effect -= population[row][7][agent];  /* Times birth account for repr?*/
    foc_effect -= population[row][8][agent]; /* But only remove E offspring? */
    foc_effect -= population[row][9][agent]; /* But also remove E offspring? */
    foc_effect += population[row][10][agent]; /* But should less mortality */
    foc_effect += population[row][11][agent]; /* But should affect offspring? */
    interest_row = 0;
    while(interest_row < int_num){
        if(interact_table[interest_row][0] == 0                         &&
           interact_table[interest_row][1] == population[row][1][agent] &&
           interact_table[interest_row][2] == population[row][2][agent] &&
           interact_table[interest_row][3] == population[row][3][agent]
          ){
               break;
           }else{
               interest_row++;
        }
    }
    for(i = 0; i < int_num; i++){
        count_change[i] += foc_effect * jaco[interest_row][i];
    }
    utilities[interest_row] = population[row][4][agent];
}

And the case -1 calls land_to_counts below.

/* =============================================================================
 * This function updates count change and utility arrays for direct actions on 
 * a landscape
 *     population: The population array of agents in the genetic algorithm
 *     interact_table: The lookup table for figuring out how resources interact
 *     int_num: The number of rows and cols in jac, and rows in the lookup
 *     utilities: A vector of the utilities of each resource/landscape level
 *     row: The row of the interaction and lookup table being examined
 *     agent: The agent in the population whose fitness is being assessed
 * ========================================================================== */
void land_to_counts(double ***population, int **interact_table, int int_num,
                    double *utilities, int row, int agent){
    
    int i, act_type, interest_row;
    double foc_effect;
    
    interest_row = 0;
    while(interest_row < int_num){
        if(interact_table[interest_row][0] == 1     &&
           interact_table[interest_row][1] == population[row][1][agent] &&
           interact_table[interest_row][2] == population[row][2][agent] &&
           interact_table[interest_row][3] == population[row][3][agent]
          ){
               break;
           }else{
               interest_row++;
           }
    }
    utilities[interest_row] = population[row][4][agent];
}

For each sub-case, how the population array is interpreted can be specialised. For example, if castem doesn’t really mean anything on the landscape, then it can simply be ignored and agents will adapt by not doing it. In this sense, these two sub-functions become easy things to tinker with for translating actions to utilities.

Version v0.0.9: A working genetic algorithm

I moved the function do_actions and its dependency resource_actions to the user.c file so that the actions of a particular agent could be performed on the actual population after the genetic algorithm simulated and selected an adaptive strategy. As a test drive, I simulated the actions of only one stake-holder who is trying to maximise crop yield, and whose only avenue for doing so is getting resources of their land one way or another. The figure below shows the output.

Output when agents can make some adaptive decisions

Output when agents can make some adaptive decisions

The figure tells an interesting story. The light blue individual in the right-hand panels represents a farmer, who has quickly figured out that little black dots on their farm are decreasing crop yield, which the farmer wants to maximise. Initially, the growing population of black dots causes crop yield to decline, but by generation 8 or 9, the farmer has opted to scare these dots to public land. Consequently, mean farm yield over all land goes up a bit (due to intraspecific competition between black dots), and almost all of the crop damage occurs on the public land (dark blue) while the farmer’s land (light blue) has better yield. The spatial distribution of the black dots is very easy to see – all of the back dots have been ‘scared’ into the public land.

This is exciting – we have a working model in which a genetic algorithm is being used to identify and enact a stake-holders strategy given their specific interests. The only major conceptual hurdle now is likely to be the manager’s response, enacting policy by affecting costs of actions, and stake-holders actions that affect other agent’s costs (e.g., lobbying a manager). This isn’t even much of a jump though – really, the framework is in place and a lot of the work from here is just grunt work in terms of coding the specifics what options will be available to what agents. It shouldn’t be too long before we have a working model of conflict that can be applied to real-world case studies. Hence, I’m calling this v0.0.9 and pushing to master. This implementation of the genetic algorithm is also not noticably slower than previous versions – it took about a second to run the above; my goal is to keep it low.

The next step will be to figure out what options should be available for directly affecting the landscape, and what needs to be done to apply the genetic algorithm to costs of other agents actions (hooks for this are already coded in switch functions of the genetic algorithm). I would also like to build the manager.c function with the ability to empirically derive the Jacobian matrix, and (eventually) make it possible for agents to consider the histories of each others actions (shouldn’t be too much of a stretch, but this is an extension of the genetic algorithm that can come later).

Update: 19 APR 2017

It’s worth pointing out that the interaction array from yesterday’s make_interaction_array function can be defined more generally as a Jacobian matrix. I think it’s worth doing this for the sake of clarity and generality, and thinking about the elements of the array as first order partial derivatives.

Planning with the Jacobian matrix

Note that one benefit of individual-based modelling is that each individual can be unique – for example, an individual’s consumption rate of crops does not have to be completely defined by its type; there can be individual variation within types too. Hence, it is probably undesirable to have the Jacobian matrix of type and landscape cell layers define how individual interactions should occur. There should be some variation and uncertainty at least as an option. Hence, the interaction array should probably be calculated a posteriori as much as possible – ideally from looking at interactions on the landscape (e.g., by the eventual manager.c), or perhaps a function should go through the resource and landscape arrays and figure out the average interaction for each data type somewhere within G-MSE (but not from within the genetic algorithm; it would take too long). These details can be worked out later in the manager.c file, or perhaps somehow with the anecdotal function, which I think now can be made more general. For now, I’m going to manually set values in the matrix and use them to build an efficient genetic algorithm.

Dealing with issues of order in the fitness function

Resource type order needs to be identified for all resource types and landscape layers. The easiest way to do this is to just have a new array that lists all resource and landscape types, such as the below.

Res Type 1 Type 2 Type 3
1 1 0 0
1 2 0 0
0 1 0 0

The first column just identifies whether or not the row refers to a resource or a landscape level. The second through fourth columns identify a type (2 and 3 are always zero for landscape levels). This strikes me as the most clear way of keeping track of which rows go with which types in both the Jacobian matrix and the resource array, which I eventually will want to include columns associated with each row, potentially?

The table is initialised with a simple function now in initialise.R to be called from the main gmse.R.

#' Initialise array of resource and landscape-level interactions
#'
#'@param resources the resource array
#'@param landscape the landscape array
#'@export
make_interaction_table <- function(resources, landscape){
    
    resource_types      <- unique(resources[,2:4]);
    resource_part       <- matrix(data=0, nrow=dim(resource_types)[1], ncol=4);
    resource_part[,2:4] <- resource_types;
    
    landscape_count    <- dim(landscape)[3] - 2; # Again, maybe all in later?
    landscape_part     <- matrix(data = 0, nrow = landscape_count, ncol = 4);
    landscape_part[,1] <- 1;
    landscape_part[,2] <- 1:landscape_count;
    
    the_table <- rbind(resource_part, landscape_part);
}

The table, along with the Jacobian matrix, is now passed to the user function and into the genetic algorithm where it can be used by the fitness function.

A revised fitness function

A revised fitness function is below, which has not passed unit tests because it doesn’t appear to be maximising utility correctly. There is likely one or more minor bugs in the code that need to be fixed, and it would be better anyway to break the below down into a couple smaller functions anyway.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent by 
 * passing through a loop to payoffs_to_fitness
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     agent_array: The agent array
 *     jaco: The jacobian matrix of resource and landscape interactions
 *     interact_table: Lookup table for figuring out rows of jaco and types
 *     interest_num: The number of rows and cols in jac, and rows in lookup
 * ========================================================================== */
void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double **agent_array, double **jaco,
                      int **interact_table, int interest_num){
    
    int agent, i, row, act_type, type1, type2, type3, interest_row;
    double agent_fitness, *count_change, foc_effect;
    double movem, castem, killem, feedem, helpem;
    double utility, *utilities;
    
    count_change = malloc(interest_num * sizeof(int));
    utilities    = malloc(interest_num * sizeof(int));
    
    for(i = 0; i < interest_num; i++){
        count_change[i] = 0; /* Initialise all count changes at zero */
        utilities[i]    = 0; /* Same for utilities */
    }
    
    for(agent = 0; agent < pop_size; agent++){
        for(row = 0; row < ROWS; row++){
            foc_effect = 0;
            act_type   = (int) population[row][0][agent];
            type1      = population[row][1][agent];
            type2      = population[row][2][agent];
            type3      = population[row][3][agent];
            utility    = population[row][4][agent];
            movem      = population[row][7][agent];
            castem     = population[row][8][agent];
            killem     = population[row][9][agent];
            feedem     = population[row][10][agent];
            helpem     = population[row][11][agent];
            switch(act_type){
                case -2:
                    foc_effect -= movem;  /* Times birth to account for repr? */
                    foc_effect -= castem; /* But only remove E offspring? */
                    foc_effect -= killem; /* But also remove E offspring? */
                    foc_effect += feedem; /* But should less mortality */
                    foc_effect += helpem; /* But should affect offspring? */
                    interest_row = 0;
                    while(interest_row < interest_num){
                        if(interact_table[interest_row][1] == type1 &&
                           interact_table[interest_row][2] == type2 &&
                           interact_table[interest_row][3] == type3
                        ){
                            break;
                        }else{
                            interest_row++;
                        }
                    } /* Found the right row in the look-up table */
                    for(i = 0; i < interest_num; i++){
                        count_change[i] += foc_effect * jaco[interest_row][i];
                    }
                    utilities[interest_row] = utility;
                case -1:
                    break; /* Add landscape effects here */
                default:
                    break;
            }
        }
        fitnesses[agent] = 0;
        for(i = 0; i < interest_num; i++){
            fitnesses[agent] += count_change[i] * utilities[i];
        }
        
        /* The below will be removed -- once a minor bug is found */
        /* fitnesses[agent] = population[0][12][agent]; */
    }
    
    
    free(utilities);
    free(count_change);
}

Nevertheless, this is definitely some progress – and the code is still fast. The next step is to print output from the above function to track down what is incorrect.

Update: 18 APR 2017

An additional thought that could be useful for the genetic algorithm is that it might make sense for the AGENT array to also include abundances of each resource type and landscape level in columns at the end of the array, the order of which matches the order of the 2D array described yesterday. Something like the former anecdotal function could then be used to fill in abundance values as appropriate (e.g., matching resource on the agent’s owned land, or on public land, or nearby to the location of the agent).

Initialise interaction array

A new function in R initialises an array of interactions among resource types and landscape layers.

#' Initialise array of resource and landscape-level interactions
#'
#'@param resources the resource array
#'@param landscape the landscape array
#'@export
make_interaction_array <- function(resources, landscape){
    resource_types  <- unique(resources[,2:4]);
    resource_count  <- dim(resource_types)[1];
    landscape_count <- dim(landscape)[3] - 2; # Maybe put all of them in later?
    total_dims      <- resource_count + landscape_count;
    INTERACTIONS    <- matrix(data = 0, nrow = total_dims, ncol = total_dims);
    
    name_vec <- NULL;                                   
    for(i in 1:dim(resource_types)[1]){
        name_vec <- c( name_vec, 
                       paste(resource_types[i,1],
                             resource_types[i,2],
                             resource_types[i,3],
                             sep = "" )
                      );                                   
    }            
    name_vec <- c(name_vec, as.character(paste("L",1:landscape_count,sep="")));
    rownames(INTERACTIONS) <- name_vec;
    colnames(INTERACTIONS) <- name_vec;
    return(INTERACTIONS);
}

Specific values can be added in outside the make_interaction_array function and updated as need be by G-MSE.

Update: 17 APR 2017

It’s a bit painful, but I’m going to delete some major pieces of code in the genetic algorithm (which will obviously be preserved in version control). The following functions are slowing things down, and given the new approach outlined in option 3 from 13 APR, I’m going to remove them and focus on the ACTION array only, assuming that agents act as if their actions will yield the intended results.

/* =============================================================================
 * This function calculates an individual agent's fitness
 * ========================================================================== */
double calc_agent_fitness(double ***population, int ROWS, int COLS, 
                          int landowner, double ***landscape, 
                          double **resources, int res_number, int land_x, 
                          int land_y, int land_z, int trait_number,
                          double *fitnesses, double *paras){
    
    int agent, resource, resource_new, trait, row, col, xloc, yloc, zloc;
    int res_on_land, res_nums_added, res_nums_subtracted, res_num_total;
    double *payoff_vector, *payoffs_after_actions, *payoff_change;
    double **TEMP_RESOURCE, **TEMP_ACTION, ***TEMP_LANDSCAPE;
    double **ADD_RESOURCES, **NEW_RESOURCES;
    double a_fitness;
    
    payoff_vector         = malloc(ROWS * sizeof(double));
    payoffs_after_actions = malloc(ROWS * sizeof(double));
    payoff_change         = malloc(ROWS * sizeof(double));
    
    /* --- Make tempororary resource, action, and landscape arrays below --- */
    TEMP_RESOURCE    = malloc(res_number * sizeof(double *));
    for(resource = 0; resource < res_number; resource++){
        TEMP_RESOURCE[resource] = malloc(trait_number * sizeof(double));   
    } 
    for(resource = 0; resource < res_number; resource++){
        for(trait = 0; trait < trait_number; trait++){
            TEMP_RESOURCE[resource][trait] = resources[resource][trait];
        }
    } 
    
    TEMP_ACTION = malloc(res_number * sizeof(double *));
    for(row = 0; row < ROWS; row++){
        TEMP_ACTION[row] = malloc(COLS * sizeof(double));   
    }    
    for(row = 0; row < ROWS; row++){
        for(col = 0; col < COLS; col++){
            TEMP_ACTION[row][col] = population[row][col][landowner];
        }
    }
    
    TEMP_LANDSCAPE = malloc(land_x * sizeof(double *));
    for(xloc = 0; xloc < land_x; xloc++){
        TEMP_LANDSCAPE[xloc] = malloc(land_y * sizeof(double *));
        for(yloc = 0; yloc < land_y; yloc++){
            TEMP_LANDSCAPE[xloc][yloc] = malloc(land_z * sizeof(double));   
        }
    } 
    for(zloc = 0; zloc < land_z; zloc++){
        for(yloc = 0; yloc < land_y; yloc++){
            for(xloc = 0; xloc < land_x; xloc++){
                TEMP_LANDSCAPE[xloc][yloc][zloc] = landscape[xloc][yloc][zloc];
            }
        }
    }
    
    /* ----------------------------------------------------------- */
    
    calc_payoffs(TEMP_ACTION, ROWS, landscape, TEMP_RESOURCE, res_number, 
                 landowner, land_x, land_y, payoff_vector);
   
    do_actions(landscape, TEMP_RESOURCE, land_x, land_y, TEMP_ACTION, ROWS, 
               landowner, res_number, COLS);
    
    /* =====  Below re-creates key parts of the resource model ===== */
    project_res_abund(TEMP_RESOURCE, paras, res_number);
   
    res_nums_added      = 0;
    res_nums_subtracted = 0;
    for(resource = 0; resource < res_number; resource++){
        res_nums_added += TEMP_RESOURCE[resource][10];
        if(TEMP_RESOURCE[resource][8] < 0){
            res_nums_subtracted += 1;
        }
    }
    
    ADD_RESOURCES = malloc(res_nums_added * sizeof(double *));
    for(resource = 0; resource < res_nums_added; resource++){
        ADD_RESOURCES[resource] = malloc(trait_number * sizeof(double));   
    }
    
    res_place(ADD_RESOURCES, TEMP_RESOURCE, res_nums_added, res_number, 
              trait_number, 10, 11);
     
    res_num_total  = res_number + res_nums_added - res_nums_subtracted;
    
    NEW_RESOURCES = malloc(res_num_total * sizeof(double *));
    for(resource = 0; resource < res_num_total; resource++){
        NEW_RESOURCES[resource] = malloc(trait_number * sizeof(double));   
    }   
    
    resource_new = 0;
    for(resource = 0; resource < res_number; resource++){
        if(TEMP_RESOURCE[resource][8] >= 0){
            for(trait=0; trait < trait_number; trait++){
                NEW_RESOURCES[resource_new][trait] = 
                    TEMP_RESOURCE[resource][trait];
            }
            resource_new++; 
        }
    }
    for(resource = 0; resource < res_nums_added; resource++){
        for(trait = 0; trait < trait_number; trait++){
            NEW_RESOURCES[resource_new][trait] = ADD_RESOURCES[resource][trait];
        }
        resource_new++;
    }
 
    res_landscape_interaction(NEW_RESOURCES, 1, 1, 8, res_num_total, 14, 
                              TEMP_LANDSCAPE, 1);
    
    /* ============================================================*/

    calc_payoffs(TEMP_ACTION, ROWS, landscape, NEW_RESOURCES, res_num_total, 
                 landowner, land_x, land_y, payoffs_after_actions);
    
    a_fitness = payoffs_to_fitness(TEMP_ACTION, ROWS, payoffs_after_actions);

    /* ----------------------------------------------------------- */
   
    for(resource = 0; resource < res_num_total; resource++){
        free(NEW_RESOURCES[resource]);
    }
    free(NEW_RESOURCES);
 
    for(resource = 0; resource < res_nums_added; resource++){
        free(ADD_RESOURCES[resource]);
    }
    free(ADD_RESOURCES);
    for(xloc = 0; xloc < land_x; xloc++){
        for(yloc = 0; yloc < land_y; yloc++){
            free(TEMP_LANDSCAPE[xloc][yloc]);   
        }
        free(TEMP_LANDSCAPE[xloc]);        
    }
    free(TEMP_LANDSCAPE); 
    for(row = 0; row < ROWS; row++){
        free(TEMP_ACTION[row]);
    }
    free(TEMP_ACTION);
    for(resource = 0; resource < res_number; resource++){
        free(TEMP_RESOURCE[resource]);
    }
    free(TEMP_RESOURCE);
    free(payoff_change);
    free(payoffs_after_actions);
    free(payoff_vector);
    
    return a_fitness;
}

The above re-creation of the resouce model was particularly slow – essentially running a big chunk of resource.c 2000 times (once for each of 100 simulated agents in the genetic algorithm for 20 generations). With more stake-holders or longer convergence times, this would become very time-consuming without much benefit.

The above function calls project_res_abund (below) which is no longer needed.

/* =============================================================================
 * This function looks at the resources and projects how many new resources
 * their will be after deaths and births.
 *     resources: The resource array
 *     paras: Relevant parameter values
 *     res_number: The number of rows in the resource array
 * ========================================================================== */
void project_res_abund(double **resources, double *paras, int res_number){

    int birthtype, deathtype;
    int birth_K, death_K;
    int resource;
    
    birthtype = (int) paras[3];
    deathtype = (int) paras[4];
    birth_K   = (int) paras[5];
    death_K   = (int) paras[6];
    
    res_add(resources, res_number, 9, birthtype, birth_K);
        
    res_remove(resources, res_number, 8, deathtype, death_K);
        
}

The function calc_agent_fitness also calls calc_payoffs, which can be removed.

/* =============================================================================
 * This function calculated each payoff for rows in the action matrix
 *     population: array of the population that is made (malloc needed earlier)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     landscape: The landscape array
 *     resources: The resource array
 *     res_number: The number of rows in the resource array
 *     landowner: The agent ID of interest -- also the landowner
 *     land_x: The x dimension of the landscape
 *     land_y: The y dimension of the landscape
 *     payoff_vector: A vector of payoffs for each row of the action array
 * ========================================================================== */
void calc_payoffs(double **population, int ROWS, double ***landscape, 
                  double **resources, int res_number, int landowner,
                  int land_x, int land_y, double *payoff_vector){
    
    int xloc, yloc, yield_layer;
    int resource, row;
    int landscape_specific;
    int res_count;
    double cell_yield;
    
    for(row = 0; row < ROWS; row++){
        payoff_vector[row] = 0;
        if(population[row][0] == -2){
            for(resource = 0; resource < res_number; resource++){
                if(population[row][1] == resources[resource][1]  &&
                   population[row][2] == resources[resource][2]  &&
                   population[row][3] == resources[resource][3]
                ){    
                    landscape_specific = population[row][6];
                    if(landscape_specific == 0){
                        res_count++;
                    }else{
                        xloc = resources[resource][4];
                        yloc = resources[resource][5];
                        if(landscape[xloc][yloc][2] == landowner){
                            res_count++;    
                        }
                    }
                }
            }
            payoff_vector[row] += res_count;
        }
        if(population[row][0] == -1){
            yield_layer = population[row][1];
            for(xloc = 0; xloc < land_x; xloc++){
                for(yloc = 0; yloc < land_y; yloc++){
                    if(landscape[xloc][yloc][2] == landowner){
                        cell_yield = landscape[xloc][yloc][yield_layer];
                        payoff_vector[row] += cell_yield;
                    }
                }
            }
        }
        if(population[row][0] > -1){
            payoff_vector[row] = 0;
        }
    }
}

I’m leaving in the functions do_actions and resource_actions, which, while not part of the genetic algorithm now, might be used in user.c to to enact the strategies selected by the genetic algorithm.

In place of all these functions, I’m going to write a modified version of payoffs_to_fitness (below, which will also be removed) called actions_to_fitness, which will need the ACTION array and RESOURCE array to return a value the_fitness.

/* =============================================================================
 * This function translates resouce abundances and crop yields to the fitness
 * of an agent
 *     action: The action array
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     payoffs: Payoffs associated with each row of the action arrray
 * ========================================================================== */
double payoffs_to_fitness(double **action, int ROWS, double *payoffs){
    int row;
    double utility, abundance, the_fitness;
    
    for(row = 0; row < ROWS; row++){
        utility      = action[row][4];
        abundance    = payoffs[row];
        the_fitness += utility * abundance;
    }
    
    return the_fitness;
}

The idea will be to have agents assume that their actions will have the intended results (e.g., killing 5 resources) without using the entire resource model to project whether or not this is really expected (e.g., if only 3 resources are avaialble to kill). Since the ACTION array includes utilities, we can multiply the assumed action effects times utility to calculate fitness. One necessary added complication is that there needs to be some way to model indirect effects on fitness, for example, if resources increase or decrease crop cell values (or other resource abundances) or vice versa. There needs to be some way for agents to recognise that they can, e.g., kill resources to increase crop yield. Rather than go through the computationally intense task of replicating full interactions within the genetic algorithm, I think it would be better to have G-MSE create a 2D array that identifies the effect of each resource type or landscape layer on each other resource type or landscape layer. This array wouldn’t need to be re-created ex nihilo every time the genetic algorithm is run, but could instead be either produced at a higher level from the parameters of the genetic algorithm, or perhaps calculated somehow in the manager function (not yet written). Hence, the consequences of an action on any given resource type or landscape layer could be followed through the 2D array instead by re-creating the resource algorithm. This would allow us to directly manipulate error as well, if for example some stake-holders don’t recognise certain consequences of affecting one or another resource. For proof of concept, only a two by two array needs to be used.

Resource_1 Landscape_1
Resource_1 0 -0.5
Landscape_1 0.1 0

Where the rows above are the focal thing of interest and the columns show what the focal thing is having an effect on, per capita. This could be challenging because the per capita effect might vary with resource abundances, and might be factored through other parameters (e.g., landscape cells affecting resource birth or death). Getting expected change in abundance could be a bit challenging, though would be certainly less computationally intense than the way I was doing it before. I’ll start with using defined parameter values for proof of concept, but I do think that this array would be best built in the manager model, perhaps with multiple options that can incorporate error and uncertainty.

Update: 13 APR 2017

Double-check resource functions

It’s now time to simulate the recreation of the RESOURCE array within the genetic algorithm, so it was useful to re-check the resource functions to remember what they RESOURCE array looks like after the resource model portion of G-MSE and why. The genetic algorithm needs to simulate births and deaths, making the code below from resources.c particularly relevant.

for(resource = 0; resource < rows; resource++){
    res_adding[resource][realised] = 0;
    rand_pois = rpois(res_adding[resource][add]);
    res_adding[resource][realised] = rand_pois;
    added += (int) rand_pois;
}

The above is in a switch function that is currently superfluous but might later model different types of reproduction. Hence it is probably best to just run the whole function res_add, which will add the number of new resource each existing resource produces to column 10 in C.

In fact, it will probably be considerably cleaner and more readable to just make the biology-centred part of the whole resource function in resource.c its own function, resource_dynamics, then run resource_dynamics in the genetic algorithm with appropriate links. As a bonus, this would take care of the landscape-level effects of resources too.

Calculate agent fitness function almost finished

The function calc_agent_fitness is almost complete, which will be an initial draft of the genetic algorithm after I write in the code to translate resource abundances and crop yields to realised utilities. The meat of the function (excluding intialisation an memory management) is below

    /* ----------------------------------------------------------- */
    
    calc_payoffs(TEMP_ACTION, ROWS, landscape, TEMP_RESOURCE, res_number, 
                 landowner, land_x, land_y, payoff_vector);
   
    do_actions(landscape, TEMP_RESOURCE, land_x, land_y, TEMP_ACTION, ROWS, 
               landowner, res_number, COLS);
    
    /* =====  Below re-creates key parts of the resource model ===== */
    project_res_abund(TEMP_RESOURCE, paras, res_number);
   
    res_nums_added      = 0;
    res_nums_subtracted = 0;
    for(resource = 0; resource < res_number; resource++){
        res_nums_added += TEMP_RESOURCE[resource][10];
        if(TEMP_RESOURCE[resource][8] < 0){
            res_nums_subtracted += 1;
        }
    }
    
    ADD_RESOURCES = malloc(res_nums_added * sizeof(double *));
    for(resource = 0; resource < res_nums_added; resource++){
        ADD_RESOURCES[resource] = malloc(trait_number * sizeof(double));   
    }
    
    res_place(ADD_RESOURCES, TEMP_RESOURCE, res_nums_added, res_number, 
              trait_number, 10, 11);
     
    res_num_total  = res_number + res_nums_added - res_nums_subtracted;
    
    NEW_RESOURCES = malloc(res_num_total * sizeof(double *));
    for(resource = 0; resource < res_num_total; resource++){
        NEW_RESOURCES[resource] = malloc(trait_number * sizeof(double));   
    }   
    
    resource_new = 0;
    for(resource = 0; resource < res_number; resource++){
        if(TEMP_RESOURCE[resource][8] >= 0){
            for(trait=0; trait < trait_number; trait++){
                NEW_RESOURCES[resource_new][trait] = 
                    TEMP_RESOURCE[resource][trait];
            }
            resource_new++; 
        }
    }
    for(resource = 0; resource < res_nums_added; resource++){
        for(trait = 0; trait < trait_number; trait++){
            NEW_RESOURCES[resource_new][trait] = ADD_RESOURCES[resource][trait];
        }
        resource_new++;
    }
 
    res_landscape_interaction(NEW_RESOURCES, 1, 1, 8, res_num_total, 14, 
                              TEMP_LANDSCAPE, 1);
    
    /* ============================================================*/

    calc_payoffs(TEMP_ACTION, ROWS, landscape, NEW_RESOURCES, res_num_total, 
                 landowner, land_x, land_y, payoffs_after_actions);

    /* Need a calc_utilities function */
    
    /* ----------------------------------------------------------- */

The next step is to write the calc_utilities. Overall, the whole program is noticeably slower, so I will want to optimise a bit if possible. I also need to do some unit testing for all of this to make sure that the genetic algorithm is doing what I intend it to do.

Moving forward: optimsation and error in ACTION

Having completed some initial coding and testing, there is a lot to do on everything downstream of calc_agent_fitness. The function doesn’t appears to alter agent utilities somehow, and slows down the simulations dramatically, from about half a second to several minutes to get through 100 generations. Options for addressing this include:

  1. Fix whatever is causing the utilities to change, tweak the code as much as possible to speed things up, and accept that the genetic algorithm will cause things to take longer.
  2. Instead of recreating a completely new resource array, just get the added new resources on a landscape by summing them on the old one, subtracting the dead ones, and adding births (maybe expected births, somehow?) – but counting births only on mother’s of the relevant land cells (if applicable).
  3. Ignore the detailed projection of the agent’s actions and just have the agent assume that all of them will be successful – so assume that an agent wanting to kill 6 resources on their land really will kill that number of resources even if there are not that many. Then calculate fitness directly from the actions ignoring the resource array completely. This removes all need for the calc_payoffs function and changes the nature of the payoffs_to_fitness function to directly assess fitness by assumingactions will be successful.

I find the third option most tempting. Perhaps there will be a case for extreme accuracy in predicting the effects of actions, but I think that it’s unlikely that we will lose much if we assume that agent actions are successful in the genetic algorithm. This also builds in the kind of error that would seem to be realistic in terms of human behaviour. It will be necessary, however, to still have agents link resource abundance with changes on the landscape – e.g., the indirect fitness benefit in terms of crop production increase caused by killing a resource needs to be realised in some way. The best way might be to rewrite res_landscape_interaction somehow to link the two without looping through each landscape cell for each resource. I don’t know the best way to do this yet – perhaps something in an observation model that estimates mean crop loss due to a resources of type X?

Update: 12 APR 2017

Resolved Issue #17

I am now closing Issue #17 introduced yesterday, as the issue is resolved such that action array columns util, u_loc, and u_land are now not touched by the genetic algorithm where the first column of the action array takes a negative value.

Moving on to castration

I have tested to confirm that the moving (i.e., scaring) action is working and actually moving resources as intended. I am now moving on to the code for castrating (decreasing birth rate to zero) resources. As with the moving, there is really no analog on the landscape for this (since crops modelled using the landscape don’t reproduce explicitly – if we wanted them to, we could just model them as a different kind of resource), so I am also only doing a function of this for resources. Any positive values in the ACTION array therefore have no effect on landscape rows (i.e, where the first column equals -1).

The castration function (up and working) reuses a lot of code from the moving function, which initially led me to trying to make all of the actions part of one function.

/* =============================================================================
 * This function causes the agents to castrate a resource
 *     land: The landscape array
 *     resources: The resource array
 *     owner: The agent ID of interest -- also the landowner
 *     u_loc: Whether or not an agent's actions depend on owning land cell
 *     casts_left: The number of remaining times an agent will castrate
 *     res_number: The total number of resources in the resources array
 *     land_x: The x dimension of the landscape
 *     land_y: The y dimension of the landscape
 *     res_type1: Type 1 category of resources being moved
 *     res_type2: Type 2 category of resources being moved
 *     res_type3: Type 3 category of resources being moved
 * ========================================================================== */
void castrate_resource(double ***land, double **resources, int owner, int u_loc, 
                   int casts_left, int res_number, int land_x, int land_y,
                   int res_type1, int res_type2, int res_type3){
    
    int xpos, ypos, xloc, yloc;
    int cell, cast;
    int resource, t1, t2, t3;
    
    resource = 0;
    while(casts_left > 0 && resource < res_number){
        t1 = (int) resources[resource][1];
        t2 = (int) resources[resource][2];
        t3 = (int) resources[resource][3];
        if(t1 == res_type1 && t2 == res_type2 && t3 == res_type3){
            xpos = (int) resources[resource][4];
            ypos = (int) resources[resource][5];
            cell = land[xpos][ypos][2];
            cast = check_if_can_act(u_loc, cell, owner);
            if(cast == 1){
                resources[resource][9] = 0;
                casts_left--; 
            }
        }
        resource++;
    }
}

Nevertheless, having these modular actions makes the code a bit more readable, and to combine all of them would require multiple while loops within the function anyway – the resource type check could be pulled out, but then this would defeat the whole point of being able to switch the order of actions. Then again, it could make it easier to avoid having the same resource experiencing multiple actions. This is probably undesirable.

Even more importantly, there is an issue here that all of these actions will start out with resource = 0, so the first resource will by default experience multiple actions wherever this is possible. Clearly this needs to be either randomised or done systematically in some way. I think that the best solution is to create a function to sample without replacement, put that function in utilities.c, then use it select resources to be acted on – hence each resource will only experience one action. In the unlikely event that there are more actions than resources, it would be useful to somehow randomise which actions are taken – perhaps smaller action specific functions should operate within the larger function ordering actions. In any case, the above castrate_resource function and the moving function should change.

Major restructure of actions successful

In working through the separate user actions, I found it challenging to try to code things such that correct resources were affected, but these actionable resources were not affected in any particular order (e.g., all scaring first, then killing, etc.). If there was a particular order, then it’s possible that users could systematically run out of resources to do things to (perhaps because they exausted the resources on their land) and hence always move resources but not kill them for arbitrary reasons. To work around this, it is necessary to randomly select an action and perform it on an actionable resource. This is solved with a new function resource_actions, which initial testing finds to work as intended.

/* =============================================================================
 * This function enacts all user actions in a random order
 *     resources: The resource array
 *     row: The row of the action array (should be 0)
 *     action: The action array
 *     can_act: Binary vector length res_number where 1 if resource actionable
 *     res_number: The number of rows in the resource array
 *     land_x: The x dimension of the landscape
 *     land_y: The y dimension of the landscape
 * ========================================================================== */
void resource_actions(double **resources, int row, double **action, 
                      int *can_act, int res_number, int land_x, int land_y){
    
    int resource, xloc, yloc, i;
    int util, u_loc, u_land;
    int movem, castem, killem, feedem, helpem;
    int *actions, total_actions, action_col, sample;
    
    actions       = malloc(5 * sizeof(int));
    total_actions = 0;
    for(i = 0; i < 5; i++){
        action_col     = i + 7;
        actions[i]     = action[row][action_col];
        total_actions += action[row][action_col];
    }
    
    resource = 0;
    while(resource < res_number && total_actions > 0){
        if(can_act[resource] == 1){
            do{ /* Sampling avoids having some actions always first */
                sample = floor( runif(0, 5) );
            }while(actions[sample] == 0 && sample == 5);
            /* Enact whichever action was randomly sampled */
            switch(sample){
                case 0: /* Move resource */
                    xloc = (int) floor( runif(0, land_x) );
                    yloc = (int) floor( runif(0, land_y) );
                    resources[resource][4] = xloc;
                    resources[resource][5] = yloc;
                    actions[0]--;
                    break;
                case 1: /* Castrate resource */
                    resources[resource][9] = 0;
                    actions[1]--;
                    break;
                case 2: /* Kill resource */
                    resources[resource][8] = 1;
                    actions[2]--;
                    break;
                case 3: /* Feed resource (increase birth-rate)*/
                    resources[resource][9]++;
                    actions[3]--;
                    break;
                case 4: /* Help resource (increase offspring number directly) */
                    resources[resource][10]++;
                    actions[4]--;
                    break;            
                default:
                    break;
            }
            total_actions--;
        }
        resource++;
    }
    free(actions);
}

The above function is called by do_actions within a switch statement. Recall that the resources array here is a temporary array that will later be used to assess the impact of actions with respect to user utility to ultimately assign each agent in the genetic algorithm a fitness value. Some functions within resource.c are going to need to be used for this.

Since I have added multiple functions that allocate memory, now is probably a good time to check for any errors or memory leaks.

 R -d "valgrind --tool=memcheck --leak-check=yes --track-origins=yes" --vanilla < gmse.R

After running valgrind, all appears to be clear.

==5787== HEAP SUMMARY:
==5787==     in use at exit: 95,076,495 bytes in 17,685 blocks
==5787==   total heap usage: 12,369,394 allocs, 12,351,709 frees, 2,246,425,240 bytes allocated
==5787== 
==5787== LEAK SUMMARY:
==5787==    definitely lost: 0 bytes in 0 blocks
==5787==    indirectly lost: 0 bytes in 0 blocks
==5787==      possibly lost: 0 bytes in 0 blocks
==5787==    still reachable: 95,076,495 bytes in 17,685 blocks
==5787==         suppressed: 0 bytes in 0 blocks
==5787== Reachable blocks (those to which a pointer was found) are not shown.
==5787== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==5787== 
==5787== For counts of detected and suppressed errors, rerun with: -v
==5787== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The changes now have been pushed from a local branch to dev. The next thing to work on is to get key parameters from the temporary resource array that was changed (how many resources added, lost, moved, etc.). After this information is collected, then another calc_payoffs can be run on the changed array to get an updated estimate of key values and compare before and after user actions.

Update: 11 APR 2017

I’ve decided that u_loc should actually refer to the actions of a particular agent being taken on their own land (u_loc = 1) or on all land (u_loc = 0). I’m debating whether a third option u_loc = -1 should be available for forcing action to only occur on public land. I have also created a more readable structure for the do_actions function in game.c, which is going to be a bit on the long side, and will therefore need to be written in a way that is easy to follow, going through each action and performing the action through a series of nested while loops.

Cost issue: Issue #17

The action array has three columns of util, u_loc, and u_land, which represent the utility of a resource, whether or not actions on the resource are restricted to the user’s land, and whether or not the utility of the resource is dependent on it being on the user’s land. Currently, any positive values correspond to some cost in the cost array, which means that they are changed to zero when the cost is high. In essence, these three columns represent identity, while the remaining rows to the right represent actions. Ideally, we don’t want the users to be affecting, or the constrain_costs function changing, util, u_loc, and u_land columns – only the ones to the right.

What needs to happen next is for util, u_loc, and u_land columns to be untouchable by the genetic algorithm when the first row in the action array (agent) is negative – corresponding to direct actions of the user on resources or landscape layers. Remaining util, u_loc, and u_land should be touchable. Hence, within constrain_costs, it is necessary to block adjustment to the relevant columns.

I appear to have found a fix for this, but I’m going to wait a day before I call the issue resolved. The fix basically involved telling the program not to touch columns below 7 if the first column of the row is less than one.

start_col = 4;
if(population[row][0][agent] < 0){
    start_col = 7;
} 

The new variable start_col then defines the column to start on when considering whether or not to constrain costs. The above also needs to appear in the functions affecting population initialisation, mutation, and crossover of the genetic algorithm. I’m not sure if there is a more elegant or more readable solution, but the above appears to work fine. The appropriate columns are untouchable in rows where the first column is negative. The constraining part of the constrain_costs function also looks a bit messy.

while(tot_cost > budget){
    do{ /* This do assures xpos never equals ROWS (unlikely) */
        xpos = (int) floor( runif(0,ROWS) );
    }while(xpos == ROWS);
    if(population[xpos][0][agent] > 0){
        do{
            ypos = (int) floor( runif(4,COLS) );
        }while(ypos == COLS);
    }else{
        do{
            ypos = (int) floor( runif(7,COLS) );
        }while(ypos == COLS);               
    }
    if(population[xpos][ypos][agent] > 0){
        population[xpos][ypos][agent]--;
        tot_cost -= COST[xpos][ypos][layer];
    }
}

I think the messiness is really mostly caused by the do loops, which are there as a safety precaution against the unlikely event that the random number selected exactly equals ROWS or COLS, and hence returns a segfault.

Next steps

With the util, u_loc, and u_land column situation seemingly resolved in the action array, I’ve done some initial testing again on the move_resource. The move_resource function now appears to only move resources when it’s supposed to (i.e., when they’re on the land and the action array says to move them – assuming u_loc = 1). Now, before moving on, I should check to make sure that resources are actually being moved in the resources array. Once this is finished, I will double check Issue 17, then move on to a castrate_resource function.

Update: 10 APR 2017

I have re-arranged the fitness function structure to calculate fitness payoffs more clearly. One top level strategy_fitness function will calculate all strategy fitness in the genetic algorithm by looping through calc_agent_fitness for each agent in the population (note: this is each agent in the genetic algorithm population, from which the new strategy for one agent in the bigger G-MSE will be selected). The calc_agent_fitness will itself call the calc_payoffs function (see below) to get a vector with the same rows as the ACTION and COST arrays. Each element will eventually represent a change in the resource or landscape, corresponding to some utility value which will make it possible to calculate and compare overall fitness of the strategy.

void calc_payoffs(double ***population, int ROWS, double ***landscape, 
                  double **resources, int res_number, int landowner,
                  int land_x, int land_y, double *payoff_vector, int agent){
    
    int xloc, yloc, yield_layer;
    int resource, row;
    int landscape_specific;
    int res_count;
    double cell_yield;
    
    for(row = 0; row < ROWS; row++){
        payoff_vector[row] = 0;
        if(population[row][0][agent] == -2){
            for(resource = 0; resource < res_number; resource++){
                if(population[row][1][agent] == resources[resource][1]  &&
                   population[row][2][agent] == resources[resource][2]  &&
                   population[row][3][agent] == resources[resource][3]
                ){    
                    landscape_specific = population[row][6][agent];
                    if(landscape_specific == 0){
                        res_count++;
                    }else{
                        xloc = resources[resource][4];
                        yloc = resources[resource][5];
                        if(landscape[xloc][yloc][2] == landowner){
                            res_count++;    
                        }
                    }
                }
            }
            payoff_vector[row] += res_count;
        }
        if(population[row][0][agent] == -1){
            yield_layer = population[row][1][agent];
            for(xloc = 0; xloc < land_x; xloc++){
                for(yloc = 0; yloc < land_y; yloc++){
                    if(landscape[xloc][yloc][2] == landowner){
                        cell_yield = landscape[xloc][yloc][yield_layer];
                        payoff_vector[row] += cell_yield;
                    }
                }
            }
        }
        if(population[row][0][agent] > -1){
            payoff_vector[row] = 0;
        }
    }
}

The above will needed to be called twice in calc_agent_fitness so that the difference between vector elements can be calculated.

Use of memcpy to copy whole arrays

I have saved a bit of hassle by switching from the multiple loops to the simple use of memcpy in c, which works as follows in the calc_agent_fitness function.

void calc_agent_fitness(double ***population, int ROWS, int COLS, int landowner,
                        double ***landscape, double **resources, int res_number,
                        int land_x, int land_y, int trait_number,
                        double *fitnesses){
    
    int agent, resource;
    int res_on_land;
    double *payoff_vector;
    double **TEMP_RESOURCE;
    
    payoff_vector = malloc(ROWS * sizeof(double));
    
    TEMP_RESOURCE    = malloc(res_number * sizeof(double *));
    for(resource = 0; resource < res_number; resource++){
        TEMP_RESOURCE[resource] = malloc(trait_number * sizeof(double));   
    } 
    
    memcpy(&TEMP_RESOURCE, &resources, sizeof(TEMP_RESOURCE));

    for(resource = 0; resource < 10; resource++){
        printf("%f\t%f\t || %f\t%f\n", resources[resource][0],
               resources[resource][1], TEMP_RESOURCE[resource][0],
               TEMP_RESOURCE[resource][1]);
    }
    /*
    calc_payoffs(population, ROWS, landscape, resources, res_number, landowner,
                 land_x, land_y, payoff_vector, agent);
    
    */
    free(payoff_vector);
    
}

The temporary vector TEMP_RESOURCE needs to be made and remade so many times, and it appears that memcpy is slightly faster than for loops. Nevertheless, I fear that use of memcpy might make the code less readable and its implementation could depend on the hardware and compiler, which I don’t want. For now, I’m going to do this the more readable way.

Using do_actions function

A do_actions function will enact the actions of one (usually out of 100) member of the population in the genetic algorithm. So the general procedure will be to do the following.

This should give a fitness function that is then returned to strategy_fitness (might want to have calc_agent_fitness return an int), which will store all fitnesses in a vector after the above loops. More bells and whistles can be added on to this later, but when this is finished, it should be a working genetic algorithm for modelling complex stake-holder behaviour.

Update: 7 APR 2017

Working through implementing the ideas for the fitness function from yesterday. I’ve linked some key parameters now through user and ga so they can be run in strategy_fitness (namely the resource number and agent_ID, landowner). Now it’s important to note that building local resources has to be conditional – if an agent has no land, they cannot do things on their land. And if their interests are global, this needs to be considered too. I think a collection of small functions called according to parameter options is needed, and landscape specific changes are really a subset of general actions, so maybe there’s a better way to do this. Really though, it would be nice to have a way for the cost of performing actions on land owned versus land not owned to be different. Then again, it would be nice to have different utilities for resources on and off your land, but this could get very complex very quickly (might, however, be interesting in that maybe a farmer values crops positively on their own land but negatively on the land of other farmers). I think it will also shake out when the manager actions affecting costs comes into play – so a manager will naturally up the cost if shooting somehow becomes not tied to a location in a way that affects other stake-holders or management decisions, for whatever reason. The bottom line is that I think it’s okay for now to run the ga with the constraint that if a resource/landscape utility value is tied to land ownership, then the actions should also be tied to owned landcape cells. If not, then actions should happen either on all cells or only public land.

void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double ***landscape,  
                      double **resources, double **agent_array,
                      int res_number, int landowner){
    int xloc, yloc;
    int agent, resource;
    int res_on_land;
    double **RESOURCE_LOCAL;
    
    /* Need something here -- check if: 
     *
     *   1) agent has landscape-specific utility
     *   2) agent actually owns some land
     * 
     * If neither are true, then RESOURCE_LOCAL should not be built, and actions
     * of stake-holders should be interpreted accordingly (e.g., agents could be
     * allowed to do some actions on public land, or not at all -- perhaps an
     * option added to paras?
     */
    
    res_on_land = 0; /* Make a sub-function returning an int for this */
    for(resource = 0; resource < res_number; resource++){
        xloc = (int) resources[resource][4];
        yloc = (int) resources[resource][5];
        if(landscape[xloc][yloc][2] == landowner){
            res_on_land++;
        }
    }
    
    for(agent = 0; agent < pop_size; agent++){
        fitnesses[agent] = population[0][12][agent];
    }
}

Definitions of the COST and ACTION arrays

A quick reminder to myself what’s going on in the COST and ACTION arrays, as it is most relevant for fitness functions. In the example COST array there are two agents (very simple – just a manager and a stake-holder) and one resource. The table below is one layer of a 3D 2-layer array where each layer identifies actions for each unique agent.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-2 1 0 0 0 0 0 0 0 2 0 0 0
-1 1 0 0 2 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0

The way to read the above in the code is as follows. All of the actions are for the one stake-holder (i.e., assume we’re looking at the second layer in the 3D array). The first row of actions is special because it represents the degree to which a stake-holder themselves values and things they will do to a resource of type1 = 1, type2 = 0, and type3 = 0. Value is indicated by util, whether or not that value is ‘visible’ to the agent (which may be implemented in different ways, but for now is within some distance of their location) is indicated by u_loc, and whether or not that value is tied to it being on land that the stake-holder owns is indicated by u_land. Actions are indicated by the remaining columns, and if u_land = TRUE, then we assuem that actions are restricted to resources on the agent’s owned land – though I am tempted to change u_loc to mean whether or not actions are (1) or are not (0) restricted in this way.

So the first row where agent = -2 represents values and actions of the focal agent (indicated by this layer of the 3D array) for a particular type of resource (note that more rows where agent = -2 would be needed for more resources). The second row where agent = -1 refers specifically to the values and actions of a particular layer of the landscape, which is indicated in the type1 column (making type2 and type3 effectively useless in this row, for now). Right now this is type1 = 1, which is the index (for C – R is of course 2) where the values of crop production are stored; I’m not sure if it’s worth adding more rows for additional layers later, but this framework at least allows the possibility for other landscape properties to be valued. Of course, whenever agent = -1 and we’re looking at the landscape, actions such as movem and castem will need to have different meanings – or no meaning, but feedem and killem could be fairly straightforward. The third row is action taken to the agent whose ID is 1 with reference to resource type1 = 1, type2 = 0, and type3 = 0. Any nonzero values here in util, u_loc, and u_land cause the focal agent to change the value of another agent, while any nonzero values in movem, castem, …, bankem cause the focal agent to change the cost of another agent taking a particular action (i.e., it affects the other agent’s layer at agent = -2), increasing it or decreasing it (NOTE: I just noticed that I really need to set up these tables with values for type1 = -1 or something here – to allow agents to affect other agents costs of actions affecting the landscape). So in theory the agent in the above table could change the manager’s values (e.g., modelling lobbying) or the cost of them performing actions (e.g., modelling something like protesting or lobbying third parties?). The could also in theory change their own values and costs of perfoming actions, though I think this should almost always be prohibited by making the cost of doing so effectively infinite, which brings me to the cost array.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-2 1 0 0 101 101 101 3 8 4 4 2 1
-1 1 0 0 101 101 101 101 101 101 101 101 101
1 1 0 0 101 101 101 101 101 101 101 101 101
2 1 0 0 101 101 101 101 101 101 101 101 101

Assume that each agent gets a total budget of 100. In that case, all of the table elements equal to 101 are effectively off limits because they are too costly (I might actually want to make them 1000 just in case some agent gets crafty and tries to lower another agent’s cost). So in the above, the stake-holder can only take six possible actions, all of which directly affect resources and not other stake-holders’ values or costs. By setting it up this way, the genetic algorithm will converge on the best set of these actions and the ACTION array above will never change values where COST elements are 101. Later, we might decrease the value of util in row 3 to allow the stake-holder to lobby the manager. Managers (different layer) would simply have cost arrays that have lower values allowing them to affect actions of stake-holders. With these tables now (I hope) completely clear, the code writing itself should become much smoother – I’ve anchored to the title immediately above because I know it’s going to be necessary to come back to these two difficult to remember tables.

Objectives for fitness function

Eventually we’ll want to find some kind of mult-objective fitness function (Lee 2012), especially for the managers. For now I’m going to simplify and just make fitness a simple matter of abundance times deviation from utility – where for a landscape ‘abundance’ is replaced by owned cell crop yield, then sum over all resources and landscape layers. For now, utilities can be set unreasonably high – so conservationists might want 1000000 geese and farmers might want 1000000 in yield – much higher than is possible so that more is always better. The fitness function will then assign fitnesses minimising the deviation from utility somehow. The deviation from will eventually allow managers to have more reasonable goals – allowing the genetic algorithm to find more flexible and dynamic strategies.

Update: 6 APR 2017

Quick check on neural networks

I want to make sure I understand neural networks well enough to be able to explain why I’m not using them (yet). Daniel Shiffman’s book chapter in The Nature of Code helps out here. Because the simulated agents aren’t trying to recognise a particular pattern, I don’t think a neural network is how I would describe the COST and ACTION arrays – nor would a more explicit network structure going from input (e.g., costs and resources abundances) to output (actions) be terribly useful for current purposes. Nevertheless, a neural network will be useful if combined with empirical data to mimic human behaviour. For example, if we want to make an agent that predicts stake-holders’ actions based off of empirically collected data from behavioural games, then a neural network could be fed input and then callobrated to the ‘correct’ behaviour observed by humans through correlations with specific conditions. This would, in effect, create an artifical bot that does what a human would do based on correlating situations with actions. I’m not sure how much data it would require to parameterise effectively, but I suspect more than would be a lot – we would probably need to have dozens of people act as stake-holders or managers within G-MSE and collect their actions.

Back to fitness functions

I now need to complete the genetic algorithm with a useful fitness function. The fitness function should calculate the change in resources caused by a stake-holder’s actions, then match them to utility. Note that this doesn’t require all resources to be calculated to figure out what total utility is before and after an agent has acted – only how the agent’s actions have increased or decreased resources, and the weight (utility) assigned to each.

A starting point is to do some clean-up. There are several values hard coded in to the ga() function that need to be assigned variables to be set in gmse.R. Once I have the ga() function a bit more readable, then I can move on to the specifics of the fitness function.

Having done the clean-up, I now note again that the utility from an action isn’t always direct – e.g., killing one resource might increase crop yield. Somehow, the action of removing an individual from the population must therefore be recognised by the agent as increasing yield by a particular amount. There are a few ways that I could think to do this:

  1. Have stake-holders correlate resources with crop production on cells. This would be the most complex way of doing things – probably the most flexible too, but I’m not sure if it would actually be the most realistic. Not for farmers watching their crops being eaten at least; the cause and effect is something I think stake-holders could probably observe pretty clearly.

  2. Give stake-holders complete access to the resouce array and have them figure out exactly how much damage their land is going to sustain by seeing the number of resources on it and the amount of damage that each reosurce reduces per cell (column 14 in C, 15 in R). Maybe this is the best starting point, though it does seem to be a bit too exact; no farmer is going to know exactly how many animals are on their farm and exactly how much damage they will do. Still, perhaps we assume this and add in error later.

  3. Give stake-holders access to the resource array column in which crop damage is specified, then have them associate mean damage per cell with each resource type. Do not, however, give them access to resource locations, and require that they instead estimate the density of resources on their landscape in the same way that managers might in an observation model type 0 (i.e., look at a few cells on their property, then infer the total number of resources and how much damage they’ll do). I like this because it seams reasonable that a farmer could know roughly how much damage an animal does to their crop in the area, but probably doesn’t have the time or ability to sample every corner of their land to find exactly the number of animals on it. It also doesn’t give stake-holders a superior ability to estimate local population size.

  4. Ignore the resource array – just have the stake-holders act some way (e.g., invest in killing stuff on the land or feeding stuff), then run a sub-routine mimicking the population landscape interaction (e.g., call res_landscape_interaction from resource.c directly). If some resources are created or destroyed, then this would need to be accounted for by making a dummy resource array. Perhaps the following:
    • Count the number of resources (of any type) in the RESOURCE array on the stake-holder’s land
    • Initialise a dummy resource array RESOURCE_LOCAL with only resources on the stake-holder’s land
    • Have a stake-holders actions affect the RESOURCE_LOCAL array in relevant columns (e.g., birth, remove_pr, etc.)
    • Use key bits of res_add and res_remove to get the number of individuals being added or removed.
    • Use the res_landscape_interaction function to find the effect of the added and removed individuals on landscape
    • End with changed resource amounts and changed landscape production as a consequence for one strategy
    • Calculate fitness by summing the utilities of resources and local landscape cells
    • Note, no need to worry if a stake-holder is, e.g., trying to kill 10 resources when there are only 6 on their land – this just results in 6 kills and wasted energy (perhaps should also add probability of success later)

Although I initially thought option 3 was pretty good, I’m now leaning toward option 4 as being the best one to try out first; it seems more flexible. Eventually, of course, we can specify options for different ways of calculating fitness, but I think it’s best to pick one option first and go with it. I think option 4 will be slightly slower than option 3, but I’m curious as to exactly how much slower. Hence, I’ll try number 4 first, then potentially move to 3 as a default if it’s too clunky. I have to keep in mind as well that managers are probably going to need to run the user functions to set policy eventually (unless I can find a work-around that gets managers to anticipate stake-holder actions in making policy), and this will likely slow things down exponentially. Time isn’t much of an issue now, and I want to keep things efficient as possible. Also important, I need to make sure that there is some if statement that only deals with the landscape if the stake-holder owns land. If they don’t own any, then their actions need to be restricted accordingly – maybe to lobbying the manager or only doing things on public land?. As a next step, I will attempt to write the code for option 4 above, perhaps excluding (for now) stake-holders that own no land.

Update: 5 APR 2017

More thoughts on genetic algorithms

I’ve come back to thinking more about how to write the fitness function of the G-MSE genetic algorithm, and about the relationship between evolution and individual learning, more generally. Watson and Szathmary (2016) argue that learning and (adaptive) evolution are formally linked. In practice, they note that ‘’In a good model space, desirable future behaviours should be similar (nearby) to behaviours that were useful in the past. For example, perhaps ’eating apples’ should be close to ‘eating pears’ but far from ‘eating red things’.’’ Watson and Szathmary (2016) also note that ‘’The representation of associations or correlations has the same fundamental relationship to learning as transistors have to electronics or logic gates to computation (and synapses to neural networks). Although mechanisms to learn a single correlation between two features can be trivial, these are also sufficient, when built up in appropriate networks, to learn arbitrarily complex functions’’. A potentially confusing aspect of this with respect to G-MSE is that we have two scales of time of interest. The first scale is within a single time step (i.e., inside the user model), and the second scale is over multiple time steps (population model \(\to\) observation model \(\to\) management model \(\to\) user model). Most of the time, when we focus on learning, we’re talking about the program learning to make a decsion within a time step rather than stake-holders learning to make decisions across time steps. I’m not opposed to modelling the latter, but the former needs to come first in software development. So when we model learning through the genetic algorithm, it’s the iterative processes in ga() – there is less worry, I think, about the correlations that Watson and Szathmary (2016) describe; rather, the associations are explicit. A value in the ACTION array is associated with a particular outcome that can be tied to stake-holder interests. More abstract learning over G-MSE generations can be added in later with estimates of correlations between actions and outcomes.

Update: 23 MAR 2017

Major updates merged to master

I have merged all of the recent updates on the genetic algorithm to the master branch. We now have a bug-free G-MSE model v0.0.8 that has all of the necessary framework of proper machine learning once a fitness function is written that links costs and utilities of each agent to agent actions. There are a few things that will need to be updated thereafter, which I am putting off until later when the full genetic algorithm is complete and I am sure how it should be called by user.c. As of now, the function runs only once for the first agent in the AGENT array. Eventually, the function ga will need to be looped within user.c for each stake-holder (and called in manager.c, not yet written, for the manager). I also still need to pass the parameter vector to ga with values for the genetic algorithm which are currently hard coded into ga.

Update: 22 MAR 2017

After some additional debugging of the find_descending_order in utilities.c (which was returning the incorrect index and therefore not selecting for high fitness strategies), I have a working genetic algorithm with a very simple fitness function.

void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double ***landscape,  
                      double **resources, double **agent_array){
    int agent;
    
    for(agent = 0; agent < pop_size; agent++){
        fitnesses[agent] = population[0][12][agent];
    }
}

Essentially, the above function checks row zero and column 12 in an agent’s action array, and defines fitness as whatever value is in this array element. Fitness cannot increase indefinitely because of the cost constraints from the COST array. Hence the genetic algorithm should increase fitness up to the point where it can’t any longer because it is constrained by costs. We can see this over 20 generations of the genetic alogrithm (note, this is different than simulation time steps – each simulated time step of G-MSE includes, in this example, a genetic algorithm where strategies updated over 20 generations). The plot below therefore represents an agent ‘’evolving’’ the best strategy for one G-MSE time step

The ACTION array for the zero agent (the only one run for a genetic algorithm in test simulations) showed a corresponding change in each simulated G-MSE time step, with agents having the actions below (or very similar actions).

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,]   -2    1    0    0    0    0    0    0    0     0     0     0    12
[2,]   -1    1    0    0    0    0    0    0    0     0     0     0     0
[3,]    1    1    0    0    0    0    0    0    0     0     0     0     0
[4,]    2    1    0    0    0    0    0    0    0     0     0     0     0

In the above, the agent’s only action is to invest all of their energy to doing the action in ACTION[1,13] (bankem), as predicted given the simple fitness function assigned a priori. Hence, with a working genetic algorithm for agents, what is necessary now is to clarify the fitness function to reflect agent utilities. Some clean-up is also necessary to call genetic algorithm specific parameters from the main gmse.R file – right now there are some hard-coded values in the ga function, and user.c doesn’t loop through multiple agents (or check and use only stake-holders).

Update: 21 MAR 2017

Part of the problem from last Friday was that the arrays fitnesses and winners were uninitialised in the genetic algorithm before being used. Fixing this and running Valgrind returns no errors and no memory leaks.

==32451== 
==32451== HEAP SUMMARY:
==32451==     in use at exit: 89,001,346 bytes in 13,024 blocks
==32451==   total heap usage: 5,218,764 allocs, 5,205,740 frees, 621,820,827 bytes allocated
==32451== 
==32451== LEAK SUMMARY:
==32451==    definitely lost: 0 bytes in 0 blocks
==32451==    indirectly lost: 0 bytes in 0 blocks
==32451==      possibly lost: 0 bytes in 0 blocks
==32451==    still reachable: 89,001,346 bytes in 13,024 blocks
==32451==         suppressed: 0 bytes in 0 blocks
==32451== Reachable blocks (those to which a pointer was found) are not shown.
==32451== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==32451== 
==32451== For counts of detected and suppressed errors, rerun with: -v
==32451== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

So we have an error that Valgrind can’t figure out for some reason. It’s worth noting that the crash never occurs on the first simulation; it always takes a couple re-runs of gmse in success for it to crash. It could just be overloading Rstudio, but I want to keep pushing to figure it out. I’m now going to test by simply running 100 times in succession.

test <- NULL;
for(i in 1:100){
   test <- gmse( observe_type  = 0,
                 agent_view    = 20,
                 res_death_K   = 400,
                 plotting      = FALSE,
                 hunt          = FALSE,
                 start_hunting = 95,
                 fixed_observe = 1,
                 times_observe = 1,
                 land_dim_1    = 100,
                 land_dim_2    = 100,
                 res_consume   = 0.5
    )
   print(i);
}

Unfortunately, the above crashed in the first loop upon running. Then in the second attempt, it crashed on the 11th loop. When I stop running the genetic algorithm, it never crashes though, so I can at least start to isolate the problem. I have a feeling it’s in the utilites.c file.

Except that now the crash occurs when the ga is commented out. The issue actually appears to be somewhere else in user.c because I can run the above for i in 1:1000 and not get an error if I don’t call user.c from R at all. Now I need to try to really examine user.c and see what’s happening. I’m going to start by not calling send_agents_home or count_cell_yield within the user function (while still calling the genetic algorithm) to see if a crash occurs.

The problem, as it turns out, was in the function send_agents_home. After much hassle and multiple times running Valgrind, I found that initialising agent_xloc or agent_yloc to the agent’s array values would occaisionally produce a segfault because the values were not always within the landscape. This was corrected by initialising these values to zero before ‘’sending agents home’’, but I’m not sure why it arose at all in the first place. Where the agents are has never been a focus, except for when manager agents (type1 = 0) are observing (the code for which is stable). To solve the problem more flexibly, I’ve replaced a straight assignment with the below code.

agent_xloc = agent_array[agent][4];
agent_yloc = agent_array[agent][5];            
if(agent_xloc < 0 || agent_xloc >= xdim){
    agent_xloc = 0;
}
if(agent_yloc < 0 || agent_yloc >= ydim){
    agent_yloc = 0;
}

Now in the very rare cases where agent locations are off the map (and it might be worth figuring out why – perhaps they’re getting moved somewhere arbitrarily and not moved back?), they will be placed on a cell that they own. This was the point of the function anyway, so it’s not a huge deal. It’s still a bit odd though, and I’m not sure why it was affecting only about one in thirty simulations. I’ll consider Issue #16: Potential bug: In user.c closed now, and move on to the genetic algorithm again.

Update: 17 MAR 2017

Placing tournament winners into a new array

At the end of the tournament function, we have a vector of winners with high fitness. These winners represent the array layers that need to comprise the new 3D array, which will be the start of the next generation of the genetic algorithm. Hence the need for a place_winners function to make a new POPULATION array to replace the old one. This could be done by individually replacing elements of a NEW_POPULATION into the old array POPULATION, but a handy swapping of pointers can do this without the multiple loops.

/* =============================================================================
 * Swap pointers to rewrite ARRAY_B into ARRAY_A for a an array of any dimension
 * ========================================================================== */
void swap_arrays(void **ARRAY_A, void **ARRAY_B){

    void *TEMP_ARRAY;

    TEMP_ARRAY = *ARRAY_A;
    *ARRAY_A   = *ARRAY_B;
    *ARRAY_B   = TEMP_ARRAY;
}

The above function works for 2D and 3D arrays by running the below.

swap_arrays((void*)&MAT1, (void*)&MAT2); 

We can see the arrays swapped in the output (the first 3 columns before the “|” partition denotes layer 1, and after denotes layer 2, so the array is \(3 \times 3 \times 2\) dimensions).

========================================= 

---------------- Pre-swap MAT 1 ------------ 
0   0   1     |  6  5   0   
2   8   6     |  9  2   4   
1   2   1     |  9  2   5   

---------------- Pre-swap MAT 2 ------------ 
1   4   3     |  8  8   6   
1   5   8     |  3  2   4   
3   9   2     |  8  3   8   


---------------- Post-swap MAT 1 ------------ 
1   4   3     |  8  8   6   
1   5   8     |  3  2   4   
3   9   2     |  8  3   8   

---------------- Post-swap MAT 2 ------------ 
0   0   1     |  6  5   0   
2   8   6     |  9  2   4   
1   2   1     |  9  2   5

Since this works, we can use swap_arrays to write a concise function for placing the new individuals.

Potential bug: In user.c

I can’t tell if I’m just overloading R by running the simulation too many times too quickly (clicking to fast), or if there’s actually a bug here. But when I comment out the below lines of code in the send_agents_home function of user.c, things seem fine.

while(agent_ID != landowner){
    do{
        agent_xloc = (int) floor( runif(0, xdim) ); 
    }while(agent_xloc == xdim);
    do{
        agent_yloc = (int) floor( runif(0, ydim) ); 
    }while(agent_yloc == ydim);
    landowner = (int) landscape[agent_xloc][agent_yloc][layer];
}

When I re-run the code quickly in succession, the above (I think) will very rarely crash the G-MSE program. I can’t figure out why yet. It’s logged as an issue now. Valgrind report below.

==15500== Invalid read of size 8
==15500==    at 0xC298756: is_number_on_landscape (user.c:19)
==15500==    by 0xC298811: send_agents_home (user.c:50)
==15500==    by 0xC299166: user (user.c:303)

Valgrind doesn’t appear to like the comparing of a landscape value (double) with an int, so I’m going to change this now. So the function is_number_on_landscape now defines land_num = (int) landscape[xval][yval][layer]; instead of calling the landscape value directly. I have also gotten rid of the sub-function is_number_on_landscape, but the crash still sometimes happens. It’s possible that this was actually two bugs though, one affecting the ga. From Valgrind below now (invalid read is gone).

==16758== Conditional jump or move depends on uninitialised value(s)
==16758==    at 0xC29819E: sort_vector_by (utilities.c:63)
==16758==    by 0xC29A1E1: tournament (game.c:280)
==16758==    by 0xC29A66D: ga (game.c:415)
==16758==    by 0xC29914F: user (user.c:294)
==16758==    by 0x4F0A57F: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F4272E: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F44E47: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F43DDC: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F422FC: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F45FB5: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==  Uninitialised value was created by a heap allocation
==16758==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16758==    by 0xC29A583: ga (game.c:390)
==16758==    by 0xC29914F: user (user.c:294)
==16758==    by 0x4F0A57F: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F4272E: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F44E47: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F43DDC: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F422FC: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F45FB5: ??? (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==16758==    by 0x4F44E47: ??? (in /usr/lib/R/lib/libR.so)
==16758== 

This all goes back to the sort_vector_by, which I should probably look at a potentially rewrite. The sort function is called by the tournament function.

Some progress on the genetic algorithm

Despite this, there has been progress on the genetic algorithm. Enough that I want to merge the local branch to dev and rev, but not master yet. The place_winners function appears to work fine.

void place_winners(double ****population, int *winners, int pop_size, int ROWS, 
                   int COLS){

    int i, row, col, layer, winner;
    double a_value;
    double ***NEW_POP;
    
    NEW_POP    = malloc(ROWS * sizeof(double *));
    for(row = 0; row < ROWS; row++){
        NEW_POP[row]    = malloc(COLS * sizeof(double *));
        for(col = 0; col < COLS; col++){
            NEW_POP[row][col]    = malloc(pop_size * sizeof(double));
        }
    }
    
    for(i = 0; i < pop_size; i++){
        winner = winners[i];
        for(row = 0; row < ROWS; row++){
            for(col = 0; col < COLS; col++){
                a_value              = (*population)[row][col][winner];
                NEW_POP[row][col][i] = a_value;
            }
        }
    }
    
    swap_arrays((void*)&(*population), (void*)&NEW_POP);
    
    for(row = 0; row < ROWS; row++){
        for(col = 0; col < COLS; col++){
            free(NEW_POP[row][col]);
        }
        free(NEW_POP[row]); 
    }
    free(NEW_POP);
}

Once I get the bugs worked out of it, the genetic algorithm should start to work. Then a fitness function needs to be made that is more realistic. Fortunately, all of the bugs now appear to be isolated in the genetic algorithm, but I might need to keep testing to be sure.

Update: 16 MAR 2017

Initialise new function to constrain costs in the genetic algorithm

A new function has been written to constrain costs in the genetic algorithm when they go over budget as a consequence of crossover and mutation.

/* =============================================================================
 * This function will ensure that the actions of individuals in the population
 * are within the cost budget after crossover and mutation has taken place
 * Necessary variable inputs include:
 *     population: array of the population that is made (malloc needed earlier)
 *     COST: A 3D array of costs of performing actions
 *     layer: The 'z' layer of the COST and ACTION arrays to be initialised
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     budget: The budget that random agents have to work with
 * ========================================================================== */
void constrain_costs(double ***population, double ***COST, int layer, 
                     int pop_size, int ROWS, int COLS, double budget){
    
    int xpos, ypos;
    int agent, row, col;
    double tot_cost, action_val, action_cost;

    for(agent = 0; agent < pop_size; agent++){
        tot_cost = 0;
        for(row = 0; row < ROWS; row++){
            for(col = 4; col < COLS; col++){
                action_val  = population[row][col][agent];
                action_cost = COST[row][col][layer];
                tot_cost   += (action_val * action_cost);
            }
        }
        while(tot_cost > budget){
            do{ /* This do assures xpos never equals ROWS (unlikely) */
                xpos = floor( runif(0,ROWS) );
            }while(xpos == ROWS);
            do{
                ypos = floor( runif(4,COLS) );
            }while(ypos == COLS);
            if(population[xpos][ypos][agent] > 0){
                population[xpos][ypos][agent]--;
                tot_cost -= COST[xpos][ypos][layer];
            }
        }
    }
}

The function has been tested, and works as intended. When the sum of the action elements of an individual multiplied by the cost of each action (tot_cost in the above function) are higher than the allowable budget, actions are randomly removed until the total costis at or under budget. Note that lower-cost actions are not removed preferentially so as not to bias evolution toward low-cost actions.

Initial thoughts on the fitness function

Having now completed functions modelling crossover, mutation, and cost-constraints in C, there are two functions left in the genetic algorithm that are needed. The second is a tournament function modelling selection – this will be relatively easy to code once I have individual fitnesses in the population. The first is the fitness function, which be very complex – so much so that I’m planning to write a very quick simplified version of the fitness function before expanding it out to deal with more difficult questions. What has to happen with the fitness function is that each simulated individual in the popuation has to use whatever information is available to an agent (e.g., manager observations, anecdotal surveys, past decisions of other agents, landscape status, etc.) to predict what the future status of the resources and landscape will be, then assign a fitness to that prediction. Utilities of each resource are in the (truncated) action and cost arrays, as below.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-1 1 0 0 2 1 0 0 0 0 0 0 1
-1 2 0 0 0 1 0 0 0 2 0 0 1
1 1 0 0 0 0 0 0 0 0 0 0 1
1 2 0 0 0 0 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 0 0 0 0 1
2 2 0 0 0 0 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 0 0 0 0 1
3 2 0 0 0 0 0 0 0 0 0 0 1

Above we have the utilities of each resource type (type1), but I’m just realising that the utilities of the landscape are absent. There isn’t really anything in the above table, for example to say that a stake-holder assigns a utility to the value of a given landscape cell.. But this needs to be the case if we want something like crop yield (perhaps I should more generally be calling it ‘’food security’’) to be modelled as part of the landscape. I think the best solution for this is to include the landscape in type1 as a negative integer. The landscape layer identifying crop yield is 1 in C (2 in R) – if I placed a new row of type1 = -1 in the COST and ACTION arrays for each agent, then the negative could simply indicate that we are looking at the LANDSCAPE array instead of the RESOURCE array. I also don’t think more than one layer of landscape will ever be used, so I’m not seeing a confusing mess of negative and positive types. The corresponding action columns (movem, castem, etc.) could have interpretations for landscape, some of them such as feedem are obvoius, while others could just be ignored because they don’t really apply. In the end the arrays would then look something like the below.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem bankem
-1 -1 0 0 1 0 0 0 0 0 0 0 1
-1 1 0 0 2 1 0 0 0 0 0 0 1
-1 2 0 0 0 1 0 0 0 2 0 0 1
1 -1 0 0 0 0 0 0 0 0 0 0 1
1 1 0 0 0 0 0 0 0 0 0 0 1
1 2 0 0 0 0 0 0 0 0 0 0 1
2 -1 0 0 0 0 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 0 0 0 0 1
2 2 0 0 0 0 0 0 0 0 0 0 1
3 -1 0 0 0 0 0 0 0 0 0 0 1
3 1 0 0 0 0 0 0 0 0 0 0 1
3 2 0 0 0 0 0 0 0 0 0 0 1

Maybe not the most elegant solution, but it keeps everything on a single array and the interpretations of types are fairly straightforward. I’ll implement this next as new array initialisations, then build a prototype fitness function that attempts to maximise crop yield through feedem (not sure if this should actually be an action in the model).

Manager summary missing

In working with the fitness function in the user model, I realised that the manager information was obviously missing, so this will have to be added in later (should be easy to do so). One reason for doing the user model first is because the manager model (particularly the genetic algorithm) is going to get much more complicated. Nevertheless, the manager model’s use of the genetic algorithm necessitates that the genetic algorithm be able to use both the OBSERVATION array and the manager’s OBS_SUMMARY of the array. Different users will have access to do different information, but I’m starting small to make sure everything is built clearly.

/* =============================================================================
 * This is a preliminary function that checks the fitness of each agent -- as of
 * now, fitness is just defined by how much action is placed into savem (last
 * column). Things will get much more complex in a bit, but there needs to be
 * some sort of framework in place to first check to see that everything else is
 * working so that I can isolate the fitness function's effect later.
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     landscape: The landscape array
 *     resources: The resource array
 *     agent_array: The agent array
 * ========================================================================== */
void strategy_fitness(double *fitnesses, double ***population, int pop_size, 
                      int ROWS, int COLS, double ***landscape,  
                      double **resources, double **agent_array){
    int agent;
    
    for(agent = 0; agent < pop_size; agent++){
        fitnesses[agent] += population[0][12][agent];
    }
    
}

The above function therefore simply returns the last column (bankem) as the individual’s fitness. I’m now going to maximise this using a tournament approach to fitness, as suggested by Hamblin (2013).

Functioning tournament function

After some toiling with swaps and pointers, I’ve managed to come up with a somewhat concise and clear function that randomly samples sampleK individuals from the population and selects the chooseK individuals with highest fitness.

/* =============================================================================
 * This function takes an array of fitnesses and returns an equal size array of
 * indices, the values of which will define which new individuals will make it
 * into the next population array, and in what proportions.
 *     fitnesses: Array to order fitnesses of the agents in the population
 *     winners: Array of the winners of the tournament
 *     pop_size: The size of the total population (layers to population)
 *     sampleK: The size of the subset of fitnesses sampled to compete
 *     chooseK: The number of individuals selected from the sample
 * ========================================================================== */
void tournament(double *fitnesses, int *winners, int pop_size, 
                      int sampleK, int chooseK){
    int samp;
    int *samples;
    int left_to_place, placed;
    int rand_samp;
    double *samp_fit;
    
    samples  = malloc(sampleK * sizeof(int));
    samp_fit = malloc(sampleK * sizeof(double));
    placed   = 0;
    
    while(placed < pop_size){ /* Note sampling is done with replacement */
        for(samp = 0; samp < sampleK; samp++){
            do{
                rand_samp      = floor( runif(0, pop_size) );
                samples[samp]  = rand_samp;
                samp_fit[samp] = fitnesses[rand_samp];
            }while(rand_samp == pop_size);
        }
        sort_vector_by(samples, samp_fit, sampleK);
        if( (chooseK + placed) >= pop_size){
            chooseK = pop_size - placed;    
        }
        samp = 0;
        while(samp < chooseK && placed < pop_size){
            winners[placed] = samples[samp];
            placed++;
            samp++;
        }
    }
    free(samp_fit);
    free(samples);
}

Note that in writing the above, I had to write a simple sort (sort_vector_by) and swap function in utilities.c. I also need to write some error messages into the above (or in ga itself); chooseK cannot be larger than sampleK. Next up will be to iterate the ga functions and make sure that fitnesses asymptote to high fitness. The framework for the genetic algorithm will then be in place, and it will be time to switch to the complex part of more interesting fitness functions.

Update: 15 MAR 2017

Initialisation of action populations

A new function has been written to initialise a population of agents, duplicated from a single agent in the larger G-MSE model and to be used for the genetic algorithm. Initial testing of this function shows that it returns appropriate arrays, in which actions are selected appropriately based on their cost values in the COST array.

/* =============================================================================
 * This function will initialise a population from the ACTION and COST arrays, a
 * particular focal agent, and specification of how many times an agent should
 * be exactly replicated versus how many times random values shoudl be used.
 * Necessary variable inputs include:
 *     ACTION: A 3D array of action values
 *     COST: A 3D array of costs of performing actions
 *     layer: The 'z' layer of the COST and ACTION arrays to be initialised
 *     pop_size: The size of the total population (layers to population)
 *     carbon_copies: The number of identical agents used as seeds
 *     budget: The budget that random agents have to work with
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     population: array of the population that is made (malloc needed earlier)
 * ========================================================================== */
void initialise_pop(double ***ACTION, double ***COST, int layer, int pop_size,
                    int budget, int carbon_copies, int ROWS, int COLS,
                    double ***population){
    
    int xpos, ypos;
    int agent;
    int row, col;
    double lowest_cost;
    double budget_count;
    double check_cost;

    /* First read in pop_size copies of the ACTION layer of interest */
    for(agent = 0; agent < pop_size; agent++){
        for(row = 0; row < ROWS; row++){
            population[row][0][agent] = ACTION[row][0][layer];
            population[row][1][agent] = ACTION[row][1][layer];
            population[row][2][agent] = ACTION[row][2][layer];
            population[row][3][agent] = ACTION[row][3][layer];
            if(agent < carbon_copies){
                for(col = 4; col < COLS; col++){
                    population[row][col][agent] = ACTION[row][col][layer];
                }
            }else{
                for(col = 4; col < COLS; col++){
                    population[row][col][agent] = 0;
                }
            }
        }
        lowest_cost  =  min_cost(COST, layer, budget, ROWS, COLS);
        budget_count =  budget;
        if(lowest_cost <= 0){
            printf("Lowest cost is too low (must be positive) \n");
            break;
        }
        while(budget_count > lowest_cost){
            do{
                do{ /* This do assures xpos never equals ROWS (unlikely) */
                    xpos = floor( runif(0,ROWS) );
                }while(xpos == ROWS);
                do{
                    ypos = floor( runif(4,COLS) );
                }while(ypos == COLS);
            }while(COST[xpos][ypos][layer] > budget_count);
            population[xpos][ypos][agent]++;
            budget_count -= COST[xpos][ypos][layer];
        } /* Should now make random actions allowed by budget */
    }
}

The above function cals the min_cost function, which simply examines the COST array to find the lowest cost action. It keeps filling up actions in the ACTION array until it’s full.

/* =============================================================================
 * This function will find the minimum cost of an action in the COST array
 * for a particular agent (layer). Inputs include:
 *     COST: A full 3D COST array
 *     layer: The layer on which the minimum is going to be found
 *     budget: The total budget that the agent has to work with (initliases)
 *     rows: The total number of rows in the COST array
 *     cols: The total number of cols in the COST array
 * ========================================================================== */
int min_cost(double ***COST, int layer, double budget, int rows, int cols){
    int i, j;
    double the_min;
    
    the_min = budget;
    for(i = 0; i < rows; i++){
        for(j = 0; j < cols; j++){
            if(COST[i][j][layer] < the_min){
                the_min = COST[i][j][layer];
            }
        }
    }
    return the_min; 
}

We now have a functioning way to initialise a population of agents that will later go through a genetic algoirthm to select the best actions. In working through this, I’ve seen that an earlier idea of mine (not sure if I wrote this down below) might be useful – have a column in both COST and ACTION that is simply bankem – essentially stashing costs in a way that doesn’t do anything. This might be important for situations in which an agent actually benefits by doing nothing, or when we want some general way to consider the benefits of stake-holder actions that affect utility but have no effect on resources or other stake-holders (e.g., holiday time).

Add new bankem action on COST and ACTION arrays

I have added a new action bankem onto the COST and ACTION arrays, which was not too difficult at all in practice. I envision this category of actions as (probably) always having a cost equal to one. Essentially, it’s a way to shift unspent costs to a category, which might or might not affect the agent’s overall utility.

Initialise a new crossover function

I have written a crossover function that, for each individual in the population, assigns a crossover partner (e.g., as would occur in sexual reproduction). With the partner assigned, the function then swaps ACTION array elements with some fixed probability (uniform crossover method). I don’t see any reason to consider multiple types of crossover at this point, so I believe this method will be sufficient.

/* =============================================================================
 * This function will use the initialised population from intialise_pop to make
 * the population array undergo crossing over and random locations for 
 * individuals in the population. Note that we'll later keep things in budget
 * Necessary variable inputs include:
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     pr: Probability of a crossover site occurring at an element.
 * ========================================================================== */
void crossover(double ***population, int pop_size, int ROWS, int COLS, 
               double pr){
    
    int agent, row, col;
    int cross_partner;
    double do_cross;
    double agent_val, partner_val;
    
    for(agent = 0; agent < pop_size; agent++){
        do{
            cross_partner = floor( runif(0, pop_size) );
        }while(cross_partner == agent || cross_partner == pop_size);
        for(row = 0; row < ROWS; row++){
            for(col = 4; col < COLS; col++){
                do_cross = runif(0,1);
                if(do_cross < pr){
                    agent_val   = population[row][col][agent];
                    partner_val = population[row][col][cross_partner];
                    population[row][col][agent]         = partner_val;
                    population[row][col][cross_partner] = agent_val;
                }
            }
        }
    }
}

Originally, I was going to use a swap function to swap agent and partner values. The swap function is still in the utilities.c file, but I think the above code is more readable.

I think it will make more sense to deal with the budget after mutation. That is, as a result of crossover and mutation, some individuals might go overbudget on their actions. I think randomly removing actions in the event of being over budget is best solved after mutation to prevent redundancy; this was a constrain_cost command originally written in R, so I can use this as a template.

Mutation function created

I have written a function to cause random mutations in the population array during the genetic algorithm.

/* =============================================================================
 * This function will use the initialised population from intialise_pop to make
 * the population array undergo mutations at random elements in their array
 * Necessary variable inputs include:
 *     population: array of the population that is made (malloc needed earlier)
 *     pop_size: The size of the total population (layers to population)
 *     ROWS: Number of rows in the COST and ACTION arrays
 *     COLS: Number of columns in the COST and ACTION arrays
 *     pr: Probability of a mutation occurring at an element.
 * ========================================================================== */
void mutation(double ***population, int pop_size, int ROWS, int COLS, 
               double pr){
    
    int agent, row, col;
    double do_mutation;
    double agent_val;
    double half_pr;
    
    half_pr = 0.5 * pr;
    
    /* First do the crossovers */
    for(agent = 0; agent < pop_size; agent++){
        for(row = 0; row < ROWS; row++){
            for(col = 4; col < COLS; col++){
                do_mutation = runif(0,1);
                if( do_mutation < half_pr ){
                    population[row][col][agent]--;
                }
                if( do_mutation > (1 - half_pr) ){
                    population[row][col][agent]++;
                }
                if( population[row][col][agent] < 0 ){
                    population[row][col][agent] *= -1;    
                } /* Change sign if mutates to a negative value */
            }
        }
    }
}

I might or might not want to tweak this later on because I’m not sure if the type of mutation is agressive enough to search for adaptive strategies. This issue will be greatly mitigated by the seeding of random action arrays and crossover, but I might want to come back to allow mutation to a wider range of numbers later. For now, there is simply a probability of a mutation occurring at each element, then, if a mutation occurs, the action value will either increase by one or decrease by one (if the original value was zero, it will increase to one). It’s tempting to allow for bigger jumps, but if they are too big then they will regularly go over budget and hence cause the whole array to reshuffle again (essentially creating a random array and removing a potential opportunity for increased fitness.

The next function that needs to be written is one that constrains the costs to be at or under budget after crossover and mutation, then a fitness function is needed (which will probably require several sub-functions to keep the code readable).

Update: 14 MAR 2017

Separate ACTION and COST arrays

I’ve now made separate the arrays that affect an agent’s actions and the agents costs (from a total budget) for performing things actions. The indices of these arrays will match at all times, such that COST[i][j][k] will be the cost of an agent k performing ACTION[i][j][k] Each agent will therefore have its own 2D layer that will include rows of other agents and columns of utilities and actions. This adds an extra array to a considerable number of things that we already need to keep track of, but I think it is less confusing than what I was doing before, and in the end separating costs from actions will be worth it. Ideally, all of this would just be some special struct in C, but, as mentioned yesterday, this won’t work because R and C need to work seemlessely.

This is much more comprehensivle in another respect; the genetic algorithm only needs to deal with the ACTION array, using the COST array as a reference. This readability of the code alone will probably be worthwhile. As another bonus, while re-writing the code, it is now obvious that it is unecessary to mutate, crossover, etc., only a select few rows; in the ACTION array, they are all fair game as determined by COST (columns 0-3 cannot be changed, but this is easy to remember).

Working call to game.c, but bad action return

There is now a working game.c file that user.c functions call, with proper header files to link. For some reason, the action arrays returned right now are incorrect, so this is the next thing that needs to be done. In general, I think it will be a good idea to make sure that calls from gmse.R are maintained without crash.

Update: 13 MAR 2017

Begin working on the genetic algorithm

I have now initialised the file game.c, which will hold everything related to the genetic algorithm, including multiple functions for running each individual process. The file will include a high-level function that brings in five arrays.

  1. The UTILITY array. The whole thing will need to be read in because agents need to have the option to affect one another’s arrays (e.g., the potential to affect the cost of each others actions). I’ll need to be careful, eventually, regarding the order of agent actions to make sure that the order in which stake-holders are put through the genetic algorithm doesn’t affect resulting agent strategies (or, if this is inevitable, then stake-holder order should be randomised).
  2. The AGENTS array will be necessary for agents to look up one anothers (and their own) locations, yield, etc.
  3. The RESOURES array will be needed for agents to look up how many resources there are of each type, where they are located, and what consequences of these agents might be expected.
  4. The para array of parameter values will be needed for any specifications of the genetic algorithm (e.g., mutation and crossover rate) we might want to implement from R.
  5. The LANDCAPE array needs to be read in to identify both the owners of cells and the yield from cells, and anything else that might be of interest.

A couple other challenges that I need to keep in mind (but do not want to implement yet).

I think it will be best to force ga to specify a single agent whose fitness will be maximised (as this agent will need to be replicated 100ish times for the evolution of a single agent to be simulated). If nothing else, this will make the code easier to follow. Hence, the main functions of both manager.c and user.c will call ga (linked with the game header file #include "game.c"), reading in all of the five arrays above and specifying for which agent it is running the genetic algorithm. In manager.c, for example, only type1 = 0 agents will be run, while these agents will be exclused in user.c.

Progess while coding the initialisation of a population

I think it makes sense to keep these functions generaly and very explicit about what can and cannot be tweaked. For example, given a 2D array, I am using x0, x1, y0 and y1 as indices that determine where to start and stop in terms of changing things. For example, this function that will be called from the initialise_pop function specifies all points in where to search the UTILITY layer for the lowest possible cost (needed for later).

/* =============================================================================
 * This function will find the minimum cost of an action in the UTILITY array
 * for a particular agent (layer). Inputs include:
 *     UTILITY: A full 3D utility array
 *     layer: The layer on which the minimum is going to be found
 *     budget: The total budget that the agent has to work with (initliases)
 * ========================================================================== */
int min_cost(double ***UTILITY, int layer, double budget, int x0, int x1,
             int y0, int y1){
    int i, j;
    double the_min;
    
    the_min = budget;
    for(i = x0; i < x1; i++){
        for(j = y0; j < y1; j++){
            if(UTILITY[i][j][layer] < the_min){
                the_min = UTILITY[i][j][layer];
            }
        }
    }

    return the_min; 
}

This requires more input, but I think it’s also clearer what is meant to happen. The above function compiles without error.

Change to the UTILITY array

Having started coding in C, I’ve decided that it will be much easier to code if I switch what is represented in the first four rows of the a layer of the UTILITY array. Now, the first two rows in which agent = -2 will be the focal agent’s cost, while rows 3 and 4 will be the focal agents actions. This will make it easier to code for the manager’s actions later.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem
-2 1 0 0 2 1 0 1 1 2 3 3
-2 2 0 0 0 1 0 5 20 12 5 10
-1 1 0 0 2 1 0 0 0 0 0 0
-1 2 0 0 0 1 0 0 0 0 0 0
1 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
1 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf

The reason this is easier is because now I can just randomise elements in the genetic algorithm below some value of rows. Agents should never be able to change their own costs, but can always change their own actions (agent = -1), and potentially the actions of other agents (agent > -1). Fortunately, this doesn’t require any extra coding of the initialisation of UTILITY – I just need to note that I’m doing it this way from now on.

Scrap the above idea completely

It was better the way it was – I confused myself with the 3 dimensions. The only actions on resources are in the focal agents first two rows. The second two rows will always be the costs of the focal agent for performing the first two rows of actions, and every other row is a cost associated with adjusting the cost of each other agent – but the actual change that is made where these costs are not infinite (i.e., for the managers) will be made in other layers of the UTILITY function.

Here’s how it will work: Agents can do things to resources movem, castem, killem, feedem, helpem at a cost. What they do is specified by the first two rows of their UTILITY layer (agent = -2). The cost of doing each of these is specified in the second two rows (agent = -2). They can also potentially change the cost of other agents doing things to resources; this is determined by other remaining rows. But the tricky bit is that their actions need to take effect in the other layers of UTILITY. Hence, we need to somehow hold the actions as they apply to UTILITY without affecting the UTILITY array itself throughout the process of the genetic algorithm (if we start changing UTILITY, then we need some way to test changes with respect to agent fitness and then put the array back as it was – actions therefore need to be recorded).

I didn’t want to do this, but I think it might actually be necessary to have two arrays instead of one UTILITY array. These two arrays would include:

  1. A COST array, which would be a 3D array (layers are agents) that identifies the cost of each agent changing something that affects agent actions.
agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem
-1 1 0 0 2 1 0 1 1 2 3 3
-1 2 0 0 0 1 0 5 20 12 5 10
1 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
1 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf

The agent = -1 here would just be the direct cost of the focal agent in the layer affecting resources.

  1. An ACTION array, which would be a 3D array of dimenions identical to that of COST that would determine what an agent actually does.
agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem
-1 1 0 0 2 1 0 0 0 0 0 0
-1 2 0 0 0 1 0 0 0 2 0 0
1 1 0 0 0 0 0 0 0 0 0 0
1 2 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0
2 2 0 0 0 0 0 0 0 0 0 0
3 1 0 0 0 0 0 0 0 0 0 0
3 2 0 0 0 0 0 0 0 0 0 0

The benefit here is that the elements would line up completely so that it would be easy to keep track of actions and costs, and the ACTION array would be all that needs to be tweaked for the genetic algorithm.

It would be nice to specify a new struct in C for all of this, but that wouldn’t change the fact that everything needs to read in and out seemlessly with R, so I don’t think that this is possible.

Update: 10 MAR 2017

Regrouping and finding a way forward on the utility functions

Reviewing my old thoughts on getting the genetic algorithm to work and get agents to do something to maximise thier own utilities. The first thing to do is to initialise a UTILITY array. I don’t see anyway around this – what is needed is a three dimensional array where each dimension z is an agent. A single agent’s utility and decision-making process is therefore represented in a matrix like the one below.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem
-2 1 0 0 2 1 0 0 0 0 0 0
-2 2 0 0 0 1 0 0 0 0 0 0
-1 1 0 0 2 1 0 1 1 2 3 3
-1 2 0 0 0 1 0 5 20 12 5 10
1 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
1 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
3 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf

Each agent will need to have a total cost budget, which will be specified in the AGENT array in its own column. In the UTILITY ARRAY above, the rows where agent = -2 (column 1) identify the actions of an agent – these are the things that an agent can do to resources. In the above example, the agent is not doing anything to resources (all are zeros). The rows where agent = -1 indicate the costs of doing things that affect the resources (i.e., the columns where agent = -2. The agent represented by this z layer of the 3D array can therefore spend from their total budget where agent = -1 to add actions where agent = -2, which in turn affects resources in one way or another. All of the remaining rows (agent = 0 to agent = 2) define actions that would affect the costs of other agents. Esseentially, values (all currently Inf) represent the cost of changing another agent’s cost by 1. So if we imagine a manager that wants to change the cost of movem for a stake-holder from 5 to 10, and their cost value in the table is 0.5, then it will cost 2.5 from their budget to increase this amount (or decrease). Note that there is also the opportunity for stake-holders to directly affect the utilities of other stake-holders – for a cost. I’m not going to play around with these options yet because it will get very complicated. Instead, I will now write a function for inialising this array in R. Once the simple case of a genetic algorith for affecting resources based on utilities and budgets is up and running, then I will start doing more complex things like having stake-holders affect one another’s utilities and costs.

Note that column 1 refers to the agent ID, not the agent type. Hence, agent = 1 will be a manager, not a stake-holder. It’s possible that there could be other managers too, but the status of an agent can be accessed with the AGENT array.

Initial function making utility array

A function below returns all of the necessary information for the table above, but with random numbers placed for all columns after type3.

make_utilities <- function(AGENTS, RESOURCES){
    UTILITY <- NULL;
    
    agent_IDs     <- c(-2, -1, unique(AGENTS[,1]) );
    agent_number  <- length(agent_IDs);
    res_types     <- unique(RESOURCES[,2:4]);
    unique_types  <- dim(res_types)[1];
    types_data    <- lapply(X   = 1:agent_number, 
                           FUN = function(quick_rep_list) res_types);
    
    column_1      <- sort( rep(x = agent_IDs, times = unique_types) );
    columns_2_4   <- do.call(what = rbind, args = types_data);
    static_types  <- cbind(column_1, columns_2_4);

    dynamic_types <- matrix(data = 0, nrow = dim(static_types)[1], ncol = 8);
    
    dynamic_vals  <- sample(x = 1:10, size = length(dynamic_types), 
                            replace = TRUE);    
    
    dynamic_types <- matrix(data = dynamic_vals, nrow = dim(static_types)[1], 
                            ncol = 8);
    
    colnames(static_types)  <- c("agent", "type1", "type2", "type3");
    colnames(dynamic_types) <- c("util", "u_loc", "u_land", "movem", "castem",
                                 "killem", "feedem", "helpem");
    
    UTILITY <- cbind(static_types, dynamic_types);
    
    return( UTILITY );
}

I’m not sure the best way to add the currently random numbers in a function, except that it these values might need to be put into the array by the user, who will want to specify which agents care about which resources and how much it will cost to change things. Better, the user could just perhaps, eventually, just specify the utilities of each stake-holder with each type (this is less to input). Then, once the genetic algorithm for the manager is up and running, all of the costs will be initialised by the manager, somehow – with default costs for the manager to affect stake-holder costs. This scheme would minimise user input and have the costs arise organically from the model and management system, while the utilities would be specified by the user. For now though, I’ll have to input the cost values by hand.

Function tweak to make 3D array

The previous function wasn’t quite right because it only made one layer of the 3D UTILITY array. Really, each layer needs to be replicated for each agent, as below.

#' Utility initialisation
#'
#' Function to initialise the utilities of the G-MSE model
#'
#'@param AGENTS The agent array 
#'@param RESOURCES The resource array
#'@export
make_utilities <- function(AGENTS, RESOURCES){

    agent_IDs     <- c(-2, -1, unique(AGENTS[,1]) );
    agent_number  <- length(agent_IDs);
    res_types     <- unique(RESOURCES[,2:4]);
    
    UTIL_LIST <- NULL;
    
    agent  <- 1;
    agents <- agent_number - 2;
    while(agent <= agents){
        UTIL_LIST[[agent]] <- utility_layer(agent_IDs, agent_number, res_types);
        agent            <- agent + 1;
    }
    
    dim_u <- c( dim(UTIL_LIST[[1]]), length(UTIL_LIST) );
    
    UTILITY <- array(data = unlist(UTIL_LIST), dim = dim_u);
    
    return( UTILITY );
}

#' Utility layer for initialisation
#'
#' Function to initialise a layer of the UTILITY array of the G-MSE model
#'
#'@param agent_IDs Vector of agent IDs to use (including -1 and -2)
#'@param agent_number The number of agents to use (length of agent_IDs)
#'@param res_types The number of unique resource types (cols 2-4 of RESOURCES)
#'@export
utility_layer <- function(agent_IDs, agent_number, res_types){
 
    LAYER <- NULL;
    
    unique_types  <- dim(res_types)[1];
    types_data    <- lapply(X   = 1:agent_number, 
                            FUN = function(quick_rep_list) res_types);
    
    column_1      <- sort( rep(x = agent_IDs, times = unique_types) );
    columns_2_4   <- do.call(what = rbind, args = types_data);
    static_types  <- cbind(column_1, columns_2_4);
    
    dynamic_types <- matrix(data = 0, nrow = dim(static_types)[1], ncol = 8);
    
    dynamic_vals  <- sample(x = 1:10, size = length(dynamic_types), 
                            replace = TRUE); # TODO: Change me?
    
    dynamic_types <- matrix(data = dynamic_vals, nrow = dim(static_types)[1], 
                            ncol = 8);
    
    colnames(static_types)  <- c("agent", "type1", "type2", "type3");
    colnames(dynamic_types) <- c("util", "u_loc", "u_land", "movem", "castem",
                                 "killem", "feedem", "helpem");
    
    LAYER <- cbind(static_types, dynamic_types);
    
    return( LAYER ); 
}

So when there are two agents, the make_utilities function returns a 3D array of 4 rows, 12 columns, and 2 layers.

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   -2    1    0    0    9    2    1
[2,]   -1    1    0    0    7    3    1
[3,]    1    1    0    0    9    8    4
[4,]    2    1    0    0    8    5    1
     [,8] [,9] [,10] [,11] [,12]
[1,]    8    8     8     3     9
[2,]    2    7    10     2     1
[3,]    5   10     6     3     8
[4,]    2    6     6     1     5

, , 2

     [,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,]   -2    1    0    0    1    8    9
[2,]   -1    1    0    0    3    7    2
[3,]    1    1    0    0    6    2    4
[4,]    2    1    0    0    4    3    7
     [,8] [,9] [,10] [,11] [,12]
[1,]    9    5     3     9    10
[2,]    6    5     7     5     7
[3,]    1    2     2     2     9
[4,]    4    8     9     7    10

I’ll record changes in the UTILITY array over time to track social changes and game strategy. For now, the next goal is to write a genetic algorithm that will work on the UTILITY array (with input from the AGENT, LANDSCAPE, and RESOURCE arrays) to optimise stake-holder actions. The simplest case will be maximising crop yield.

Plans for the genetic algorithm, short and long term

In the short term, it is therefore necessary to write a set of functions for a genetic algorithm, starting first with the functions written in R on 7 FEB 2017 to show proof of concept. I will use these on the UTILITY arrays that I made today and show how agent actions can be simulated to maximise a simple scenario – trying to make as much crop yield as possible, where resources decrease yield if they are on the land. The most difficult part of this will be the fitness function. Essentially, stake-holder agents are going to need to learn or know the relationship between resources and their crop yields, then do something to affect the resources. There are two ways that the relationship between resource and crop yield could be implemented in the model:

  1. Agents know how resources affect yield by looking at their consume column in the RESOURCES array. This is pretty straightforward to implement. Each agent could simply count the number of resources on its cells, look at the landscape cel values, then calculate the proportion their crop yield is predicted to decrease and act accordingly to maximise yield (e.g., by killing resources). This is probably the first implementation to try.
  2. Agents learn how resources affect yield over time by correlating resource abundance on landscape cells with cell production. This would take longer (time steps would have to pass or be burned in before agents got the hang of things) and be more computationally intense (agents would need to dig through the history of interactions), but ultimately be more flexible. In the future, for example, if there was an interaction between two different types of resources, or between resources and other agents, that was not obvoius, then an on-the-fly correlation analysis could pick up unexpected links between interacting resources and stake-holders to affect stake-holder utility. Stake-holders could then adjust their behaviours accordingly, possibly resulting in unexpected decisions (e.g., maybe in a complex system where managers, stake-holders, and resources all affect dynamics, it sometimes is beneficial to have resources on your land but lobby the manager as if it’s a cost). This kind of implementation will require some more interesting correlational analyses that requires agents to see the history of interactions and outcomes. It will be fun to build a framework for this, but I’ll want to get the model up and running without these complications first, then add them in later.

Bringing in the manager will, of course, make things even more complex. I think the best order to do all of this is to focus on 1 above first, then build managers into the model with 1, and then work on thinking about how to implement 2.

Update: 9 MAR 2017

The plotting of \(2 \times 2\) figures that include maps of land ownership and individual stake-holder yields is now complete for observation types 2 and 3. With this complete, I will now turn to writing yesterday’s R function in C (which needs to happen anyway – may as well do it now to keep things fast). Once this is complete, then it will be easier to start building a genetic algorithm for maximising the utility of one stake-holder. Ignoring manager decision-making and conflicting stake-holders for the time being, I will focus on a stake-holder type with a relatively clear goal: maximise crop yield. Using the utility matrices and genetic algorithm notes from earlier, I’ll be able to write a general function in c that affects user behaviour.

User function now written in C

The user function that was written originally in R has now been coded in c. This makes it much faster to first place agents on their own land (if they own land), then count up their yield from the landscape. Testing of this function finds that everything appears to work normally for all observation types and different land dimensions.

I have run valgrind to check for memory leaks again (since it’s been a while).

 R -d "valgrind --tool=memcheck --leak-check=yes --track-origins=yes" --vanilla < gmse.R

No memory leaks were reported.

==26147== HEAP SUMMARY:
==26147==     in use at exit: 104,719,416 bytes in 18,583 blocks
==26147==   total heap usage: 5,168,708 allocs, 5,150,125 frees, 953,760,506 bytes allocated
==26147== 
==26147== LEAK SUMMARY:
==26147==    definitely lost: 0 bytes in 0 blocks
==26147==    indirectly lost: 0 bytes in 0 blocks
==26147==      possibly lost: 0 bytes in 0 blocks
==26147==    still reachable: 104,719,416 bytes in 18,583 blocks
==26147==         suppressed: 0 bytes in 0 blocks
==26147== Reachable blocks (those to which a pointer was found) are not shown.
==26147== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==26147== 
==26147== For counts of detected and suppressed errors, rerun with: -v
==26147== ERROR SUMMARY: 196884 errors from 2 contexts (suppressed: 0 from 0)

Next, I can start to make the users actually do things that might maximise their own yield (e.g., shoot resources or farm cells more effectively). I play to write a flexible genetic algorithm function in c. The function itself could be called from a higher-level function so as to be used directly in R (though I don’t plan to do this for normal G-MSE operations, but it might be useful to include direct R call optoins once the package is complete).

Update: 8 MAR 2017

New landscape layer identifying land ownership

There is now a new layer of landscape, and I have tweaked things to make the current default three layers. These layers include:

  1. A layer of terrain type (not currently in use)
  2. A layer of values that can be used to model cell productivity
  3. A layer identifying the owner of each cell (corresponds to agent ID)

When the cell owner is 0, this effectively means the land is under manager (e.g., public) control. The new initialise landscape function now allows the user to explicitly set the proportion of cells that should go to each owner (vector input).

make_landscape <- function(model, rows, cols, cell_types, cell_val_mn, 
                           cell_val_sd, cell_val_max = 1, cell_val_min = 0,
                           layers = 3, ownership = 0, owner_pr = NULL){
    the_land  <- NULL;
    if(model == "IBM"){
        if(rows < 2){
            stop("Landscape dimensions in IBM must be 2 by 2 or greater");   
        }         
        if(cols < 2){ # Check to make sure the landcape is big enough
            stop("Landscape dimensions in IBM must be 2 by 2 or greater");   
        }
        cell_count     <- cols * rows;
        the_terrain    <- sample(x = cell_types, size = cell_count, 
                                 replace = TRUE);
        the_terrain2   <- rnorm(n = cell_count, mean = cell_val_mn,
                                sd = cell_val_sd);
        if( length(ownership) == 1 ){
            who_owns     <- sample(x = 0:ownership, size = cell_count, 
                                   replace = TRUE);
            the_terrain3 <- sort(who_owns); # Make contiguous for now
        }else{
            who_owns     <- sample(x = ownership, size = cell_count, 
                                   replace = TRUE, prob = owner_pr);
            the_terrain3 <- sort(who_owns); 
        }
        the_terrain2[the_terrain2 > cell_val_max] <- cell_val_max;
        the_terrain2[the_terrain2 < cell_val_min] <- cell_val_min;
        alldata        <- c(the_terrain, the_terrain2, the_terrain3);
        the_land       <- array(data = alldata, dim = c(rows, cols, layers));
    }
    if( is.null(the_land) ){
        stop("Invalid model selected (Must be 'IBM')");
    }
    return(the_land);
}

Hence in the above, if ownership = 0, then the layer is effectively ignored, or if it is a scalar, then ownership of landscape cells is divided equally among integer values from zero to the scalar. However, the most thorough way to set ownership will be by setting ownership to a vector of possible owners and owner_pr to their relative proportions of cells owned. Addition of this landscape layer has been tested and runs without error.

Linking cell yield with agents

I have now begun a user R function, which currently (1) moves agents to somehwere on their owned landscape (if not already there) and (2) calculates the amount of their total yield from the landscape and stores this total amount in the AGENTS array.

user <- function(resource   = NULL,
                 agent      = NULL,
                 landscape  = NULL, 
                 paras      = NULL,
                 model      = "IBM"
) {
    check_model <- 0;
    if(model == "IBM"){
        # Relevant warnings below if the inputs are not of the right type
        if(!is.array(resource)){
            stop("Warning: Resources need to be in an array");   
        }
        if(!is.array(agent)){
            stop("Warning: Agents need to be in an array");   
        }
        if(!is.array(landscape)){
            stop("Warning: Landscape need to be in an array");
        } # TODO: make sure paras is right length below
        if(!is.vector(paras) | !is.numeric(paras)){
            stop("Warning: Parameters must be in a numeric vector");
        }
        # If all checks out, then run the population model

        #======================================================================
        # TEMPORARY R CODE TO DO USER ACTIONS (WILL BE RUN FROM C EVENTUALLY)
        #======================================================================
        
        for(agent_ID in 1:dim(agent)[1]){
            owned_cells <- sum(landscape[,,3] == agent_ID);
            # --- Put the agent on its own land
            if(owned_cells > 0){ # If the agent owns some land
                a_xloc <- agent[agent_ID, 5];
                a_yloc <- agent[agent_ID, 6];
                while(agent[agent_ID,1] != landscape[a_xloc, a_yloc, 3]){
                    a_xloc <- sample(x = 1:dim(landscape)[1], size = 1);
                    a_yloc <- sample(x = 1:dim(landscape)[2], size = 1);
                }
                agent[agent_ID, 5] <- a_xloc;
                agent[agent_ID, 6] <- a_yloc;
            } 
            # --- count up yield on cells
            agent_yield <- 0;
            xdim        <- dim(landscape[,,3])[1]
            ydim        <- dim(landscape[,,3])[2]
            for(i in 1:xdim){
                for(j in 1:ydim){
                    if(landscape[i,j,3] == agent[agent_ID,1]){
                        agent_yield <- agent_yield + landscape[i,j,2];
                    }
                }
            }
            agent[agent_ID, 15] <- agent_yield
        }
        USER_OUT <- list(resource, landscape, agent);
        # TODO: User actions are next...
        #======================================================================

        check_model <- 1;
    }
    if(check_model == 0){
        stop("Invalid model selected (Must be 'IBM')");
    }
    return(USER_OUT);
}

It might be useful to also have a column in the AGENTS array that records percent capacity of yield for stake-holders, perhaps by saving the original landscape (before resources remove yield) and calculating a proportion. A couple notes, the indicated code above will need to be put into C – it’s much to slow for R already. Also, for some reason, if I don’t store a_xloc and a_yloc back into the appropriate agent[agent_ID, 5] and agent[agent_ID, 6], respectively, a weird bug appears. The actual resource population (but not its estimate) flatlines after 20 or so generations at some value. This is very weird because the file gmse.R doesn’t even return the resource or landscape arrays – not yet. I’m not sure why a bug in this the code affects population demographics, but fixing it also appears to correct the problem completely. This is something to watch out for, however.

Plotting owned landscape and stake-holder yield

The figure below shows some new output for G-MSE. The left column of the figure is familiar, but the right column now provides some feedback for five simulated stake-holders that own roughly equal amounts of land. The actual plots of land are shown in the upper right, while the individual yields for each stake-holder’s plots are shown over time in the lower right.

Updated G-MSE output showing stake-holder yields from landscape

Updated G-MSE output showing stake-holder yields from landscape

As of now, this image is only produced for the first two observation functions (case 0 and 1), so I need to replicate it in the other two observation functions. Eventually, it would be better to just have one function for plotting so that any changes made would really be global.

Update: 7 MAR 2017

Tracking crop yield over time

Given that resources now can affect the second layer of the landscape, which can model the percent crop yield (or anything else), we can now plot the mean percent yield per cell (orange) over time along with resource abundance (black) and its estimate (blue). The figure below shows this for an example in which each independent visit by a resource reduces crop yield by 50% (e.g., the individual consumes half of the resources on a cell if it arrives there at a time step).

Updated G-MSE output with mean landscape yield over time

Updated G-MSE output with mean landscape yield over time

This has now only been coded for the mark-recapture plot, so the next task is to fill this out for all of the plot types, then add a new layer of the landscape that will designate each cell with a number that identifies the owner of the land, or if the land is public (type 0). This will allow me to link crop yield to a specific agent.

Update: 6 MAR 2017

Fix read in and out of landscape array from R to C

While testing the resource-landscape interaction, there was an issue with the landscape array being read into C correctly. When R sends an array or vector into C, it is sending the contents of a list (i.e., what might be a \(2 \times 2\) array in R gets read in as if each element were in a list of four elements). The structure of the array then needs to be correctly defined in C so that it matches what it was in R. This requires placing the contents of the elements coming in from R in the correct order with respect to pointers in C, and this occurs in reverse order, so if we had a table in R

Y1 Y2
X1 1 2
X2 3 4

The list would be read in (apparently) as [1, 3, 2, 4], so if we want to read this in to an array in R, and we prefer to make a pointer to X1 and X2 location (which is easier for my brain because it allows array[i][j] to refer to the i individual and j trait), then we need to read in the array as follows:

the_array   = malloc(x_size * sizeof(double *));
for(i = 0; i < x_size; i++){
    the_array[i] = malloc(y_size * sizeof(double));   
} 
vec_pos = 0;
for(j = 0; j < y_size; j++){
    for(i = 0; i < x_size; i++){
        the_array[i][j] = R_ptr[vec_pos];
        vec_pos++;
    }
}

This is not quite intuitive at first, but doing it this way gets R and C on the same page. For example, here is the RESOURCES array moving from R to C and back again. Printed in each environment, the array is the same (note, they could be differently structured and still be technically consistent – e.g., if all arrays were transposed – but this would be a nightmare to code).

> RESOURCES[1:4,1:4]
     IDs type1 type2 type3
[1,]   1     1     0     0
[2,]   2     1     0     0
[3,]   3     1     0     0
[4,]   4     1     0     0
> RESOURCE_NEW      <- resource(resource   = RESOURCES,
+                                       landscape  = LANDSCAPE_r,
+                                       paras      = paras,
+                                       move_res   = TRUE,
+                                       model      = "IBM"
+         );
1.000000    1.000000    0.000000    0.000000    
2.000000    1.000000    0.000000    0.000000    
3.000000    1.000000    0.000000    0.000000    
4.000000    1.000000    0.000000    0.000000    
>         RESOURCES             <- RESOURCE_NEW[[1]];
> RESOURCES[1:4,1:4]
     [,1] [,2] [,3] [,4]
[1,]    1    1    0    0
[2,]    2    1    0    0
[3,]    3    1    0    0
[4,]    4    1    0    0

When reading in the landscape, this got confusing beause the same thing had to be done in three dimensions, and initially I lost track of the pointers causing the layers to mix. This has been resolved now, and I have tested to ensure that landscape elements are identical when read into C and when returned back into R

> LANDSCAPE_r[1:4,1:4,1:2]
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    2    2    2
[2,]    2    2    2    2
[3,]    1    1    1    2
[4,]    1    1    2    2

, , 2

         [,1]       [,2]       [,3]
[1,] 1.540700 -1.7960987  2.7759525
[2,] 1.483312  0.5855166 -0.4789347
[3,] 1.579536  1.0600302  2.2923279
[4,] 1.745043  0.2437264  0.6171671
          [,4]
[1,] 0.6596141
[2,] 0.8117666
[3,] 2.0330554
[4,] 1.1496975

> RESOURCE_NEW      <- resource(resource   = RESOURCES,
+                                       landscape  = LANDSCAPE_r,
+                                       paras      = paras,
+                                       move_res   = TRUE,
+                                       model      = "IBM"
+         );
1.000000    2.000000    2.000000    2.000000    
2.000000    2.000000    2.000000    2.000000    
1.000000    1.000000    1.000000    2.000000    
1.000000    1.000000    2.000000    2.000000    


1.540700    -1.796099   2.775952    0.659614    
1.483312    0.585517    -0.478935   0.811767    
1.579536    1.060030    2.292328    2.033055    
1.745043    0.243726    0.617167    1.149697    
>         RESOURCES             <- RESOURCE_NEW[[1]];
>         RESOURCE_REC[[time]]  <- RESOURCES;
>         
>         LANDSCAPE_r           <- RESOURCE_NEW[[2]];
> LANDSCAPE_r[1:4,1:4,1:2]
, , 1

     [,1] [,2] [,3] [,4]
[1,]    1    2    2    2
[2,]    2    2    2    2
[3,]    1    1    1    2
[4,]    1    1    2    2

, , 2

         [,1]       [,2]       [,3]      [,4]
[1,] 1.540700 -1.7960987  2.7759525 0.6596141
[2,] 1.483312  0.5855166 -0.4789347 0.8117666
[3,] 1.579536  1.0600302  2.2923279 2.0330554
[4,] 1.745043  0.2437264  0.6171671 1.1496975

>

The biological interactions (i.e., the function from 3 MAR) now does what it is supposed to do, and I will move on to make the landscape interactions more interesting.

Allow layers to change by themselves each generation

Given that some resources will affect layers of the landscape, modelling consumption of biomass on cells, it is necessary to also include a function that changes the landscape cell values without any input from resources. This can model the growth of biomass on cells between time steps. I’ve therefore written a new function that does this in R (I don’t think this will be complex enough to require it in C).

update_landscape <- function(model = "IBM", landscape, layer, mean_change,
                             sd_change = 0, max_val = 1, min_val = 0){
    the_land <- NULL;
    if(model == "IBM"){
        xlength   <- dim(landscape[,,layer])[1];
        ylength   <- dim(landscape[,,layer])[2];
        lsize     <- xlength * ylength;
        adj_vals  <- rnorm(n = lsize, mean = mean_change, sd = sd_change);
        adj_layer <- matrix(data = adj_vals, nrow = xlength, ncol = ylength);
        new_layer <- landscape[,,layer] + adj_layer;
        
        new_layer[new_layer > max_val] <- max_val;
        new_layer[new_layer < min_val] <- min_val;
        landscape[,,layer]             <- new_layer;
        the_land                       <- landscape;
    }else{
        stop("Invalid model selected (Must be 'IBM')");
    }
    return(the_land);
}

One feature of the G-MSE model is now that, in addition to a hard imposed carrying capacity on resource types, it is also possible to make the carrying capacity a natural function of the landscape. For example, we might force individuals on the landscape to consume a certain amount of resources on the landscape to survive or reproduce. Hence, as landscape cell values decrease modelling the consumption of biomass, fewer individual resources can survive or reproduce.

Ideally, it will then be possible to parameterise the model using data for, e.g., how much damage to biomass a goose can do to a patch of land. As of now, by default, I’m just assuming that it decreases crop yield by 10%, and increases its own survival probability by the same when it lands on a cell.

For some reason, a function that I wrote to reset the landscape values screwed with the resource abundances (flat-lined after 20 gens for no clear reason). I’ve reverted to a simpler function, and will build up off of this tomorrow, but it would be nice to know why the R function was affecting the population dynamics even when it returned the same landscape that it took in. Tomorrow, I will build up a new function with similar features piece by piece to make sure it works. Then, I will do some initial simulations modelling crop growth as affected by resources on a landscape, and resource dynamics in turn affected by crops. Things to add after include:

  1. User’s ability to grow more crops or shoot resources
  2. Resource movement behaviour that seeks out resources

I’m not sure which to tackle first just yet – perhaps the former because the latter doesn’t seem necessary now.

Update: 3 MAR 2017

Resource-landscape interactions

Having now resolved the issue concerning multi-layered landscapes, it’s time to actually use one of these layers in the model. The goal here is to do the following:

  1. Allow resources of any given type (e.g., individuals in a population of conservation interest) to affect values on the landscape (e.g., representing possible crop yield).
  2. Have values on the landscape affect resources, potentially in one or all of the ways below.
    • Affect individual death rate
    • Affect individual birth rate
    • Affect individual movement

It would be nice if, for example, individuals could have their probability of death decrease if they are on a cell of high value (modelling increased food consumption), or their probability of giving birth (or number of offspring) increase. Movement rules could also allow individuals to gravitate towards high value cells (or stop when landing on one), thereby modelling behavioural change to move toward areas where opportunities for foraging (or nesting, or something else) are high. This could affect consumption of food on different landscape types (e.g., cropland) and hence make it possible to also model management strategies of diversionary feeding.

To incorporate the above, a new function in c is going to be needed that models interaction beween resources and landscapes. This function will require input of:

I will program this in a flexible way within c, and use some default features that will probably decrease a trait and landscape value by a uniform proportion each time (which seems intuitively more reasonable than a uniform value if we’re thinking about probabilities of mortality and proportion of food on a landsdcape eaten). Key options will be called from R.

Progress on resource-landscape interactions

The initial code to allow interaction is written in the form of the following function, locating on a local branch (not pushed on GitHub).

/* =============================================================================
 * This function reads in resources and landscape values, then determines how
 * each should affect the other based on resource position and trait values
 * Inputs include:
 *     resource_array: resource array of individuals to interact
 *     resource_type_col: which type column defines the type of resource
 *     resource_type: type of resources to do the interacting
 *     resource_col: the column of the resources that affects or is affected
 *     rows: the number of resources (represented by rows) in the array
 *     resource_effect: the column of the resources of landscape effect size
 *     landscape: landscape array of cell values that affect individuals
 *     landscape_layer: layer of the landscape that is affected
 * ========================================================================== */
void res_landscape_interaction(double **resource_array, int resource_type_col,
                               int resource_type, int resource_col, int rows,
                               int resource_effect, double ***landscape, 
                               int landscape_layer){
    
    int resource;
    int x_pos, y_pos;
    double c_rate;
    double current_val;
    double esize;
    
    for(resource = 0; resource < rows; resource++){
        if(resource_array[resource][resource_type_col] == resource_type){
            x_pos  = resource_array[resource][4];
            y_pos  = resource_array[resource][5];
            c_rate = resource_array[resource][14];
        
            landscape[x_pos][y_pos][landscape_layer] *= (1 - c_rate);

            current_val = resource_array[resource][resource_col];
            esize       = resource_array[resource][resource_effect];
            resource_array[resource][resource_col] += (1 - current_val) * esize;
 
        }                
    }
}

This needs to be tested more carefully – for some reason both layers are being affected, and I need to make sure that the landscape is being read in correctly.

Update: 24 FEB 2017

RESOLVED ISSUE #14: Success on multi-layered landscapes

Initial testing suggests that I have successfully coded landscapes into G-MSE that have more than one layer. The G-MSE program now initialises (for the moment) landscapes that have depth of two layers, such as the below.

## , , 1
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    9    3    7    8    3    2    8    4    1     1
##  [2,]    8    5   10    6    5   10    6    1    9     2
##  [3,]    6    8    5    9    2    3    4    9    7     9
##  [4,]    9    2    8    9   10    4    8    7    2     1
##  [5,]    1    7    5    3    9    9    3    7    1     1
##  [6,]    3    7    4   10    7    5    9    1    8     8
##  [7,]   10    9    4    4    5    7   10    8    3     5
##  [8,]    3    4    4    2    5    5    2    5    7     2
##  [9,]    3    3    3    7    6    5    3   10    9     7
## [10,]    8    8    2    7    4    7    7    8   10     6
## 
## , , 2
## 
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,] 0.99 0.76 0.12 0.83 0.67 0.28 0.26 0.30 0.51  0.68
##  [2,] 0.68 0.71 0.79 0.74 0.28 0.66 0.53 0.59 0.18  0.39
##  [3,] 0.85 0.42 0.56 0.50 0.92 0.05 0.78 0.98 0.14  0.59
##  [4,] 0.90 0.29 0.42 0.82 0.07 0.29 0.37 0.95 0.32  0.14
##  [5,] 0.63 0.11 0.33 0.46 0.18 0.26 0.73 0.66 0.77  0.42
##  [6,] 0.36 0.69 0.15 0.96 0.94 0.52 0.04 0.27 0.79  0.83
##  [7,] 0.33 0.85 0.16 0.30 0.99 0.95 0.44 0.27 0.41  0.44
##  [8,] 0.93 0.68 0.75 0.15 0.67 0.20 0.67 0.71 0.91  0.87
##  [9,] 0.06 0.53 0.94 0.13 0.14 0.14 0.42 0.22 0.12  0.31
## [10,] 0.28 0.46 0.27 0.01 0.86 0.68 0.05 0.69 0.18  0.33

Hence, we can now have different layers representing different aspects of the landscape. For example, the first layer of the array above (,,1) could represent the kind of terrain type for each cell, while the second layer (,,2) could represent the potential crop yield of the cell. The resource function now returns both the resource array and the multi-layer landscape, meaning the code structure is now in place to do some actual biology. We might have resources located on a particular cell increase or decrease the values of one or more layers. This can then be returned as information to agents or retained somehow. It might get fairly memory-intensive if G-MSE is saving ever layer of landscape for every time step, so it’s worth thinking about how to use the dynamic landscape in each generation.

Running valgrind on the new program reassures me that I’ve not done anything too bone-headed in allocating memory for a three dimensional landscape array.

 R -d "valgrind --tool=memcheck --leak-check=yes --track-origins=yes" --vanilla < gmse.R

It all appears to be freed successfully, with no memory leaks picked up.

==27105== 
==27105== HEAP SUMMARY:
==27105==     in use at exit: 78,219,488 bytes in 16,679 blocks
==27105==   total heap usage: 2,791,027 allocs, 2,774,348 frees, 621,207,897 bytes allocated
==27105== 
==27105== LEAK SUMMARY:
==27105==    definitely lost: 0 bytes in 0 blocks
==27105==    indirectly lost: 0 bytes in 0 blocks
==27105==      possibly lost: 0 bytes in 0 blocks
==27105==    still reachable: 78,219,488 bytes in 16,679 blocks
==27105==         suppressed: 0 bytes in 0 blocks
==27105== Reachable blocks (those to which a pointer was found) are not shown.
==27105== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==27105== 
==27105== For counts of detected and suppressed errors, rerun with: -v
==27105== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Hence, I have merged from a local branch to branch dev. I have also read in the new landscape into the observation function.

Where all of this is going now

Now that I have the hang of returning multiple elements from C to R simultaneously (which is accomplished basically by making a structure SEXP allocating to a list VECSXP, each element of which can be an array), it will be easier to think about the code more holistically – each part of the model can potentially take in and return every type of object, hence there are no restrictions on what one model can affect. To model geese, which is probably the first type of conflict I’m tempted to look at, I can use the population model to allow landscape layers to affect probability of geese mortality and for geese (RESOURCE array) in turn to affect the landscape, and hence crop output. Next week, I’m hoping to get the required code for doing this in place, and to also get some feedback regarding how utility functions of agents should be modelled at the upcoming workshop after my presentation. The game-theoretic component can probably be a work in progress though, and it should be possible to model geese without adding these complexities until after receiving feedback, though thinking about the game-theoretic algorithm and data structures continues to be a high priority of mine.

Update: 23 FEB 2017

Future decision-making algorithm

A recent paper by Miyasaka et al. (2017) looks at land use in a social-ecological system using an agent-based model and some interesting decision-making rules. Individuals calculate utility ‘’(expressed in terms of probability) for all land-use and location options […] and select the option with the highest utility’’. Basically, agents in this model rank probabilities of all land-use options, then try the one with highest probability, then go down the line if the first is not successful (I assume that the payoffs after success are identical, though this isn’t entirely clear to me). Agents also shift decisions and labour allocation based on similar households (imitation).

Coding goals for the day

  1. Break down move functions into easier to digest pieces by creating functions:
    • movement_dir: Causes movement in the x or y direction
    • edge_effect: Does something at the edge when encountered
  2. Initialise landscape layers

Further progress

Goal 1 has been completed, and, as I’ve been tempted to do before, I have added a new utilities.c file for holding functions that need to be called by other c files (e.g., moving resources).

The next thing I have done (on an unpushed branch, now merged) is to tweak the observation function in C to return a list with two elements. The first element is the set of observations in array form (as before), and the second element is the AGENTS array. With the new c code, I can now return multiple things from C to R with the same function, which opens up some new possibilities, in particular allowing the landscape to change along with the resource and observation arrays within the same C function. It also potentially makes the anecdotal function obsolete, as the change in the AGENTS array can just be made within observation instead of calling a new R function.

The next challenge is to get a multi-layered landscape in and out of C from R. I’m not sure how many ways there are to do this, exactly, but the simplist might actually be to write a three dimensional array to read in the same way as the two dimensional arrays. This will require a very nasty set of loops allocating pointers of pointers of pointers (i.e., ***land), but the idea should be easy enough.

Update: 22 FEB 2017

Moving back to machine learning in G-MSE

Having now completed a new R package modelling a genetic algorithm for iterated games played on \(2 \times 2\) symmetric payoff matrices, and applied this package to an upcoming presentation on the future of G-MSE, I now turn back to coming up with a functional genetic algorithm for G-MSE software.

Additional issues

Currently, there are five outstanding issues in G-MSE. Issues 9, 11, and 12 are rather trivial and easy to implement. Issues 10 (dealing with the implementation of multiple resources) and 14 (dealing with multiple landscape dimensions) require more serious consideration. Perhaps partly because of the recent ConFooBio focus on geese and the sizeable special issue in Ambio that just came out, I am thinking about first coding in multiple layers to LANDSCAPE. As noted in issue 10, this can be done fairly straightforwardly in R and C – the different layers can simply be list elements in R (so LANDSCAPE[[i]] is one layer that is actually a matrix of real values) and read in as a three dimensional array in C. I was able to do this while making the gamesGA package, with the agents array being set in R, then being unlisted with unlist() before being changed into array form and read into C in fitness.R. This isn’t a particularly elegant solution, but it’s one that could work with code such as the following:

land_r_vec <- unlist(LANDSCAPE_r);
land_r_arr <- matrix(data = land_r_vec, nrow = dim(LANDSCAPE_r)[2]);

Alternatively, and probably better in the long run for efficiency (though I doubt that the above call would lose much) if the lists were directly read into C and returned as lists. I’m not sure if this is possible, but if it is, I’ll have to make use of C data structures that are read in from R’s C interface.

The reason that I’m keen to start with the landscape layer implementation is that I think this might be the best way to initially model crop production. The more flexible way to do it would be to put crops in the RESOURCE array, but this would require much more memory and computation time than I really think is necessary for what, in all cases that I can currently concieve, really comes down to just a real number at a location. By adding this real number to the landscape and letting it be increased or decreased by agents and resources, we can have the most straight-forwad method of modelling crop production as affected by farmers, managers, and animals. Another layer, however, is potentially needed mapping x and y locations to ownership of a particular stake-holder. Thus, I can imagine a landscape with three layers, the combination of which will let us address some basic questions concerning conflict between farmers and conservationists:

## [[1]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    2    4    2    4    4    3    4    1     1
##  [2,]    2    1    1    1    1    1    1    3    4     2
##  [3,]    4    4    1    2    2    1    1    3    1     1
##  [4,]    1    2    4    3    4    3    1    2    1     2
##  [5,]    3    3    4    2    2    2    4    3    3     3
##  [6,]    1    1    3    2    3    2    1    1    3     3
##  [7,]    4    3    4    4    2    4    2    2    3     3
##  [8,]    4    4    4    3    2    2    2    2    2     3
##  [9,]    3    3    3    3    2    3    1    4    4     2
## [10,]    4    3    1    4    2    1    4    3    4     4
## 
## [[2]]
##            [,1]       [,2]       [,3]       [,4]      [,5]      [,6]
##  [1,] 0.1216903 0.87620468 0.11585701 0.63549572 0.5499610 0.7247075
##  [2,] 0.7033615 0.19920331 0.16832448 0.75077540 0.2100236 0.4698127
##  [3,] 0.5296123 0.67776576 0.53901556 0.66568077 0.3636397 0.1929119
##  [4,] 0.3904591 0.35405551 0.05141652 0.02382895 0.7146165 0.4595963
##  [5,] 0.7282723 0.87061639 0.52147139 0.95831419 0.0962034 0.1766737
##  [6,] 0.5647454 0.71690912 0.14866888 0.24325914 0.1882054 0.2663904
##  [7,] 0.9803632 0.09975753 0.61459262 0.06857492 0.8732086 0.6268729
##  [8,] 0.5676452 0.69077684 0.57366414 0.18195190 0.1786565 0.3679847
##  [9,] 0.4064361 0.61380977 0.36716846 0.89387672 0.3675970 0.6841347
## [10,] 0.6857836 0.81209502 0.99363538 0.94351049 0.5384275 0.3058252
##             [,7]       [,8]       [,9]     [,10]
##  [1,] 0.08078416 0.40033129 0.36557829 0.3415115
##  [2,] 0.32981568 0.44815580 0.66727945 0.6074423
##  [3,] 0.27434346 0.55946386 0.35760666 0.1560804
##  [4,] 0.65027376 0.85410670 0.02720575 0.7191878
##  [5,] 0.10279874 0.10008282 0.38078379 0.4057249
##  [6,] 0.55986298 0.08748274 0.29607922 0.0820963
##  [7,] 0.64130285 0.14691714 0.28797907 0.6648468
##  [8,] 0.23655619 0.42326755 0.91423327 0.1083085
##  [9,] 0.06163966 0.15931278 0.39174257 0.8304182
## [10,] 0.71175442 0.02287805 0.28183796 0.2976164
## 
## [[3]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]    1    1    2    1    2    1    1    2    1     2
##  [2,]    2    1    2    1    2    2    1    2    1     2
##  [3,]    2    2    1    2    1    1    2    2    2     1
##  [4,]    2    2    2    2    2    1    2    2    1     1
##  [5,]    1    2    2    1    1    1    2    2    2     2
##  [6,]    1    2    1    2    1    1    1    2    1     1
##  [7,]    1    2    1    1    2    2    1    2    2     1
##  [8,]    1    1    2    2    1    1    1    2    1     2
##  [9,]    2    1    1    1    2    2    1    1    2     1
## [10,]    2    2    1    1    2    2    2    2    2     2

Where, above, the first element (i.e., layer) is terrain type (already in G-MSE), the second element is the production potential of crops, and the third element is the stake-holder that owns the land (a zero could be included for public land). It would make sense if the land was contiguous – I don’t see a good algorithm for this, so it might be necessary to make one. I could imagine something that breaks down the map into equal segements (e.g., like programs created to avoid gerrymandering, but easier because we don’t have to worry about population size – at the moment). Of course, if the number of cells does not evenly divide by the number of simulated farmers, then some farmers are going to have bigger farms than others, but perhaps we want this to be the case? It would be nice to be able to specify the variation in farm size. In fact, it would probably be good to use this to also incorporate the total amount of farm space. Something like the below

#farmers     <- 10;
#total_cells <- dim(landscape[[3]])[1] * dim(landscape[[3]])[2]
#pr_farmland <- 0.9;
#farm_cells  <- floor( pr_farmland * total_cells );
#extras      <- farm_cells - farmers; # Every farmer needs at least one cell
#farm_prp    <- rep(x = extras / farmers, times = farmers); # Vec can change
#farm_cells  <- sample(x = farmers, size = extras, 
#                      prob = farm_prp, replace = TRUE);
#farm_cells  <- c(farm_cells, 1:farmers);
#cell_table  <- table(farm_cells);
#print(cell_table);

NOTE: The above has been commented out due to errors in making the page, for some reason.

The above cell_table therefore shows how many cells each farmer gets. Some sort of (simple) cluster algorithm is needed to distribute each farmer’s allocated cells to an area of the landscape. Remaining cells will be 0 cells, indicating other land. Note that the pr_farmland could also be a function of the number of cell types in landscape[[1]]. I haven’t decided if this should be done in R (as above) or C. I’m leaning toward C (or, at least, kicking things to C when then get complicated) because I can imagine the need to specify some detail in these landscapes.

Landscape connections to birth rate

Given the above proposed additions to the landscape, it’s worth perhaps considering an option in the resource function to link resource birth rate to properties of the landscape. As of now, carrying capacity is just assumed to be a parameter of the model that is static, but it would be particularly interesting if the parameter value could change based on properties of the landscape. For example, all of the values of landscape[[2]] could be summed up (perhaps multiplied by a scalar) to determine number of offspring produced on any given cell. If instead of random uniforms between zero and one, landscape[[2]] instead represented something more concrete such as kilograms of edible biomass produced (or whatever), this could be directly translated to offspring reared. Of course, with the geese, this opens up some issues – mainly that breeding is done somewhere else; perhaps the landscape[[2]] should instead affect survival instead of birth rate?

It’s important to think about the scale here too – as a habit, I’ve been thinking of cells as kind of mid-sized things, perhaps a square kilometer at most, but it might be better to think of them as much larger, so each cell could be its own farm with potentially many geese. Of course, we’ll want the option to do both, but given the scale of the geese scenario, I’m thinking bigger might be better. It also would be useful to have managers be able to allocate refuge space from their budgets.

Update: 17 FEB 2017

Big picture notes regarding G-MSE presentation

In presenting G-MSE, I think it is important to emphasise that game-theory is the standard, formal, tool for understanding conflict between rational agents. Hence, it is the natural tool for addressing issues of cooperation and conflict in conservation (Colyvan et al. 2011; Lee 2012; Kark et al. 2015; Adami et al. 2016; Tilman et al. 2016). It’s important to also recognise that game theory is broader in scope than the application of standard pay-off matrices, and includes extensions such as adaptive dynamics. Where strategies are complex, machine learning techniques such as the use of genetic algorithms can be used to find adaptive strategies for games (e.g., Balmann and Happe 2000; Tu et al. 2000). And a game-theoretic framework is entirely compatible with agent-based modelling (An 2012, Tesfatsion et al. (2017)). Bonabeau (2002), citing Axelrod, argues strongly for an agent-based approach to game theory. Hence, a good summary of the key concepts of G-MSE is shown below.

G-MSE flowchart

G-MSE flowchart

The big green circle is the engine that drives the decision-making of rational agents (i.e., managers and stake-holders) under complex choices and payoffs.

Update: 16 FEB 2017

New considerations for machine learning in gmse

Finishing the gameGA R package has given me a bit more perspective on the eventual structure of the genetic algorithm of gmse. Taking into account the history of interations between two agents was straightforward in the case of Prisoner’s dilemma, or any symmetrical \(2 \times 2\) payoff matrix. Because there were only two options to consider (‘cooperate’ and ‘defect’), every locus of each agent’s strategy just represented a response to a different interaction history. By changing the default parameters, I was also able to recreate the results of Darwen and Yao (1995), who found that strategy evolution under the following payoff values tended to result in long periods of defection or cooperation punctuated by rapid transitions from one to the other.

Opponent cooperates Opponent defects
Focal player cooperates 3, 3 0, 5
Focal player defects 5, 0 1, 1

Typical evolution of strategies given the above payoff matrix and a three-move memory history with 100 rounds per opponent looks like the below. Periods of low fitness show areas where most strategies have evolved to defect, while periods of high fitness show areas where most strategies cooperate.

Typical evolution of cooperation in gamesGA simulation of Prisoner’s dilemma

Typical evolution of cooperation in gamesGA simulation of Prisoner’s dilemma

This simple example highlights something that is potentially important for understanding conflict in conservation scenarios, cooperation (and conflict) might be fragile, with rapid shifts from one strategy dominating to another taking over without much external pressure.

The fragility or robustness of conflict in conservation could have major influences on policy, particularly where we’re interested in coming up with long-term sustainable solutions. Two key questions immediately come to mind:

  1. How does the (lack of) robustness of cooperation in the simplified model of gamesGA scale up with complexity? That is, real-world conflicts are much more complex than this simplified model, so will this complexity make existing cooperation and conflict more robust, or more fragile? We can draw a comparison here, perhaps, with the community ecology literature, where the questions have long been posed, are more complex communities more stable and more productive? There is a lot of recent literature on this, both from theoretical and empirical studies, and stretching all the way back to the early works of Elton and May. Applying similar ideas to social-ecological systems could be useful – it could be that, like community ecology, there are qualities of such systems that make cooperation or conflict more robust (I’ve been particularly interested in degeneracy). I’ll revisit some of the community ecology literature to remind myself what the key points are.
  2. How do external influences, including management actions, affect the robustness of cooperation? If we find that cooperation is fragile in social-ecological systems, are there some kinds of managment options that tend to reduce that fragility, or at least slow down what might otherwise be a rapid shift from cooperation to conflict (or hasten a rapid shift from conflict to cooperation)? This is one area in which a machine learning and game-theoretic approach could really shine, producing a general framework for exploring the robustness of cooperation and conflict to changing conditions.

I’m wondering whether a couple papers could be especially useful to the conservation literature – one could be a perspective piece just on the application of game theory to understanding conflict and management, and things that will need to be taken into account (more on this later – but would include time lags, interactions among stake-holders, degeneracy of effects, etc.); the idea of applying game theory to management questions and conflict is now familiar to ConFooBio, but a lot of the thinking we’ve done could risk being lost if not published as a lead-in to more complex modelling, behavioural games, or time-series analyses. A second paper could be a basic starting point for addressing how robust cooperation and conflict are predicted to be – this might be answerable without the full power of a complex gmse software, focusing instead on modelling some simplified scenarios (using a version of gmse with a more samplified ‘g’), then (probably) concluding that additional work will be needed to really get at complex real-world case studies (which we’ll do with gmse).

Other thoughts on strategy

It’s also worth noting that gamesGA does not allow for some strategies that might be relevant, such as the ‘win-stay’ and ‘lose-shift’ strategy (Nowak et al. 1995). There is also considerable work on the robustness of cooperation in games such as Prisoner’s dilemma – a lot of which consider spatial effects (local networks, grid-based cooperation) explicitly. I’ve not found anything that looks at this in the context of conservation though, so I think there could well be scope for a high-level perspective paper here. It could also be worth considering other types of games, such as the snowdrift (chicken, hawk-dove) game, which cooperation and conflict are potential outcomes.

Opponent cooperates Opponent defects
Focal player cooperates 2, 2 3, 1
Focal player defects 1, 3 0, 0

The above payoff matrix produces even more fragile results (shown below).

Typical evolution of cooperation in gamesGA simulation of the snowdrift game.

The robustness of these results gets stronger though when the payoff differences get more severe so that mutual defection is much worse. For example, consider the following payoff matrix.

Opponent cooperates Opponent defects
Focal player cooperates 12, 12 13, 11
Focal player defects 11, 13 0, 0

Defection given the payoffs below has a much more difficult time getting a foothold because anytime defectors become sufficiently frequent in a population, their fitness drops dramatically compared to cooperators.

Typical evolution of cooperation in gamesGA simulation of the snowdrift game with a major penalty for mutual defection.

The point is that the differences between payof values will matter by increasing the risk associated defection (or cooperation). Note that values near zero in the first plot implied a population of mostly defectors, whereas equal magnitude of change in mean fitness does not correspond to as great a difference in the proportion of cooperators and defectors in the second figure.

A perspective paper on theory of conflict and cooperation in conservation?

Given that there is little to nothing on application of game theory to conflict and cooperation in conservation, it strikes me that a forward-looking paper could be useful for establishing some things – perhaps making use of the gamesGA R package as a conceptual tool for demonstrating some key points. Relevant topics include:

  1. Background on conflict and cooperation in the context of conservation, particularly biodiversity and food security
  2. Background on game theory, and its use in understanding the logic of conflict and cooperation in both biological and social systems
  3. Key questions in applying game theory to conflict and cooperation in conservation social-ecological systems.

    • At what scale(s) should we model conflict – at the individual or institutional level, or both?
    • How robust will sustained cooperation and conflict be in social-ecological systems, as affected by both internal and external pressures? Here is a place to insert gamesGA point about robustness.
    • How much complexity do we need to build into models before it becomes possible to predict the outcomes of management policies with any degree of confidence, especially given the inherent biological and social uncertainties involved?
  4. Specific points regarding the complexity inherent to predicting social-ecological conflict: how to address these in a way that can be beneficial for management recommendations

    • Machine learning and genetic algorithms: flexible tools for understanding and predicting complex strategies. Discuss how these tools are already used successfully in other contexts.
    • Agent-based modelling as an approach to simulating realistic scenarios of cooperation and conflict
    • Explicit consideration of degeneracy as a potential management strategy for increasing the robustness of cooperation (see recent work by Man et al. 2016)
  5. The ultimate goal: towards a modelling framework that simulates adaptive management of populations under the influence of conflicting stake-holders, and is capable of simulating management options in silico to predict the efficacy of policy.

I’m not sure if this is the best structure or not, but I think I could see a paper like this succeeding in setting up the importance for future work.

Update: 13 FEB 2017

gameGA: a new R package that also can be run from a browser

In preparation for two upcoming workshop talks, I have developed a new R package to demonstrate the potential of machine learning and genetic algorithms in understanding human conflicts between food security and biodiversity. Package installation instructions are available on the GitHub repository, and the program can also be run directly from a browser courtesy of shiny. This package is mostly to serve as a proof of concept; while limited in its applications (though it could later be incorporated into gmse, if desired), it demonstrates a relevant and flexible application of machine learning to games theory. Further, the fact that the processing time of simulations is very rapid – not even noticeable even when run from a browswer (note, the fitness function is coded in c; had it been coded in R, most simulations would take up to a minute), shows that it is realistic to simulate multiple genetic algorithms (underlying multiple agent strategies) within a program. I have no desire to upload gameGA onto the CRAN Repository, unless it is requested.

In the coming days, I will continue to put together a forward-looking talk that outlines how management strategy evaluation can be combined with game theory (making use of genetic algorithms to understand behaviour) to better understand and potentially help resolve conflicts over food security and biodiversity. I think that it will be reasonable to argue that the range of strategies predicted by even a simple iterated Prisoner’s dilemma (or any other two player two decision symmetric payoff scenario) probably reflect, reasonably well, the kind of variation in human behaviour that might be predicted in real systems. Most humans will not act completely rationally, therefore we might expect a lot of uncertainty in human behavoiur where conflict arises; most strategies will be aligned with the interests of individual stake-holders, even if they are not perfect at maximising stake-holder interests.

Update: 8 FEB 2017


Project Summary: G-MSE v0.0.52

A central purpose of G-MSE software will be to provide a user-friendly yet flexible tool for simulating the management of populations, with particular emphasis on a mechanistic simulation of uncertainty and interactions among managers and stake-holders. Hence, the software will be able to address key questions concerning conflict in all of the specific ConFooBio case studies, but also provide a general framework for developing social-ecological theory. My hope is to introduce v1.0 by the end of the year, which will take advantage of shiny to let users run simulations and view results within a browser, giving as many users as possible access to the key features of G-MSE. Because shiny is called directly from R, users who are familiar with R will be able to use functions within a gmse package (the name is not yet taken on the CRAN list). All of the code underlying G-MSE, and its complete development history, will be available on GitHub for maintenance, further development, and collaboration (currently, the repository is private, meaning it is viewable by invite only, but I’ve no qualms with making it public). My goal now is to summarise what aspects of G-MSE have already been developed, and to outline my plans for future development. Feedback at this stage is very welcome, particularly concerning what features of the software are most (or least) important. The figure below illustrates a general overview of G-MSE. The left panel represents how users will interact with the software, and the right panel represents the model itself, which uses the G-MSE concept proposed in the ConFooBio ERC proposal.

What I’ve done so far

We now have a working, stable (i.e., bug-free, as far as I can tell), resource model (blue box above) and observation model (yellow box above). Details of how these models are coded and used can be found in the notes below, and I am happy to summarise them. For now, I will avoid the technical details and focus on what these models can do; the code is written with future development in mind, meaning that if there is a feature that is not in either of these models that should be, adding it will almost certainly be a matter of inserting a bit of additional code rather than re-coding major chunks of the model. I’ll start by talking about the resource model.

The resource model is, by default, individual-based. What this means is that each resource is represented as a discrete entity with a number of different attributes. I use ‘resource’ as a general term because these resources can really be anything that we want them to be; potential resources include grouse, hen harriers, geese, fish, elephants, crops, hunting licenses, etc. Basically, anything that we want to represent discretely that is not an agent (a manager or stake-holder) can be considered a resource. Each resoure has its own ID, and can take an natural number of types in three type categories (i.e., type1 can take any natural number to group resources in some way, as can type2 and type3). Types could be used for different populations of resources within the same simulation (e.g., hen harriers and grouse; wild and farmed salmon), or further define life-history stages, sex, or something else.

Resources occupy some x y location on a landscape. The landscape can be of any length and width combination, and has a torus edge whereby opposite edges are joined so that resource moving off of one side appear on the other (I can easily add a hard boundary, or a reflecting edge if desired). Currently, the landscape has one layer (more could be added), with cells on the landscape taking any real number. I’m not using cell values at the moment, but these could represent anything from terrain types to environmental variables. During one iteration of the resource() function, resources move according to one of four pre-specified rules:

  1. No movement
  2. Random movement in any x and y direction selected from a uniform distribution.
  3. Random movement in any x and y direction selected from a Poisson distribution.
  4. Movement a Poisson selected number of times, each time moving randonly in any x and y direction selected from a uniform distribution (Duthie and Falcy 2013)

After movement, each resource can potentially reproduce according to a growth parameter. The number of offspring that a resource produces is determined by using this as the rate parameter in sampling from a poisson distribution. A carrying capacity can be applied to birth such that if too many offspring are born, then offspring are randomly removed until carrying capacity is reached. Offspring resources have identical traits to their parent resources. It is also possible to not allow any birth for some resources.

After birth, resources that were not just born can be removed (i.e., death) in one of three ways:

  1. No removal possible
  2. Removal with some universal, individual-specific probability determined by an individual’s remove trait.
  3. Removal probability determined by a carrying capacity (potentially different from the carrying capacity affecting birth), where probability of removal for any individual resource increases as the population increases above carrying capacity.

The resource model then returns the new set of individuals; we therefore have the basic processes of movement, birth, and death. These processes could be made more complex (e.g., sexual reproduction, more complex movement rules – toward or away from other resources, perhaps), and any number of other processes might be added into the resource model, including interactions between resources, if desired. I’m considering what we have now as a starting point.

The observation model simulates the process of data collection (but not data analysis, which is done elsewhere – eventually probably in the manager model). It basically generates an uncertain snapshot of the real population(s) by sampling from it in one of four ways (A-D):

  • (A) Sampling resources from a subset of the landscape, inferring population size from the density of the subset (Nuno et al. 2013).
  • (B) Marking a random number of resources in the population, then recapturing them for MRM analysis.
  • (C) Sweeping down a linear transect of the landscape – checking every cell while resources are potentially moving (and therefore generating double-counted and missed resources).
  • (D) Sweeping along blocks of the landscape, one at a time, and checking every cell, again while resources are potentially moving.

The figure below shows the dynamics of a real population (black line) with a carrying capacity on death of 800 (dotted red line), as estimated by each method (blue lines, panels A-D).

Figure above shows four different observation types as applied to the same population model: (A) observation type 0 (sample a random region and then extrapolate population size by calculating density), (B) Mark and recapture individuals and estimate population size using a Chapman estimator, (C) sample along a linear transect while resources can move while sampling, and (D) sample blocks where resources can move while sampling

Figure above shows four different observation types as applied to the same population model: (A) observation type 0 (sample a random region and then extrapolate population size by calculating density), (B) Mark and recapture individuals and estimate population size using a Chapman estimator, (C) sample along a linear transect while resources can move while sampling, and (D) sample blocks where resources can move while sampling

We can run 100 time steps of 800 resources in a trivial amount of time (less than half a second) using any observation method. Of course, things slow down when adding more resources or generations, but even hundreds of thousands of resources can be simulated over 100 time steps can be simulated in under a minute.

In addition to the resource and observation models, I have played around with a few more minor things that can be called on when desired. This includes a function called anecdotal that allows agents (managers and stake-holders) to see any resources within their local view – essential mimicking anecdotal observation through seeing how many resources are around them at any given time (this might later affect stake-holder attitudes or decisions).

The most interesting other thing that I’ve added is a prompt for user input. Basically, after a certain number of time steps (or right from the start of the simulation), an option in the program allows the user to act like a stake-holder or manager. After a time step has finished, the user is prompted with a message like the following on the R command line:

Year:  95
The manager says the population size is  181
You observe  11  animals on the farm
Enter the number of animals to shoot

The user then types in how many animals that they wish to shoot, and these animals are removed from the population.

What I’m going to do in the future

A detailed journal of recent development history is below. Here I will summarise how I plan to complete the software, and the rationale behind some (tentative) decisions. Right now, I am focused on getting through the main engine of G-MSE (red, green, and orange boxes from the first figure above), with the primary challenge of integrating game theory into G-MSE. The manager and users models are unique in that both require agents to make decisions that potentially affect each other and the resources. I am simulating agents as discrete individuals, but unlike resources, agents have different traits and are represented by different data structures in the code. Like resources, however, agents can take on any number of three different category types. Category type1 is the type of most importance, which is used primarily for distinguishing among managers and different types of stake-holders. The manager(s) is always of type1 = 0, and plays a special role in observing the population, and will make policy decisions based on the observational model and (eventually) the numbers and past behaviour of stake-holders through the manager model. Other type1 agents act as stake-holders instead of managers, and act through the user model.

I’ve spent some time trying to decide how to incorporate game theory into the G-MSE software. There is more than one way to do it. My first thought was to model games using the traditional \(2 \times 2\) payoff structure, with managers setting the payoffs and two stake-holders acting as players trying to maximise their gains. Given this sort of structure, solving for optimal strategies can be easily accomplished, and we can certainly add this type of mathematical solution as an option in G-MSE. The utility of this kind of mathematical approach starts to unravel, however, when games become more complex (discussion and references all below, mostly from late January). In particular, solving for optimal strategies and equilibria (of which there can be multiple) can become increasing difficult to intractable given any of the following:

  • Asymmetric payoffs: In other words, when the payoffs of choices such as ‘cooperate’ or ‘defect’ are not the same for each player, but depend instead on player type.
  • More than two strategies: If players are not constrained to doing one of two things, but could do three or more things instead, then the number of possible payoff structures and therefore optimal strategies increases exponentially.
  • More than two players: If there are many different players and types of players that affect payoffs (e.g., as Saro noted by including managers in the payoff structure), then finding optimal strategies quickly becomes intractable.
  • Iterated interactions: If players are able to access interaction histories (previous instances of games played; i.e., an extensive-form game), then the number of possible strategies becomes extremely high – too high to enumerate in a lifetime – after even a few rounds of Prisoner’s dilemma.

Any realistic social-ecological conflict is probably going to include one or more more of the above complications. While I really like simple mathematical and conceptual models (particularly those that provide unifying explanations), and believe that they are especially useful in developing theory, I don’t believe that the case studies that we are interested in will be tractable if we exclude the above bulleted possibilities. Nor will the software be very flexible if we confine users to very simple games. Hence, I think a different approach is needed to model the strategies of rational agents.

An increasingly used method of simulating complex, goal-oriented strategies, is through the use of machine learning. The idea behind the machine learning approach is to teach a program to learn, so the program can figure out how to solve a problem (e.g., find the best strategy) without actually being told the solution; figuring out the best solution would be effectively impossible because there are too many possible solutions to explore and compare. One technique for narrowing down the possibilities and arriving at a very good (though possibly not best) solution is to simulate the process of natural selection using a genetic algorithm. The genetic algorithm starts of with a random set of simulated genomes (genotype), each of which maps to a random strategy (phenotype). It then allows for recombination between genomes, and mutation, and checks each strategy to see how good each is at solving the problem at hand (e.g., maximising the payoff in a game). The most fit strategies reproduce, and more generations are simulated until some criteria is filled (e.g., the mean fitness of strategies is no longer improving, or 100 generations have passed). Once this criteria is met, the evolved strategies have been selected to solve the problem.

Additionally, a machine learning approach can use data to learn how to behave in a particular scenario. Google uses this in some of their software, perhaps most familiarly in gmail sorting incoming emails into different categories. Most timely, and perhaps most excitingly for those of us who are interested in games, a machine learning algorithm has been very recently developed that can consistently beat professional poker players. From the linked article in MIT Technology Review,

‘’DeepStack learned to play poker by playing hands against itself. After each game, it revisits and refines its strategy, resulting in a more optimized approach. Due to the complexity of no-limit poker, this approach normally involves practicing with a more limited version of the game. The DeepStack team coped with this complexity by applying a fast approximation technique that they refined by feeding previous poker situations into a deep-learning algorithm.’’

My goal is to apply a genetic algorithm to G-MSE, which will allow manager and stake-holder behaviour to be modelled for any potential objectives (e.g., maximise crop yield, keep populations near carrying capacity, keep all stake-holders happy, etc.) and allowing for multiple types of actions (e.g., hunt, scare, plant crops, protect offspring, pester the manager, forbid stake-holders). I’m not yet sure if this is realistic or not, but I think the genetic algorithm approach will at least get us further than anything else (save for some sort of brilliant new conceptual theory that shows how we can avoid the aforementioned complications). I’ve drafted a prototype genetic algorithm in R, which conceptually looks like the following figure.

The end result of this kind of implementation of G-MSE could allow us to:

  • Simulate many different kinds of social-ecological conflict (real and hypothetical) and predict how rational managers and stake-holders are likely to act given novel scenarios (i.e., perform in silico tests of new scenarios or new policies).
  • Compare model predictions to observations in the field.
  • Compare, more specifically, how the strategies of real people diverge from those predicted by a model, allowing us to identify stake-holder motivations that we might not otherwise have considered.
  • Build strategy algorithms based on empirically-derived data on people’s behaviour and apply them to new scenarios of conflict.
  • Estimate the uncertainty of model predictions by multiple simulations with identical starting conditions.
  • Explore how uncertainty within the model affects agent strategies.

This concludes the summary. There are a lot of challenges to implementing the genetic algorithm, but a very initial prototype below shows the idea. My hope is to have a beta version of G-MSE up and running sometime in the summer (with a polished version later in the year), and to continue to build upon the model as needed to allow for new scenarios and improved genetic algorithms. I am very open to feedback on what is and is not important for initial versions of G-MSE.


Update: 7 FEB 2017

Prototype of genetic algorithm in R

I have constructed a prototype of a genetic algorithm, written in R, but deliberately avoiding most base R functions that are not available (or that I won’t want to use) in c. Once I have a prototype that I’m happy with, I will write it up in c and start to implement it into G-MSE. There are afew tricks that I’m going to want to use, particularly to swap arrays in the tournament, which I believe can be accomplished just by swapping all pointer addresses. Additional optimisation ideas might be found here; I’ll probably need to be careful to keep this process speedy, but even the initial R code is fairly efficient, so I’m optimistic.

I’ve broken the R code down into five basic functions, representing the boxes from the most recent conceptual diagram from 3 FEB, with the exception of ‘replace’, which is done automatically within ‘tournament’. The first function identifies the focal agent in UTILITY, then initialises a population of 100 agents, 10 of which are identical to the focal agent, and 90 of which are identical in all except their five action columns, which are randomised instead (note, the whole file is recorded by git in scratch.R, which might later be overwritten). We also need a min_cost function to run in initialise_pop too allocate actions according to costs cleanly.

min_cost <- function(budget_total, util, row){
    the_min <- budget_total;
    for(check in 1:5){
         index <- (2*check) + 5;
         if(util[row, index] < the_min){
             the_min <- util[row, index];    
         }
    }
    return( as.numeric(the_min) );
}

# Add row 10X to 90 random (first brown box)
initialise_pop <- function(UTILITY, focal_agent, population){
    for(agent in 1:dim(population)[1]){
        if(agent < clone_seed){
            for(u_trait in 1:dim(population)[2]){
                population[agent, u_trait] <- UTILITY[focal_agent, u_trait];
            }
        }else{ # No need to bother with a loop here -- unroll to save some time
            population[agent, 1]  <- UTILITY[focal_agent, 1];
            population[agent, 2]  <- UTILITY[focal_agent, 2];
            population[agent, 3]  <- UTILITY[focal_agent, 3];
            population[agent, 4]  <- UTILITY[focal_agent, 4];
            population[agent, 5]  <- UTILITY[focal_agent, 5];
            population[agent, 6]  <- UTILITY[focal_agent, 6];
            population[agent, 7]  <- UTILITY[focal_agent, 7];
            population[agent, 9]  <- UTILITY[focal_agent, 9];
            population[agent, 11] <- UTILITY[focal_agent, 11];
            population[agent, 13] <- UTILITY[focal_agent, 13];
            population[agent, 15] <- UTILITY[focal_agent, 15];
            population[agent, 8]  <- 0;
            population[agent, 10] <- 0;
            population[agent, 12] <- 0;
            population[agent, 14] <- 0;
            population[agent, 14] <- 0;
            population[agent, 16] <- 0;
            lowest_cost <- min_cost(budget_total = budget_total, util = UTILITY,
                                    row = focal_agent);
            budget_count <- budget_total;
            while(budget_count > lowest_cost){
                affect_it <- 2 * floor( runif(n=1) * 5); # In c, do{ }while(!=6)
                cost_col  <- affect_it + 7;
                act_col   <- affect_it + 8;
                the_cost  <- population[agent, cost_col];
                if(budget_count - the_cost > 0){
                    population[agent, act_col] <- population[agent, act_col]+1;
                    budget_count <- budget_count - the_cost;
                } # Inf possible if keeps looping and can't remove 1
            }
        }
    }
    return(population);
}

After the initiali population of agents is made, we include functions through which the genetic algorithm will loop, simulating key evolutionary processes. The first such process is crossing over.

# Crossover (second brown box)
# Would really help to define the SWAP function in c here -- use int trick
crossover <- function(population){
    agents     <- dim(population)[1];
    cross_prob <- 0.1;
    for(agent in 1:dim(population)[1]){
        c1 <- runif(n=1);
        if(c1 < cross_prob){
            cross_with                <- agents * floor(runif(n=1)) + 1;
            temp                      <- population[cross_with, 8];
            population[cross_with, 8] <- population[agent, 8];
            population[agent, 8]      <- temp;
        }
        c2 <- runif(n=1);
        if(c2 < cross_prob){
            cross_with                <- agents * floor(runif(n=1)) + 1;
            temp                      <- population[cross_with, 10];
            population[cross_with,10] <- population[agent, 10];
            population[agent, 10]     <- temp;
        }
        c1 <- runif(n=1);
        if(c1 < cross_prob){
            cross_with                <- agents * floor(runif(n=1)) + 1;
            temp                      <- population[cross_with, 12];
            population[cross_with,12] <- population[agent, 12];
            population[agent, 12]     <- temp;
        }
        c1 <- runif(n=1);
        if(c1 < cross_prob){
            cross_with                <- agents * floor(runif(n=1)) + 1;
            temp                      <- population[cross_with, 14];
            population[cross_with,14] <- population[agent, 14];
            population[agent, 14]     <- temp;
        }
        c1 <- runif(n=1);
        if(c1 < cross_prob){
            cross_with                <- agents * floor(runif(n=1)) + 1;
            temp                      <- population[cross_with, 16];
            population[cross_with,16] <- population[agent, 16];
            population[agent, 16]     <- temp;
        }
    }
    return(population);
}

Crossing over is followed by mutation.

# Mutation (third brown box)
# Note that negative values equate to zero -- there can be a sort of threshold
# evolution, therefore, a la Duthie et al. 2016 Evolution
mutation <- function(population, mutation_prob){
    mutation_prob <- mutation_prob * 0.5;
    for(agent in 1:dim(population)[1]){
        c1 <- runif(n=1);
        if(c1 < mutation_prob){
            population[agent,8] <- population[agent, 8] - 1;    
        }
        if(c1 > (1 - mutation_prob) ){
            population[agent,8] <- population[agent, 8] + 1;    
        }
        c2 <- runif(n=1);
        if(c2 < mutation_prob){
            population[agent,10] <- population[agent, 10] - 1;    
        }
        if(c2 > (1 - mutation_prob) ){
            population[agent,10] <- population[agent, 10] + 1;    
        }     
        c3 <- runif(n=1);
        if(c3 < mutation_prob){
            population[agent,12] <- population[agent, 12] - 1;    
        }
        if(c3 > (1 - mutation_prob) ){
            population[agent,12] <- population[agent, 12] + 1;    
        }         
        c4 <- runif(n=1);
        if(c4 < mutation_prob){
            population[agent,14] <- population[agent, 14] - 1;    
        }
        if(c4 > (1 - mutation_prob) ){
            population[agent,14] <- population[agent, 14] + 1;    
        }  
        c5 <- runif(n=1);
        if(c5 < mutation_prob){
            population[agent,16] <- population[agent, 16] - 1;    
        }
        if(c5 > (1 - mutation_prob) ){
            population[agent,16] <- population[agent, 16] + 1;    
        }                 
    }
    return(population);
}

The function below ensures that the costs of agents actions are not over the total budget. If they are after crosover and mutation, then actions are randomly removed until they are within the costs.

# Need to incorporate selection on *going over budget* and *negative values*
constrain_cost <- function(population){
    for(agent in 1:dim(population)[1]){
        over <- 0;
        if(population[agent, 8] < 0){
            population[agent, 8] <- 0;   
        }
        over <- over + (population[agent, 8] * population[agent, 7]);
        if(population[agent, 10] < 0){
            population[agent, 10] <- 0;   
        }
        over <- over + (population[agent, 10] * population[agent, 9]);
        if(population[agent, 12] < 0){
            population[agent, 12] <- 0;   
        }
        over <- over + (population[agent, 12] * population[agent, 11]);
        if(population[agent, 14] < 0){
            population[agent, 14] <- 0;   
        }
        over <- over + (population[agent, 14] * population[agent, 13]);
        if(population[agent, 16] < 0){
            population[agent, 16] <- 0;   
        }
        over <- over + (population[agent, 16] * population[agent, 15]);
        while(over > budget_total){
            affect_it <- 2 * floor( runif(n=1) * 5); # Must be a better way
            cost_col  <- affect_it + 7;
            act_col   <- affect_it + 8;
            if(population[agent,act_col] > 0){
                the_cost  <- population[agent, cost_col];
                population[agent, act_col] <- population[agent, act_col] - 1;
                over <- over - the_cost;
            }
        }
    }
    return(population);
}

After mutation, a fitness function checks the fitness of each agent. This will eventually be a complex function balancing actions according to costs and utility, but for now I’ve just given the agent with the highest 16th column the highest fitness (i.e., maximise helpem).

# Fitness -- this is the most challenging function
# Just as proof of concept, let's just say fitness is maximised by helpem (16)
strat_fitness <- function(population){
    fitness <- rep(0, dim(population)[1]);
    for(agent in 1:length(fitness)){
        fitness[agent] <- population[agent,16];    
    }
    return(fitness);
}

Finally, we have tournament selection, which also replaces the original population. Tournament selection proceeds by randomly selecting four agents from the population, and passes the agent out of the four with the highest fitness to the next population. This kind of tournament selection seems effective, and will be more efficient in c that some other tournament types (e.g., best 4 out of 10), I think.

# Tournament selection on population
tournament <- function(population, fitness){
    agents <- dim(population)[1];
    traits <- dim(population)[2];
    winners <- matrix(data = 0, nrow = agents, ncol=traits);
    for(agent in 1:dim(winners)[1]){
        r1   <- floor( runif(n=1) * dim(winners)[1] ) + 1   
        r2   <- floor( runif(n=1) * dim(winners)[1] ) + 1   
        r3   <- floor( runif(n=1) * dim(winners)[1] ) + 1   
        r4   <- floor( runif(n=1) * dim(winners)[1] ) + 1   
        wins <- r1;
        if(fitness[wins] < fitness[r2]){
            wins <- r2;
        }
        if(fitness[wins] < fitness[r3]){
            wins <- r3;
        }
        if(fitness[wins] < fitness[r4]){
            wins <- r4;   
        }
        for(trait in 1:dim(winners)[2]){
            winners[agent, trait] <- population[wins, trait];
        }
    }
    return(winners);
}

We can therefore simulate the genetic algorithm with the following code, which simulates 30 iterations (i.e., generations) of crossover, mutation, and selection.

mean_fitness <- NULL;
clone_seed   <- 11;
budget_total <- 100;
focal_agent  <- 2;

# Add three agents, representing three stake-holders, to the utility array
a0 <- c(0, 0, 0,  0, 0, 0, 0, 0,  0, 0,  0, 0,  0, 0,  0, 0);
a1 <- c(1, 0, 0,  2, 0, 0, 8, 5, 30, 0, 20, 0, 10, 0, 10, 0);
a2 <- c(2, 0, 0, -1, 1, 1, 0, 0, 50, 0,  0, 1,  1, 2,  2, 1);

UTILITY <- rbind(a0, a1, a2);

population <- matrix(data = 0, ncol = 16, nrow = 100);

population <- initialise_pop(UTILITY = UTILITY, focal_agent = 2, 
                             population = population);

mean_fit   <- NULL;
iterations <- 30;
while(iterations > 0){
    population <- crossover(population = population);
    population <- mutation(population = population, mutation_prob = 0.2);
    population <- constrain_cost(population = population);
    fitness    <- strat_fitness(population);
    population <- tournament(population = population, fitness = fitness);
    mean_fit   <- c(mean_fit, mean(fitness));
    iterations <- iterations - 1;
}

The plot below shows that the algorithm converges on the best fitness strategy (mean_fit) quite rapidly.

Note that a strategy fitness of 10 is the highest possible because agents have a total budget of 100 and each helpem costs 10 from this total budget. The rapid convergence is encouraging – the time taken from start to finish for this genetic algorithm is only 0.182 seconds in R (note also, that it found the solution in half the time), and will of course be much, much faster in c. Things will get slower as fitness functions become more complicated, and convergence might take a while given optimisation of multiple things.

Also note that the genetic algorithm will need to be run for multiple agents, slowing the processes down.

Re-structuring the UTILITY array

I’m now noticing that there is an error in the genetic algorithm as applied to G-MSE. While the algorithm shows a proof-of-concept well, agents aren’t actually individual rows in UTILITY, they’re list elements made up of data frames. Hence, The code that I just constructed needs to be applied not to an individual row, but to lists.

I think that the above point might be a good excuse to improve upon the data frame itself, and specifically to incorporate costs in a more effective way, then improve upon the algorithm. It’s always important to keep in mind that the goal of the genetic algorithm is to teach agents to learn to maximise their own utility. Different agents will do this in different ways, so we need to keep everything broad – one idea might be to incorporate everyone’s UTILITY in the utility list; this could help out with the manager too. So now the data frame would like like the below.

agent type1 type2 type3 util cost_util cost_h helpem
0 1 0 0 2 101 101 0
0 2 0 0 0 101 101 0
1 1 0 0 2 0 10 0
1 2 0 0 -1 1 2 1
2 1 0 0 0 101 101 0
2 2 0 0 1 101 101 0

Something is still not quite perfect yet. Note that I’ve added a cost_util, which could be the cost of lobbying another stake-holder, or manager. We could then see stake-holders using some of their budget to affect manager utilities. Each agent has its own array too – so the costs need to be uniquely reflecting the cost of agent index on affecting another agent’s action or utility.

Maybe this is the wrong way to go – first two columns might be the utility then costs of the focal agent (as partially imposed by other agents), where subsequent rows could just be costs for imposing on all other agents? The array could then look something like the below. Note that values of Inf, in the code, should just be some value that is higher than the cost of any agent, making it impossible that such values can be altered.

agent type1 type2 type3 util u_loc u_land movem castem killem feedem helpem
-2 1 0 0 2 1 0 0 0 0 0 0
-2 2 0 0 0 1 0 0 0 0 0 0
-1 1 0 0 2 1 0 1 1 2 3 3
-1 2 0 0 0 1 0 5 20 12 5 10
0 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
0 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
1 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
1 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 1 0 0 Inf Inf Inf Inf Inf Inf Inf Inf
2 2 0 0 Inf Inf Inf Inf Inf Inf Inf Inf

Note the agent number -2 refers to the utility values of the focal agent – in its rawest form, what does the agent want or value. This is defined for each type of resource (type1). In the above, for example, if we consider resources where type1 = 1 to be crops and type1 = 2 to be geese, then we might have a farmer represented (note – the farmer likes crops, but is neutral on geese per se – I’m just assuming that geese are fine as long as they don’t affect crop production). The farmer can also specify whether the utility of each resource is dependent upon its location u_loc being within its view, or some other range – I’m a bit nervous about this column as I fear that it might constrain the kinds of questions that can be addressed. It might be good actually make this any natural number, rather than a TRUE or FALSE, then have utility attached to a natural number on some layer of the landscape, so that one layer of LANDSCAPE can be the number of who owns it (zero for manager); a -1 could always just code for within view. The u_land specifies whether or not utility is attached to the value of some landscape layer (e.g., perhaps representing quality of land, or production in some cases). And finally the variables movem, castem, killem, feedem, and helpem are all actions – what the focal agent does (initialised at zero above).

The agent number -1 is identical for all but the last five rows, which refer to the cost values for affecting each of the actions; cost is drawn from some total budget. Finally, the remaining rows are the costs for changing all other agents’ costs with respect to all resource types (note that before now, we’ve effectively only had the first four rows – this adds to things). This means that the focal agent might, potentially, increase or decrease the cost of a different stake-holder performing an action – this will mostly be applied when the manager is the focal agent, changing, for example, how much it costs for stake-holders to scare or hunt resources. But the model allows stake-holders to affect each other’s costs too, if this is useful. Focal agents might also try affect others utility values; for example, a stake-holder might pay some cost to try to close the gap between their utility value for a particular resource and the manager’s (or rival stake-holder’s) value (i.e. ‘lobbying’). Note that a focal agent’s costs are represented twice, once where agent = -1, and once where agent equals the agent’s natural number. However, the latter represents the ‘cost’ of changing its own cost which I’ve outlawd by setting it to Inf. There are potentially some other uses for this redundancy. It might be useful in the future if we want to simulate negotiations, so an agent has their original cost/utility values, and a copy of what they are after they have been altered in some way.

The data frame above is therefore one of three data frames (one for each agent), each of which is an element in a list. The data frame above shows the utilities, actions, and costs of a focal agent, and the costs of affecting other agents’ costs. In the above, affecting other agents costs is always forbidden, so these values are all Inf. A manager should be able to adjust these costs to enact policy – for example, by outlawing killing of resources.

Update: 6 FEB 2017

Modelling crop yields and compensation

The compensation scheme and importance of government funding that Saro has noted in her summary of geese and farming conflicts leads me to believe that some direct form of compensation needs to be included in the genetic algorithm. Adjust the government funding is simple enough – we can just change the budget of the manager. How compensation – and farming more generally – will work is a bit more complicated. Here are three ways that I see could work:

  • Use a natural number on the LANDSCAPE to represent maximum farm yield. Reduce this yield for each organism on the land – assign one type of stake-holder to a patch of land using type as an index of an individual farmer (note that the extra rows in the AGENTS array might need to be ignored for determining agent actions).
    • Good: Conceptually easy, easy to code, low memory
    • Bad: Might be less flexible in interacting with other resources
  • Make crop land a RESOURCE, thereby allowing it to interact more directly with other resources.
    • Good: Flexible – could have multiple crops in the same location and interaction with other resources would be straightforward
    • Bad: High memory; the RESOURCE array could get quite big, and more loops would probably be required to manage it.
  • Just directly simulate compensation and crop growth by affecting the utility and cost columns in the UTILITY array.
    • Good: Very easy to implement, fast
    • Bad: Not very flexible – crop production and management is not concrete; probably some hidden assumptions.

Of course, there’s no reason that all of these options can’t be implemented depending on the situation; they aren’t mutually exclusive in the code. I’m inclind to try the first option as a default. Adding a real number to each landscape cell could represent expected crop yield, and this number could change depending on the presence of resources and the actions of farmers. Note that the landscape cells are already initialised with a real number that was meant to represent types of landscape – this can just be changed to a real number that represents crop yield. The files resource.c and observation.c already read in c as an array of type double; it’s really just a matter of using the landscape that is already available.

Crop yield could therefore affect utility – in that some utility value is assigned to (and multiplied by) the value of each cell. Presence of an organism could decrease this value (sidenote: we’ll need to think about order of operations in the model). Compensation could directly off-set the loss of utility. So we could take the data frame from Friday:

type1 type2 type3 util u_loc u_land cost_m movem cost_c castem cost_k killem cost_f feedem cost_h helpem
1 0 0 2 0 0 8 5 30 0 20 0 10 0 10 0
2 0 0 -1 1 1 0 0 50 0 0 1 1 2 2 1

The column util could just be the direct effect that resources of type1 have on the agent, u_loc could be whether or not utility is affected by location, and if it is, then u_land could be its effect on the landscape layer (for now, there is only one layer of landscape values, but another column should probably be added to specify). Hence, representing a goose to a farmer that doesn’t care about geese at all but wants high crop yield could be util = 0, u_loc = 0, and u_land = -1. A farmer that kind of likes geese but also crop yeld could be something like util = 1, u_loc = 0, and u_land = -1. Note that the last farmer likes geese, but does not care where the geese are (u_loc = 0), but does not want their effects on the farmer’s land (u_land = -1). I think that this is probably the right way to go, though optimising fitness given these multiple interests will be a challenge.

It might also be useful to have a compensation column, which managers could affect, though this could be done in other ways too. Also, I think I have interprted util in a couple different ways throughout the notes, so it might be worth having a different column – perhaps target, or something that identifies the target value that affects util in some way. If a resource is at carrying capacity, for example, we might want some mechanism in the model by which stake-holders no longer want it to increase.

Given Saro’s notes, it might also be worth just allowing an option for a traditional \(2 \times 2\) payoff matrix too. This would allow another use of a genetic algorithm – to simulate the barganing process as in Tu et al. (2000).

Minor error correction

At 12:50, I have corrected a typo in the function ind_to_land(), which was making it impossible to plot non-square landscapes correctly.

Update: 3 FEB 2017

Late night (or early morning) idea

One way to solve the first major concern at the end of yesterday’s notes could be to do the following: Create a new list strategy in which each element of the list corresponds to an individual agent, strategy[[agent_i]]. The element itself would be a data frame that is eight columns wide and 1000 (ish) rows down. The first six of eight columns would identify a particular kind of individual – an agent or a resource. The columns would indicate a type as follows:

  1. Agent versus Resource (which array)
  2. IDs
  3. type1
  4. type2
  5. type3
  6. xloc (perhaps +/- view?)
  7. yloc (perhaps +/- view?)

In the above, any negative values would indicate all individuals (i.e., disregard ID if 1 is -2) In all cases, some value The remain columns would define:

  1. The column to affect of the particular type identified in the first six columns (7+ of the resouce or agent array)
  2. The amount to affect it by

It could be a messy optimisation procedure, but this would ensure that agents could pinpoint which individuals to target, what to target, and by how much. The cost of doing so could be factored in perhaps by either: only enacting rows until actions are below the cost or constraining all actions to have an effect that is lower than the cost (e.g., by normalisation). There would need to be some error checks in it, but this could be the most flexible way to handle the search algorithm. The new structure would be either a list of data frames or (perhaps interpreted in c) a 3D array that is \(1000 \times 8 \times agent_{number}\).

Another look at the search algorithm

The above proposal seems reasonable, though I now need to think a bit more about the implementation. If doing something is the consequence of an if statement in the code, then inapplicable values (e.g., non-existent types or locations) simply don’t add to the actions (or cost) – they would just be junk. Alternatively, they could add to the cost, and therefore be selected out in favour of better actions. I kind of like the latter more, for now, because I suspect it would cause convergence to happen more quickly and make the ‘genome’ more readable.

I also think that the entire strategy array should probably be an int, with random sampling during a mutation – note that this is a change from my earlier notes (anything before yesterday) in which I was planning to use double values within the AGENTS array. The previous plan would have just been a mess to implement, and this way we have a separate structure that acts as a ‘genome’ for strategies (I’ve not decided on how the utilities of strategies will be held yet – probably a separate UTILITY array). The first seven columns will always be integers anyway, and it will be faster and more easy to understand if mutation just causes an integer change – just sample with new_mutation = floor( rbinom(0,1) * maxcol). Or, because we don’t have to do this biologically, maybe we come up with something like the pseudocode below:

mutate = rbinom(0,1);
if(mutate < 0.05){
    effect =  floor( rbinom(0,1) * maxcol);
    value  -= effect;
}
if(mutate > 0.95){
    effect =  floor( rbinom(0,1) * maxcol);
    value  += effect;
}

Note that the above avoids calling rbinom more than necesary, and it is fairly agressive in searching. My other thought was to just have mutation cause either value-- or value++. My fear is that this could result in local maxima issues because ‘jumping over’ a type would be impossible. This could be fixed by something like the below:

mutate = rbinom(0,1);
if(mutate < 0.05){
    effect =  floor(mutate * 100);
    value  -= effect;
}
if(mutate > 0.95){
    effect =  floor( (1 - mutate) * 100);
    value  += effect;
}

This avoids calling rbinom more than necessary, and avoids local maxima by letting mutations jump over types. It should also result in a mutation rate of 0.1, with equal probably of incrementing from 1 to 5 or decrementing from 1 to 5. But I’m still not terribly excited about the idea of making it add or subtract from value above, as types are not ordinal.

Here’s another idea, maybe start with the aforementioned utility list/array. This array could be a list of arrays UTILITY[[agent]][row][col] in R and a UTILITY[agent][row][col] 3D array in c in which each agent is a list element or dimension. Rows could exhaust all possible types with their utilities in the final column such that the element corresponding to one agent, e.g., could be:

type0 type1 type2 type3 utility cost
0 0 0 0 0 1000
0 1 0 0 0 1000
0 2 0 0 0 1000
1 1 0 0 2 8
1 2 0 0 -1 12

In the above, there are three types of agents (type0 = 0), which includes the manager (type1 = 0) and two stake-holders (type1 = 1 & type1 = 2). There are also two different types of resoures (type0 = 1), which include type1 = 1 and type1 = 2. Types 2 and 3 are unused for both agents and resources. Each type has a utility – how much the particular agent values the identified agent or resource (though I’m not sure how this will be interpreted for agents, particularly when the identity becomes self-referential). The cost identifies how much expenditure is required to affect the agent or resource – Note that this already creates the problem that different attributes of resources and agents should cost different amounts to affect. At the very least, we don’t want the cost of culling versus scaring to have to be the same. At the same time, we don’t want the ouput of this software to be so messy that end users won’t be able to interpret what is going on – the possibilities should correspond to clear management options, I think (though we also don’t want to constrain the model to force it to do what we presume is best; we want it to find novel solutions, where possible).

Ideally, it would be nice if both managers and stake-holders to potentially affect each other’s costs, but this creates a kind of infinite regress problem – the need for meta-costs for how much it costs to affect another agent’s cost; this is probably too much. Instead, maybe costs are a function of the manager’s utility, and that all lobbying occurs on utility. This avoids the ‘cost of costs’ problem – the only thing we lose is that one stake-holder might not be able to directly affect how easy it is to hunt or scare, or do anything else – though they might still affect each other’s utilities? Even this seems to get a bit too complex.

Maybe theres a starting point that gets the model working but also leaves remove for further development. What if the UTILITY array looked more like this:

type1 type2 type3 utility cost_m movem cost_c castem cost_k killem cost_f feedem cost_h helpem
1 0 0 2 8 5 30 0 20 0 10 0 10 0
2 0 0 -1 0 0 50 0 0 1 1 2 2 1

Now in the above, only resources are actually considered in the UTILITY array. There are five basic things that an agent can do to a resource – two ways to benefit it and three ways to have a negative affect on it. Agents can move resources (movem), castrate resources (castem), or kill resources (killem). And agents can feed resources (feedem) or help resources (feedem). Doing each of these things comes with an associated cost. It’s important not to take these categories too literally, but for now they could loosely correspond to:

  1. Scaring by affecting xloc and yloc (movem),
  2. Castrating by affecting birthrate parameter (castem)
  3. Killing by affecting the mortality parameter (killem)
  4. Feeding by affecting the birthrate parameter (feedem)
  5. Or helping by affecting offspring parameter (helpem)

I’m not sure how this last one will work yet. This sacrifices some of the generality of the code, but in the context of what G-MSE is for, I don’t think that we lose much, and it’s easy to see how we could add columns to UTILITY later as necessary. Then, following the diagram from yesterday, when mangers use the genetic algorithm, they could affect their own cost columns and everyone elses as a function of their own utilities and the population sizes of each resource. The costs would then be update for the stake-holders, who could adjust their own parameters accordingly to maximise their utility.

Note that the index of each array in the list UTILITY will correspond to the ID in the AGENT array. This could be useful, if we want, for example, to eventually let stake-holders affect one another. The first agent is also always the manager (and it’s hard to see a situation where we have more than one, but even if so, we can always have one head manager), so stake-holders can lobby the manager by indexing the zero index of UTILITY with ease.

Focusing first on the stake-holder genetic algorithm

The genetic algorithm can be fairly straightforward, and (I think), fairly efficient given the data structure above. All of the five aforementioned columns can be mutated by any integer value, and unlike the case in which types were random, the numbers are ordinal so that the following code isn’t too much of a problem:

mutate = rbinom(0,1);
if(mutate < 0.05){
    value--;
}
if(mutate > 0.95){
    value++;
}

So the the code would do the following in c for a single agent:

  1. malloc a 3D array that is 100 deep, and copy the agent’s entirey UTILITY data frame each time.
  2. For 90 of those copies, randomise (-cost to cost, perhaps?) all of the five aforemened values.
  3. Loop through the 100 deep indices, looking at each of the five columns and swap a column with that of another randomly chosen index with a certain probability.
  4. While still looping also check to see if any mutation occurs and if so, then increment or decrement the value – actually, it might make more sense to mutate first then decide whether or not to swap. Note that this might cause a situation where the mutation rates among loci differ (one might get mutated, swapped, then mutated again), but since we’re not trying to faithfully replicate a biological process, I don’t think this is a problem – it might even add more desirable variation.
  5. Check the fitness of each of the 100 indices, store them in a vector.
  6. Tournament selection on indices using the fitness.
  7. Replace the array so that 100 of the most fit offspring are in it.
  8. Go back to 3 – Do this at least 20 times before checking to see if the mean fitness increase has increased by some amount, if not, repeat 20 more times.
  9. If 500 times have happened, just use the highest-fitness index as the new agent.

Given the above, we need to update the internal structure of the genetic algorithm to the below:

The result is simpler, and therefore it should be faster and easier to implement. I’m hoping that it will be possible to code this to be as flexible as possible – enough to really allow for some complex interactions with agents affecting agents in different ways, but I think this will have to be addressed when I actually start writing the code. For the moment, I think the above is a good balance for stake-holder genetic algorithms. I also think that the spatial effects will also better emerge organically through restricting stake-holders to affecting only their own cell (or the view around their cell). If we let the genetic algorithm try to evolve to find the locations where an agent should do something, I think it would slow down considerably. We can always use view = 100, or turn off spatial implementation, to have stake-holders affect across the whole region, and it’s hard to see why we would want stake-holders to arbitrarily pick out parts of the map to care about (if it’s caused by the presence of another resource, then the algorithm should find the right actions based on the resource’s utility).

Looking specifically at the fitness function

Let’s assume that costs (‘policy’ in the updated figure) are fixed for now, and the genetic algorithm takes these as a given. Then, instead of accounting for every other agents actions (like the manager might have to do – figure this one out later), each stake-holder could just check to see how their actions affect the abundance or local density of each resource. In fact, why don’t we add another column:

type1 type2 type3 util u_loc cost_m movem cost_c castem cost_k killem cost_f feedem cost_h helpem
1 0 0 2 0 8 5 30 0 20 0 10 0 10 0
2 0 0 -1 1 0 0 50 0 0 1 1 2 2 1

For space, utility has been shortened to util, but I’ve also added a column u_loc, which codes for whether or not the utility of the resource depends on that resource being ‘local’, in whatever sense this could be relevant. For example, a zero could be a simple FALSE, while a 1 could be TRUE in the sense that resource do not affect utility or therefore fitness unless they are on the same cell as the agent, or within view. Better, values within this column might correspond to different definitions of ‘local’ – It could mean utility is important when the resource is on any cell of the same type of agent, or on a cell with a particular landscape property.

Now, to implement this, we can’t really just use the RESOURCE array, because neither managers nor stake-holders should have access to it. We have to use either the observation array (or summary statistics from it) or the agent array. When stake-holders have an estimate of how many resources there are, their utility is affected and they can act in a certain way. Note that I’ve placed some odd utility values in the rows above, but maybe they should actually be much different, reflecting the ideal number of resources – perhaps more utility values are needed too:

I wonder if a second utility parameter is needed so that utility doesn’t not need to increase or decrease linearly with utility. Perhaps up to 3-5 util and u_loc values are needed – unneeded ones can be ignored later.

The above could see stake-holders switch strategies when satisfied. It might also be worth having some sort of a dummy resource, or a reluctance to spend cost if not necessary – stake-holders have other things to do, and if everything is working fine, then they don’t need to do anything in the model, including use up costs, and perhaps there should be some pressure against it anyway.

In any case, I could see two types of information about resources being most relevant for stake-holders decisions, thereby affecting the fitness function:

  1. How many resources of each type does the manager claim there are globally?
  2. How many resources do you see locally (e.g., through anecdotal).

I can’t, offhand, thinking of why anything else would be absolutely necessary – not as a starting point, at least. Stake-holders could use one or both estimates to make their decisions, so those values and UTILITY could be read into the fitness function. The fitness function would then estimate how resource abundances would change as a consequence of the stake-holder’s action, with higher fitness being awarded to actions that match utility values.

To make things even more complicated, it might be necessary to record the UTILITY actions over time – eventually, stake-holders should be able to correlate them with changes in utility (both as a consequence of their own actions and other stake-holder responses) and use this in the genetic algorithm. For now, I think establishing a record-keeping method is enough. I’ll worry about how to incorporate the game history into decision making after I have a simpler working model.

Additional thoughts

While this is going on, I’ll want to keep in mind the four categories of Tu et al. (2000), which I mentioned on 30 JAN 2017. These define conflict by different utility functions. This might be especially relevant because next I’ll need to figure out how the manager is going to strategise given both stake-holders and resources.

For next week: Consider writing a proto-type for the genetic algorithm in R. Make sure that it works and trouble-shoot any unforseen issues before trying to write the whole thing in c

Update: 2 FEB 2017

I have now started using GitHub projects to better organise G-MSE development. This appears to improve the workflow a bit better, which is good because the workflow is probably going to get more complicated once I starting coding the genetic algorithm and integrating it into G-MSE. Below is an updated overview of how I expected G-MSE to work in light of the genetic algorithm approach.

Note that the Genetic algorithm (in green above) is being used twice, once by managers and once by users (stake-holders). What is happening here is that managers are taking in the observation model and updating their management plan by tapping into the genetic algorithm. Likewise, after the managers do this, the stake-holders respond by also tapping into the same genetic algorithm to update their off-take. The flow of the model here needs to be planned carefully. I think the best places to start after the observation model is by reading in the observation data into the manager model. Managers can then use the observation data for analysis in manager.R, which will call manager.c (I don’t think that this will need to be a very intense program, but using c for the analysis will at least allow us to build upon the code in a complex way, if need be). The output from the manager model should then be the kinds of summary statistics that are relevant for policy-making. This could be as simple as an estimate of population size or density, or perhaps the abundance of different age classes. To keep things flexible, I think that a new object is needed – a new list or array – as output that is relevant for policy making. Perhaps if the output is just a scalar, then this can be interpreted as an abundance or density estimate, while if it is a vector, than it can represent ages or types?

Perhaps what we really want to do is tidy up the observational data? The manager.c function could serve as a special type of apply or tapply R function, which takes in different column arguments and then calculates summary statistics for the specified column. So, if no column is specified, then manager.c just estimates the population size or density of the entire population based on the observed data provided (using the mark-recapture or density procedures currently implemented by the chapman_est() and dens_est() R functions written in gmse.R). Whereas, instead, if we specify a type column, then the estimate is done for all estimates of that type. If a type and an age column, for example, are specified, then estimates for all unique combinations of type and age are performed. The output is then an array with fewer rows – each row corresponds to an averaged out resource array where some columns don’t mean anything (e.g., ID). This will require a lot of error checking, as bad user input could cause the manager to do odd things – in fact, I think only column types and age (4 columns total) should be allowed to be uniquely estimated; even that is a bit much. The zero column of the array could then stand for an estimate instead of an ID in the output. For the simplest of cases where only one resource is being estimated, therefore, the relevant array manage would have a zero index manage[0][0].

There is a bit of fuzziness here that should perhaps be better cleared out with planning. Currently the observation.R function requires a type and category type to be specified. In other words, the observation function has been constructed (deliberately) to look at one type at a time. This is probably good – no need to change it, but for a simulation where multiple resources are being managed, observation() will need to be run multiple times. I think that the best thing to do in this case is to store observation data frames in a list, such that OBSERVATION[[1]] stores type 1 and OBSERVATION[[2]] type 2. This will keep the high-level resource types separate; within the list elements, other types (e.g. sex) and age can be managed, and we can pass each list element to the manager model separately to produce some output. The output can also be stored as a list MANAGED[[1]], with all list elements subsequently being merged into one data frame – ideally in c, but possibly before in R – to go into the genetic algorithm. Again, a lot of the time all of this will just end up as one vector, with really just one number being of interest, but we want the model to stay flexible so that we can deal with eventual demands. The end result will be all of the information the manager is actually going to use to make decisions – which may then be separated by resource type(s) and ages.

G-MSE will run an independent observation model for each type of resource of interest. The output of observations will be respresented by a list of arrays in R. Each array in the list will then independently be run by the manager model, each run of which will return an array of summary statistics that will be added to a new list. The list (or an array of concatenated elements – merged data frames) will then be read into the genetic algorithm. Additionally, the manager model should also affect the landscape list/array too – this will give the option of using resource distribution in making decisions; a simple increment for each time a resource is observed on an x y location should do.

Difficulties remain with the genetic algorithm

This brings us to the genetic algorithm itself. Once the ‘’managed’’ data has been finalised by the manager model (or after the manager model has been run for each reasource), then the genetic algorithm will take the managed data, the agent array, the landscape, and the parameter vector and output a new agent array in which elements of the manager’s row have potentially been changed. This models the process of the manager potentially receiving summary statistics, information about stake-holders, distributions of resources and stake-holders on the landscape (along with other landscape-level properties), and other globally relevant parameters and potentially adjusting their policy and even interests accordingly. Actions, interests, and costs are encoded in the agent’s rows:

  • Actions are things that managers can do themselves – acting like a stake-holder, essentially.
  • Utility amount is what the manager values and form the bases of action and cost decisions
  • Costs affect how much of a user’s budget needs to be spent to do some action (Note that this should probably fit into the lobby idea from yesterday, as a specific instance of the more general idea). The manager can, for example, effectively ban hunting by making it extremely costly – or just coerce stake-holders not to hunt by increasing its cost. The manager can also encourage fences or scaring by decreasing their cost; a cost of zero is essentially the manager offering to do this for free, as if it benefits the stake-holder, the genetic algorithm will find it. A negative cost means that the manager pays the stake-holder to do it.

I’m not sure if we need managers to be able to have their own actions given the point about zero costs, but I think leaving this option open is easy, even if it’s rarely used. Having three blocks of column types (actions, utilities, and costs) also mght allow stake-holders to affect each others costs, potentially, so it’s worth planning this way. I think the best way to do this is to probably have the dimensions of AGENTS be adjustable, based on how many columns are needed. Four columns would actionally be needed to optimse

All of these values can be optimised as a consequence of data or other agents (for example, a high population size might cause the manager to allow more resources to be hunted or scared – but more or angrier stake-holders might also causes this to happen). The interests of managers, obviously, should change before the actions, as how they act will depend on what they are interested in.

The actions of managers, as changed through the genetic algorithm will be directly interpreted by users as policy. Hence the relevant row(s) of the agent array will feed into the user model. These rows will affect the costs of stake-holder actions (recall that each stake-holder has a total budget). The stake-holders react to these policies and simultaneously adjust their own actions (and potentially utility).

A major challenge here is the sheer number of things that an agent could potentially do, and getting all of those options into the AGENTS array. Things that a set of columns is going to have to specify for an agent include:

  • The utility associated with each type of resource
  • An action to directly affect each type of resource
  • The cost of taking a particular action on a resource

It seems as though there should be an action to directly affect the landscape in some way. E.g., fencing – though I suppose a fence could just be a type of resource in the RESOURCE array. The problem is that if we allow users to directly affect resources or the landscape, then there has to be some sort of switch in the code to allow this. If, however, things like fence or crop is a resource, and therefore in the resource array, then agent actions could be restricted to subtracting or adding resources by adjusting the resource array. Then again, if a particular resource (e.g., a fence) is not in the resource array, then there is nothing to adjust. Of course, code-wise, is there really a difference between a fence and scaring? They both adjust the x y location of a resource. Maybe we really don’t need much to do with the landscape – just work with a displacement cost, leaving how resources are displaced to be abstract and interpreted by the end user of the software.

Just working with the idea that displacement is all the same (maybe leave some hooks in the code for adjusting the landscape), what we really need to know then is the following:

  • How much utility each agent gives to each resource
  • What columns can be tweaked in the agent and resource arrays
  • How much does each tweak of a column cost?

Complicating things even more, costs might be different when adding or subtracting values from resources – e.g., it might be not so costly to greatly decrease the birth rate of an organism, but increasing it by the same amount should be near impossible. I’m not yet sure the best way to structure the arrays, or anything else, to handle all of these complications, so this will be a major project in the near future (added to GitHub projects).

Internal structure of the genetic algorithm

Once I resolve the issue of how to structure the data so that the appropriate values on AGENTS, RESOURCES, and possibly even LAND arrays can be tweaked through the genetic algorithm, the internal structure will look something like the below.

Essentially, the relevant row from AGENTS will be brought into the genetic algorithm; ten copies of the row will be added to 90 copies with random numbers. The fitness of all 100 copies will be checked in the fitness function – the fitness function will adjust the resource and agent values according to the copy being assessed, and then some function needs to be called to predict what will happen to resouces and (possibly) agents and the landscape. Originally, I thought that this might be accomplished using the resource function itself, but I don’t think this is best anymore – mainly because it misses an error step (i.e., agents shouldn’t be able to perfectly predict effects on resources). Perhaps it should just loop back through the observation data? Or the resources within view? It could then consider the effect of resources being removed on the agents own utility. It could be something as simple instead as: directly scare or remove resources from location – does this decrease undesired resources from the location? Then lobby the manager: does the change in manager policy affected undesired resources form the location? In other words, should we just have agents look at the direct and immediate effects of what they’re doing on a particular location. In the case of the manager, the location could be the entire landscape, perhaps incorporating birth and death rates into the fitness function?

For tomorrow, it might be worth just working through some of the things that will definitely need to be coded. Alternatively, there are two definite challenges that remain for using the genetic algorithm:

  1. How will agents actually have their strategies tweaked – what is the structure of the array that will identify parts of the resource and agent arrays that can be change, and at what cost?
  2. How will agents estimate their own fitness, especially given that such judgements will come with error and be limited in scope.

Update: 1 FEB 2017

More thinking about agent fitness functions

While I have a general idea of how to implement the genetic algorithm now, how agents make decisions and act on them is still not clear from a modelling perspective, so more critical thinking needs to be done here before any coding. Unlike the resource and observation models, I also think it might be better, given the complexity, to write a prototype of the code in R to show proof of concept before optimising the code in c. One thing that I think every agent needs will be some sort of total budget (note, this budget is not necessarily currency – at the moment, I’m thinking about it more like a time budget; it’s also possible we’ll need two budgets, giving the option of one used explicitly for time and the other for currency, but I’m keeping it simple for now). This will give us the option of constraining agents’ behaviours if desired so that agents cannot take unrealistic actions to increase their utility, and instead might have to consider trade-offs between different actions. For example, a farmer might be able to either tend crops, scare or kill organisms, build fence, or lobby the manager to increase utility, even though the best thing to do would be all four. A utility function would then determine how a combination of actions maps to utility, and a genetic algorithm could find the optimal behaviour to get the highest utility. We might consider different stake-holders, or different types of stake-holders, to have different total budgets from which to make decisions – these budgets could also be affected by, and affect, RESOURCES.

In the software, what this might look like is each AGENT having the opportunity to modify the following:

  • Other agents’ utility values (lobbying)
  • Other agents’ actions (harassment or intimidation)
  • Resource mortality (hunting)
  • Resource birth rate (castration)
  • Resource location (scaring)
  • Landscape movement ease (fencing)
  • Landscape productivity (farming)

Note that this way of conceptualising the implementation of actions is broad enough to include managers (who might lobby stake-holders, or intimidate them through laws to not do something). There might be other things to consider, but this suggests to me at least seven potential variables that an agent could affect, and agents will need to maximise their utility using a genetic algorithm that tweaks all of these parameter values – ideally it would also take into account past actions of other agents to predict utility.

Agents might also be spatially restricted in their ability to perform any of these actions, thus making strategy dependent upon location (e.g., a farmer might not be able to hunt in certain areas). Here the option to define type2 agents could come in hand – one agent might be represented by multiple rows of the AGENT array with actions for each type type2, but each row having a unique xloc and yloc, thereby representing land owned. Managers could own all land, or just public land if they cannot do anything on stake-holder land. Some agents might have locations of -1 (or lower), meaning that they cannot do anything that requires control of land.

Implementing this type of system could be challenging, and will require that the landscape be a 3 dimensionsal array (or list) with the third dimension or [[layer]] list element representing a different layer of the landscape. I’ll make this an ISSUE later.

The game implementation of G-MSE will require several additional AGENT columns corresponding the bullets above, but also type specifications. These columns would correspond to the G1 to Gn columns suggested earlier. More concrete, they will look something like this:

IDs type1 type2 see2 see3 budget lobby_type_1 lobby_col_1 lobby_val_1 farm_product
0 0 0 0 0 1000 0 0 0 4.5
1 1 0 0 0 100 2 14 11.4 19.6
2 1 0 0 0 100 2 14 16.1 10.3
N-1 2 0 0 0 75 0 12 3.2 12.3
N 2 0 0 0 75 0 12 4.2 8.8

Hence in the above, AGENTS has a budget column, and columns for each type of actions that can be performed. For lobbying, agents can select a type to lobby (lobby_type_1), the column to try to affect (lobby_col_1), and a value to affect it by (lobby_val_). This raises an issue that an agent might want to lobby multiple types of agents, or even multiple columns of multiple types of agents. It might be worth thinking about if there is a better way to organise what parameters can be affected and how. Of course, we can make global changes that change the number of columns in AGENTS, giving all of the columns needed, but maybe there is a better way to do it. Between see3 and budget, columns will include utility values on resource types and perhaps cell types on the landscape (these are what lobb_val_1 affects – agents should also be able to potentially affect each others lobby_val_1, but probably not lobby types or columns – I can’t see how realistic it would be to convince a different stake-holder to do something with the same value to a different type of agent or a different resource). It might just have to be the other agents’ own cells (or all those of an agent’s type, if we represent type as an individual), or the cells within view.

This setup could offer some interesting insights – potentially figuring out the conditions under which it benefits stake-holders to take different actions for themselves (increasing yield, hunting, scaring, etc.), or taking different types of actions (e.g., doing work for oneself, lobbying managers, harrassing other stake-holders). Perhaps it’s possible that conflicts could lead to energy being invested in different types of actions depending upon different costs of those actions (is it easier to lobby the manager or shoot an organism?), or arms races could develop that don’t make a whole lot of sense until we understand the history of the conflict (easy to bother other stake-holder, which causes a retaliation, which escalates, etc., with not much action taken to manage). A key here will be to adequately parameterise how much investment each type of action requires so that we have an idea of the kinds of trade-offs that stake-holders experience. Again, in the absence of these trade-offs, it seems like stake-holders should and would try to do everything – interacting with managers and other stake-holders, adding fences, maximising yield, hunting, etc. But I don’t think such an unlimited model would reflect how people actually budget their time and money (i.e., I don’t think that the assumption that agents have unlimited resources is realistic or useful, and that it would be both more realistic and more useful to allow for limited budget).

To summarise briefly here – what we’re going to do is have those columns in the AGENTS array, then use a genetic algorithm allowing each agent to tweak these values, which will produce the effect of changing other values of the AGENTS, RESOURCES, and LANDSCAPE arrays – constrained to a certain budget – to affect the focal agent’s utility. This requires a utility function for the focal agent to somehow predict the consequences of these actions (perhaps by simulating a run of the G-MSE to predict what happens in the next generation if only their actions were to be applied). This could get computationally intense, but I don’t see a speedier option just yet.

Update: 31 JAN 2017

Implementation of agent strategies

More needs to be planned for the input and output of agent strategies. That is, what variables should and should not be available to managers and stake-holders when optimising strategies through the genetic algorithm, and how should these variables be incorporated into a strategy that causes agents to take one or more actions? Note, there are plenty of resourecs for incorporating multiple objectives into genetic algorithms (Fonseca and Fleming 1993, e.g., 1998; Horn et al. 1993; Jaszkiewicz 2002), so agents can be complex in their utility functions. What I’m talking more about is what do agents get to consider when optimising to maximise their own utility functions? And what kinds of actions do stake-holders engage in upon formulating a strategy? Once the answers to these questions are clear, it will possible to start the process of coding manager and user functions. Some potential things to consider as variables affecting manager and stake-holder strategies:

  • Observed density of resources produced by manager reports. This could as reported from the observation function (or as statistics from an analysis on these observations), or perhaps something directly from the RESOURCE array if resources are meant to be known (e.g., if hunting licenses or crop yields are modelled as resources).
  • Direct knowledge of and interest in some spatially restricted area, perhaps representing a stake-holder’s farm or other property – or their common hunting grounds. Stake-holders utility functions might therefore be particularly affected by the distribution of resources and any management policy that explicitly considers geography. I’ve not seen models that do anything like this, but given the importance of local interests and therefore spatial dynamics in ConFooBio case studies, and in conservation more generally, I think spatial distribution needs to be included.
  • History of resource abundance distributions, perhaps placing diminishing weight on older resource abundances and distributions.
  • History of previous management policies and stake-holder actions.
  • Uncertainty associated with any of the previous bullet points.

Some things to consider as potential actions (outputs) of manager and user functions:

  • A general approach to management; i.e., will the manager allow users to hunt resources? Will they protect areas of landscape so that some resources cannot move into these areas? Will they cull resources themselves, or castrate them (birth_rate = 0), perhaps at some cost that should be considered explicitly? There are probably some high-level decisions to consider here, and it would be ideal to have many possibilities to choose from.
  • Will manager decisions be biased by resource age or type (e.g., sex), or by resource location? This could get very complex, but space is going to be important.
  • How are stake-holder agents going to respond to management decisions? Will they simply act to maximise their own utility within the rules set by management, or are these rules break-able if the cost to doing so is sufficiently low?
  • Can stake-holders ‘lobby’ managers, or other stake-holders, affecting their utility values and therefore their utility functions?
  • Are stake-holders limited by time or costs in some way? Should each be given some sort of budget (or should mangers?) that allows them to take certain actions, hence imposing a trade-off on things that they can do? Should this budget be affected by resources (e.g., crop yield)?

Neither of these lists are exhaustive, and the input and output options could get very complex. I think that this is okay as long as it doesn’t cause the program to be too inefficient, intractable, or unrealistic. We want the options available to managers and stake-holders to reflect those of real systems as much as possible, but it is also worth thinking about whether some options can be safely pruned out of the software, or at least tabled for a later time.

Note, it might be that for most stake-holders, the strategy is really obvious – always act in such a way as to maximise the resources that you’re interested in – no need to optimise much then because the action to take is clear. For managers, however, I can imagine that the decision will always be a bit more challenging, requiring trade-offs between the interests of different stake-holders in determining policy.

Also Note, there should be no need to tell managers what kind of approach to take with respect to policy (though this should be an option, of course). The genetic algorithm should be able to handle this sort of thing – indeed, we might just see very different approaches come out of this model organically as a consequence of different resource abundances and distributions and stake-holder interactions. For example, between time steps, we might see managers switch from establishing a global hunting quota to prohibiting hunting and constructing fences (protected areas of landscape) instead; all we need to do is allow some sort of switch to affect manager’s general approach, then incorporate this switch variable into the genetic algorithm.

Use of genetic algorithms in ecology and evolution

Hamblin (2013) has a nice methods paper on the use of genetic algorithms, focused especially on a ecology and evolution audience. He cites a highly relevant book by Sean Luke, which includes a general introduction to genetic algorithms, but also chapters on coevolution (competing strategies), multiobjective optimisation, and policy optimisation (Luke 2015). Luke (2015) is particularly cited for the a quote on the utility of metaheuristics (which includes genetic algorithms), which I’ll just include here in full:

‘’Metaheuristics are applied to I know it when I see it problems. They’re algorithms used to find answers to problems when you have very little to help you: you don’t know what the optimal solution looks like, you don’t know how to go about finding it in a principled way, you have very little heuristic information to go on, and brute-force search is out of the question because the space is too large. But if you’re given a candidate solution to your problem, you can test it and assess how good it is. That is, you know a good one when you see it.’’

I think this probably applies well to G-MSE. Hamblin (2013) notes that ‘’fitness evaluation’‘is the larges performance bottelneck, so it is probably not worth investing too much energy on optimising the specifics of structure types, or crossover, mutation, and reproduction algorithms; instead, more attention might be paid to making speedy assessments of the fitness (payoffs) of agent strategies. It’s also possible to control recombination (I’m going to call it that sometimes –’‘crossover’’ strikes me as a bit of a confused term from the computer science literature) and mutation frequency through a parameter, so they could effectively be turned off if the parameter were set to zero. Hamblin (2013) notes that mutation type (e.g., random per locus or chromosome) is not terribly important (but it’s worth pointing out that the mutation rates from the literature search in Table 3 are generally much lower than Luo et al. (2014) mentioned – 0.1 still seems reasonable to me), but recombination parameters can be important – one point crossover (i.e., forcing cross-over to happen once for all individuals) can break up good linkage combinations – better to just use uniform (probabilistic) crossover. Population sizes shown in Table 2 of Hamblin (2013) references shows that population sizes around 100-200 (though some much lower, but nearly always less than or equal to 2000) are common, with run lenghts commonly around 500 (1000 is also commmon); reals are about as common as binaries. The most popular selection algorithm is truncation, making up well over half of ecology and evolutionary biology applications of genetic algorithms (Table 1 of Hamblin 2013). To my surprise, truncation selection is not the consensus recommendation for genetic algorithms (and proportional methods are quite bad when multiple strategies are near an optimum, resulting in premature convergence). The recommended selection method according to Correia (2010) is actually tournament selection. The algorithm is described by the quote below:

‘’It randomly picks k individuals from the population and copies the fittest of them to the mating pool. All the k individuals go back to the population. The process is repeated until the mating pool has the desired size.’’

So tournament selection is not probabilistic – in that sense, it is like truncated selection, but there is an extra sampling step that is iterated until the new generation is formed. If k is the same size as the mating pool, the this is effectively truncation selection, so really tournament selection is a generalisation of this that will be useful to code. Hamblin (2013) also cites a book chapter by Syswerda (I’m still waiting on the full text, but the link has all of it) that shows that overlapping generations (termed ‘’steady state’‘in the computer science literature) perform better than non-overlapping (termed’‘generational’’) algorithms. This can be easy to implement – allow selected agents to be placed in a new array, but have mutation and crossover in half. This will fit especially well with G-MSE given that agent strategies might not be expected to change much from one time step (of the model, not the genetic algorithm) to the next. Hence, the optimal solution from the previous time step will be included in next time step, and if nothing changes, then convergence will occur as soon as possible.

It will obviously be important to run diagnostic tests on the G-MSE genetic algorithm. Hamblin (2013) recommends,

‘’Plots of mean population fitness (and its variance) and the fitness of the best individual over time can be important for both diagnostic and reporting purposes; populations that reach a single solution (close to zero variance) within a few generations are a clear sign of premature convergence, likely stemming from a problem in the balance of exploration and exploitation (selection too strong, too little mutation/crossover, population size too small, etc).’’

Testing shouldn’t be too difficult – the results of genetic algorithms can be printed off to a c file, then read in by R and presented in a figure. Hamblin (2013) suggests that genetic algorithms are robust, so it’s unlikely that parameter values choices will cause major problems or affect things greatly, but it’s worth doing all of the quality checks.

Update: 30 JAN 2017

More review of utility functions in genetic algorithms

I’m turning now to the use of utility functions, particularly the use of them in genetic algorithm and games. It appears that these can be found in economics and business. For example, Luo et al. (2014) address an optimsation problem for product demand using a utility function to be maximised and a genetic algorithm. Luo et al. (2014) use fuzzy numbers to model market segements, which include three numbers representing most pessimistic, most likely, and most optimistic values. The authors use conjoint analysis, apparently a technique to figure out what people will pay for, combined with a ‘part-worth utility model’. Utility is modelled as a USD amount, and as a linear function (summation) of the product of weights, part-worth utilities, and a binary variable linking product profiles and product attributes – summations over levels and product attributes. I’m not too worried about the details here, just that total utility is measured in currency in this case, and is calculated as weighted sub-utilities – this kind of logic is relevant for G-MSE.

Luo et al. (2014) then go on to model how utility determines a consumer’s choice of product (essentially, consumers pick the product of highest utility, or none at all). Several constraints on product choice and product attribute-profiles are noted in the model, but the genetic algorithm is implemented using int coding – one chromosome has consumer choice and product configuration sections. Genes within the consumer choice section each represent consumers in a particular market segment – values of these genes correspond to choice of different product profiles (if the value is zero, no product is chosen). The product configuration section contains subsections related to product profiles; each subsection has genes whose integer values indicate the level selected for a product attribute. A population’s gene values are initialised randomly, and individual fitness is calculated using the linear models introduced prior to the genetic algorithm. The authors use a uniform crossover procedure, which might be useful type of algorithm – apparently searching a lot of strategy space, though the costs and benefits of different crossover methods are still unclear to me. Parameters for the genetic algorithm seemed unusual, to me at least; Luo et al. (2014) set a population size of 30, a crossover probability of 0.7, and a mutation probability of 0.4. I would have considered the population size much too low, and the mutation probability much too high, but it’s worth keeping in mind that perhaps these parameter combinations are useful in genetic algorithms even they appear odd biologically – it’s worth experimenting with them, at least (their Figure 2 suggests that my presumed ideal parameters might be on the low side for crossover and mutation). Surprisingly (at least, to me), Luo et al. (2014) conclusded that their algorithm had the best performance (in terms of profit maximisation) when ‘’crossover probability was 0.7 and mutation probability was 0.7’’. The authors used MATLAB to implement the genetic algorithm, so the might have been stressed on computation efficiency – it took 85 seconds for 50 generations on a Pentium IV processor; had the analysis been run in c, it surely would have been faster.

Luo et al. (2014) do note that ‘’low mutation probability (e.g. 0.1) is a good choice’’ for genetic algorithms (as a biologist, of course, this seems very, very high!), but their problem was an exception because the space that needed to be explored was very large. The general take-home I get from this is that the relatively low mutation and recombination rates that we observe as biologists are probably not appropriate for a good genetic algorithm; higher ones should be used by default – of course, this will require citation to reassure reviewers that this is standard practice.

Tu et al. (2000) look at genetic algorithms for negotiations among agents using utility functions, which is exactly the kind of thing that we’re interested in for G-MSE. In addition to being a useful resource for showing an overlap between utility functions and genetic algorithms, this conference proceedings is very interesting in that it has interacting agents, and considers negotiation as ‘’a serch for an optimal negotation outcome with respect to the utility functions of each partner’’ (Tu et al. 2000). I’m not sure if we’ve proposed it this way before, but given that I’ve been conceptualising the manager in G-MSE as a special kind of agent (and, in that sense, similar to stake-holders, but following its utility to make rules rather than work within rules to maximise utility), it would be very interesting if we could use a genetic algorithm and the manager agent’s utility function to optimise negotiation outcomes in addition to management outcomes – or perhaps, define ideal management outcomes as the optimal negotiation outcomes that maximise the interests of stake-holders. We could then use the manager genetic algorithm as a tool in real-world case studies where real or simulated stake-holders play the role of agents.

The use of automated negotiation strategies in online commerce appears to follow a protocol using simple sequential rules and threshold utility values. Tu et al. (2000) created a generic framework for a genetic algorithm, implemented using Java. Three functions needed included mutation, crossover, and reproduction. The algorithm for selection seems a bit unclear. It appears that parent individuals (i.e., reproduction) is chosen based on probability, while selection of offspring simply draws the highest fitness offspring to become the next generation of parents? (’’The parent individuals are chosen with a probability proportional to their fintess and the operators are chosen randomly. From the new population of size \(\lambda\), the \(\mu\) individuals with the highest fitness are propogated into the next generation as parents’’). This isn’t entirely clear.

The method by which agents reach a consensus is really interesting as way that an agreement – e.g., a policy – is reached. It occurs to me that there might need to be some utility in inaction as well – rather, some cost associated with doing something as a consequence of low utility, though I’m not yet entirely sure how this would be implemented practically. Stake-holders have other interests, of course. The authors consider four types of scenarios on which negotiations take place:

  1. No conflict – i.e., both agents have ‘’identical utility functions’’, so the optimal outcome for each is the same.
  2. Pure distributive – Completely opposite utility functions, so a gain for one agent is always a loss for the other.
  3. Simple integrative – Contrary utility functions, but the importance of issues under negotiation are a bit different, so a compromise can potentially be reached.
  4. Divorce – Similar to Simple integrative, but more complex and potentially resulting in many different outcomes.

Tu et al. (2000) tweaked crossover and mutation probabilities to get best results (unfortunately, the exact values they used aren’t reported anywhere in the proceedings, that I can find).

Update: 29 JAN 2017

Sunday musings

As a bit of an aside, I’m thinking about how biological degeneracy might fit in to the efficacy of management policies, given that multiple independent agents might affect a biological system in different ways. I think that degeneracy is interesting and probably greatly under-considered across all biological scales, but it appears entirely absent as a theoretical or practical consideration in conservation and the maintenance of ecosystem function. Man et al. (2016) very recently developed the theory to quantify degeneracy, doing so while simulating networks of complex neoronal systems characterised by non-linearity – specifically comparing degeneracy to redundancy and complexity, which were also defined mathematically. I think there’s a lot of room for theoretical development on degeneracy, and a lot of scope for the application of degeneracy theory to big questions in evolutionary ecology, community ecology, and conservation biology. The modelling in G-MSE is general enough to be potentially able to address these kinds of questions, perhaps using the mathematical definitions introduced by Man et al. (2016) for analysis of simulation results.

Update: 27 JAN 2017

I’ve been doing a bit more literature review on the subject of genetic algorithms, particularly as applied to economic and social-ecological questions (e.g., Balmann and Happe 2000; Ascough et al. 2008). Given the need to keep things computationally efficient while also repeatedly updating agent strategies, I think it’s worth defining AGENTS as an integer array (I’m not sure why RESOURCES can’t also be one, actually, so it might be worth checking on this) instead of a double. Supporting this:

  • There is really nothing currently in the AGENT array that needs to be a non-integer. the closest thing is a parameter affecting movement, but this can be made into an int, I should think. It might also help if the parameter affecting error was continuous, though I’m not yet convinced it must be – error could just be the probability of error from zero to 100, interpreted as 0 to 1.0 by increments of 0.01.
  • Similarly, there really isn’t anything in RESOURCES that needs to be a non-integer either. The probabilities of removal (i.e., death) and growth (i.e., birth) are the closest, but I don’t know if there’s any good reason to have these be especially precise – i.e., why not just have an int value from zero to 100, corresponding to a 0.01 to 1.0 probability of mortality later? That way, the whole array could be int. I suspect the same can be done for the birth parameter, though the case is certainly less convincing than for the agent array.

NEW ISSUE 13: Switch agent array to type int

In light of the above reasoning, I think I’ll plan to switch AGENTS to an int type, then see how this affects things. Using integers to define ‘genotypes’ that affect agent strategies would permit the use of bitwise operators to increase speed at a very computationally intense part of the model (genetic algorithm mutation and selection). The size of an int must be at least 16 bits in c, so a signed int could correspond to \(2^{15} - 1 = 32,767\) unique values – plenty, I would think, for coding a sufficient number of strategies. I’ll want to do a bit more digging to see how much this could be expected to speed up the genetic algorithm (see here ). Of course, if it’s trivial, then using double and columns affecting behaviour is probably just fine. But if speed is an issue, a vector of int values could really be better than several columns of double values; I’m just not sure what would have to be sacrificed yet. Quick random number sampling will be needed.

Having second thoughts about binary encoding

I’m not entirely convinced yet, actually that binary instead of real encoding is needed. One advantage of real encoding, besides that it fits a bit more easily into the current data structure I’m using, is that it might converge on optimal strategies sooner even if the bitwise calculations are faster (Salomon 1996). Note that phenotypes in bitwise encoding are affected by both the position and value of bytes, whereas phenotypes in real encoding are only affected by the value of real numbers (Kumar 2013). There are some techniques to map binary values to real numbers, though I’ve not yet found anything comparing the efficiency of binary versus real encoding, but Salomon (1996) argued that real encoding was the best choice of applying genetic algorithms to optimisation – I think this might be the way to go, though I’ll want to think about how crossing over and mutation will work efficiently. I’m not entirely sure I do want to finish issue 13. In the end, using int instead of double could cut the memory in half, but this would be almost useless for the AGENT array – if it could be done for the RESOURCES array, it might be more useful, but R doesn’t differentiate, so it really won’t matter that much, if at all.

REMOVING ISSUE 13: Convinced myself that this was a bad idea

Note that Balmann and Happe (2000) writes that ‘’population size usually ranges between 10 to 50’’, though from population-genetics perspective, this seems too small to me.

Update: 26 JAN 2017

Fleshing out the use of Genetic Algorithms for G-MSE

I’m becoming more convinced that some sort of genetic algorithm is the best way to model the strategies of all agents, including managers and stake-holders. Here is a rough overview of how I see the next step of the software development process:

  • Insert columns into the AGENT data frame that represent utility values associated with each type of resource. This will effectively quantify how much of each resource managers and stake-holders want. For example, while managers might prefer a balance of resources (perhaps the average of stake-holders?), stake-holders might prefer to maximise only one resource with little or no concern for another (or to actually prefer some resource quantities to be minimised). The utility values of each agent will be used as variables in a utility function, which will calculate agents’ satisfaction (or happiness or contentness) with a current situation of resource quantities (note: this utility function need not be linear – for some stake-holders, I’d expected it to be more log-linear, but it might be good to try different functions and ask real stake-holders what they think). Hence, a function calc_utility will be needed.

  • Insert another set of columns into AGENT that influences agent actions – how managers and stake-holders will do something in their environment. This can be thought of as analagous to genes affecting an organisms phenotype in an evolutionary model, but will have different types of effects for agents:
    • Column values will directly affect manager decisions, establishing rules (i.e., the game) for stake-holders. A manager function will therefor be needed.
    • Column values will directly affect stake-holder actions – i.e., how they will play the game. A user function will therefore be needed.
  • The second set of AGENT columns affecting manager and stake-holder actions will be updated before every decision using a genetic algorithm. This will require a separate opt_utility function. This general function will work as follows:
    • Read in an agent (or a type of agent) and identify their utility values associated with each resource type (first new columns) and their actions (second new columns).
    • Read in the resources.
    • Read in the most recent observer estimates.
    • Read in the most recent game rules.
    • Use the calc_utility to calculate the utility of the agent of interest.
    • Use the agent of interest to produce a new data array of 10 pseudo-agents of the same type with randomised action variables. Next:
      1. For 20 generations
      2. Have each pseudo-agent produce 100 offspring with random recombination among pseudo-agents, and random mutation, for each action variable.
      3. Use the manager or user function for managers or stake-holders (perhaps need an R and C version of these functions – c for here, R for later), respectively, to temporarily simulate each offspring’s decision if used in one or more previous time steps (e.g., by using the current AGENT values)
      4. Use calc_util to find the utility associated with the simulated decision in 2 – this effectively tests each pseudo-agent to see if their action variables are good at maximising utility.
      5. Grab the 10 pseudo-agents with the highest utility values and go back to step 2.
      6. If after 20 generations, mean utility values have not gotten sufficiently larger, stop and grab the pseudo-agent with the highest value – its action variable values then replace those of the original agent.
    • The above could be speeded up by making one of the original pseudo-agents be the actual agent – assuming strategies won’t change much over time.
  • The above genetic algorithm can be used both for maangers maximising utility through establishing game rules and for stake-holders maximising utility by affecting resources. The idea is to have the general opt_utility to optimise what an agent does to maximise their utility through the use of a general genetic algorithm (perhaps simulating human planning, if it were as good as adaptation by natural selection, which I don’t think it is).

  • This entire process will need to go into one c function for reasons of efficiency – we’re going to add some time onto these simulations, but I think it will be worth it provided we:
    • Consider some tricks for optimisation, such as using previous time step values as seeds in later time steps.
    • Make this process optional, with alternatives of specifying strategies a priori (perhaps empirically derived ones) and allowing the end user to input decisions as the simulation proceeds.
    • Are not too strict about convergence in optimisation – I don’t think we need to be, as we’re not really trying to model perfectly rational agents so much as intelligent agents.
  • So, perhaps, the manager.r function will take in all of the necessary information and send it to c, and then c will go through the entire process of potentially automating manager interpretation of observation data and decision of making game rules based on manager utility values. Other management options will of course be available.

  • Then, the user.r function will likewise take all of the necessary information and send it to c to go through the entire process of automating stake-holder interpretation of the manager’s rules and updated actions based on stake-holder utility valuse. Other user options will of course be available.

  • This removes the need for a specific game arena, games.R, because the game is defined by manager.r and effectively played by users in user.r. The novelty is that we’re using evolutionary game theory under the hood in both management and stake-holder actions to infer broader patterns about how cooperation and conflict might arise when all parties are acting according to their own interest.

I think this is getting on the right track, and I am starting to see how the code will look and run. We also might want to include a spatial component to all of this, affecting both manager and user actions. For example, perhaps some stake-holders can only have their utility functions affected by or act in resources within certain areas of the landscape.

Update: 25 JAN 2017

NEW ISSUE 12: Observe multiple times for density estimator

Currently, estimating total population size using a sub-sample of observed area and assuming that the density of this sub-sample reflects global density (method = case 0) only works when one sub-sample is taken. There are multiple ways of fixing this so that the population size estimate takes into account multiple sub-samples. It would be a good idea to think about the most efficient way to do this and program it into R (perhaps with tapply to start, but eventually in the manager.c function, maybe).

NEW ISSUE 11: Permanently move agents

Allow agents to move in each time step, permanently, in some way. This might be best done through the anecdotal function. As of now, they go back to their original place at the end of each time step, and it would be good to have an option to let them move all around the landscape.

Agent-based modelling in economics – potentially useful ideas

Phan (2003) briefly summarises the emerging (at least, at the time emerging) field of Agent-based Computational Economics, noting that agent-based models can complement mathematical theory in economics especially when equilibrium conditions cannot be easily computed or attained by agents. Relating agent-based models to cognitive economics, Phan (2003) notes that the latter ‘’is an attempt to take into account the incompleteness of information in the individual decision making process’’, which seems especially relevant to G-MSE. The program SWARM might be useful to explore – written in java though. Software like SWARM, MODULECO, and CORMAS appear to have a similar interface as G-MSE has (or will have), but I think that writing G-MSE from the ground up was definitely the right choice. This makes G-MSE more targeted to a specific social-ecological problem, allowing it to be written in a way that is computationally efficient, but can also be accessible through a browser by end users without proficiency in R (regarding efficiency, current simulation times for the model itself are: 100 time steps = 0.241 seconds, 1000 time steps = 3.179 seconds, and 10000 time steps = 27.740 seconds; I can’t imagine anyone would want simulations longer than 1000 time steps, but the efficiency allows many replicate simulations in a time frame that will not be an issue for serious research – especially if run in parallel. Things do slow a bit when more individuals are needed, but I’ve simulated 100 time steps with over 100000 individuals and found the simulation to take only 22.8 seconds. Memory might be an issue, but I’m currently storing entire resource and observation histories – an option to not do this would cut back massively).

Phan (2003) discusses how agents might optimise behaviour over the course of some number of iterations, which appears analagous to evolution of traits, except that it’s one individual essentially working through a trial-and-error process of finding the best behaviour to adopt to maximise some sort of utility function (in this case, profit). Beliefs are reported over time as numeric values that affect behaviour. Phan (2003) likewise considers the situation in which individuals buy or don’t buy something to maximise a surplus via a maximisation function that multiplies a binary variable to the difference between costs and benefits of a good.

Marks (1992), in a now fairly dated paper, looked at modelling generalised prisoner’s dilemmas, which involve continuous rather than discrete strategies, and discusses solutions for optimal strategies, including evolutionary stable strategies as pioneered by John Maynard Smith. The general idea of the ideas in Marks (1992) has overlap with G-MSE, in that there are agents (perhaps rational agents) attempting to maximise something through interaction. Marks (1992) first introduces the oligopoly problem, stating, ‘’with a small number of competitive sellers, what is the equilibrium pattern of price and quantity across these sellers, if any?’’ The analagy to managers and stake-holders would seem to be appropriate, perhaps: given a small number of stake-holders what is the equilibrium value of a set of resources (including population size, farm yield, etc.), if any? To do this we need to understand the agency of the stake-holders and the rules of the game as set by managers.

Marks (1992) considers an economic model of a generalised prisoner’s dilemma with three players, considering the genetic algorithm, a machine-learning technique that makes it unnecessary for a human being to consider a strategy (i.e., the strategies are derived from the conditions of the model). This is the kind of avenue that we want to go down. In fact Marks (1992) puts it quite clearly in the block below:

‘’Mathematically, the problem of generating winning strategies is equivalent to solving a multi-dimensional, non-linear optimization with many local optima. In population genetic terms, it is equivalent to selecting for fitness’’

Hence the overlap between evolutionary game theory and adaptive dynamics models with models that produce optimal strategies for maximising utility in economic situations appears to be quite large, as presumed. Therefore, using evolutionary game theory would appear to be a reasonable way of selecting stake-holder strategies in G-MSE. Delving a bit more into this literature might make the jargon clearer, and identify any subtle differences in the maths or algorithms though. And I’m still not sure how this fits in with machine learning (e.g., if machine learning is just adaptive dynamics under the hood – a quick search doesn’t give an answer to this, so I think it will be necessary to do a bit more reading to understand the two; Marks (1992) differentiates, ‘’[…] advent of [Genetic Algorithms] (and machine learning) means […]’‘). Here is an interesting example from a course in machine learning, where the instructor first looks at genetic algorithms – the instructor describes them as the’‘least practical’’ of machine learning algorithms in the course, but the instructor is also an engineer, so perhaps they’ll be more practical (probably more general, if I’m thinking correctly) for solving G-MSE type problems.

Perhaps one c function (e.g., adaptive.c) could go through a learning process of maximising utility for each type of agent (each agent might get intense, depending on how many agents there are). The rules of the game could be passed from game.c to adaptive.c, where adaptive.c also takes in the array of AGENTS. From the starting point of each agent’s traits, agents within the program could reproduce themselves with mutation, the selection could minimise some cost function until some sort of maxima is acheived that results in agent trait values that havet he highest return on utility. The program adaptive.c could therefore take in AGENTS

IDs type1 type2 see2 see3 G1 G2 Gn
0 0 0 0 0 0 0 0
1 1 0 0 0 0.2 1.1 -0.1
2 1 0 0 0 1.0 -0.1 -2.7
N-1 2 0 0 0 0.4 -1.1 0.9
N 2 0 0 0 2.1 3.0 0.5

Where the table above is the data frame of AGENTS as it currently exists with additional columns G1 to Gn that could hold real numbers that affect agent behaviour. A dummy data frame could be created that allows for evo_time generations of reproduction with mutation and selection for minimising a cost function in attempt to find appropriate values affecting components of an agent’s strategy. I’m not sure how long such an algorithm would take, but I suspect that it could be optimised to not be painfully long – different criteria could be set, e.g., to allow for a maximum number of evolving generations (the aforementioned instructor suggests 1000) or some convergence criteria. Essentially, each agent or type of agent would go through a process of learning an optimal strategy by creating a lineage of strategies, the descendants of which would be selected by strategy performance. Note that given a convergence criteria, strategies might not need to evolve much in each time step of G-MSE – the best strategy might be stable over time in some situations (and if we don’t want strategies to change over time steps, the question of optimal strategy could be solved when initialising agents – still the idea of allowing dynamic strategies seems interesting, and might be important if management is also changing).

While for some simulations, we’ll want to take the time to allow evolution of optimal strategies, in others we might even embrace an imperfect strategies evolving as a consequence of short evolution times – this might mimic the limited time that stake-holders have to consider a particular problem.

Update: 24 JAN 2017

A general summary of G-MSE as it exists at the moment

  • We have a working population model that is individual-based and allows for multiple types individual movement, birth, and death.
  • We have an observation model with four types of possible observation
  • We have a working option to allow end-user dynamic inputs while the simulation is going on, ‘playing’ as a manager/stake-holder
  • The code is flexible enough that we should be able to add to it as need be without restructuring everything.
  • The code appears to be bug-free; it doesn’t crash when used correctly (though some error messages could be added)
  • The code is efficient: computationally intense tasks are passed to C, while tasks done in R are now coded with proper memory management in mind

A summary of some of the challenges of putting the ‘G’ in G-MSE

  1. The possible number of game categories increases exponentially with the number of actions, meaning that game solutions are only available for simple cases (Adami et al. 2016). How this affects G-MSE will depend on how many options are available to agents.
  2. Payoffs in G-MSE will almost certainly be asymmetric, meaning that different agents might perceive themselves to be playing different games. This is a consequence of what evolutionary game theorists would refer to as ‘’genotype asymmetry’’ – as in, asymmetric caused by something that is inherent to the agent itself (as opposed to its location).
  3. Resources in G-MSE might be asymmetric, meaning that some agents (stake-holders) have an inherent advantage over others, allowing them to dominate interactions.
  4. Payoffs are expected to be stochastic – or, rather – there will be some variation around expected payoffs that might affect agent decision making and management outcomes.

A summary of some ideas for moving forward with G-MSE

  1. As an option, write up the model to allow for participatory agent-based modelling. This would be beneficial in allowing experimentation and therefore having some empirical data on how stake-holders might make decisions, which could then parameterise a decision-making agent in the model. The downside is that decisions would not be well-grounded in theory – we wouldn’t have a clear idea of why they were being made based on game theory. Hence, some sort of game-theoretic derivation of decision rules is needed.
  2. Evolutionary game theory might be useful in deriving strategies for agents to play. I can imagine borrowing from the game-theoretic or adaptive dynamics literature to allow strategies to optimise over time to maximise agent payoffs. This might be challenging if optimal strategies depend on game history (i.e., if we allow for strategies based on extensive-form games. Perhaps what we want instead is something more like a chess engine (e.g., Stockfish, which is publicly available though written in C++, though some engines are written in C – the general idea of these engines is to consider the rules and look forward to evaluate options, pruning branches of moves that are evaluated as bad, as solving for the best move is impossible given the sheer number of possible positions).
  3. Incorporate economic game theory, and more specifically agent-based computational economics, which incorporate utility functions to allow agents to make decisions. This would link nicely to the more social part of the social-ecological modelling, and is a bit away from my comfort zone, so it’s probably good to consider in more detail. Perhaps some game-theoretic model that cleverly incorporates both evolutionary and economic applications of game theory could be good.
  4. Of slightly lesser concern, but worth mentioning, maybe consider the potential scope for applying complexity theory to management. I’ve been particularly thinking about how biological degeneracy might apply to social-ecological modelling (among other things), leading to more robust management decisions by explicitly considering the possibility of multiple components of management fulfilling overlapping functions, hence leading to greater stability or robustness. Degeneracy is defined by dissimilar components of a system performing similar functions (note, different from redundancy, which implies complete interchangeability). Degeneracy is ubiquitous in complex systems, but its importance has been largely overlooked – I wonder if there is a case for thinking about it here, particularly with multiple agents and scales.

Short-term plan

I’m going to finish developing thoughts on evolutionary game theory, then move onto looking at game theory from an economic perspective. I think the biggest thing to consider on the immediate horizon is what kind of approach will be used to simulate agents (stake-holders) playing games and making decisions. Once this is clear, the details can follow. Some sort of utility function will be used. Of particular consideration is how much complexity should be incorporated – or, perhaps – how much mechanistic detail.

Leombruni and Richiardi (2005) makes some interesting points regarding use of mathematical versus agent based models, noting the tractability issues with mathematical (and game-theoretic) models as things become more complex due to unique individuals needing to be represented.

Quick efficiency fix

The best way to manage memory in R is going to be by avoiding Rbind altogether and working instead with lists, as made very clear by the following quick experiment in scratch.r:

################################################################################
# Testing list versus array efficiency

# ARRAY FIRST:
sam <- sample(x = 1:100, size = 14000, replace = TRUE);
dat <- matrix(data=sam, ncol=14);

obs <- NULL;

proc_start <- proc.time();

time <- 1000;
while(time > 0){
   obs   <- rbind(obs, dat);
   time  <- time - 1;
}

proc_end   <- proc.time();
time_taken <- proc_end - proc_start;
# TIME TAKEN: 14.09 seconds

# NOW LIST:
sam <- sample(x = 1:100, size = 14000, replace = TRUE);
dat <- matrix(data=sam, ncol=14);

obs <- list();

proc_start <- proc.time();

time <- 1000;
elem <- 1;
i    <- 1;
while(time > 0){
    obs[[i]] <- dat;
    i        <- i + 1;
    time     <- time - 1;
}

proc_end   <- proc.time();
time_taken <- proc_end - proc_start;
# TIME TAKEN: 0.005 seconds

################################################################################

The output being deposited into a list is much, much faster. Enough to make me want to fix this immediately. Doing so was trivial – it was just a matter of replacing RESOURCE_REC <- rbind(RESOURCE_REC, RESOURCES) with RESOURCE_REC[[time]] <- RESOURCES, then editing the plotting functions accordingly given the new data type. The result is that simulations are now much faster, especially when time is high, simulating many time steps. One hundred time steps used to take 10-12 seconds for some observation times – they now all take under a second. For more time steps, the efficiency difference would increase exponentially. The massively increased efficiency occurs because R now no longer allocates a whole new massive chunk of memory for each new recorded data frame – it just appends data to a list where the memory has already been allocated.

CONCLUSION THE TIME IT TAKES TO RUN 100 TIME STEPS HAS DECREASED BY AN ORDER OF MAGNITUDE BY SWITCHING FROM DATA FRAMES TO LISTS IN R (NOW LESS THAN 1 SECOND)

Note that plotting still happens slowly, deliberately, because we’re putting the system to sleep for a tenth of a second in each time step to make the animation smooth. When plotting is turned off, this no longer happens.

Update: 23 JAN 2017

Proof of concept: Interactive user input as a stake-holder

The code below runs the gmse program in a way that is interactive. I have run time steps, and specified that the hunting begins in time step 95.

> sim <- gmse( observe_type  = 0,
+              agent_view    = 10,
+              res_death_K   = 400,
+              plotting      = TRUE,
+              hunt          = TRUE,
+              start_hunting = 95
+ );

This produces the following output. When prompted by the line ‘’Enter the number of animals to shoot’’, I have typed in a number and hit enter accordingly.

Year:  95
The manager says the population size is  181
You observe  11  animals on the farm
Enter the number of animals to shoot
10

Year:  96
The manager says the population size is  408
You observe  11  animals on the farm
Enter the number of animals to shoot
10

Year:  97
The manager says the population size is  272
You observe  6  animals on the farm
Enter the number of animals to shoot
10
You can't shoot animals that you can't see
6  animals shot

Year:  98
The manager says the population size is  226
You observe  10  animals on the farm
Enter the number of animals to shoot
0

Year:  99
The manager says the population size is  294
You observe  9  animals on the farm
Enter the number of animals to shoot
5

The output of this also shows the spatial distribution of resources and a population graph over time. My hope was that allow the gmse.so file to be sourced directly from a link so that it could be run by anyone remotely, but I think that this will take a bit more work – worth keeping in mind for later.

Population dynanics (black line) and manager estimate of population size (blue line) over time in a simulation in which the user can act as a stake-holder and shoot animals

Population dynanics (black line) and manager estimate of population size (blue line) over time in a simulation in which the user can act as a stake-holder and shoot animals

I am still trying to get a clear picture on how to incorporate management, user, and game-theoretic modelling components. Given uncertainty in all of these components, some unified approach would seem beneficial. Franco et al. (2016) has recently introduced a comprehensive approach to evaluate effects of disurbance on coral reefs using a Bayesian Belief Network (BBN) approach. This approach ‘’offers a methodological framework to address uncertanty.’’ This approach requires some defined outcome state, the probabilities of realisation of which are calculated. Use of BBNs requires an acyclic graph and conditional probability tables. It’s not entirely clear to me how BBNs would be incorporated into the G-MSE simulations, except maybe as a type of observation model? With the simulation, we can look at causality directly and thereby quantify direct and indirect effects, and measurement error. It could, however, be useful to know how well BBNs perform using simulated populations, simulated observational data, and appropriate analysis based on BBNs, as would be used on empirically derived data.

Update: 22 JAN 2017

For coauthors, add the G-MSE files onto a public Dropbox so that they can be sourced and run remotely. There are also some useful resources for embedding R in a website. This might be faster than using Shiny, at least at first, so it could be useful for initial demonstrations. It might be useful to show a prototype of G-MSE, or what it might be:

## [1] "Managers estimate the population size is 4230"
## [1] "You encounter 35 animals around your farm"
## [1] "Estimated loss of yield is at 5%"
## [1] "Enter how many animals you intend to hunt"

Demonstrating this (and it would be quick to implement) might be useful for showing how management and games work.

Update: 20 JAN 2017

Side note about computation efficiency

Note that it would really be faster to convert to a list type in R if anything computationally intense needs to be done (e.g., binding rows). C will not appear to let me read in a list via .Call, only a vector, so it’s worth thinking later about whether doing some things on the R side will be faster:

  1. Using the returned arrays from C
  2. Changing arrays to lists and then using them in R
  3. Making new C functions to do some standard R tasks faster (unlikely to work well)

Updated scratch.R to show how option 2 could work, though the change itself might be more inefficient than binding or other operations.

Issues related to agent-based complex modelling of human decisions

An (2012) reviewes humans as agents in agent-based models of social-ecological systems. An (2012) ties this in with complexity theory, and distinguishes agent-based from individual-based models in a useful way – with agent-based models being defined more by attention to decision making processes (as in models of human behaviour). An (2012) asks,

  1. What methods, in what manner, have been used to model human decision-making and behavior?
  2. What are the potential strengths and caveats of these methods?
  3. What improvements can be made to better model human decisions in coupled human and natural systmes?

An (2012) reviews nine different types of decision models, and notes that different types of decision models can be mixed and matched, as we’ll likely need to do for G-MSE. I’m not sure that we can assume that stake-holders are the same types of decision-makers. For example, I suspect that farmers might be better represented by a microeconomic model of decision making, with a focus on maximising some sort of revenue or yield. An (2012) notes the use of utility functions here (seeming to link with some of my earlier thoughts), including one in which ecological indicators are included in place of just money (Nautiyal and Kaechele 2009). Apparently, econometric work by McFadden (1973) is foundational to looking at decisions based on utility, modelling decisions as a probability of an agent choosing an option. An (2012) notes that decisions are unlikely to be completely rational, and humans will tend to seek ‘’satisfatory rather than optimal utility’’.

A second of the nine types of decision models includes the psychosocial and cognitive models, which attempt to model individual’s thoughts based on beliefs and goals – institutions can also be modelled this way, though we might think of institutions as collections of the same type of individual for the purposes of G-MSE coding.

One type of modelling that could be especially interesting is what An (2012) defines as ‘’participatory agent-based modelling’’, wherein real stake-holders tell the modeller what they would do under some set of conditions conditions, then the model runs with those decisions. This has been used, apparently, in an agricultural setting (Naivinit et al. 2010), and would be a very interesting addition to G-MSE. If we could have an option for letting a user take over the role of an agent in the model and play against a computer, it could be interesting – though I’d tend to still want to develop some game-theoretic algorithm that grounds predictions of stake-holder behaviour, rather than relying solely on empirically derived data (i.e., asking people what they would do). This could be accomplished in a couple ways, in principle – one being throught he use of a C standalone program (i.e., not linking with R) that prompts the user for input using the scanf function and repeatedly updates the simulation with information in every cycle of the G-MSE loop. The same effect can be accomplished in R with the following code as an example of the concept:

act_agent <- function(times){
    while(times > 0){
        cat("\n\n\n How many geese do you shoot? \n\n");
        shot_char   <- readLines(con=stdin(),1);
        shot_num    <- as.numeric(shot_char);
        gross_prod  <- rpois(n=1, lambda=100);
        net_prod    <- gross_prod - (2 * shot_num);
        cat("\n");
        output      <- paste("Net production = ", net_prod);
        print(output);
        times       <- times - 1;
    }
}

If you read the function into R, then run it (e.g., act_agent(times = 2), it will ask for input times iterations, prompting once per iteration of the while loop. An option in G-MSE would be nice to allow:

  1. Users to let the program run with stake-holders simulated by well thought out utility functions applied to a game.
  2. Users allowed to interact with the program in each time step such that every time an agent needs to make a decision, the user is prompted to doing so for one type of agent
  3. Multiple users prompted to enter in decisions, simulating a long history of real-world actors making choices in a simulated game.

All of these would be fun, and An (2012) notes that they are often quite sueful. Ideally it would be nice to make the program more user-friendly than a command line interface, but that seems like a concern for a version 2.0, after an initial version has been released. More helpfully, using some sort of loop could make for easy input of the R options in the gmse function – it could ask, in plain language, for users to insert the numbers that are currently only input within gmse() itself (e.g., gmse(time_max = 100)).

It’s possible that we could develop a type of rudimentary artificial intelligence by collecting data of user decisions (i.e., make a ‘bot’ that mimics human decisions). For example, we could have 100 people act as agents in G-MSE, collect data on the decisions that they make when trying to act like a stake-holder, then construct an algorithm based on real user decision in different situations (alternativley, or in addition, we could also look at actual past decisions from the case studies to make an algorithm). This could be an interesting, approach, albeit a somewhat atheoretical one – it doesn’t excite me quite as much, but it might be worth considering because the end result might predict human behaviour better than theory-driven approaches (as humans don’t always act rationally or think things through carefully – I don’t think a citation is needed for this; it’s 20 JAN 2017, and the current time is 17:00 GMT, or 12:00 EST). It could also be interesting to compare different types of approaches (i.e., have a theory-based approach and a empirically-based approach option). An (2012) warns though that ‘’Even though also based on data, researchers usually have to go through relatively complex data compiling, computation, and/or statistical analysis to obtain such rules’’ An (2012) also notes that this kind of data collection does not necessarily identify why decisions are being made. Hence, I do think game-theory will be absolutely important, with agents using underlying utility functions to maximise their own utilities as a consequence of games.

Update: 19 JAN 2017

Some notes on the asymmetric nature of stake-holder games

Games between stake-holders, modelled by agents in G-MSE, are typically, if not always, going to be asymmatric. This means that the stake-holders are distinguished by more than their strategies – they are likely to have their own unique payoffs defined by their identities (e.g., as a conservationist, a farmer, etc.). It would seem as though the only way around this – if it’s even possible – might be to make identity part of the game itself. In other words, let agents attempt to maximise some general payoff by deciding to take on a particular role, and then a strategy given their chosen role. It’s an interesting thought, but I don’t think it makes much sense for the practical application of G-MSE. In the context of the games that we’re interested in, stake-holders effectively are conservationists, farmers, hunters, etc. (or some mixture of these roles). Hence, I think we need to work with the idea that the games our stake-holders play and that G-MSE will model are going to be asymmetric.

Maynard Smith and Parker (1976) outlined three specfic ways that games might be asymmetric (they were thinking about animal contests, but the general principles apply):

  1. Pay-offs asymmetry: Different players might stand to gain different amounts in the game – e.g., perhaps mutual cooperation returns a higher benefit for one player than another, or defection on the part of one player has a more negative effect on its opponent than vice versa.

  2. Resource asymmetry Intrinsic difference between players might give one player an inherent advantage, allowing them to dominate in an interaction (i.e., there might not be much of a conflict because one side can always win).

  3. Uncorrelated asymmetry Discussed earlier: Maynard Smith and Parker (1976) define this as asymmetries that ‘’do not affect either the payoffs or the’’ resources that might given one player an intrinsic advantage.

The authors offer some general conclusions about asymmetric gains with unequal payoffs, but these are really more about encounters of conflict, and perhaps not so applicabl to G-MSE. They state that, where payoffs are unequal but all parties have access to information, it is best to ‘’play high when you have more to gain and zero when you have less to gain’‘. In other words, if there is a lot to gain by sticking it out and fighting hard in an interaction, do it – if there’s not much to gain, then back off. Such contests are the central focus of Maynard Smith and Parker (1976), but the general conclusion that’‘mixed strategies will be the exception’’ when contests are asymmetrical would seem to apply more broadly. Given the many ways that a game can be asymmetrical – rather, that a symmetrical game could be changed to asymmetrical – it would seem likely that there are more ways that cause a strategy to become pure than not pure because there are more ways of adjusting payoffs to making one strategy the clear winner. This could simplify the game theory in G-MSE, in a sense, if mixed strategies do not require much consideration.

McAvoy and Hauert (2015) recently emphasised the importance of asymmetry in evolutionary games, noting that ‘’cooperation may be tied to individual energy or strength, which is, in turn, determined by a player’s role’’. This would seem to apply to social-ecological conflicts as well – cooperation might reasonably tied to the power (economic, political, etc.) of stake-holders, meaning that it might be important to take this into account in G-MSE modelling. For something like Prisoner’s dilemma, we can represent an asymmetry using subscripts, so the standard game would be represented by a payoff matrix,

\[ \left( \begin{array}{ccc} & C & D \\ C & R, R & S, T \\ D & T, S & P, P \end{array} \right). \]

Where the above satisfies: \(T > R > P > S\). An asymmetric game can be represented by,

\[ \left( \begin{array}{ccc} & C & D \\ C & R_{i}, R_{j} & S_{i}, T_{j} \\ D & T_{i}, S_{j} & P_{i}, P_{j} \end{array} \right). \]

The above is for two different types of players, \(i\) and \(j\). Note that I tried working through the same basic concept with a bit different notation earlier on, with each matrix element being defined by a utility function that is unique to each agent type. In the code, this will all be defined by agent types and their respective traits (columns in the agent_array), but it’s good to link this up with theory and the general properties of asymmetric games.

McAvoy and Hauert (2015) go into the Prisoner’s Dilemma and Snowdrift gamse given environmental and genotypic asymmetry

  • Environmental asymmetry refers to asymmetry in payoff matrices caused by differences in individual location.
  • Genotypic asymetry refers to asymmetry in payoff matrices caused by differences in individual genotype, which we can probably think about as differences in type that are intrinsic to the individuals and are not spatial (e.g., stake-holder roles).

Such asymmetries can complicate evolution of strategies, and, perhaps more relevant for G-MSE, can cause different types of agents to experience different types of games as a result of asymmetry:

‘’[…] Thus, based on the social dilemma implied by the ranking of the payoffs, a player who incurs a cost of \(c_{1}\) for cooperating is always playing a Snowdrift Game while a player who incurs a cost of \(c_{2}\) is always playing a Prisoner’s Dilemma. It follows that ecological asymmetry can account for multiple social dilemmas being played within a single population, even if the players all use the same set of strategies’’ (McAvoy and Hauert 2015 p. 9).

The above quote is respect to asymmetry payoffs caused by space, but the point is that the asymmetry of the payoff matrix can lead to different players experiencing different games and therefore having different – potentially conflicting – strategies.

We might also apply the concept of genotypic asymmetry with the process of cultural updating, which occurs when the ‘genotypes’ (perhaps stake-holder types) do not change, but the strategies of players can be updated over time. Note that genetic asymmetry can be reduced to a broader symmetric game given genetic updating (i.e., births and deaths of players of particular types), this is probably not applicable to G-MSE.

Update: 18 JAN 2017

Some thoughts on the application of game theory

I’m trying to step back a bit to consider the manager and user models, which will both affect and/or be affected by the game-theoretic component of the model. I’ve considered how the game-theoretic component will fit into G-MSE more generally, and also a bit of how it might be implemented and applied in the context of stake-holder actions. Overall, this will require three c files to be closely integrated, but the application (perhaps even development, if necessary) of game theory requires a lot of thought.

The model will be more general if we allow agents to take any number of actions. but the number of games that are possible increases exponentially with the number of different actions that agents can take (Zeeman 1980). If only two actions are possible (e.g., cooperate and defect), then there are only four types of games that can be played (Prisoner’s dilemma, Snowdrift, Anti-coordination, and Harmony). The number of games increases to 20 for three actions and 228 for four actions (Adami et al. 2016). If we want the software to somehow identify the type of game being played – rather – if game type identification is to be an essential part of the program, then agent actions will probably need to be limited (there is of course, always the option to identify games iff there are sufficiently few actions). If most conflicts can be described by a small number of types of agents with a small number of types of actions (and this seems reasonable, perhaps, especially if we think of actions qualitatively), then constraining the software to such cases might be preferable (at least, as a starting point). The benefit is that we might then make clearer predictions for management, e.g.: Right now, stake-holders are playing a Snowdrift game, but by adopting an alternative management decision, they will transition to playing Harmony.

This is appealing, but I think it also relies on payoff matrices being symmetric, meaning that players are distinguished by their strategies and nothing else (McAvoy and Hauert 2015). In the types of games that interest us, this almost certainly won’t be true. The games we’re interested in at ConfooBio will typically be characterised by uncorrelated asymmetry; that is, situations in which agents know that they are of a certain type and will receive payoffs associated with that type of agent. Hence, the payoff structure might look like a Prisoner’s dilemma to one stake-holder, but Harmony to another (i.e., the optimal strategy is always cooperate for one, but always defect for another because each knows the type of agent that they are and how payoffs differ between types).

I’m starting to work through these ideas with an initial focus on evolutionary games, as this is the application of game theory with which I’m most familiar, and because I think some of the general developments of evolutionary game theory are probably applicable for our purposes. I’ll also need to read more widely into economics and the social sciences, but some recent work by Adami et al. (2016) and McAvoy and Hauert (2015) seems relevant.

Adami et al. (2016) argue that the optimal strategies predicted by simple mathematical games are unlikely to be very useful for predicting agent actions given the complexities associated with decisions of real-world; such complexity notably includes stochasticity, which applies to games among all kinds of agents from ‘’microbes to day traders’’ (Adami et al. 2016). Stochasticity can affect the stability of strategies (see also Adami and Hintze 2013). If strategies are conditional or based on memory of previous encounters, then the number of traits (Adami et al. 2016 assume loci, modelling genetics, but the same applies to agents making decisions) required to model decisions increases rapidly – 21 total traits are needed for conditional expression of strategy when agents can remember the previous two games. In practice, I suspect that there is some helpful way to simplify this – perhaps not every detail of the history of interactions and possible conditions really is needed to (or even would be expected to) model stake-holder behaviour. Instead, I suspect that game history could be boiled down into one or two representative variables that, among other things, are likely to influence agent behaviour. Agents are perhaps better to be thought of as modelling stake-holders guided primarily by heuristics rather than optimally rational behaviour? Hence the agent_array might better be thought of as containing variables underlying human values and traits in the context of games rather than as solutions to games. A couple recent and potentially relevant papers on decision rules in complex environments include Fawcett et al. (2014) and McNamara et al. (2014). Adami et al. (2016) conclude that ‘’[w]hile evolutionary games can be described succinctly in mathematical terms, they can only be solved exactly for the simplest of cases’’. Adami et al. (2016) were specifically considering games in an evolutionary context, but I don’t think that their conclusion is limited to evolutionary game theory. In the case of decision making stake-holders, the complexity associated with stochasticity and uncertainty, the possibility of more than two actions and payoffs, and the asymmetry of payoff matrices would all seem to conrtribute to the difficulty or impossibility of solving for exact solutions. Hence, when scenarios are complex in G-MSE (as we probably need them to be), it is unlikely that analytic solutions will be of much use. However, stake-holders won’t evolve in the same sense as biological organisms, so some techniques used in evolutionary game theory will be unavailable – or have to be modified. It might be worth thinking more about identifying the consequences of practical or observed strategies, or types of strategies, rather than trying to somehow solve for the best strategies. The Axelrod experiments kind of did this before a lot of complex techniques became available to analyse evolutionary games. Users proposed strategies, which were put into a tournament – the point wasn’t so much to solve the iterated Prisoner’s dilemma so much as to explore different strategies for playing the game.

In this browser app, you can play the iterated prisoner’s dilemma against ‘Lucifer’, an automated agent that response to your decisions.

Update: 17 JAN 2017

NEW ISSUE 9: Observation Error It would be useful to incorporate observation error into the simulations more directly. This could be affected by one or more variables attached to each agent, which would potentially cause the mis-identification (e.g., incorrect return of seeme) or mis-labelling (incorrect traits read into the observation array) of resources. This could be done in either of two ways:

  1. Cause the errors to happen in ‘real time’ – that is, while the observations are happening in the simulation. This would probably be slightly inefficient, but have the benefit of being able to assign errors specifically to agents more directly.

  2. Wait until the resource_array is marked in the observation function, then introduce errors to the array itself, including errors to whether or not resources are recorded and what their trait values are. These errors would then be read into the obs_array, which is returned by the function.

NEW ISSUE 10: Multiple resource

The resource-wide parameter values (e.g., carrying capacities, movement types) will need to be either:

  1. Defined at the individual scale so that each individual resource has it’s own value which can then be called in the resource function as necessary, and/or
  2. Input as vector in the base gmse function, the length of which could determine how many times resource is called in one time step (one for each type of resource, potentially, if carrying capacity is type specific – or carrying capacity could be applied within a type in c – perhaps more efficient, but would require to read in multiple K somehow, either through the paras vector or in the resources array – or something else. How to do this best will need to consider both computational efficiency and clarity/ease of coding.

Note that:

  • res_remove can already be called in a type-specific way by resource, so it might just be better to call resource once and somehow input variable numbers of K into c. I’ll need to think more about this, but it could be something like assigning each individual a competition coefficient alpha for how it is affected by each other type of individual. Intra-type competition could then be modelled generally, with K defined by its inverse. Meanwhile, inter-type competition coefficients could also be useful.

  • Along these lines, it’s also worth considering an option allowing only one resource per cell (equating to a local alpha and K of one). This might be worth making its own issue later.

  • If we were to call resource multiple times, we would also need to paste arrays together in R. This wouldn’t be terrible, but it could lose some efficiency unnecessarily, and I don’t see the benefit.

MEMORY LEAK CHECK OF R CODE

I have tried running simulations at very high population sizes (>100000) to see how the simulation would react. Upon seeing quite a bit of memory being used up, I ran the following valgrind command:

R -d "valgrind --tool=memcheck --leak-check=yes" --vanilla < gmse.R

The program valgrind found a lot of large memory allocations and deallocations (as expected):

Warning: set address range perms: large range

The leak summary was as follows:

==14507== LEAK SUMMARY:
==14507==    definitely lost: 133,373,728 bytes in 469 blocks
==14507==    indirectly lost: 11,472,512 bytes in 55 blocks
==14507==      possibly lost: 120,863,992 bytes in 563 blocks
==14507==    still reachable: 2,319,742,586 bytes in 12,127 blocks
==14507==         suppressed: 0 bytes in 0 blocks
==14507== Reachable blocks (those to which a pointer was found) are not shown.
==14507== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==14507== 
==14507== For counts of detected and suppressed errors, rerun with: -v

If we shift to look only at one run of the resource model, which is run in the new script scratch.R, we get:

==14689== LEAK SUMMARY:
==14689==    definitely lost: 3,584 bytes in 4 blocks
==14689==    indirectly lost: 0 bytes in 0 blocks
==14689==      possibly lost: 0 bytes in 0 blocks
==14689==    still reachable: 28,837,506 bytes in 13,346 blocks
==14689==         suppressed: 0 bytes in 0 blocks
==14689== Reachable blocks (those to which a pointer was found) are not shown.
==14689== To see them, rerun with: --leak-check=full --show-leak-kinds=all

And if we include one run of the observation model too, we get:

==14721== LEAK SUMMARY:
==14721==    definitely lost: 6,296 bytes in 8 blocks
==14721==    indirectly lost: 0 bytes in 0 blocks
==14721==      possibly lost: 0 bytes in 0 blocks
==14721==    still reachable: 28,948,434 bytes in 13,355 blocks
==14721==         suppressed: 0 bytes in 0 blocks
==14721== Reachable blocks (those to which a pointer was found) are not shown.
==14721== To see them, rerun with: --leak-check=full --show-leak-kinds=all

A bit more worrisome, if I run an old R script (a simple individual-based model), I get the following

==15050== LEAK SUMMARY:
==15050==    definitely lost: 0 bytes in 0 blocks
==15050==    indirectly lost: 0 bytes in 0 blocks
==15050==      possibly lost: 0 bytes in 0 blocks
==15050==    still reachable: 36,846,063 bytes in 15,996 blocks
==15050==         suppressed: 0 bytes in 0 blocks
==15050== Reachable blocks (those to which a pointer was found) are not shown.

Originally, I feared that this might suggest a problem with my c code, or its call to R. All the memory allocated appears to be freed though. Some searching online suggests that valgrind is not always perfect on this front.

‘’You may be surprised to see that valgrind believes that R has leaked memory - unfortunately, it is not perfect, and in this particular case the memory is not so much ’leaked’ as it is ‘cached for the duration of that R session’, and valgrind fails to detect that ‘ownership’ of a particular block of memory is transfered.’’

This is likely what happened (given the original warning). In fact, if we run valgrind and try to track the origin of the leak with --track-origins=yes, it complains in exactly the this way – about memory that is allocated but definitely freed:

R -d "valgrind --tool=memcheck --leak-check=yes --track-origins=yes" --vanilla < scratch.R

Below, for example, valgrind is complaining about line 468 in the resource.c file:

==15171== 1,560 bytes in 1 blocks are definitely lost in loss record 165 of 1,867
==15171==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==15171==    by 0xC2959DE: resource (resource.c:468)
==15171==    by 0x4F0A57F: ??? (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F4272E: Rf_eval (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F44E47: ??? (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F43DDC: Rf_applyClosure (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F422FC: Rf_eval (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F45FB5: ??? (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F44E47: ??? (in /usr/lib/R/lib/libR.so)
==15171==    by 0x4F42520: Rf_eval (in /usr/lib/R/lib/libR.so)

This line allocates memory for the res_new array:

res_new = malloc(res_num_total * sizeof(double *));
for(resource = 0; resource < res_num_total; resource++){
    res_new[resource] = malloc(trait_number * sizeof(double));   
}    

This appeared to be have been freed correctly, but on inspection, each malloc to an array was missing a correspondnig free I have fixed this (with thanks to this StackOverflow thread), and now the entire gmse.R program produces the follow valgrind output:

==15405== LEAK SUMMARY:
==15405==    definitely lost: 0 bytes in 0 blocks
==15405==    indirectly lost: 0 bytes in 0 blocks
==15405==      possibly lost: 0 bytes in 0 blocks
==15405==    still reachable: 1,544,824,322 bytes in 12,119 blocks
==15405==         suppressed: 0 bytes in 0 blocks

CONCLUSION MEMORY LEAK HAS BEEN IDENTIFIED AND FIXED

While this wasn’t a huge deal for small scale simulations, for simulations with huge arrays caused by large population sizes, this would have made a difference. The code has therefore been corrected and pushed to dev.

With all of this in mind, it is worth thinking about the R side of memory management as it becomes more relevant (see R memory management advice). It might be worth switching to a list structure for input and output so that entire frames are not copied for each operation (which I assume R is doing for the rbind() function). It might also be worth thinking about running rm() and gc() in tandem to release memory during the major loop – or also getting rid of some components of the data frame on the fly. The gmse.R program could potentially switch from a list to an array after the major simulation loop finishes and plotting or returning the array is necessary.

It appears that I’m correct regarding the use of rbind() (or c() or cbind()) – these are terribly inefficient with respect to what’s happening under the hood when R calls C (or C++). I’ve downloaded Svetlana Eden’s Efficiency tips for basic R loop, which might be a useful reference when working on the R side of optimisation. The rbinds really show be avoided, if possible. One way to do this, if nothing else, would be to write to a file instead of cbind (not sure if this would be helpful for a shiny app). StackOverflow suggests using rbindlist, but this would introduce dependencies that I’d prefer to avoid. In the end, it might be worth it to just write a quick add_data.c script in c for the sole purpose of joining old and new arrays. Alteratively, this might not be so important – in the end, it might not even be necessary to record the entire observation history; at least, not in the way it’s currently being done. The history might instead only record a few key things from each time period.

Update: 16 JAN 2017

RESOLVED ISSUE 6: Sampling ability with agent number This issue has been resolved to my satisfaction. I did this using the second option of addressing it. Now for case 3 in which blocks of the landscape are iteratively sampled (and resources potentially move in between iterations), a transect_eff defining transect efficiency is set as equal to the number of observing agents (working_agents). The transect_eff is a counter, which, after it has counted down to zero, will permit resource movement. Hence, if there is only one agent observing, transect_eff hits zero and movement happens after every iteration; if there are two agents observing, then transect_eff hits zero after two iterations, then movement occurs and transect_eff is reset to working_agents.

RESOLVED ISSUE 8: Clear up method sampling type in observation model This issue has been resolved, albeit with cases in a different order than suggested (the original suggestion, it turns out, was not ideal). Cases are now:

0: Sampling with a range of view (i.e., don’t rely on the fix_mark > 0 for switching methods) 1. Sampling fix_mark times randomly on the landscape. 2. Linear transect 3. Square transect

Of course, there is always room for more, but these are now four clear observation methods. Separating case 0 from case 1 is especially useful now. Now the variable fix_mark is just ignored for all cases except 1. In the code, both case 0 and case 1 still look similar, and both dig deep through mark_res and field_work functions to differentiate between observation methods, but I don’t think that this is necessarily a bad thing – a different argument to mark_res differentiates them now, at least, in the observation function, so it’s not too difficult to trace through what is going on. Note that both cases 0 and 1 add a new column for each times_obs, which isn’t done for the transect methods.

The specially created branch fix_home_bug has been merged. I will keep it alive for a while before removing it entirely.

Update – 14:33, after rewriting the gmse.R code to make an easier catch-all function, with appropriate analysis, I’ve noticed that the binos function in observation.c is defining distance in a way that is no longer really compatible with the ponit of case 0 (i.e., sample a small area and extrapolate based on the density). The binos function was looking at the Euclidean distance, making, e.g., 3 cells away diagonally farther than 3 cells away left or right (or up or down). This might be useful later, so I’m going to keep it in as an option, but I’m also going to make the default now as within view cells in any direction, such that a block forms around the focal individual, and diagonal distances are not assumed to be longer than length and width. This is the more common way of simulating things, and it makes movement and observation estimates easier – I think the only reason to change it back to Euclidean distance would be if we had an actual map and really needed to be precise with the distance of things on it.

I have also simplified the master R file gmse.R to allow for one function to do all of the work, using several default options for simulations. Below, the main gmse() function is shown with its default values.

################################################################################
# PRIMARY FUNCTION (gmse) FOR RUNNING A SIMULATION
# NOTE: RELIES ON SOME OTHER FUNCTIONS BELOW: MIGHT WANT TO READ WHOLE FILE
################################################################################
gmse <- function( time_max       = 100,   # Max number of time steps in sim
                  land_dim_1     = 100,   # x dimension of the landscape
                  land_dim_2     = 100,   # y dimension of the landscape
                  res_movement   = 1,     # How far do resources move
                  remove_pr      = 0.0,   # Density independent resource death
                  lambda         = 0.9,   # Resource growth rate
                  agent_view     = 10,    # Number cells agent view around them
                  agent_move     = 50,    # Number cells agent can move
                  res_birth_K    = 10000, # Carrying capacity applied to birth
                  res_death_K    = 400,   # Carrying capacity applied to death
                  edge_effect    = 1,     # What type of edge on the landscape
                  res_move_type  = 2,     # What type of movement for resources
                  res_birth_type = 2,     # What type of birth for resources
                  res_death_type = 2,     # What type of death for resources
                  observe_type   = 0,     # Type of observation used
                  fixed_observe  = 1,     # How many obs (if type = 1)
                  times_observe  = 1,     # How many times obs (if type = 0)
                  obs_move_type  = 1,     # Type of movement for agents
                  res_min_age    = 1,     # Minimum age recorded and observed
                  res_move_obs   = TRUE,  # Move resources while observing
                  Euclidean_dist = FALSE, # Use Euclidean distance in view
                  plotting       = TRUE   # Plot the results
){}

Using the function defined above, with most parameters set to default values, I looked at the four different observation types below given the following parameters.

# A: Sample of a 10 by 10 region to estimate density
# Simulation time: 1.8 seconds
gmse( observe_type = 0,
      agent_view   = 10,
      res_death_K  = 800,
      plotting     = TRUE
    );

# B: Mark 30 resources 4 times, recapture 30 4 times
# Simulation time: 2.1 seconds
gmse( observe_type  = 1,
      fixed_observe = 30,
      times_observe = 8,
      res_death_K   = 800,
      plotting      = TRUE
    );

# C: Sample agent_view rows at a time -- all across
# Simulation time: 2.3 seconds
gmse( observe_type = 2,
      agent_view   = 10,
      res_death_K  = 800,
      plotting     = TRUE
);

# D: Sample agent_view rows at a time -- all across
# Simulation time: 6.5 seconds
gmse( observe_type = 3,
      agent_view   = 10,
      res_death_K  = 800,
      plotting     = TRUE
);

These four simulations A-D, which had identical populations models and similar observation modes, produced the four graphs below.

Figure above shows four different observation types as applied to the same population model: (A) observation type 0 (sample a random region and then extrapolate population size by calculating density), (B) Mark and recapture individuals and estimate population size using a Chapman estimator, (C) sample along a linear transect while resources can move while sampling, and (D) sample blocks where resources can move while sampling

Figure above shows four different observation types as applied to the same population model: (A) observation type 0 (sample a random region and then extrapolate population size by calculating density), (B) Mark and recapture individuals and estimate population size using a Chapman estimator, (C) sample along a linear transect while resources can move while sampling, and (D) sample blocks where resources can move while sampling

Overall, these simulations have been stable throughout testing, and I am (finally) merging the dev branch to master, pushing to GitHub, and declaring this v0.0.5.

Update: 15 JAN 2017

A couple updates that have been made, or need to be fixed. I’ll do these tomorrow, as they probably won’t require much more than a few hours in the morning

I’ve created a new temporary branch, fix_home_bug, after noticing a crash from my home laptop. It seems that I hadn’t initialised the added variable at zero in the res_add function of the resources.c file. At the office computer, it seemed to initialise it at zero automatically (or I’d not played with the right parameters to get it to crash), but at home, it was often getting initialised to very high values and crashing. I’ve fixed the issue on the new branch, but it needs to be merged.

NEW UNRESOLVED ISSUE #8: Clear up method sampling type in observation model The method sampling for case 0 is too confusing. Sometimes it means randomly sampled fix_mark individuals from the population, and sometimes it means sample within a particular range of view. Change this so that the switch functions have four clear cases:

  1. Sampling with a range of view (i.e., don’t rely on the fix_mark > 0 for switching methods)
  2. Linear transect
  3. Square transect
  4. Sampling fix_mark times randomly on the landscape.

This will avoid a lot of hassle, even if the code for cases 0 and 3 end up looking the same, or very similar. It’s just very confusing to manage as it is now.

ISSUE #6 STILL NEEDS RESOLVING I was working on this when I found the bug resolved on the new branch. It didn’t take too long, and it should be an easy fix while I take care of issue 8.

TIME ISSUES: While the simulations run quickly in the office computer, 100 time steps now take about 8 seconds for the loop on my Lenovo Thinkpad X201 – something to be aware of as the coding continues.

FOR TOMORROW: Make a summary that includes an example of all 4 types of observation models and their appropriate analyses (quickly fix the plotting to do the correct analyses automatically):

case 0 View-based sampling in which the density is sampled and applied to the whole size of the landscape (as in Nuno et al. 2013) case 1 Mark-recapture sampling where there is some fixed number marked at each time and estimates show Chapman style analysis case 2 Sampling along a linear transect as resources move, and case 3 Sampling using blocks as resources move.

Some updated code is on the fix_home_bug branch, which can be merged into the dev branch once it’s done and is stable after some testing in the office (i.e., try to crash it).

Update: 13 JAN 2017

Below shows a bit of additional coding, which resulted in two new ways (which is really just one flexible way) that observation can occur. There are couple trivial fixes and additions to make (see new issues 6 and 7), but these should be easy to implement. For now, it’s time to take a step back and plan a bit more generally, especially with respect to implementing the game-theoretic component of the modelling.

RESOLVED ISSUE #5: Sweep observation This issue has now been resolved. There are now two additional ways to observer populations, as guided by the method variable used in the main switch of the observational model. In biological terms, the observational model allows us to sample in the following two ways:

  1. By sampling view rows at a time, starting from the top of the landscape and working down to the bottom. Each time a new row is sampled, resources on the landscape can move (resource movement can also be turned off if desired). Hence, it is possible for observers to miss or double count resources. The bigger view is, the fewer iterations of sampling are needed to make it all the way across the landscape, hence fewer total times resources will move over the course of sampling.

  2. Identical to 1, but instead of sampling a full row and working down, observers start in the upper left corner of the landscape and sample around a view by view block, and hence a total of view^2 cells. Sampling proceeds with blocks across rows until sampling of the very right side of the landscape has occurred. After sampling all to the end of the right side, observers move down, sampling another row of view by view blocks just beneath the first. This continues until the entire landscape has been sampled, and roughly simulates an observer working their way through the whole landscape over time (time in which resources might move).

Note: The first case is redundant, and therefore will probably be removed later, but it helped as a scaffold for the more general procedure and takes up little space; for now, I’ll leave it

Testing on both of the above cases was successful (see the figure below). In each case, if resources are not allowed to move, then observers predict resource abundance with 100 percent accuracy (i.e., they sweep through the landscape and count all of the stationary resources). If resource can move, there is a bit of (normally distributed, it appears, and should be – can look later) error around the actual abundance. Either of these two methods of observation work fairly efficiently until view gets very low (ca 2), in which case a lot of sampling happens in each generation.

Figure above shows a population model (black line) and a new observation model (blue line). The observation model was simulated by having one observer record all of the resources on a 10 by 10 cell block (100 total blocks), each one at a time, and between observing blocks, resouces were able to move

Figure above shows a population model (black line) and a new observation model (blue line). The observation model was simulated by having one observer record all of the resources on a 10 by 10 cell block (100 total blocks), each one at a time, and between observing blocks, resouces were able to move

After each sampling, resources moved an average of ca 5 cells away, with a distribution as shown below (Figure below shows the distance that an individual moves in one time step – between successive iterations of observer sampling along a transect.

Figure shows the distance (in cells) travelled by a resource during one time step, between which observers sample. Movement is guided by a poisson function such that an individual moves a distance of dist a total of Poisson(dist) times in one time step. The figure above shows this for 1000 individuals.

Figure shows the distance (in cells) travelled by a resource during one time step, between which observers sample. Movement is guided by a poisson function such that an individual moves a distance of dist a total of Poisson(dist) times in one time step. The figure above shows this for 1000 individuals.

I did not code sampling using the initially considered method, with agents physically moved to locations and then looking around. Instead, resources are just considered counted if they are within the row or block under consideration. To account for multiple agents sampling, view is actually first multiplied by the number of agents sampling (only 1 for now). This makes sense for case 1, but for case 2, sampling ability actually increases with the square of agent number, so this will need to be changed (Adding a new issue).

INTRODUCE NEW ISSUE #6: Sampling ability with agent number

In case two of the observational model, the length and width of a sampling block will both increase linearly with the number of agents doing the sampling; hence, sampling area increases exponentially with the number of observers, which is probably unrealistic. There are two ways to potentially address this:

  1. Simply add another square for each agent observing, in the next place it would otherwise go.
  2. Probably more easy, have a countdown defined by += (int) agent_array[agent][8]. Only allow resources to move when this countdown hits zero, and reset it it thereafter. Hence, observers will observe more n more blocks if there are n more observers.
  3. Could adjust the squared dimensions appropriately to retain a square block, but this seems inefficient, imprecise (due to rounding), and unnecessary.

INTRODUCE NEW ISSUE #7: DENSITY TYPE SAMPLING

Of course, it will be easy to make this kind of transect sampling random instead of comprehensive over the landscape. This can be done by simply randomly choosing the positions of block on a landscape some obs_iter of times. This could allow an estimate of population size by considering density (i.e., assume that the number counted in a sampled block reflects the density of the larger landscape of known size), as was done by Nuno et al. (2013). This shouldn’t take much time to code and test.

Update: 11 JAN 2017

I’m going to start referring to issues that are introduced and resolved in the gmse GitHub repository by number.

RESOLVED ISSUE #4: Repeat calls of resource within resourc.R Now poorly named given the solution. The result is a brief update on the addition of a bit of a side function. The function anecdotal is now available in the observation.c file, and is called from the anecdotal() function in the file anecdotal.R. All this function does is cause agents of one or all types to count the number of a particular resource within the agents’ view. It is similar to the observation function, but instead of returning an array of observations of resources (augmented with columns for different observations periods – see 10 JAN) that is intended to be used by R separately, the anecdotal function adds the number of resources viewed in an agent’s vicinity to a column in the agent array. The name of the function therefore is meant to add to an agent’s general mood or impression of the quantity of a resource, based on anecdotal evidence for what’s going on around their location. We can imagine such anecdotal evidence as affecting the opinions and behaviours of stake-holders.

INTRODUCE NEW ISSUE #5: Sweep observation Related to discussions with Jeremy and Tom regarding the Islay geese, need to have a kind of observational model in which agents move to take measurements, but resources move along roughly the same time scale. This can of course be accomplished one way if we:

  • add an option for resource movement during the course of observation: Do this in the main observation function as an if(resource_movement == 1) type criteria at the tail end before the break (to avoid unnecessary movement). This will require also including the resource movement function (currently in resource.c) in the observation.c file. May as well just dump the whole thing in in the interest of modularity, though if it stays the same, it will be tempting to create a utils.c file of some sort. This resource movement option can be applied to the existing method case 0, as appears in the switch function of the observation function.

To do a sweep of the landscape while allowing resources to move, I think we’ll want a completely different method of population size estimation (most upstream switch function). What this method will do is:

  1. Start an observer(s) in a fixed x location of x = 0 on the landscape
  2. Have the observer(s) census all individuals on the x locations x to x+view (i.e., observe view rows)
  3. Set a new x = x+view
  4. Move resources
  5. Iterate 2-4 until x+view is greater than the y dimension land_y
  6. If there are any ys left, then iterate the last x to land_y.

The procedure above will simulate observations over a time that is proportional to their view (and thus ability to census) – the more time it takes, the more the resources can move and potentially lead to measurement error. The observational array returned will still be output in the same way – resources will be marked as with the case 0 option and read out as an observational array.

Note: It would be nice to eventually allow for blocks rather than long linear transects to be sampled, as square blocks might more realistically correspond to the kind of sampling that would be done by a real observer. I don’t think that this would make too much difference in terms of finding sampling error, as there is no bias to resources movement in one direction; hence, the turnover of resources for any particular number of cells will be the same for any N cells sampled. It also stands to reason that this error should be normally distributed as the number of sampling attempts becomes large, and the error should be mean centred around the actual population size, since the probability of missing and double counting would seem to cancel out exactly. This might eventually lead to analytical estimate of observation and error actually being reasonable under some conditions.

Plan for the near future

I will try to implement this new idea tomorrow, as I don’t think it will take much more than a day’s work, if that. Then, it’s really time to take a step back and think – need to read Nuno et al. (2013) in more detail first, perhaps tonight, and potentially also add the observation model procedure used therein as different implementations of case – this should be very similar to the solution for *ISSUE 5**, except through the use of random sampling of area and density measuring of resources. We’ll then be in a position of having a stable resource and observation model with a few different options for observation, and I’ll need to think more carefully about the big picture, and how to proceed with the rest of the model.

Update: 10 JAN 2017

We now have a working G-MSE v0.0.4, which includes a stable population model and a stable observation model. The figure below shows the visual output of the new version, with the landscape in the top panel (note: different tan colours don’t mean anything yet – the landscape is effectively uniform); resources (i.e., individuals in the population) are represented in black. In the bottom panel, the solid black line shows the actual change in (adult) population size over time, stabilising around a carrying capacity of 400 (red dotted line). The dark solid cyan line shows an estimate of the population size from the observation model, simulated through mark-recapture (other types of observation are available, see below). The shading around this line shows \(95%\) confidence interval estimates. More details about this specific estimate below.

Figure above shows population and observation models

Figure above shows population and observation models

I’ve made a few minor updates to the population model code, and included one new type of movement that is allowed – borrowed from individual-based modelling literature on plant-pollinator-exploiter interactions (Bronstein et al. 2003; Duthie and Falcy 2013). This type of movement makes use of an individual’s movement parameter move by having an individual move Poisson(move) times each time step, and with each movement travelling up to move cells away (Euclidean distance). This type of movement is case 0: in the mover function in resource.c.

This update includes the major addition of the observation.c file, called by observation.R to simulate the sampling of resources (i.e., individuals) from the population model. The file observation.R holds the observation() function, which returns a data frame of observed resources. The observation function thereby simulates the process of acquiring observational data, but not analysing those data. Analysis of these data is left to R, or to a (yet written) c function (note, current analyses are fairly simple).

The function observation.R requires the following three data frames:

  1. resources: holds all of the resources simulated.
  2. landscape: holds the landscape on which resources and agents are located.
  3. agent: holds all of the agents simulated (this also includes at least one manager of type 0 – even if the manager does not eventually participate in games).

The observation.R function also requires the paras vector, which holds all parameters that might be important throughout the simulation.

Optional inputs include:

  • type, which specifies the type of resource being observed (default = 1).
  • fix_mark, which either sets a fixed number of resources to be sampled during an observation (positive integer value) or sets an observer to ‘’observe’’ all resources in its view (0 or FALSE).
  • times, which sets how many times an observer will make observations during a time step (must be >0)
  • samp_age, which defines the minimum age at which resources are sampled (the default is set to 1, meaning that resources just added are not sampled – could conceptualise this as sampling only adults; for now, it also makes the initial testing easier because carrying capacity has not yet been applied to juveniles during before observation – can change this, of course.
  • agent_type, which identifies which agents are doing the observing. The default value is 0, which identifies the managers in the model. For most purposes, we will only need to have managers doing the observing, but there is definitely some utility in allowing other agents to do their own observing; more on this below.
  • model, which currently has to be “IBM”. Eventually it might be nice to allow observation.R to shunt observations to something not individual-based, such as Nilsen’s model, or another analytical equivalent, but not yet.

The file observation.R calls the function observation in the file observation.c. This c file follows the following general protocol:

The function observation is called, which does the following:

  • Reads the resource, landscape, and agent data arrays into c from R. It also reads in the parameter vector (which includes the optional inputs from the observation.R function).
  • Calls the function mark_res a total of times times – each time simulating a unique trip to do field work. mark_res is a general function for marking individuals. Other functions can eventually be called instead of, or in addition to, mark_res, but the function is already very flexible, so it’s hard to imagine what other function might be needed – mark_res is currently the default and only function called. Details on the function are below.
  • builds a new array of observations obs_array. This array includes a row for every resource observed and all of the columns that also exist in the resource array (e.g., identifying resource location, identity number, types, life-history parameter values, etc.). Additionally, the observational array also includes a column for each times – the number of times that observations are made. These columns hold values of 0 or 1, which indicate whether (1) or not (0) a resource was observed during a particular observation (can think of times as outings in the field, each producing a column of whether a resource was spotted/marked/recaptured or not).
  • Reads the obs_array into a format that can be returned to R

The function mark_res is called by observation, and does the following:

  • Identifies each observer in the agent array. Agents of a specified type (usually type 0) act as observers and thereby perform the observational tasks. By default, we assume that there is one agent of type 0 that acts as a lone manager (or, perhaps, a very cohesive team) who does all of the work, but if we have more type 0 agents, then each will do the same amount. For each type 0 agent, two functions are called in succession (recall that mark_res might be called multiple times in success by observation):
  • field_work causes the agent to go out and do some observational field work.
  • a_mover causes the agent to move according to some specified rules, as stored in the parameter vector and agent array. The default is simple uniform movement some Euclidean distance away after doing field work – setting up for field work in a different location. The code is almost identical to the code that moves reources in resource.c, so I’ll not explain this here.

The function field_work simulates the process of an agent looking for and tagging resources in some way (this can later be interpreted as viewing, tagging, marking, recapturing, etc.). There are currently two different tagging procedures possible (with the option to build more):

  1. Tag all resources within some Euclidean distance of the observer. The distance is determined by a parameter in the agent data frame. Resources within this distance are found using the binos function (simulating, e.g., binoculars).
  2. Randomly tag fix_mark resources on the landscape (note: which resources is not a function of space)

After the observation function is run, we thereby have an obervational data frame in which rows are individual resources, and columns include traits of those resources (same as in the resource data frame) and whether or not the resource was observed during a particular simulated outing. Through a combination of specifications for times and fix_mark options, observational data frame can then be interpreted in multiple ways and used in a simulated analysis:

There are multiple ways to interpret the observation results. Examples of this are as follows (for now, I’m assuming that there is one observer, but we can substitute the below with any number of observers):

  • Have the observer tag every resource within their range of vision some number of times; take the average of number of resources tagged per time as an estimate of population density.
  • Have the observer tag every resource within their range of vision some fixed number of times, but then interpret some of those times as marks and others as recaptures. Uneven times for marking and recapturing could be interpreted as different investment in each procedure (e.g., go out and mark at 3 different locations, then recapture at 9 locations). Unique marked and recaptured individuals can be summed to estimate population size using capture-mark-recapture techniques. Currently, there is some code in R simulating a Lincoln-Petersen estimator of mark-recapture with a Chapman correction for small sample size (see Pollock et al. 1990). It would be useful to add some other estimatores (e.g., Bayesian).
  • Have the observer sample a fixed number of resources on the landscape some number of times (not spatial – resources are just randomly taken). Interpret one or more of these samples as markings, and one or more as recaptures. Then, use these data to estimate population size using some technique such as mark-recapture estimation. This is the technique used in the figure shown above.
  • Note, because observation arrays are stored by R, population size estimation can span multiple time steps (e.g., mark one year, recapture the next – though some individuals might die in the intervening period)

Details of the technique used to produce the above figure include the following:

  • One type 0 agent exists to do all of the observing.
  • A fixed sample of 20 resources are marked (if fewer than 20 sampled resources exist, then all resources are sampled – this never happened though) at each time field work is done.
  • Field work is done 12 times in a time step, perhaps simulating 12 outings over a short time period within a calendar year.
  • Three of these outings are interpreted as periods of marking, where resources are tagged.
  • Nine of these outings are interpreted as periods of recapturing, where resources are caught and recorded.
  • Unique resources tagged by the agent in the first three and last nine outings are interpreted as unique marks and recaptures, respectively.
  • After all time steps are simulated, a function written in gmse.R figures out what the estimate of the population size would be for each time step. The analysis uses a very simple chapman_est function that I wrote in R. This function, or something like it, might be later incorporated as part of the observation model itself (likely by having observation.R call a different c file or R function), or in the manager model, or somewhere inbetween. I haven’t decided.

For now, it’s time to take another step back and take stock of what needs to be done next. A manager model and user model will need to start looking at multiple resources for making decisions, and somehow both potentially feed into a game-theoretic model. The complexity involved with the integration of management, games, and user actions should be a bit mitigated by all of these eventual functions revolving mostly around the agent array, with some input from the observation array. Of course, at least one type of agent will need access to the observational data as input (perhaps only to ignore it, sometimes), and users will need access to the resource array for off-take and other things. Some careful planning is needed for what happens next. I am particularly becoming aware that the flexibility of this model, while definitely a good thing, has the potential to tempt me into creating a lot of end user options that no one will actually want. It might be a good idea to develop a list at some point separating key options that we definitely want to be visible to all end users from more obscure options that are available to us by editing the central gmse.R script. It’s also likely that a model of this scope will require a well written R function that translates different combinations of user-friendly inputs into an R list, which can then be interpreted by the script that calls resource.R, observation.R, manager.R, game.R, and user.R, and which places inputs into the vector para appropriately.

It’s worth noting that the flexibility of the observation function might be used to address social questions that interest us. I’ve been mainly conceptualising the observation model as something done by a disinterested third party – a manager rather than a stake-holder per se. The manager would make some decision that then affected payoffs in a game among stake-holders. We can do this of course, but we can also allow the stake-holders themselves to observe, perhaps less thoroughly and with more potential for bias (as we assume that they have less time and expertise). For example, we might imagine some stake-holders to estimate population size or change over time for themselves by observing all of the resources within a short distance around their location – perhaps (incorrectly) biased by large population changes (e.g., way more geese around my location this year than last – estimate a lot of total geese this year overall). These observations could feed into the game and user models.

Also – and this might require some tweaking – the flexibility of the type columns (type1, type2, type3) means that observing can be flexible too. We could allow each individual to observe, or groups of individuals of the same type to observe. NEW: We can also specify the type of individuals doing the observing by any category, including individual ID. This means that we can tell a specific agents (assuming they are represented by rows) to observe, or loop through the function with specific agents. The agent’s type (or ID) is stored in the observation output, indicating which agent did the observing if data frames get amalgamated from looping the observation function.

Update: 22 DEC 2016

As a quick update, I now have a working population model for G-MSE, and have reached the point where it will probably be better for me to take a step back and plan a bit, then work on other aspects of the full model rather than add more bells and whistles to the population sub-component. The development that I have done includes five files (happy to send these for the curious):

  1. gmse.R – A master file that I’m currently using to call everything else

  2. landscape.R – A file that constructs an \(m \times n\) landscape (in the code, this is a simple 2D array, the elements of which can contain any real number). Currently, there is an option to make this landscape any size and randomly place any number of ‘resources’ onto it, if desired. In the past, I have used some code to produce autocorrelation of values on the landscape; if it suits us, I can rewrite this code (to improve the readability) for application to G-MSE. I also think it would be useful to have the option of reading in an image (i.e., a map) and converting it to an array to be used as the landscape (e.g., JPG, BMP, etc.) – I suspect some stakeholders might find this especially useful, as it might help them see the applicability more clearly. Also, I’ve left hooks in the R file to allow eventual development of a non-spatial model.

  3. initialise.R – A file that generates a single ‘RESOURCE’ array, which will hold everything that might be of value to stakeholders; this includes, most obviously, individuals in populations of conservation interest, but can also be used to respresent things like hunting licenses or crop plots. The idea is to have a data structure that provides maximum flexibility – individuals can be represented as rows (or sets of rows) within the array, and their types and attributes can be indexed by column:

##        IDs type_1 type_2 x_loc y_loc move time remov_pr growth offspr age
## res_1    1      1      0    10    10    2    0      0.1    1.1      0   0
## res_2    2      2      0     6    14    2    0      0.1    1.1      0   0
## res_3    3      2      0    20    18    2    0      0.1    1.1      0   0
## res_4    4      1      0    20    15    2    0      0.1    1.1      0   0
## res_5    5      1      0    12    11    2    0      0.1    1.1      0   0
## res_6    6      1      0     1     1    2    0      0.1    1.1      0   0
## res_7    7      2      0    18    14    2    0      0.1    1.1      0   0
## res_8    8      2      0    20    17    2    0      0.1    1.1      0   0
## res_9    9      2      0     4    17    2    0      0.1    1.1      0   0
## res_10  10      2      0     4    16    2    0      0.1    1.1      0   0
  • Note that the first column is a unique index for the discrete resource – it tags it over time and the age of the resource. The type columns (cols 2 and 3) can respresented anything; perhaps most usefully different types of resources that stake-holders might have interest in (e.g., harriers versus grouse, geese versus crop biomass, reindeer versus ticket sales), but also sub-types (e.g., individual sex) and even individuals at a different scale. For example, we might loosely define individuals as being represented by the index of type-1 instead of rows explicitly. In doing so, we can interpret the above table as having two individuals (1 and 2) with perhaps a shared presence at 7 (individual 1) and 3 (individual 2) different locations (cols x-loc and y-loc); hence the scale of individuals can be finer than array rows. Similarly, we could represent multiple individuals in a single row by having each row represent a group of individuals of type-1 or type-2, with the quantity of individuals being represented in a column to be defined later. The key point here is that this structure of coding and abstract definition of ‘resource’ will maximise flexibility over how individuals are represented and modelled; a key challenge will be knowing when to use what kind of structure. Note also that these columns are not (yet) set in stone, and we can add more as need be (I’m already tempted to add a third abstract ‘type’, though I’m not sure if it would ever be needed).
  1. resource.R – This file has only one real job, and that is to read in the RESOURCE array, LANDSCAPE array, PARAMETER vector, and MODEL TYPE (currently only individual-based model, “IBM”), and then call the appropriate resource model. this intermediary R file allows us to be flexible in re-routing the whole G-MSE to different population models, if need be. We could even mix and match the extent to which components use simple equation-based modelling (e.g., as in Nilsen’s MSE), and which use the more computationally-intense agent-based simulation (though I really don’t think computation time will be much of an issue, even with the agent-based model). Currently, all this R file is doing is calling the C code and the file resource.c – or, more accurately, it is calling the compiled file resource.so, which allows R to link to C.

  2. resource.c – This is the file that does all of the heavy lifting in terms of simulating resources on a landscape; it is written in C to make the computation run (much) more quickly (probably by two orders of magntiude). The file includes several C functions, one of which links them all by running the resource() function, which reads in the RESOURCE and LANDSCAPE arrays, and a PARAMATER vector (containing any key parameter values) from R, and returns a new RESOURCE array (hence, landscape and parameter values are unchanged). A rough outline of what this key function does is as follows:
    • Reads and edits all of the key input into a form that C can store and use
    • Calls function add_time, which writes a time step and adds an age to all rows (see table above)
    • Calls function mover to move individuals some Euclidean distance according to a parameter (see above) and movement rules (currently: uniform probability of cell distances, Poisson probability of distances). This program also uses a parameter to determine what happens at the edge of the landscape – currently, either nothing happens (i.e., individuals are just ‘out of view’) or the landscape wraps around as a torus (i.e., if you leave on the left side, you come back on the right).
    • Calls the functions res_add and res_place to simulate the addition of new resources (e.g., birth of individuals) and place them in a new array, respectively. Currently, old rows (e.g., individuals) directly create new rows according to a growth parameter (see table above), simulating birth, but this can be changed. A carrying capacity can also be applied to addition of new rows. New rows are also identical to their ‘parent’ rows in everything except ID and age, but this can also be changed.
    • Calls the function remove to remove some of the old rows from the input array – currently removal of rows occurs with some fixed probability (remov_pr, see table above), or probabilistically based on a set carrying capacity.
    • Combines the rows of the original RESOURCE array that were not removed with the newly created resources to make one single array (might want to make this its own function later, for readability).
    • Reads and edits all of the key output back into a form that can be recognised by R as a data frame.
    • Note: There is plenty of room for expanding this population model, and adding components such as immigration and emigration, interaction of resources, more complex movement, spatial heterogeneity of birth and death, sexual reproduction, disturbance, etc. This is just what I consider to be a minimal individual-based model useful for simulating a population. The code appears to be stable, though a bit more error checking would be useful, and some warnings need to be added to the code – also, as of now, it is possible to have divergent growth of population size, maxing out the computer’s memory and causing the program to crash. Some safeguard against this needs to be written in.

A small script can help us see the output of what’s going on in the population, both in terms of individual movement and change in population abundance over time. The run time of the below population is negligible – all of the data underlying the 100 time steps shown in the figure below is produced in a tenth of a second (4 JAN Update: Assuming instead a carrying capacity of 40000, closer to the ball-park of the Islay geese, 100 time steps takes 11 seconds). The upper panel of the figure below shows a landscape (light and dark brown – these colours don’t mean anything at the moment, but could represent different landscape properties) with individuals (black) that move around, reproduce, and die in each time step. The lower panel shows the abundance of these individuals as they increase to carrying capacity (red dashed line), whereafter the population size remains stable (of course, simulating a bigger population takes a bit more time – it takes about nine tenths of a second to simulate 100 time steps at a carrying capacity of 4000).

Image an example run of the population model

Image an example run of the population model

Towards a Game-theoretic Management Strategy Evaluation (G-MSE):

I would like to develop one general, efficient, open-source, and user- and developer-friendly program for G-MSE that would be a general tool for applying game theory and management strategy evaluation to specific problems of conflict among stake-holders. I’m somewhat flexible on the development, but my preference would be to have software that is:

General model development

MAJOR POINTS: Some major points fleshed out given the thinking below:

  • The G-MSE model will focus on the dynamics of 2+ different objects
  • There will be 2+ stake-holders that each have an interest in the quantity of one or more objects
  • The 2+ different objects modelled will have some effect on one another
  • Effects of objects on one another will cause conflict (or cooperation) between stake-holders

Question: The objects (i.e., populations, resources, commodities) will often be represented as discrete entities (individual animals in populations, but also things like licenses sold and crop patches saved or raided – which could have individual locations). Should the stake-holders also be modelled as (potentially multiple) discrete entities? This is easy to see if, e.g., stake-holders are potential hunters that do or do not buy licenses and engage in hunting, but maybe conservationists could also be considered as discrete – each individually affecting the decision of an organisation in a game.

Given the question above: Stake-holders could then also be represented by a data frame, which could generalise the model to allow many individual stake-holders to play a game (or not, if data frame is single row, or scalar). This could then more naturally incorporate mixed strategies (some will take one strategy, some another) and uncertainty. In the case that it is some sort of organisation making a decision, this would allow the individual stake-holders to collectively affect a single action or policy. This would appear to drift more into the realm of agent-based computational economics, which might be a good thing given the goals of ConFooBio. This could allow for maximum flexibility too, if agents could also be discrete individuals making decisions.

Should the model therefore be focused on at least four data frames modelling individuals? At least two modelling individual species or resources of interest (and at least one being a population of conservation interset), and at least two modelling modelling individuals with interests in the former?

I think that the agent-based model is really going to be the default one to use, with other models being useful only if the end user is really tied to them in some way. In general, to find emergent phenomena and predict dynamics and decisions accurately, I think it will be useful to keep in mind the maxim of keeping situation rules simple while allowing agents to be complex (Volker Grimm said something like this in one of his talks or publications, and given the ConFooBio focus, I think it’s especially applicable).

Before getting into specifics, it will be useful to walk through the G-MSE model conceptually to figure out what kinds of approaches are going to be most useful for the following:

  1. Manager model
  2. User model
  3. Natural resources model
  4. Observation model
  5. Game-theoretic model

Each of these needs a general framework that will be most usefully applied to real-world problems of conflict. Ideally, these models will be modular – i.e., not depend on the type of modelling being done in other areas of G-MSE. That way, we might, e.g., decide to substitute an entirely different kind of natural resources model (e.g., simple numerical Lotka-Volterra versus spatially explicit individual-based model), but still be able to generate input/output in each component to be used by the next.

Nevertheless, there needs to be some conceptual framework that is consistent, in addition to the five above modules. I’ve written down some of these ideas, deliberately avoiding Nilsen’s MSEtools repository for now. Some potential things that are common to G-MSE:

  1. Population dynamics – every case study, and indeed, every concievable application of G-MSE will include at least two things that are dynamic. In all of the case studies, at least one of these things is a population; sometimes the second is as well (e.g., hen harriers and grouse), but sometimes the second thing is a resource or commodity (e.g., crop biomass harvest, fishing licenses sold, etc.). I think that it is possible to generalise the model by tracking two things (loosely defined) over time with some sort of flexible structure (data frame, matrix, scalar). Different stake-holders will have an interest in one or both of these things. Change in these things over time might be dependent on previous time steps (as is expected for populations), or they might not be (as might be the case for fishing licenses sold).

The model is therefore going to need to generally hold two or more variables or objects that represent populations or resources (including biomass) that can both be affected by any of the sub-components (note: even something like fishing licenses sold can be oberved, perhaps with trivial error – we can therefore apply the same process of MSE to both populations and the things with which they are in conflict).

In any case, there will be a need to model how properties of the population change from one time step to the next. Properties of interest for populations might include:

  • Population size
  • Population structure (age or stage classes)
  • Individual attributes (location, phenotype, etc.)

It would seem as though properties for conflicting resources would be more likely to boil down to one number (e.g., crop yield, licenses sold), but maybe not. We could, for example, assign a location to farms and licenses, or units of biomass in some way.

I think an individual-based model that represents individuals and resources with a table is probably the best way to go in most cases. We can perhaps broaden this out so that the observation model will recognise a table (IBM), a vector (classes), or a number (just size), with some indication of the type of data being returned, but most of the time a full table will be the way to go (in fact, we could probably just make everything a data frame, and have \(1 \times 1\) data frames be interpreted as scalar, and \(1 \times n\) data frames be interpreted as a vector). The information about the population will represent all of the relevant information about the natural population being modelled, so it can pass all of this information onto the observation model, which can then run some function to search through it and extract parameters of interest (with error, potentially). Within this model, we’ll want functions to model birth, death, immigration, and emigration.

  1. Observation – Every application of G-MSE will include some type of sampling from the population dynamics model to extract statistics that are relevant to management. How exactly this sampling is done (and how it is modelled, perhaps, given diffent inputs from the population dynamics function) will vary with different techniques, but perhaps not so much. All observation samples a subset of the population (or some metric correlated with abundance – e.g., dung or nest sites, which would be easy to represent). Therefore, there needs to be some sort of process for estimating the key parameters of interest (aforementioned population size, structure, attributes, etc.) from the complete table being inputted from the population dynamics part of the model. This could be something as simple as sampling from the full table:
  • With or without replacement
  • With bias to particular attributes
  • With error (false detection), including error in attributes (age, sex, etc.)

For scalar or vector inputs, observation error could be more directly simulated – just with a parameter for bias and error (e.g., around population size, or sizes of each age or stage class).

Alternatively, a different, more general way of doing it might be to instead simulate some length of time \(t_{obs}\) for modelling the process of observation. Then each time step could include a probability of observing an individual. This might be even better because I think it would be more generalisable. In the case of the IBM, individuals could be observed following a Poisson process at each time step that:

  • Could recognise the same individual, or not
  • Could be biased to particular attributes (including location)
  • Include false detection probability at each time step

The benefit here is that a scalar or vector could be modelled in the same way, just by sampling from a Poisson distribution to find observation number at each time step of some number of individuals (potentially of different ages or classes).

  1. Management – Here’s where things get a bit more tricky, potentially. The management model will receive whatever the observation model produces, namely, two data frames representing the dynamic things of interest, typically:
  • Population with individuals conservation interest, where an IBM is being used (perhaps simulated with time stamps, locations, and individual attributes – error in observations already produced), which will be most of the time.
  • Resource of interest, which might be interpreted by a manager, or not (if we’re simulating a manager that is not concerned about the resource in question).

It will then spit out something that will affect both the game that agents play and therefore actions of users.

One job of the management model will be to calculate statistics associated with the uncertainty surrounding these observations (e.g., confidence intervals), which will affect management decisions that are simulated.

TODO: Need to figure out how management decions are going to be implemented. These deisions will feed directly into the game model, and possibly the user model.

  1. Game model

This part is especially tricky. Need some common framework to convert the dynamic things (resource, population) into a utility function, then into a payoff matrix (or perhaps something even more general). Questions that need addressing before building the model:

  • Do we want the game to model more than two (types of) players?
  • Do we want the game to model more than two strategies per player?
  • Do we want the history of player actions to affect player strategies (i.e., extensive-form game versus a normal game)? If so, how complex are we willing to allow strategies to be?
  • How rational are we expecting individuals to be (do we want to solve Nash equilibria by default, or base behaviour on something else)?

We also want to include uncertainty in the games.

General software development

The general structure of the program itself, I think, could fit into Figure 1 of (Bunnefeld et al. 2011) (TREE paper), with a game-theoretic component added into the management model and harvester operating. Would game-theory among agents then be applied to the harvesters who are making decisions? A basic computational model would then proceed something like as follows:


Master file: gmse.R [also create standalone gmse.c with int main(void)]

  • Input all relevant variables, data.
  • Run seven functions; 2-6 forming an inner loop:
  1. initialise (initialise.c; start individuals, landscapes, etc)
  2. -> resource (resource.c; resource model, might burn in for a while first)
  3. -> observe (observe.c; observation model)
  4. -> manager (manager.c; management model)
  5. -> game (game.c; game theory applied, games played)
  6. -> user (user.c; actions take based game)
  7. summary (summary.R; extract and present information from data frames)
  • Exit program

initialise.R: code within R to organise key data frames

  • Switch(model_type):
  • case(agent_based):
    • Generate array STAKEHOLDER_1 (Stake-holders can be discrete)
    • Generate array STAKEHODLER_2 (rows = individuals; cols = attributes)
    • Generate array RESOURCE_1 (note: resources can be populations)
    • Generate array RESOURCE_2 (rows = individuals; cols = attribuets)
    • Generate matrix LANDSCAPE (start with an \(m \times m\) matrix)
  • case(matrix):
    • Generate matrices as appropriate
  • case(scalar):
    • Create variables as appropriate

resources.c: sub-functions affect dynamics of resources

  • Read RESOURCE_1, RESOURCE_2, and LANDSCAPE
  • Switch(model_type):
  • case(agent_based):
  • move(double RESOURCE): move individuals or resources on LANDSCAPE
  • reproduce(double RESOURCE): New resources added based on some rules
  • die(double RESURCE): Resources removed based on some rules
  • immigrate(double RESOURCE): resources added by different rules (later?)
  • emgirate(double RESOURCE): resources removed by different rules (later?)
  • interact(double RESOURCE_1, double RESOURCE_2): Resources interact
  • case(matrix):
  • To be developed later
  • case(scalar):
  • To be deveoped later
  • End with modified RESOURCE_1, RESOURCE_2, and LANDSCAPE

observe.c: sub functions affecting simulated data collection

manager.c: sub functions affecting management decision model

game.c: sub functions affecting game played based on management decisions

user.c: sub functions affecting implementation of users given game.c

summary.R: Summarise information and plot (also create C standalone)

Note: The c standalone will also need the file gmse_util.c, for all of the other components (e.g., random number generation) which would normally be done in R. In R, these components can be incorporated with the appropriate R.h and Rmath.h header files.

Note: The RESOURCE_2 will have to be optional, because in some scenarios, two stake-holders might simply be in conflict over the use of one resource.


Note that Erlend Nilsen has constructed the basic MSE framework in R already, and I’ve forked his repository on GitHub as a potential starting point. I’ve also starred a repository for calling C from R, as I think that this will be necessary. I’d like a standalone version of the model in C, but the focus should probably first to be writing the intense code in C while immediately making it called from R – cloning and making a C standalone can come later (maybe avoid using too much of Rmath.h so that a C standalone is easier).

This would allow a harvester operating module or function to fit within the broad simulation or program, G-MSE.

The spatial aspect of some of the key cases studies (e.g., Nellemann et al. 2000), and the importance of space more broadly in ecological processes, suggests to me that the G-MSE program will need to ahve a spatial component – landscapes need to be a part of it, perhaps?

Overall, based on the ERC proposal and Bunnefeld et al. (2011), the model will function something like the below (subject to change):

As long as not too many generations are run (e.g., not too much more than 100), I am cautiously optimistic that this program will be able to include an individual-based model of a focal population, and all of the other game-theoretic components, and not take more than a few minutes to run and produce simulated results (obviously less if it is called directly from C, but I’m shooting for this calling from shiny in a browser). For end users, dynamic graph production can make the wait time a bit more interesting, if it’s possible. For us, the time it will take for me to call in c, especially if using a the cluster, will be trivial.

For the natural resources model, it might be nice to have an option of burning in several time steps before starting the loop (if, e.g., no empirical data are available, and the model instead relies on parameters plugged into a Lotka-Volterra or Ricker model). Or, if data are available, long-term demographic data could be used and assumed to represent the true population dynamics (i.e., just use these data to simulate N individuals) before starting the G-MSE model loop. It is worth thinking about how much population structure we might want to add – my inclination is to make the software as flexible as possible (e.g., allow sex, age, etc., to be attributes of discrete individuals), but this will depend on other aspects of the model.

Game-theory modelling (game.c; green box above)

In the interest of making this model as general as possible, I believe that we’ll eventually want to use an extensive-form game to allow for the sequence of moves to affect stake-holder actions. Nevertheless, just to get the basic framework underway, I think we can start out with a normal-form game, with the intent of generalising the model later (the code will be modular enough to allow this). Generalisation should be easy if we have a separate function to keep track of the game tree, and then allow agents to access the game tree (or parts of it, in the case of incomplete information) to make decisions about how to act. An extensive-form game package exists in R, published by Kenkel and Signorino (2014) with code available on GitHub, but the focus of this package is for ‘estimating recursive, sequential games, and not simultaneous move games or dynamic games with infinite time horizons’. Since the quoted probably describes the kinds of games that ConFooBio is interested in, I think the games package will be a useful reference, but not something to directly apply. It incorporates uncertainty, which could be something useful to return to for further reference.

A couple other (Java based) examples of games are available on GitHub, such as GTE, which has a GUI web application and a corresponding published paper (Savani and Stengel 2014). This model leads me to think that it’s probably best to give each player two matrices:

  1. A payoff matrix representing different player actions and payoffs
  • payoffs are defined by utility functions for each player.
  • These could be represented by a three dimensional array, or a list (list would allow the do.call function to be used – probably easier to deal with in R).
  1. A history matrix showing the results of previous interactions; note that giving each player their own history matrix will allow players to have incomplete (or perhaps incorrect) information about the game.

Another java extensive-form games package exists, though it seems like less useful for ConFooBio purposes.

Some notation to try out: For the purpose of the below, to keep things simple, I’m going to just start with payoff matrices, and assume that history of interactions is not yet used in decisions.

  • Denote \(U^{m}_{k}\) as the utility to agent \(m\) (\(m \in \{1, ..., M\}\)) from the outcome \(k\).
  • The probability that \(m\) chooses an action \(X_{i}\) from all possible options \(i \in I\) can be represented as \(p^{m}_{i}\).
  • The action for \(m\) can be represented \(X^{m}_{i}\), so \(E[X^{m}_{i}] = \int_{I} x^{m}_{i} f(x^{m}_{i})dx\). Or something like it.
  • Hence we can write \(U_{m,k}(X^{m}_{i})\) for all \(m\), which defines the payoff, as affected by the actions of \(m\) and other players.

To further simplify, I am going to assume that there are only two players. The general payoff matrices can be represented as below (loosely following the notation of Débarre et al. (2014)):

\[ {\bf A^{1}} = \left( \begin{array}{cc} U^{1}_{a} & U^{1}_{b} \\ U^{1}_{c} & U^{1}_{d} \end{array} \right), {\bf A^{2}} = \left( \begin{array}{cc} U^{2}_{a} & U^{2}_{b} \\ U^{2}_{c} & U^{2}_{d} \end{array} \right). \]

In the above \(a\), \(b\), \(c\), and \(d\) are all different possible outcomes that depend upon the decisions of players 1 and 2. We can think about these in terms of the actions \(X^{1}_{i}\) and \(X^{2}_{i}\), and put these into the familiar payoff table below,

Player 2
Player 1 Strategy 3 Strategy 4
Strategy 1 \(a \to \{U^{1}, U^{2}\}\) \(b \to \{U^{1}, U^{2}\}\)
Strategy 2 \(c \to \{U^{1}, U^{2}\}\) \(d \to \{U^{1}, U^{2}\}\)

For doing the maths though, individual matrices will be used. Note that to keep things general, the above strategies are unique to each player. I think that this will be relevant to ConFooBio because each actor will have a unique role. Hence, a vector \(I\) can represent all possible options for action, with players (normally) only having access to a subset \(i \in I\), though we might conceive of some players being able to do the same thing despite having different roles.

Making payoff matrices a list with \(M\) elements of vectors is probably the best way to go in R, with \(M=2\) players for most of what we’ll do. Each player \(m\) will have its own options for acting within the list M[m].

M     <- 2; # Number of players in the game
S     <- list(); # Strategy vectors (elements all possible strategies)
A     <- list(); # Payoff vectors (elements all possible strategy combinations)

For now, let’s just assume that each player has two possible strategies, and we’ll just use the traditional matrix to calculate Nash equilibria; for future reference, Avis et al. (2009) might be useful for quick calculation of Nash equilibria for two player games. Continuing with the above, here’s a basic setup computing the Prisoner’s dilemma:

S[[1]] <- c("C","D"); # Cooperate or defect strategies (change to numeric?);
S[[2]] <- c("C","D");
A[[1]] <- c(3,0,5,1); # Payoffs for player 1
A[[2]] <- c(3,5,0,1); # Payoffs for player 2
A1     <- matrix(data=A[[1]], nrow=length(S[[1]]), byrow=FALSE); 
A2     <- matrix(data=A[[2]], nrow=length(S[[2]]), byrow=FALSE); 
print(A1); # Note the traditional Prisoner's dilemma payoff structure
##      [,1] [,2]
## [1,]    3    5
## [2,]    0    1
print(A2);
##      [,1] [,2]
## [1,]    3    0
## [2,]    5    1

Now check to see if the best possible response for each player is the same regardless of its opponent’s strategy.

best1 <- apply(A1,1,which.max); # Best strategies for Player 1
best2 <- apply(A2,2,which.max); # Best strategies for Player 2
tabl1 <- tabulate(best1); # Frequency of bests
tabl2 <- tabulate(best2); 
str1  <- tabl1 / sum(tabl1); # Frequency of each strategy
str2  <- tabl2 / sum(tabl2);
summ1 <- matrix(data=str1,nrow=1); # Summary vector of strategies
summ2 <- matrix(data=str2,nrow=1);
colnames(summ1) <- S[[1]];
colnames(summ2) <- S[[2]];
rownames(summ1) <- "Proportion";
rownames(summ2) <- "Proportion";
print(summ1); print(summ2);
##            C D
## Proportion 0 1
##            C D
## Proportion 0 1

One goal will be to develop a function that can return optimal strategies for each player, including mixed strategies, for any given \(2 \times 2\) payoff matrix. The function below does not do this; it needs to be fixed. A starting point for looking at appropriate algorithms is Avis et al. (2009), who come up with an efficient solution.

Before investing too much time in this, let’s make sure that finding equilibrium solutions make sense in the context of games with uncertainty. We might need a different approach, e.g., if the payoffs themselves are uncertain and the optimal strategies are reflected in this uncertainty

One package in R can solve Nash equilibria, though the documentation for it is not excellent. There’s also a repository that can do it in C, but that might take more time than it is worth – the paper underlying it is Miltersen and Sørensen (2009). A benefit here is that it uses extensive-form games and computes quasi-perfect equilibria, which are specifically equilibria that assumes that a player’s opponent is not perfect, and accounts for past mistakes.

## XXX FIXIT: There is an error in calculating what each should play -- it is tabulating the frequency of best plays, but when mixed strategies occur, it returns a 1/2, 1/2 instead of the proportion based on the value. 
solve.nash <- function(){ #Function to be made to solve Nash equilibrium
   return(NULL);
}


game <- function(payoff1, payoff2){
    if(length(payoff1) != length(payoff2)){
      print("WARNING: Payoff vectors must be the same length");   
      return(NULL);
    }
    if(min(payoff1) < 0){
      payoff1 <- payoff1 + min(payoff1);   
    }
    if(min(payoff2) < 0){
      payoff2 <- payoff2 + min(payoff2);   
    }    
    if(is.matrix(payoff1)==FALSE){
      payoff1 <- matrix(data=payoff1, nrow=2, byrow=TRUE);   
    }
    if(is.matrix(payoff2)==FALSE){
      payoff2 <- matrix(data=payoff2, nrow=2, byrow=TRUE);   
    }
    S      <- list(); 
    S[[1]] <- c("Strategy_1","Strategy_2"); 
    S[[2]] <- c("Strategy_3","Strategy_4");
    best1  <- apply(payoff1,1,which.max); # Best strategies for Player 1
    best2  <- apply(payoff2,2,which.max); # Best strategies for Player 2
    tabl1  <- tabulate(best1); # Frequency of bests
    tabl2  <- tabulate(best2);
    expe1  <- apply(payoff1,2,sum) * tabl1;
    expe2  <- apply(payoff2,1,sum) * tabl2;
    str1   <- expe1 / sum(expe1); # Frequency of each strategy
    str2   <- expe2 / sum(expe2);
    summ1  <- matrix(data=str1,nrow=1); # Summary vector of strategies
    summ2  <- matrix(data=str2,nrow=1);
    colnames(summ1) <- S[[1]];
    colnames(summ2) <- S[[2]];
    rownames(summ1) <- "Proportion";
    rownames(summ2) <- "Proportion";
    strategy_pr     <- list(player1=summ1,player2=summ2);
    return(strategy_pr);
}

We can now use the function above to figure out and return strategies for any given payoff vectors from \(a\), \(b\), \(c\), and \(d\) for each player (1 and 2).

u <- shinyUI(pageWithSidebar(

  headerPanel(""),
  sidebarPanel(
    textInput('vec1', 'Player 1: a, b, c, d', "3, 5, 0, 1"),
    textInput('vec2', 'Player 2: a, b, c, d', "3, 0, 5, 1")
  ),

  mainPanel(
    h4('Proportion strategy is optimally played: (DOES NOT WORK YET)'),
    verbatimTextOutput("oid1")
  )
))

s <- shinyServer(function(input, output) {

  output$oid1<-renderPrint({
    p1  <- as.numeric(unlist(strsplit(input$vec1,",")))
    p2  <- as.numeric(unlist(strsplit(input$vec2,",")))
    pay <- game(payoff1=p1, payoff2=p2)
    o1  <- as.numeric(pay$player1)
    o2  <- as.numeric(pay$player2)
    cat("Player 1 (Strategy 1, 2):\n")
    print(o1)
    cat("\n\n")
    cat("Player 2 (Strategy 3, 4):\n")
    print(o2)
  }
  )


}
)
#shinyApp(ui = u, server = s)

Game theory and modelling

Agent 2
Agent 1 Strategy 1 Strategy 2
Strategy 1 A1 pay, A2 pay A1 pay, A2 pay
Strategy 2 A1 pay, A2 pay A1 pay, A2 pay

Notes regarding Nilsen’s MSE

The following recreates Nilsen’s MSE modelling work.

  • In the population models, the harvest occurs before population grows in a given time step.
  • Four population models are included, all numerical models with no population structure and identical parameter inputs of initial population size, environmental stochasticity, harvest amount (raw number), carrying capacity, and max growth rate.
  • The observation model is the simplest possible model, taking in the population model data and returning an estimate with some error associated with the coefficient of variation of monitoring around the real abundance/density and some degree of bias.
  • The manager model (called the Harvest Decision Model) includes three types of management decisions:
  1. Proportional harvest
  2. Constant quota
  3. Threshold harvest
  • The manager model receives the single estimate of population size (density or abundance), then returns a total allowable catch. A second function models hunter frustration, and is meant to be run after the first function. The second function checks to see if hunter frustration is within a set of bounds; if it is, then the function returns the original total allowable catch. If it is not, then the function adjust the total allowable catch.

  • The user model (called the implementation model) includes four separate functions, including the very simple, which just samples from a random binomial or poisson function around total allowable catch.

Hence, we can put four of these functions together to simulate a very simple MSE model:

pop_abund      <- 100;
harvest        <- 20;
growth_rate    <- 1;
K              <- 200;
pr_harvest     <- 0.7;

time           <- 1;
time_end       <- 30;
track          <- matrix(data=0, nrow=time_end, ncol=5);

while(time <= time_end){
    pop_vars     <- PopMod1(X_t0=pop_abund, sigma2_e=0.2, N_Harv=harvest, K=K, 
                            r_max = growth_rate);
    pop_abund    <- as.numeric(pop_vars[4]);
    obs_vars     <- obs_mod1(scale="Abund", value=pop_abund, bias=1, cv=0.4);
    if(obs_vars < 0){  # Nilsen's model allows estimate to be negative
        obs_vars <- 0; # Make it so that negative equates to est. of extinction  
    }
    har_vars     <- HarvDec1(HD_type="A", qu=0.2, PopState_est=obs_vars);
    imp_vars     <- Impl1(TAC=floor(har_vars), ModType="B", p=pr_harvest);
    track[time,] <- c(time, pop_abund, obs_vars, har_vars, imp_vars);
    time         <- time + 1;
}
colnames(track)  <- c("time", "Pop. Size", "Pop. Est.", "Harv. Rate", "Harv.");

We run the above code, and we can look at how key population and management quantities change over time:

  1. Population size (panel A; solid line)
  2. Population estimate size (panel A; dashed line)
  3. Harvest rate set by manager (panel B; solid line)
  4. Harvested number of animals (panel B; dashed line)

The below figure shows all of these quantities over time.

We can re-run the code at any point and essentially recreate a run of Nilsen’s MSE model. The hard work is now to come up with a G-MSE, which will allow for much more individual complexity through an agent-based approach.

Some side-notes that might be of use

The function do.call in R apparently calls a function and passes the arguments for the function from a list (e.g., if A is in a list form, or put in a list form with list(A), then do.call("f", list(A)) calls the function f for every list element in A, where individual list elements can be vectors with function arguments). This is a base R function.

Potentially relevant conferences and workshops

Scottish Ecology, Environment, and Conservation Conference (‘’The conference aims to bring together researchers in ecology, conservation, and environmental sciences across Scotland’’ – ‘’The conference is primarily for PhD, Masters and advanced undergraduate students’’) University of Aberdeen: 3-4 APR 2017 6 FEB abstract submission deadline

Modelling Biological Evolution 2017: Developing Novel Approaches (topics include: Evolutionary Game Theory and Solving Social Dilemmas) http://www.math.le.ac.uk/people/ag153/homepage/MBE_2017/MBE_2017_1.htm University of Leicester: 4-5 APR 2017 1 FEB 2017 register and abstract submission deadline.

Workshop on behavioural game theory (topic is Pyschological Game Theory) https://www.uea.ac.uk/economics/news-and-events/workshop-on-behavioural-game-theory-2017 University of East Anglia (Norwich): 5-6 JUL 2017 28 FEB 2017 submission deadline (no workshop fee)

Game theory and management (topics include: Game theory and management applications, cooperative games and applications, dynamic games and applications, stochastic games and appications) http://gsom.spbu.ru/en/gsom/research/conferences/gtm/ Saint Petersburg University: 28-30 JUN 2017

6th workshop on stochastic methods in game theory ( ‘’Many decision problems involve elements of uncertainty and of strategy. Most often the two elements cannot be easily disentangled. The aim of this workshop is to examine several aspects of the interaction between strategy and stochastics. Various game theoretic models will be presented, where stochastic elements are particularly relevant either in the formulation of the model itself or in the computation of its solutions.’’ Example topics include: Large games and stochastic and dynamic games) https://sites.google.com/site/ericegametheory2017/home Sicily, Italy: 5-13 MAY 2017

13 European Meeting on Game Theory (SING13) (topics include: cooperative games and their applications, dynamic games, stochastic games, learning and experimentation in games, computational game theory, game theory applications in fields such as management). http://www.lamsade.dauphine.fr/sing13/ Paris, France: 5-7 JUL 2017 28 FEB abstract submission deadline

References consulted and annotated (Mendeley)

Adami, C., Schossau, J., & Hintze, A. (2016). Evolutionary game theory using agent-based methods. Physics of Life Reviews, 19, 1–26. https://doi.org/10.1016/j.plrev.2016.08.015

An, L. (2012). Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecological Modelling, 229, 25–36. https://doi.org/10.1016/j.ecolmodel.2011.07.010

Ascough, J. C., Maier, H. R., Ravalico, J. K., & Strudley, M. W. (2008). Future research challenges for incorporation of uncertainty in environmental and ecological decision-making. Ecological Modelling, 219(3–4), 383–399. https://doi.org/10.1016/j.ecolmodel.2008.07.015

Bautista, C., Naves, J., Revilla, E., Fernández, N., Albrecht, J., Scharf, A. K., … Selva, N. (2016). Patterns and correlates of claims for brown bear damage on a continental scale. Journal of Applied Ecology. http://doi.org/10.1111/1365-2664.12708

Bennett, E. M. (2017). Changing the agriculture and environment conversation. Nature Ecology and Evolution, 1(January), 1–2. https://doi.org/10.1038/s41559-016-0018

Bischof, R., Nilsen, E. B., Brøseth, H., Männil, P., Ozoliņš, J., & Linnell, J. D. C. (2012). Implementation uncertainty when using recreational hunting to manage carnivores. Journal of Applied Ecology, 49(4), 824–832. https://doi.org/10.1111/j.1365-2664.2012.02167.x

Bjerketvedt, D. K., Reimers, E., Parker, H., & Borgstrøm, R. (2014). The Hardangervidda wild reindeer herd: a problematic management history. Rangifer, 34(1), 57–72.

Bonabeau, E. (2002). Agent-based modeling: methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, 99, 7280–7287. https://doi.org/10.1073/pnas.082080899

Bunnefeld, N., & Keane, A. (2014). Managing wildlife for ecological, socioeconomic, and evolutionary sustainability. Proceedings of the National Academy of Sciences, 111(36), 12964–12965. http://doi.org/10.1073/pnas.1413571111

Bunnefeld, N., Hoshino, E., & Milner-Gulland, E. J. (2011). Management strategy evaluation: A powerful tool for conservation? Trends in Ecology and Evolution, 26(9), 441–447. http://doi.org/10.1016/j.tree.2011.05.003

Chollett, I., Garavelli, L., O’Farrell, S., Cherubin, L., Matthews, T. R., Mumby, P. J., & Box, S. J. (2016). A Genuine Win-Win: Resolving the ``Conserve or Catch’’ Conflict in Marine Reserve Network Design. Conservation Letters, 0(0), 1–9. https://doi.org/10.1111/conl.12318

Cobano, J. A., Conde, R., Alejo, D., & Ollero, A. (2011). Path planning method based on Genetic Algorithms and the Monte-Carlo method to avoid aerial vehicle collisions under uncertainties. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 4429–4434). https://doi.org/10.1109/ICRA.2011.5980246

Colyvan, M., Justus, J., & Regan, H. M. (2011). The conservation game. Biological Conservation, 144(4), 1246–1253. http://doi.org/10.1016/j.biocon.2010.10.028

Duffy, R., St John, F. A. V, Büscher, B., & Brockington, D. (2016). Toward a new understanding of the links between poverty and illegal wildlife hunting. Conservation Biology, 30(1), 14–22. https://doi.org/10.1111/cobi.12622

Elston, D. A., Spezia, L., Baines, D., & Redpath, S. M. (2014). Working with stakeholders to reduce conflict-modelling the impact of varying hen harrier Circus cyaneus densities on red grouse Lagopus lagopus populations. Journal of Applied Ecology, 51(5), 1236–1245. http://doi.org/10.1111/1365-2664.12315

Eythórsson, E., Tombre, I. M., & Madsen, J. (2017). Goose management schemes to resolve conflicts with agriculture: Theory, practice and effects. Ambio, 46(S2), 231–240. https://doi.org/10.1007/s13280-016-0884-4

Farmer, J. D., & Foley, D. (2009). The economy needs agent-based modelling. Nature, 460(August), 685–686. https://doi.org/10.1038/460685a

Franco, C., Hepburn, L. A., Smith, D. J., Nimrod, S., & Tucker, A. (2016). A Bayesian Belief Network to assess rate of changes in coral reef ecosystems. Environmental Modelling and Software, 80, 132–142. https://doi.org/10.1016/j.envsoft.2016.02.029

Hake, M., Mansson, J., & Wiberg, A. (2010). A working model for preventing crop damage caused by increasing goose populations in Sweden. Ornis Svecica, 20(3-4), 225–233.

Hamblin, S. (2013). On the practical usage of genetic algorithms in ecology and evolution. Methods in Ecology and Evolution, 4(2), 184–194. https://doi.org/10.1111/2041-210X.12000

Heinonen, J. P. M., Palmer, S. C. F., Redpath, S. M., & Travis, J. M. J. (2014). Modelling hen harrier dynamics to inform human-wildlife conflict resolution: A spatially-realistic, individual-based approach. PLoS ONE, 9(11). http://doi.org/10.1371/journal.pone.0112492

Hindar, K., Fleming, I. A., McGinnity, P., & Diserud, O. (2006). Genetic and ecological effects of salmon farming on wild salmon: modelling from experimental results. ICES Journal of Marine Science, 63(7), 1234–1247. https://doi.org/10.1016/j.icesjms.2006.04.025

Janssen, M. A., Holahan, R., Lee, A., & Ostrom, E. (2010). Lab experiments for the study of socio-ecological systems. Science, 328, 613–618. http://doi.org/10.1126/science.1229223

Karlsson, S., Diserud, O. H., Fiske, P., & Hindar, K. (2016). Widespread genetic introgression of escaped farmed Atlantic salmon in wild salmon populations. ICES Journal of Marine Science, 0, fsw121. https://doi.org/10.1093/icesjms/fsw121

Liu, Y., Diserud, O. H., Hindar, K., & Skonhoft, A. (2013). An ecological-economic model on the effects of interactions between escaped farmed and wild salmon (Salmo salar). Fish and Fisheries, 14(2), 158–173. http://doi.org/10.1111/j.1467-2979.2012.00457.x

Luo, X., Yang, W., Kwong, C., Tang, J., & Tang, J. (2014). Linear programming embedded genetic algorithm for product family design optimization with maximizing imprecise part-worth utility function. Concurrent Engineering, 22(4), 309–319. https://doi.org/10.1177/1063293X14553068

Man, M., Zhang, Y., Ma, G., Friston, K., & Liu, S. (2016). Quantification of degeneracy in Hodgkin-Huxley neurons on Newman-Watts small world network. Journal of Theoretical Biology, 402, 62–74. http://doi.org/10.1016/j.jtbi.2016.05.004

Manfredo, M. J., Bruskotter, J. T., Teel, T. L., Fulton, D., Schwartz, S. H., Arlinghaus, R., … Sullivan, L. (2016). Why social values cannot be changed for the sake of conservation. Conservation Biology. Accepted. https://doi.org/10.1111/cobi.12855.This

Mansson, J., Nilsson, L., & Hake, M. (2013). Territory size and habitat selection of breeding Common Cranes (Grus grus) in a boreal landscape. Ornis Fennica, 90(2), 65–72.

Marks, R. E. (1992). Breeding hybrid strategies: optimal behaviour for oligopolists. Journal of Evolutionary Economics, 2(1), 17–38. https://doi.org/10.1007/BF01196459

McAvoy, A., & Hauert, C. (2015). Asymmetric evolutionary games. PLoS Computational Biology, 11(8), e1004349. https://doi.org/10.1371/journal.pcbi.1004349

Mccann, R. K., Marcot, B. G., & Ellis, R. (2006). Bayesian belief networks: applications in ecology and natural resource. Canadian Journal of Forest Research, 36, 3053–3062.

Miyasaka, T., Le, Q. B., Okuro, T., Zhao, X., & Takeuchi, K. (2017). Agent-based modeling of complex social–ecological feedback loops to assess multi-dimensional trade-offs in dryland ecosystem services. Landscape Ecology. https://doi.org/10.1007/s10980-017-0495-x

Nellemann, C., Jordhoy, P., Stoen, O. G., & Strand, O. (2000). Cumulative impacts of tourist resorts on wild reindeer (Rangifer tarandus tarandus) during winter. Arctic, 53(1), 9–17. https://doi.org/10.14430/arctic829

Nellemann, C., Vistnes, I., Jordhoy, P., Strand, O., & Newton, A. (2003). Progressive impact of piecemeal infrastructure development on wild reindeer. Biological Conservation, 113(2), 307–317. https://doi.org/10.1016/S0006-3207(03)00048-X

Olaussen, J. O., & Skonhoft, A. (2008). On the economics of biological invasion: An application to recreational fishing. Natural Resource Modeling, 21(4), 625–653. https://doi.org/10.1111/j.1939-7445.2008.00026.x

Rumpff, L., Duncan, D. H., Vesk, P. A., Keith, D. A., & Wintle, B. A. (2011). State-and-transition modelling for Adaptive Management of native woodlands. Biological Conservation, 144(4), 1244–1235. http://doi.org/10.1016/j.biocon.2010.10.026

Strand, O., Nilsen, E. B., Solberg, E. J., & Linnell, J. C. D. (2012). Can management regulate the population size of wild reindeer (Rangifer tarandus) through harvest? Canadian Journal of Zoology, 90, 163–171. http://doi.org/Doi 10.1139/Z11-123

Tilman, A. R., Watson, J. R., & Levin, S. (2016). Maintaining cooperation in social-ecological systems: Theoretical Ecology. https://doi.org/10.1007/s12080-016-0318-8

Tu, M. T., Wolff, E., & Lamersdorf, W. (2000). Genetic algorithms for automated negotiations: a FSM-based application approach. Proceedings 11th International Workshop on Database and Expert Systems Applications, 1029–1033. https://doi.org/10.1109/DEXA.2000.875153

Wam, H. K., Bunnefeld, N., Clarke, N., & Hofstad, O. (2016). Conflicting interests of ecosystem services: Multi-criteria modelling and indirect evaluation to trade off monetary and non-monetary measures. Ecosystem Services.

Wang, P., Poe, G. L., & Wolf, S. A. (2017). Payments for ecosystem services and wealth distribution. Ecological Economics, 132, 63–68. https://doi.org/10.1016/j.ecolecon.2016.10.009

Wright, G. D., Andersson, K. P., Gibson, C. C., & Evans, T. P. (2016). Decentralization can help reduce deforestation when user groups engage with local government. Proceedings of the National Academy of Sciences, 201610650. https://doi.org/10.1073/pnas.1610650114

References cited

Adami, C., and A. Hintze. 2013. Evolutionary instability of zero-determinant strategies demonstrates that winning is not everything. Nature communications 4:2193. Nature Publishing Group.

Adami, C., J. Schossau, and A. Hintze. 2016. Evolutionary game theory using agent-based methods. Physics of Life Reviews 19:1–26. Elsevier B.V.

An, L. 2012. Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecological Modelling 229:25–36.

Ascough, J. C., H. R. Maier, J. K. Ravalico, and M. W. Strudley. 2008. Future research challenges for incorporation of uncertainty in environmental and ecological decision-making. Ecological Modelling 219:383–399.

Avis, D., G. D. Rosenberg, R. Savani, and B. von Stengel. 2009. Enumeration of Nash equilibria for two-player games. Economic Theory 42:9–37.

Balmann, A., and K. Happe. 2000. Applying parallel genetic algorithms to economic problems: The case of agricultural land markets. in IIFET conference “microbehavior and macroresults”. proceedings.

Bocedi, G., S. C. F. Palmer, G. Pe, R. K. Heikkinen, Y. G. Matsinos, K. Watts, and J. M. J. Travis. 2014. RangeShifter: a platform for modelling spatial eco-evolutionary dynamics and species’ responses to environmental changes. Methods in Ecology and Evolution 5:388–396.

Bonabeau, E. 2002. Agent-based modeling: methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences 99:7280–7287.

Bronstein, J. L., W. G. Wilson, and W. F. Morris. 2003. Ecological dynamics of mutualist/antagonist communities. American Naturalist 162:S24–39.

Bunnefeld, N., E. Hoshino, and E. J. Milner-Gulland. 2011. Management strategy evaluation: A powerful tool for conservation? Trends in Ecology and Evolution 26:441–447.

Carter, G. G., and G. S. Wilkinson. 2013. Food sharing in vampire bats: reciprocal help predicts donations more than relatedness or harassment. Proceedings of The Royal Society B 280:20122573.

Carter, G. G., and G. S. Wilkinson. 2015. Social benefits of non-kin food sharing by female vampire bats. Proceedings of The Royal Society B 282:20152524.

Colyvan, M., J. Justus, and H. M. Regan. 2011. The conservation game. Biological Conservation 144:1246–1253. Elsevier Ltd.

Correia, L. 2010. Computational evolution: Taking liberties. Theory in Biosciences 129:183–191.

Darwen, P. J., and X. Yao. 1995. On evolving robust strategies for iterated prisoner’s dilemma. Pp. 276–292 in Progress in evolutionary computation.

Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford.

Débarre, F., C. Hauert, and M. Doebeli. 2014. Social evolution in structured populations. Nature Communications 5:3409.

Duthie, A. B., and M. R. Falcy. 2013. The influence of habitat autocorrelation on plants and their seed-eating pollinators. Ecological Modelling 251:260–270.

Fawcett, T. W., B. Fallenstein, A. D. Higginson, A. I. Houston, D. E. W. Mallpress, P. C. Trimmer, and J. M. McNamara. 2014. The evolution of decision rules in complex environments. Trends in Cognitive Sciences 18:153–161. Elsevier Ltd.

Fonseca, C. M., and P. J. Fleming. 1993. Genetic algorithms for multiobjective optimization: formulation, discussion and generalization. Pp. 416–423 in Icga.

Fonseca, C. M., and P. J. Fleming. 1998. Multiobjective optimization and multiple constraint handling with evolutionary algorithms - Part I: A unified formulation. IEEE Transactions on Systems, Man, and Cybernetics Part A:Systems and Humans. 28:26–37.

Franco, C., L. A. Hepburn, D. J. Smith, S. Nimrod, and A. Tucker. 2016. A Bayesian Belief Network to assess rate of changes in coral reef ecosystems. Environmental Modelling and Software 80:132–142. Elsevier Ltd.

Hamblin, S. 2013. On the practical usage of genetic algorithms in ecology and evolution. Methods in Ecology and Evolution 4:184–194.

Horn, J. rey, N. Nafpliotis, and D. E. Goldberg. 1993. Multiobjective optimization using the niched pareto genetic algorithm.

Jaszkiewicz, A. 2002. Genetic local search for multi-objective combinatorial optimization. European Journal of Operational Research 137:50–71.

Kark, S., A. Tulloch, A. Gordon, T. Mazor, N. Bunnefeld, and N. Levin. 2015. Cross-boundary collaboration: Key to the conservation puzzle. Current Opinion in Environmental Sustainability 12:12–24. Elsevier B.V.

Kenkel, B., and C. S. Signorino. 2014. Estimating Extensive Form Games in R. Journal of Statistical Software 56:1–27.

Kumar, A. 2013. Encoding schemes in genetic algorithm. International Journal of Advanced Research in IT and Engineering 2:1–7.

Lee, C. S. 2012. Multi-objective game-theory models for conflict analysis in reservoir watershed management. Chemosphere 87:608–613. Elsevier Ltd.

Leombruni, R., and M. Richiardi. 2005. Why are economists sceptical about agent-based simulations? Physica A 355:103–109.

Luke, S. 2015. Essentials of Metaheuristics.

Luo, X., W. Yang, C. Kwong, J. Tang, and J. Tang. 2014. Linear programming embedded genetic algorithm for product family design optimization with maximizing imprecise part-worth utility function. Concurrent Engineering 22:309–319.

Man, M., Y. Zhang, G. Ma, K. Friston, and S. Liu. 2016. Quantification of degeneracy in Hodgkin-Huxley neurons on Newman-Watts small world network. Journal of Theoretical Biology 402:62–74. Elsevier.

Marks, R. E. 1992. Breeding hybrid strategies: optimal behaviour for oligopolists. Journal of Evolutionary Economics 2:17–38.

Maynard Smith, J., and G. A. Parker. 1976. The logic of asymmetric contests. Animal Behaviour 24:159–175.

McAvoy, A., and C. Hauert. 2015. Asymmetric evolutionary games. PLoS Computational Biology 11:e1004349.

McFadden, D. 1973. Conditional logit analysis of qualitative choice behavior. Pp. 105–142 in P. Zarembka, ed. Frontiers in econometrics. Academic Press Inc, New York.

McNamara, J. M., P. C. Trimmer, and A. I. Houston. 2014. Natural selection can favour “irrational” behaviour. Biology Letters 10:20130935.

Milner-Gulland, E. J. 2011. Integrating fisheries approaches and household utility models for improved resource management. Proceedings of the National Academy of Sciences 108:1741–1746.

Miltersen, P. B., and T. B. Sørensen. 2009. Computing a quasi-perfect equilibrium of a two-player game. Economic Theory 42:175–192.

Miyasaka, T., Q. B. Le, T. Okuro, X. Zhao, and K. Takeuchi. 2017. Agent-based modeling of complex social–ecological feedback loops to assess multi-dimensional trade-offs in dryland ecosystem services. Landscape Ecology, doi: 10.1007/s10980-017-0495-x. Springer Netherlands.

Naivinit, W., C. Le Page, G. Trébuil, and N. Gajaseni. 2010. Participatory agent-based modeling and simulation of rice production and labor migrations in Northeast Thailand. Environmental Modelling and Software 25:1345–1358. Elsevier Ltd.

Nautiyal, S., and H. Kaechele. 2009. Natural resource management in a protected area of the Indian Himalayas: A modeling approach for anthropogenic interactions on ecosystem. Environmental Monitoring and Assessment 153:253–271.

Nellemann, C., P. Jordhøy, O. G. Støen, and O. Strand. 2000. Cumulative impacts of tourist resorts on wild reindeer (Rangifer tarandus tarandus) during winter. Arctic 53:9–17.

Nowak, M. A., K. Sigmund, and E. El-Sedy. 1995. Automata, repeated games and noise. Journal of Mathematical Biology 33:703–722.

Nuno, A., N. Bunnefeld, and E. J. Milner-Gulland. 2013. Matching observations and reality: Using simulation models to improve monitoring under uncertainty in the Serengeti. Journal of Applied Ecology 50:488–498.

Phan, D. 2003. From agent-based computational economics toward cognitive economics. Pp. 369–396 in P. Bourgine and J.-P. Nadal, eds. Cognitive economics: An interdisciplinary approach. Springer, London.

Pollock, K. H., J. D. Nichols, C. Brownie, and J. E. Hines. 1990. Statistical inference for capture-recapture experiments. Wildlife Monographs 27:938–942.

Salomon, R. 1996. The influence of different coding schemes on the computational complexity of genetic algorithms in function optimization. Pp. 227–235 in Parallel problem solving from nature. Springer Berlin Heidelberg.

Savani, R., and B. von Stengel. 2014. Game theory explorer: software for the applied game theorist. Computational Management Science 5–33.

Tesfatsion, L., C. R. Rehmann, D. S. Cardoso, Y. Jie, and W. J. Gutowski. 2017. An agent-based platform for the study of watersheds as coupled natural and human systems. Environmental Modelling and Software 89:40–60. Elsevier Ltd.

Tilman, A. R., J. R. Watson, and S. Levin. 2016. Maintaining cooperation in social-ecological systems: Theoretical Ecology, doi: 10.1007/s12080-016-0318-8. Theoretical Ecology.

Trivers, R. 1985. Social Evolution. The Benjamin/Cummings Publishing Company, Inc., Menlo Park, California.

Tu, M. T., E. Wolff, and W. Lamersdorf. 2000. Genetic algorithms for automated negotiations: a FSM-based application approach. Proceedings 11th International Workshop on Database and Expert Systems Applications 1029–1033.

Watson, R. A., and E. Szathmary. 2016. How Can Evolution Learn? Trends in Ecology and Evolution 31:147–157. Elsevier Ltd.

Wilkinson, G. S. 1990. Food sharing in vampire bats. Scientific American 262:64–70.

Zeeman, E. C. 1980. Population dynamics from game theory. Pp. 471–497 in Z. Nitecki and C. Robinson, eds. Global theory of dynamical systems. Springer-Verlag, Berlin.