Title: | User-Friendly 'shiny' App for Bayesian Species Distribution Models |
---|---|
Description: | A user-friendly 'shiny' application for Bayesian machine learning analysis of marine species distributions. GLOSSA (Global Species Spatiotemporal Analysis) uses Bayesian Additive Regression Trees (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) to model species distributions with intuitive workflows for data upload, processing, model fitting, and result visualization. It supports presence-absence and presence-only data (with pseudo-absence generation), spatial thinning, cross-validation, and scenario-based projections. GLOSSA is designed to facilitate ecological research by providing easy-to-use tools for analyzing and visualizing marine species distributions across different spatial and temporal scales. |
Authors: | Jorge Mestre-Tomás [aut, cre] , Alba Fuster-Alonso [aut] |
Maintainer: | Jorge Mestre-Tomás <[email protected]> |
License: | GPL-3 |
Version: | 1.0.0 |
Built: | 2024-11-16 06:08:41 UTC |
Source: | https://github.com/imares-group/glossa |
This function enlarges a polygon by applying a buffer.
buffer_polygon(polygon, buffer_distance)
buffer_polygon(polygon, buffer_distance)
polygon |
An sf object representing the polygon to be buffered. |
buffer_distance |
Numeric. The buffer distance in decimal degrees (arc degrees). |
An sf object representing the buffered polygon.
This function cleans coordinates of presence/absence data by removing NA coordinates, rounding coordinates if specified, removing duplicated points, and removing points outside specified spatial polygon boundaries.
clean_coordinates( df, study_area, overlapping = FALSE, decimal_digits = NULL, coords = c("decimalLongitude", "decimalLatitude"), by_timestamp = TRUE, seed = NULL )
clean_coordinates( df, study_area, overlapping = FALSE, decimal_digits = NULL, coords = c("decimalLongitude", "decimalLatitude"), by_timestamp = TRUE, seed = NULL )
df |
A dataframe object with rows representing points. Coordinates are in WGS84 (EPSG:4326) coordinate system. |
study_area |
A spatial polygon in WGS84 (EPSG:4326) representing the boundaries within which coordinates should be kept. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
decimal_digits |
An integer specifying the number of decimal places to which coordinates should be rounded. |
coords |
Character vector specifying the column names for longitude and latitude. |
by_timestamp |
If TRUE, clean coordinates taking into account different time periods defined in the column 'timestamp'. |
seed |
Optional; an integer seed for reproducibility of results. |
This function takes a data frame containing presence/absence data with longitude and latitude coordinates, a spatial polygon representing boundaries within which to keep points, and parameters for rounding coordinates and handling duplicated points. It returns a cleaned data frame with valid coordinates within the specified boundaries.
A cleaned data frame containing presence/absence data with valid coordinates.
Generates raster layers for longitude and latitude from given raster data, applies optional scaling, and restricts the output to a specified spatial mask.
create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)
create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)
layers |
Raster or stack of raster layers to derive geographic extent and resolution. |
study_area |
Spatial object for masking output layers. |
scale_layers |
Logical indicating if scaling is applied. Default is FALSE. |
Raster stack with layers lon and lat.
This function performs k-fold cross-validation for a Bayesian Additive Regression Trees (BART) model using presence-absence data and environmental covariate layers. It calculates various performance metrics for model evaluation.
cv_bart(data, k = 10, seed = NULL)
cv_bart(data, k = 10, seed = NULL)
data |
Data frame with a column (named 'pa') indicating presence (1) or absence (0) and columns for the predictor variables. |
k |
Integer; number of folds for cross-validation (default is 10). |
seed |
Optional; random seed. |
A data frame containing the true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), and various performance metrics including precision (PREC), sensitivity (SEN), specificity (SPC), false discovery rate (FDR), negative predictive value (NPV), false negative rate (FNR), false positive rate (FPR), F-score, accuracy (ACC), balanced accuracy (BA), and true skill statistic (TSS) for each fold.
This function extracts covariate values for species occurrences, excluding NA values.
extract_noNA_cov_values(data, covariate_layers, predictor_variables)
extract_noNA_cov_values(data, covariate_layers, predictor_variables)
data |
A data frame containing species occurrence data with columns x/long (first column) and y/lat (second column). |
covariate_layers |
A list of raster layers representing covariates. |
predictor_variables |
Variables to select from all the layers. |
This function extracts covariate values for each species occurrence location from the provided covariate layers. It returns a data frame containing species occurrence data with covariate values, excluding any NA values.
A data frame containing species occurrence data with covariate values, excluding NA values.
This function fits a Bayesian Additive Regression Trees (BART) model using presence/absence data and environmental covariate layers.
fit_bart_model(y, x, seed = NULL)
fit_bart_model(y, x, seed = NULL)
y |
A numeric vector indicating presence (1) or absence (0). |
x |
A data frame with the same number of rows as the length of the vector 'y', containing the covariate values. |
seed |
An optional integer value for setting the random seed for reproducibility. |
A BART model object.
This function generates pseudo-absence points within the study area.
generate_pseudo_absences( presences, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), decimal_digits = NULL, attempts = 100 )
generate_pseudo_absences( presences, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), decimal_digits = NULL, attempts = 100 )
presences |
Data frame containing presence points. |
study_area |
Spatial polygon defining the study area ('sf' object). |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
decimal_digits |
An integer specifying the number of decimal places to which coordinates should be rounded. |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
Data frame containing both presence and pseudo-absence points.
This function wraps all the analysis that the GLOSSA package performs. It processes presence-absence data, environmental covariates, and performs species distribution modeling and projections under past and future scenarios.
glossa_analysis( pa_data = NULL, fit_layers = NULL, proj_files = NULL, study_area_poly = NULL, predictor_variables = NULL, decimal_digits = NULL, scale_layers = FALSE, buffer = NULL, native_range = NULL, suitable_habitat = NULL, other_analysis = NULL, seed = NA, waiter = NULL )
glossa_analysis( pa_data = NULL, fit_layers = NULL, proj_files = NULL, study_area_poly = NULL, predictor_variables = NULL, decimal_digits = NULL, scale_layers = FALSE, buffer = NULL, native_range = NULL, suitable_habitat = NULL, other_analysis = NULL, seed = NA, waiter = NULL )
pa_data |
A list of data frames containing presence-absence data. |
fit_layers |
A SpatRaster stack containing model fitting environmental layers. |
proj_files |
A list of file paths containing environmental layers for projection scenarios. |
study_area_poly |
A spatial polygon defining the study area. |
predictor_variables |
A list of predictor variables to be used in the analysis. |
decimal_digits |
An integer specifying the number of decimal places to which coordinates should be rounded. |
scale_layers |
Logical; if TRUE, covariate layers will be scaled based on fit layers. |
buffer |
Buffer value or distance in decimal degrees (arc_degrees). |
native_range |
A vector of scenarios ('fit_layers', 'projections') where native range modeling should be performed. |
suitable_habitat |
A vector of scenarios ('fit_layers', 'projections') where habitat suitability modeling should be performed. |
other_analysis |
A vector of additional analyses to perform (e.g., 'variable_importance', 'functional_responses', 'cross_validation'). |
seed |
Optional; an integer seed for reproducibility of results. |
waiter |
Optional; a waiter instance to update progress in a Shiny application. |
A list containing structured outputs from each major section of the analysis, including model data, projections, variable importance scores, and habitat suitability assessments.
This function inverts a polygon by calculating the difference between the bounding box and the polygon.
invert_polygon(polygon, bbox = NULL)
invert_polygon(polygon, bbox = NULL)
polygon |
An sf object representing the polygon to be inverted. |
bbox |
Optional. An sf or bbox object representing the bounding box. If NULL, the bounding box of the input polygon is used. |
An sf object representing the inverted polygon.
This function crops and extends raster layers to a study area extent (bbox) defined by longitude and latitude then applies a mask based on a provided spatial polygon to remove areas outside the polygon.
layer_mask(layers, study_area)
layer_mask(layers, study_area)
layers |
A stack of raster layers ('SpatRaster' object) to be processed. |
study_area |
A spatial polygon ('sf' object) used to mask the raster layers. |
A 'SpatRaster' object representing the masked raster layers.
This function calculates the optimal cutoff for presence-absence prediction using a BART model.
pa_optimal_cutoff(y, x, model, seed = NULL)
pa_optimal_cutoff(y, x, model, seed = NULL)
y |
Vector indicating presence (1) or absence (0). |
x |
Dataframe with same number of rows as the length of the vector 'y' with the covariate values. |
model |
A BART model object. |
seed |
Random seed for reproducibility. |
The optimal cutoff value for presence-absence prediction.
This function makes predictions using a Bayesian Additive Regression Trees (BART) model on a stack of environmental covariates ('SpatRaster').
predict_bart(bart_model, layers, cutoff = NULL)
predict_bart(bart_model, layers, cutoff = NULL)
bart_model |
A BART model object obtained from fitting BART using the 'dbarts' package. |
layers |
A SpatRaster object containing environmental covariates for prediction. |
cutoff |
An optional numeric cutoff value for determining potential presences. If NULL, potential presences and absences will not be computed. |
A SpatRaster containing the mean, median, standard deviation, and quantiles of the posterior predictive distribution, as well as a potential presences layer if cutoff is provided.
This function removes duplicated points from a dataframe based on specified coordinate columns.
remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))
remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))
df |
A dataframe object with each row representing one point. |
coords |
A character vector specifying the names of the coordinate columns used for identifying duplicate points. Default is c("decimalLongitude", "decimalLatitude"). |
A dataframe without duplicated points.
This function removes points from a dataframe based on their location relative to a specified polygon.
remove_points_polygon( df, polygon, overlapping = FALSE, coords = c("decimalLongitude", "decimalLatitude") )
remove_points_polygon( df, polygon, overlapping = FALSE, coords = c("decimalLongitude", "decimalLatitude") )
df |
A dataframe object with rows representing points. |
polygon |
An sf polygon object defining the region for point removal. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
coords |
Character vector specifying the column names for longitude and latitude. Default is c("decimalLongitude", "decimalLatitude"). |
A dataframe containing the filtered points.
This function calculates the response curve (functional responses) using a Bayesian Additive Regression Trees (BART) model.
response_curve_bart(bart_model, data, predictor_names)
response_curve_bart(bart_model, data, predictor_names)
bart_model |
A BART model object obtained from fitting BART ('dbarts::bart'). |
data |
A data frame containing the predictor variables (the design matrix) used in the BART model. |
predictor_names |
A character vector containing the names of the predictor variables. |
A list containing a data frame for each independent variable with mean, 2.5th percentile, 97.5th percentile, and corresponding values of the variables.
This function launches the GLOSSA Shiny web application.
run_glossa( request_size_mb = 2000, launch.browser = TRUE, port = getOption("shiny.port") )
run_glossa( request_size_mb = 2000, launch.browser = TRUE, port = getOption("shiny.port") )
request_size_mb |
Maximum request size for file uploads, in megabytes. Default is 2000 MB. |
launch.browser |
Logical indicating whether to launch the app in the browser (default is TRUE). |
port |
Port number for the Shiny app. Uses the port specified by 'getOption("shiny.port")' by default. |
The GLOSSA Shiny app provides an interactive interface for users to access GLOSSA functionalities.
No return value, called to launch the GLOSSA app.
if(interactive()) { run_glossa() }
if(interactive()) { run_glossa() }
This function computes the variable importance scores for a fitted BART (Bayesian Additive Regression Trees) model using a permutation-based approach. It measures the impact of each predictor variable on the model's performance by permuting the values of that variable and evaluating the change in performance (F-score is the performance metric).
variable_importance(bart_model, cutoff = 0, n_repeats = 10, seed = NULL)
variable_importance(bart_model, cutoff = 0, n_repeats = 10, seed = NULL)
bart_model |
A BART model object. |
cutoff |
A numeric threshold for converting predicted probabilities into presence-absence. |
n_repeats |
An integer indicating the number of times to repeat the permutation for each variable. |
seed |
An optional seed for random number generation. |
A data frame where each column corresponds to a predictor variable, and each row contains the variable importance scores across permutations.