| Title: | User-Friendly 'shiny' App for Bayesian Species Distribution Models |
|---|---|
| Description: | A user-friendly 'shiny' application for Bayesian machine learning analysis of marine species distributions. GLOSSA (Global Ocean Species Spatio-temporal Analysis) uses Bayesian Additive Regression Trees (BART; Chipman, George, and McCulloch (2010) <doi:10.1214/09-AOAS285>) to model species distributions with intuitive workflows for data upload, processing, model fitting, and result visualization. It supports presence-absence and presence-only data (with pseudo-absence generation), spatial thinning, cross-validation, and scenario-based projections. GLOSSA is designed to facilitate ecological research by providing easy-to-use tools for analyzing and visualizing marine species distributions across different spatial and temporal scales. Optionally, pseudo-absences can be generated within the environmental space using the external package 'flexsdm' (not on CRAN), which can be downloaded from <https://github.com/sjevelazco/flexsdm>; this functionality is used conditionally when available and all core features work without it. |
| Authors: | Jorge Mestre-Tomás [aut, cre] (ORCID: <https://orcid.org/0000-0002-8983-3417>), Alba Fuster-Alonso [aut] (ORCID: <https://orcid.org/0000-0002-7283-291X>) |
| Maintainer: | Jorge Mestre-Tomás <[email protected]> |
| License: | GPL-3 |
| Version: | 1.2.4 |
| Built: | 2026-05-17 09:33:56 UTC |
| Source: | https://github.com/imares-group/glossa |
This function enlarges a polygon by applying a buffer.
buffer_polygon(polygon, buffer_distance)buffer_polygon(polygon, buffer_distance)
polygon |
An sf object representing the polygon to be buffered. |
buffer_distance |
Numeric. The buffer distance in decimal degrees (arc degrees). |
An sf object representing the buffered polygon.
This function cleans coordinates of presence/absence data by removing NA coordinates, rounding coordinates if specified, removing duplicated points, and removing points outside specified spatial polygon boundaries.
clean_coordinates( df, study_area, overlapping = FALSE, thinning_method = NULL, thinning_value = NULL, coords = c("decimalLongitude", "decimalLatitude"), by_timestamp = TRUE, seed = NULL )clean_coordinates( df, study_area, overlapping = FALSE, thinning_method = NULL, thinning_value = NULL, coords = c("decimalLongitude", "decimalLatitude"), by_timestamp = TRUE, seed = NULL )
df |
A dataframe object with rows representing points. Coordinates are in WGS84 (EPSG:4326) coordinate system. |
study_area |
A spatial polygon in WGS84 (EPSG:4326) representing the boundaries within which coordinates should be kept. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
thinning_method |
Character; spatial thinning method to apply to occurrence data. Options are 'c("None", "Distance", "Grid", "Precision")'. See 'GeoThinneR' package for details. |
thinning_value |
Numeric; value used for thinning depending on the selected method: distance in meters ('Distance'), grid resolution in degrees ('Grid'), or decimal precision ('Precision'). |
coords |
Character vector specifying the column names for longitude and latitude. |
by_timestamp |
If TRUE, clean coordinates taking into account different time periods defined in the column 'timestamp'. |
seed |
Optional; an integer seed for reproducibility of results. |
This function takes a data frame containing presence/absence data with longitude and latitude coordinates, a spatial polygon representing boundaries within which to keep points, and parameters for rounding coordinates and handling duplicated points. It returns a cleaned data frame with valid coordinates within the specified boundaries.
A cleaned data frame containing presence/absence data with valid coordinates.
This function is a copy from the 'contBoyce()' function from the 'enmSdm' R package.
This function calculates the continuous Boyce index (CBI), a measure of model accuracy for presence-only test data. This version uses multiple, overlapping windows, in contrast to link{contBoyce2x}, which covers each point by at most two windows.
contBoyce( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), numBins = 101, binWidth = 0.1, autoWindow = TRUE, method = "spearman", dropZeros = TRUE, na.rm = FALSE, ... )contBoyce( pres, contrast, presWeight = rep(1, length(pres)), contrastWeight = rep(1, length(contrast)), numBins = 101, binWidth = 0.1, autoWindow = TRUE, method = "spearman", dropZeros = TRUE, na.rm = FALSE, ... )
pres |
Numeric vector. Predicted values at presence sites. |
contrast |
Numeric vector. Predicted values at background sites. |
presWeight |
Numeric vector same length as |
contrastWeight |
Numeric vector same length as |
numBins |
Positive integer. Number of (overlapping) bins into which to divide predictions. |
binWidth |
Positive numeric value < 1. Size of a bin. Each bin will be |
autoWindow |
Logical. If |
method |
Character. Type of correlation to calculate. The default is |
dropZeros |
Logical. If |
na.rm |
Logical. If |
... |
Other arguments (not used). |
CBI is the Spearman rank correlation coefficient between the proportion of sites in each prediction class and the expected proportion of predictions in each prediction class based on the proportion of the landscape that is in that class. The index ranges from -1 to 1. Values >0 indicate the model's output is positively correlated with the true probability of presence. Values <0 indicate it is negatively correlated with the true probability of presence.
Numeric value.
This function is directly copied from the 'enmSdm' package.
Boyce, M.S., Vernier, P.R., Nielsen, S.E., and Schmiegelow, F.K.A. 2002. Evaluating resource selection functions. Ecological Modeling 157:281-300. doi:10.1016/S0304-3800(02)00200-4
Hirzel, A.H., Le Lay, G., Helfer, V., Randon, C., and Guisan, A. 2006. Evaluating the ability of habitat suitability models to predict species presences. Ecological Modeling 199:142-152. doi:10.1016/j.ecolmodel.2006.05.017
Generates raster layers for longitude and latitude from given raster data, applies optional scaling, and restricts the output to a specified spatial mask.
create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)create_coords_layer(layers, study_area = NULL, scale_layers = FALSE)
layers |
Raster or stack of raster layers to derive geographic extent and resolution. |
study_area |
Spatial object for masking output layers. |
scale_layers |
Logical indicating if scaling is applied. Default is FALSE. |
Raster stack with layers lon and lat.
This function performs cross-validation for a Bayesian Additive Regression Trees (BART) model using presence-absence data and environmental covariate layers. It calculates various performance metrics for model evaluation.
cross_validate_model(data, folds, predictor_cols = NULL, seed = NULL)cross_validate_model(data, folds, predictor_cols = NULL, seed = NULL)
data |
Data frame with a column (named 'pa') indicating presence (1) or absence (0) and columns for the predictor variables. |
folds |
A vector of fold assignments (same length as 'data'). |
predictor_cols |
Optional; a character vector of column names to be used as predictors. If NULL, all columns except 'pa' will be used. |
seed |
Optional; random seed. |
A list with:
A data frame containing the true positives (TP), false positives (FP), false negatives (FN), true negatives (TN), and various performance metrics including precision (PREC), sensitivity (SEN), specificity (SPC), false discovery rate (FDR), negative predictive value (NPV), false negative rate (FNR), false positive rate (FPR), F-score, accuracy (ACC), balanced accuracy (BA), and true skill statistic (TSS) for each fold.
Data frame with observed, predicted, probability, and fold assignment per test instance.
Computes a set of performance metrics (e.g., AUC, TSS, CBI) based on observed and predicted values.
evaluation_metrics(df, na.rm = TRUE, method = "spearman")evaluation_metrics(df, na.rm = TRUE, method = "spearman")
df |
A data.frame with columns: 'observed' (0/1), 'predicted' (0/1), 'probability' (numeric). |
na.rm |
Logical. Whether to remove rows with NA values. |
method |
Correlation method for CBI ("spearman", "pearson", or "kendall"). |
A named list or data.frame with evaluation metrics.
This function extracts covariate values for species occurrences, excluding NA values.
extract_noNA_cov_values(data, covariate_layers, predictor_variables)extract_noNA_cov_values(data, covariate_layers, predictor_variables)
data |
A data frame containing species occurrence data with columns x/long (first column) and y/lat (second column). |
covariate_layers |
A list of raster layers representing covariates. |
predictor_variables |
Variables to select from all the layers. |
This function extracts covariate values for each species occurrence location from the provided covariate layers. It returns a data frame containing species occurrence data with covariate values, excluding any NA values.
A data frame containing species occurrence data with covariate values, excluding NA values.
This function fits a Bayesian Additive Regression Trees (BART) model using presence/absence data and environmental covariate layers.
fit_bart_model(y, x, seed = NULL, ...)fit_bart_model(y, x, seed = NULL, ...)
y |
A numeric vector indicating presence (1) or absence (0). |
x |
A data frame with the same number of rows as the length of the vector 'y', containing the covariate values. |
seed |
An optional integer value for setting the random seed for reproducibility. |
... |
Additional arguments passed to 'dbarts::bart()'. |
A BART model object.
Creates cross-validation fold assignments for presence-absence or presence-only data, supporting three types of strategies: k-fold, spatial blocks (through blockCV R package), and temporal blocks.
generate_cv_folds( data, method = "k-fold", block_method = "predictors_autocorrelation", block_size = NULL, k = 10, predictor_raster = NULL, model_residuals = NULL, coords = c("decimalLongitude", "decimalLatitude") )generate_cv_folds( data, method = "k-fold", block_method = "predictors_autocorrelation", block_size = NULL, k = 10, predictor_raster = NULL, model_residuals = NULL, coords = c("decimalLongitude", "decimalLatitude") )
data |
A 'data.frame' with at least presence-absence data ('pa'), coordinates, and optionally a 'timestamp'. |
method |
The cross-validation strategy. One of: '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'. |
block_method |
For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'. |
block_size |
Numeric. Manual block size in meters (used if 'block_method = "manual"'). |
k |
Integer. Number of folds to generate. |
predictor_raster |
A 'terra::SpatRaster' used for estimating spatial autocorrelation (only needed if 'block_method = "predictors_autocorrelation"'). |
model_residuals |
A 'data.frame' with residuals and coordinates (only needed if 'block_method = "residuals_autocorrelation"'). |
coords |
A character vector of length 2 indicating the longitude and latitude column names. |
A list with the following elements:
A vector of fold assignments (one per row in 'data').
The CV method used.
The spatial block size method (if applicable).
The estimated or manual block size (in meters), if spatial blocks were used.
This function generates pseudo-absences outside a buffer around presence points but within the convex hull of those points. This prevents spatial overlap while preserving geographic realism.
generate_pa_buffer_out( presences, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), pa_buffer_distance = 0.5, ratio = 1, attempts = 100, seed = NULL )generate_pa_buffer_out( presences, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), pa_buffer_distance = 0.5, ratio = 1, attempts = 100, seed = NULL )
presences |
Data frame containing presence points. |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
pa_buffer_distance |
Numeric; buffer radius in degrees around each presence. Default is 0.5. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional seed for reproducibility. |
A data frame of pseudo-absences with coordinates, timestamp, 'pa = 0', and covariate values.
Uses flexsdm::sample_pseudoabs(method = c("env_const", env = predictors) within each timestamp stratum so that pseudo-absences match the temporal distribution of presences. Extracts covariate values for the sampled coordinates and returns a data.frame with the same predictor columns.
generate_pa_env_space_flexsdm( presences, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )generate_pa_env_space_flexsdm( presences, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )
presences |
Data frame containing presence points. |
raster_stack |
'SpatRaster' object containing covariate data. Uses continuous predictor only and samples per timestamp. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional seed for reproducibility. |
A data frame of pseudo-absences with coordinates, timestamp, 'pa = 0', and covariate values.
This function generates pseudo-absence points randomly across the study area (random background), optionally applying spatial thinning to match presence filtering strategy.
generate_pa_random( presences, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )generate_pa_random( presences, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )
presences |
Data frame containing presence points. |
study_area |
Spatial polygon defining the study area ('sf' object). |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional random seed. |
Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.
Generate Pseudo-Absences Using Target-Group Background
generate_pa_target_group( presences, target_group_points, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )generate_pa_target_group( presences, target_group_points, study_area, raster_stack, predictor_variables, coords = c("decimalLongitude", "decimalLatitude"), ratio = 1, attempts = 100, seed = NULL )
presences |
Data frame containing presence points. |
target_group_points |
Data frame of all sampling locations (target group). |
study_area |
Spatial polygon defining the study area ('sf' object). |
raster_stack |
'SpatRaster' object containing covariate data. |
predictor_variables |
Character vector of the predictor variables selected for this species. |
coords |
Character vector specifying the column names for latitude and longitude. Defaults to 'c("decimalLongitude", "decimalLatitude")'. |
ratio |
Ratio of pseudo-absences to presences (default 1 = balanced). |
attempts |
Integer specifying the number of attempts to generate exact pseudo-absences. Defaults to 100. |
seed |
Optional random seed. |
Data frame containing pseudo-absence points with coordinates, timestamp, pa = 0, and covariates.
Wrapper function for pseudo-absence generation methods, such as background random points, target-group, and using buffer area.
generate_pseudo_absences( method = c("random", "target_group", "buffer_out", "env_space_flexsdm"), presences, raster_stack, predictor_variables, study_area = NULL, target_group_points = NULL, coords = c("decimalLongitude", "decimalLatitude"), pa_buffer_distance = 0.5, ratio = 1, attempts = 100, seed = NULL )generate_pseudo_absences( method = c("random", "target_group", "buffer_out", "env_space_flexsdm"), presences, raster_stack, predictor_variables, study_area = NULL, target_group_points = NULL, coords = c("decimalLongitude", "decimalLatitude"), pa_buffer_distance = 0.5, ratio = 1, attempts = 100, seed = NULL )
method |
Character; one of "random", "target_group", "buffer_out", or "env_space_flexsdm". |
presences |
Data frame of presence points with coordinates and timestamp. |
raster_stack |
SpatRaster of covariates. |
predictor_variables |
Character vector of selected predictors. |
study_area |
Optional sf polygon (used for clipping). |
target_group_points |
Optional data frame of sampling points (for target-group). |
coords |
Vector of coordinate column names. |
pa_buffer_distance |
Numeric; buffer radius in degrees around each presence. Default is 0.5. |
ratio |
Ratio of pseudo-absences to presences. |
attempts |
Max attempts to fulfill sample size. |
seed |
Optional seed for reproducibility. |
A data frame of pseudo-absence points (pa = 0) with covariates.
This function wraps all the analysis that the GLOSSA package performs. It processes presence-absence data, environmental covariates, and performs species distribution modeling and projections under past and future scenarios.
glossa_analysis( pa_data = NULL, fit_layers = NULL, proj_files = NULL, study_area_poly = NULL, predictor_variables = NULL, thinning_method = NULL, thinning_value = NULL, scale_layers = FALSE, buffer = NULL, native_range = NULL, suitable_habitat = NULL, other_analysis = NULL, model_args = list(), cv_methods = NULL, cv_folds = 5, cv_block_source = "residuals_autocorrelation", cv_block_size = NULL, pseudoabsence_method = "random", pa_ratio = 1, target_group_points = NULL, pa_buffer_distance = NULL, seed = NA, waiter = NULL )glossa_analysis( pa_data = NULL, fit_layers = NULL, proj_files = NULL, study_area_poly = NULL, predictor_variables = NULL, thinning_method = NULL, thinning_value = NULL, scale_layers = FALSE, buffer = NULL, native_range = NULL, suitable_habitat = NULL, other_analysis = NULL, model_args = list(), cv_methods = NULL, cv_folds = 5, cv_block_source = "residuals_autocorrelation", cv_block_size = NULL, pseudoabsence_method = "random", pa_ratio = 1, target_group_points = NULL, pa_buffer_distance = NULL, seed = NA, waiter = NULL )
pa_data |
A list of data frames containing presence-absence data including 'decimalLongitude', 'decimalLatitude', 'timestamp', and 'pa' columns. |
fit_layers |
A ZIP file with the raster files containing model fitting environmental layers formatted as explained in the website documentation. |
proj_files |
A list of ZIP file paths containing environmental layers for projection scenarios. |
study_area_poly |
A spatial polygon defining the study area. |
predictor_variables |
A list of the predictor variables to be used in the analysis for each occurrence dataset. |
thinning_method |
A character specifying the spatial thinning method to apply to occurrence data. Options are 'c("none", "distance", "grid", "precision")'. See 'GeoThinneR' package for details. |
thinning_value |
A numeric value used for thinning depending on the selected method: distance in meters ('distance'), grid resolution in degrees ('grid'), or decimal precision ('precision'). |
scale_layers |
Logical; if 'TRUE', covariate layers will be standardize (z-score) based on fit layers. |
buffer |
Buffer value or distance in decimal degrees (arc_degrees) for buffering the study area polygon. |
native_range |
A vector of scenarios ‘c(’fit_layers', 'projections')' where native range modeling should be performed. |
suitable_habitat |
A vector of scenarios ‘c(’fit_layers', 'projections')' where habitat suitability modeling should be performed. |
other_analysis |
A vector of additional analyses to perform (e.g., ''variable_importance', 'functional_responses', 'cross_validation''). |
model_args |
A named list of additional arguments passed to the modeling function (e.g., 'dbarts::bart'). This allows users to fine-tune model parameters such as 'ntree' or 'k'. These are passed internally via '...' and must match the arguments of the selected model function. |
cv_methods |
A vector of the cross-validation strategies to perform. One or multiple of '"k-fold"', '"spatial_blocks"', '"temporal_blocks"'. |
cv_folds |
Integer indicating the number of folds to generate. |
cv_block_source |
For spatial blocks, how to determine block size. One of: '"residuals_autocorrelation"', '"predictors_autocorrelation"', '"manual"'. |
cv_block_size |
Numeric block size in meters (used if 'cv_block_source = "manual"'). |
pseudoabsence_method |
Method for generating pseudo-absences. One of "random", "target_group", "buffer_out", or "env_space_flexsdm". |
pa_ratio |
Ratio of pseudo-absences to presences (pseudo-absence:presences). |
target_group_points |
Optional data frame for sampling points for target-group method. |
pa_buffer_distance |
Numeric buffer radius in degrees around each presence. Default is NULL. |
seed |
Optional; an integer seed for reproducibility of results. |
waiter |
Optional; a waiter instance to update progress in a Shiny application. |
A list containing structured outputs from each major section of the analysis, including model data, projections, variable importance scores, and habitat suitability assessments.
This function inverts a polygon by calculating the difference between the bounding box and the polygon.
invert_polygon(polygon, bbox = NULL)invert_polygon(polygon, bbox = NULL)
polygon |
An sf object representing the polygon to be inverted. |
bbox |
Optional. An sf or bbox object representing the bounding box. If NULL, the bounding box of the input polygon is used. |
An sf object representing the inverted polygon.
This function crops and extends raster layers to a study area extent (bbox) defined by longitude and latitude then applies a mask based on a provided spatial polygon to remove areas outside the polygon.
layer_mask(layers, study_area)layer_mask(layers, study_area)
layers |
A stack of raster layers ('SpatRaster' object) to be processed. |
study_area |
A spatial polygon ('sf' object) used to mask the raster layers. |
A 'SpatRaster' object representing the masked raster layers.
This function calculates the optimal cutoff for presence-absence prediction using a BART model.
pa_optimal_cutoff(y, x, model, seed = NULL)pa_optimal_cutoff(y, x, model, seed = NULL)
y |
Vector indicating presence (1) or absence (0). |
x |
Dataframe with same number of rows as the length of the vector 'y' with the covariate values. |
model |
A BART model object. |
seed |
Random seed for reproducibility. |
The optimal cutoff value for presence-absence prediction.
Plot cross-validation fold assignments
plot_cv_folds_points(data, polygon = NULL)plot_cv_folds_points(data, polygon = NULL)
data |
Dataframe with columns: 'decimalLongitude', 'decimalLatitude', 'pa' and 'fold'. |
polygon |
An sf object representing the inverted study area. |
A ggplot object showing point color-coded by cv fold and shaped by presence/absence.
This function makes predictions using a Bayesian Additive Regression Trees (BART) model on a stack of environmental covariates ('SpatRaster').
predict_bart(bart_model, layers, cutoff = NULL)predict_bart(bart_model, layers, cutoff = NULL)
bart_model |
A BART model object obtained from fitting BART using the 'dbarts' package. |
layers |
A SpatRaster object containing environmental covariates for prediction. |
cutoff |
An optional numeric cutoff value for determining potential presences. If NULL, potential presences and absences will not be computed. |
A SpatRaster containing the mean, median, standard deviation, and quantiles of the posterior predictive distribution, as well as a potential presences layer if cutoff is provided.
This function removes duplicated points from a dataframe based on specified coordinate columns.
remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))remove_duplicate_points(df, coords = c("decimalLongitude", "decimalLatitude"))
df |
A dataframe object with each row representing one point. |
coords |
A character vector specifying the names of the coordinate columns used for identifying duplicate points. Default is c("decimalLongitude", "decimalLatitude"). |
A dataframe without duplicated points.
This function removes points from a dataframe based on their location relative to a specified polygon.
remove_points_polygon( df, polygon, overlapping = FALSE, coords = c("decimalLongitude", "decimalLatitude") )remove_points_polygon( df, polygon, overlapping = FALSE, coords = c("decimalLongitude", "decimalLatitude") )
df |
A dataframe object with rows representing points. |
polygon |
An sf polygon object defining the region for point removal. |
overlapping |
Logical indicating whether points overlapping the polygon should be removed (TRUE) or kept (FALSE). |
coords |
Character vector specifying the column names for longitude and latitude. Default is c("decimalLongitude", "decimalLatitude"). |
A dataframe containing the filtered points.
This function calculates the response curve (functional responses) using a Bayesian Additive Regression Trees (BART) model.
response_curve_bart(bart_model, data, predictor_names)response_curve_bart(bart_model, data, predictor_names)
bart_model |
A BART model object obtained from fitting BART ('dbarts::bart'). |
data |
A data frame containing the predictor variables (the design matrix) used in the BART model. |
predictor_names |
A character vector containing the names of the predictor variables. |
A list containing a data frame for each independent variable with mean, 2.5th percentile, 97.5th percentile, and corresponding values of the variables.
This function launches the GLOSSA Shiny web application.
run_glossa( request_size_mb = 2000, launch.browser = TRUE, port = getOption("shiny.port"), clear_global_env = FALSE )run_glossa( request_size_mb = 2000, launch.browser = TRUE, port = getOption("shiny.port"), clear_global_env = FALSE )
request_size_mb |
Maximum request size for file uploads, in megabytes. Default is 2000 MB. |
launch.browser |
Logical indicating whether to launch the app in the browser (default is TRUE). |
port |
Port number for the Shiny app. Uses the port specified by 'getOption("shiny.port")' by default. |
clear_global_env |
Logical. If TRUE, clears the global environment after the app exits. |
The GLOSSA Shiny app provides an interactive interface for users to access GLOSSA functionalities.
No return value, called to launch the GLOSSA app.
Use 'clear_global_env = TRUE' cautiously, as it removes all objects from your R environment after the app exits.
if(interactive()) { run_glossa() run_glossa(clear_global_env = TRUE) # clears all global objects }if(interactive()) { run_glossa() run_glossa(clear_global_env = TRUE) # clears all global objects }
This function computes the variable importance scores for a fitted BART (Bayesian Additive Regression Trees) model using a permutation-based approach. It measures the impact of each predictor variable on the model's performance by permuting the values of that variable and evaluating the change in performance (F-score is the performance metric).
variable_importance(bart_model, y, x, cutoff = 0, n_repeats = 10, seed = NULL)variable_importance(bart_model, y, x, cutoff = 0, n_repeats = 10, seed = NULL)
bart_model |
A BART model object. |
y |
Vector indicating presence (1) or absence (0). |
x |
Dataframe with same number of rows as the length of the vector 'y' with the covariate values. |
cutoff |
A numeric threshold for converting predicted probabilities into presence-absence. |
n_repeats |
An integer indicating the number of times to repeat the permutation for each variable. |
seed |
An optional seed for random number generation. |
A data frame where each column corresponds to a predictor variable, and each row contains the variable importance scores across permutations.