Package 'rsprite2' reference manual

Title:	Identify Distributions that Match Reported Sample Parameters (SPRITE)
Description:	The SPRITE algorithm creates possible distributions of discrete responses based on reported sample parameters, such as mean, standard deviation and range (Heathers et al., 2018, <doi:10.7287/peerj.preprints.26968v1>). This package implements it, drawing heavily on the code for Nick Brown's 'rSPRITE' Shiny app <https://shiny.ieis.tue.nl/sprite/>. In addition, it supports the modeling of distributions based on multi-item (Likert-type) scales and the use of restrictions on the frequency of particular responses.
Authors:	Lukas Wallrich [aut, cre] , Aurélien Allard [ctb], Lukas Jung [ctb]
Maintainer:	Lukas Wallrich <[email protected]>
License:	MIT + file LICENSE
Version:	0.2.1
Built:	2025-03-05 04:02:14 UTC
Source:	https://github.com/lukaswallrich/rsprite2

Find a possible distribution.

Description

This function aims to find a possible distribution that would give rise to the observed sample parameters. For that, you need to pass a list of parameters, best created with set_parameters

Usage

find_possible_distribution(parameters, seed = NULL, values_only = FALSE)
find_possible_distribution(parameters, seed = NULL, values_only = FALSE)

Arguments

`parameters`	List of parameters, see `set_parameters`
`seed`	An integer to use as the seed for random number generation. Set this in scripts to ensure reproducibility.
`values_only`	Should only values or a more informative list be returned. See Value section.

Value

Unless values_only = TRUE, a list with:

`outcome`	success or failure - character
`distribution`	The distribution that was found (if success) / that had the closest variance (if failure) - numeric
`mean`	The exact mean of the distribution - numeric
`sd`	The SD of the distribution that was found (success) / that came closest (failure) - numeric
`iterations`	The number of iterations required to achieve the specified SD - numeric

If values_only = TRUE, then the distribution is returned if one was found, and NULL if it failed.

Examples

sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)
find_possible_distribution(sprite_parameters)

sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)
find_possible_distribution(sprite_parameters)

Find several possible distributions.

Description

This function aims to find several possible distribution that would give rise to the observed sample parameters. For that, you need to pass a list of parameters, created with set_parameters

Usage

find_possible_distributions(
  parameters,
  n_distributions = 10,
  seed = NULL,
  return_tibble = TRUE,
  return_failures = FALSE
)
find_possible_distributions(
  parameters,
  n_distributions = 10,
  seed = NULL,
  return_tibble = TRUE,
  return_failures = FALSE
)

Arguments

`parameters`	List of parameters, see `set_parameters`
`n_distributions`	The target number of distributions to return.
`seed`	An integer to use as the seed for random number generation. Set this in scripts to ensure reproducibility.
`return_tibble`	Should a tibble, rather than a list, be returned? Requires the `tibble`-package, ignored if that package is not available.
`return_failures`	Should distributions that failed to produce the desired SD be returned? Defaults to false

Value

A tibble or list (depending on the return_tibble argument) with:

`outcome`	success or failure - character
`distribution`	The distribution that was found (if success) / that had the closest variance (if failure) - numeric
`mean`	The exact mean of the distribution - numeric
`sd`	The SD of the distribution that was found (success) / that came closest (failure) - numeric
`iterations`	The number of iterations required to achieve the specified SD - numeric - the first time this distribution was found

Examples


sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)

find_possible_distributions(sprite_parameters, 5, seed = 1234)

sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)

find_possible_distributions(sprite_parameters, 5, seed = 1234)

This function tests whether a given mean (with a specific precision) can result from a sample of a given size based on integer responses to one or more items. The test is based on Brown & Heathers (2017). If return_values = TRUE and if there is more than one precise mean compatible with the given parameters, all possible means are returned. In that case, if the given mean is not consistent, the closest consistent mean is returned with a warning.

Usage

GRIM_test(mean, n_obs, m_prec = NULL, n_items = 1, return_values = FALSE)
GRIM_test(mean, n_obs, m_prec = NULL, n_items = 1, return_values = FALSE)

Arguments

`mean`	The mean of the distribution
`n_obs`	The number of observations (sample size)
`m_prec`	The precision of the mean, as number of digits after the decimal point. If not provided, taken based on the significant digits of `mean` - so only needed if reported mean ends in 0
`n_items`	Number of items in scale, if distribution represents scale averages. Defaults to 1, which represents any single-item measure.
`return_values`	Should all means consistent with the given parameters be returned?

Value

Either TRUE/FALSE, or all possible means (if test passes)/closest consistent mean (if test fails)

References

Brown NJ, Heathers JA (2017). “The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology.” Social Psychological and Personality Science, 8(4), 363–369.

Examples

# A sample of 28 integers cannot result in a mean of 5.19. This is shown by
GRIM_test(5.19, 28)

# To find the closest possible mean, set return_values to TRUE
GRIM_test(5.19, 28, return_values = TRUE)

# A sample of 28 integers cannot result in a mean of 5.19. This is shown by
GRIM_test(5.19, 28)

# To find the closest possible mean, set return_values to TRUE
GRIM_test(5.19, 28, return_values = TRUE)

GRIMMER test for standard deviation

Description

This function tests whether a given standard deviation (with a specific precision) can result from a sample of a given size based on integer responses to one or more items. The test was first proposed by Anaya (2016); here, the algorithm developed by Allard (2018) is used, extended by Aurélien Allard to support multi-item scales.

Usage

GRIMMER_test(
  mean,
  sd,
  n_obs,
  m_prec = NULL,
  sd_prec = NULL,
  n_items = 1,
  min_val = NULL,
  max_val = NULL
)
GRIMMER_test(
  mean,
  sd,
  n_obs,
  m_prec = NULL,
  sd_prec = NULL,
  n_items = 1,
  min_val = NULL,
  max_val = NULL
)

Arguments

`mean`	The mean of the distribution
`sd`	The standard deviation of the distribution
`n_obs`	The number of observations (sample size)
`m_prec`	The precision of the mean, as number of digits after the decimal point. If not provided, taken based on the significant digits of `mean` - so only needed if reported mean ends in 0
`sd_prec`	The precision of the standard deviation, again only needed if reported standard deviation ends in 0.
`n_items`	Number of items in scale, if distribution represents scale averages. Defaults to 1, which represents any single-item measure.
`min_val`	(Optional) Scale minimum. If provided alongside max_val, the function checks whether the SD is consistent with that range.
`max_val`	(Optional) Scale maximum.

Value

Logical TRUE/FALSE indicating whether given standard deviation is possible, given the other parameters

References

Anaya J (2016). “The GRIMMER test: A method for testing the validity of reported measures of variability.” PeerJ Preprints, 4, e2400v1.

Examples

# A sample of 18 integers with mean 3.44 cannot have an SD of 2.47. This is shown by
GRIMMER_test(mean = 3.44, sd = 2.47, n_obs = 18)


# A sample of 18 integers with mean 3.44 cannot have an SD of 2.47. This is shown by
GRIMMER_test(mean = 3.44, sd = 2.47, n_obs = 18)

Plot distributions

Description

This plots distributions identified by find_possible_distributions using ggplot2. They can be shown as histograms or as cumulative distributions (ECDF) plots. The latter give more information, yet not all audiences are familiar with them.

Usage

plot_distributions(
  distributions,
  plot_type = c("auto", "histogram", "ecdf", "density"),
  max_plots = 100,
  show_ids = FALSE,
  facets = NULL
)
plot_distributions(
  distributions,
  plot_type = c("auto", "histogram", "ecdf", "density"),
  max_plots = 100,
  show_ids = FALSE,
  facets = NULL
)

Arguments

`distributions`	Tibble with a column `distribution` and an identifier (`id`), typically as returned from `find_possible_distributions`.
`plot_type`	Plot multiple histograms, or overlapping cumulative distribution plots, or density plots? "auto" is to plot histograms if up to 9 distributions are passed, or if there are fewer than 10 discrete values, and empirical cumulative distribution plots otherwise
`max_plots`	How many distributions should at most be plotted? If more are passed, this number is randomly selected.
`show_ids`	Should ids of the distributions be shown with ecdf and density charts? Defaults to no, since the default ids are not meaningful.
`facets`	Should distributions be shown in one chart or in multiple small charts? Only considered for ecdf and density charts, histograms are always shown in facets

Value

A ggplot2 object that can be styled with functions such as labs or theme_linedraw

Examples

sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)

poss <- find_possible_distributions(sprite_parameters, 5, seed = 1234)

# All distributions in same plot
plot_distributions(poss, plot_type = "ecdf")

# Separate plot for each distribution
plot_distributions(poss, plot_type = "ecdf", facets = TRUE)

sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20,
                                    min_val = 1, max_val = 5)

poss <- find_possible_distributions(sprite_parameters, 5, seed = 1234)

# All distributions in same plot
plot_distributions(poss, plot_type = "ecdf")

# Separate plot for each distribution
plot_distributions(poss, plot_type = "ecdf", facets = TRUE)

Define parameters for SPRITE algorithm

Description

The SPRITE algorithm aims to construct possible distributions that conform to observed/reported parameters. This function performs some checks and returns a list of these parameters that can then be passed to the functions that actually generate the distributions (e.g. find_possible_distribution)

Usage

set_parameters(
  mean,
  sd,
  n_obs,
  min_val,
  max_val,
  m_prec = NULL,
  sd_prec = NULL,
  n_items = 1,
  restrictions_exact = NULL,
  restrictions_minimum = NULL,
  dont_test = FALSE
)
set_parameters(
  mean,
  sd,
  n_obs,
  min_val,
  max_val,
  m_prec = NULL,
  sd_prec = NULL,
  n_items = 1,
  restrictions_exact = NULL,
  restrictions_minimum = NULL,
  dont_test = FALSE
)

Arguments

`mean`	The mean of the distribution
`sd`	The standard deviation of the distribution
`n_obs`	The number of observations (sample size)
`min_val`	The minimum value
`max_val`	The maximum value
`m_prec`	The precision of the mean, as number of digits after the decimal point. If not provided, taken based on the significant digits of `mean` - so only needed if reported mean ends in 0
`sd_prec`	The precision of the standard deviation, again only needed if reported standard deviation ends in 0.
`n_items`	Number of items in scale, if distribution represents scale averages. Defaults to 1, which represents any single-item measure.
`restrictions_exact`	Restrictions on the exact frequency of specific responses, see Details
`restrictions_minimum`	Restrictions on the minimum frequency of specific responses, see Details
`dont_test`	By default, this function tests whether the mean is possible, given the sample size (GRIM-test) and whether the standard deviation is possible, given mean and sample size (GRIMMER test), and fails otherwise. If you want to override this, and run SPRITE anyway, you can set this to TRUE.

Details

Restrictions can be used to define how often a specific value should appear in the sample. They need to be passed as a list in the form value = frequency. Thus, to specify that there should be no 3s and five 4s in the distribution, you would pass restrictions_exact = list("3" = 0, "4" = 5). To specify that there should be at least one 1 and one 7, you would pass restrictions_minimum = list("1" = 1, "7" = 1). If you just want to specify that the minimum and maximum values appear at least once (for instance when they are the reported rather than possible range), you can use the shortcut restrictions_minimum = "range". Finally, if you work with multi-item scales that result in decimal responses, round those names to two decimal points, e.g., when n_items = 3 you could specify list("1.67" = 0).

Value

A named list of parameters, pre-processed for further rsprite2 functions.

Examples


set.seed(1234) #To get reproducible results

# Simple case
sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20, min_val = 1, max_val = 5)
find_possible_distribution(sprite_parameters)

# With restrictions
sprite_parameters <- set_parameters(mean = 1.95, sd = 1.55, n_obs = 20,
                                    min_val = 1, max_val = 5, n_items = 3,
                                    restrictions_exact = list("3"=0, "3.67" = 2),
                                    restrictions_minimum = "range")
find_possible_distribution(sprite_parameters)

set.seed(1234) #To get reproducible results

# Simple case
sprite_parameters <- set_parameters(mean = 2.2, sd = 1.3, n_obs = 20, min_val = 1, max_val = 5)
find_possible_distribution(sprite_parameters)

# With restrictions
sprite_parameters <- set_parameters(mean = 1.95, sd = 1.55, n_obs = 20,
                                    min_val = 1, max_val = 5, n_items = 3,
                                    restrictions_exact = list("3"=0, "3.67" = 2),
                                    restrictions_minimum = "range")
find_possible_distribution(sprite_parameters)

Package 'rsprite2'

Help Index

Find a possible distribution.

Description

Usage

Arguments

Value

Examples

Find several possible distributions.

Description

Usage

Arguments

Value

Examples

GRIM test for mean

Description

Usage

Arguments

Value

References

Examples

GRIMMER test for standard deviation

Description

Usage

Arguments

Value

References

Examples

Plot distributions

Description

Usage

Arguments

Value

Examples

Define parameters for SPRITE algorithm

Description

Usage

Arguments

Details

Value

Examples