Find the element(s) in a vector, whose data type potentially deviates from the majority data type or a specified one.
Usage
find_culprit(
vec,
should_be = c("majority", "numeric", "double", "integer", "character", "logical"),
return_index = FALSE,
na.rm = TRUE,
verbose = TRUE,
...
)Arguments
- vec
Data vector to find the not-fitting element in.
- should_be
Expected data type of
vec.Numericis indifferent if double or integer is provided. If set tomajority, expect the data type of most of the elements.- return_index
Return index of culprit elements rather than their value.
- na.rm
Do not treat
NAvalues as culprits, i.e. discard them from the culprit return.- verbose
If guessed data type should be printed if
should_beismajority.- ...
Arguments passed on to
readr::guess_parsernaCharacter vector of strings to interpret as missing values. Set this option to
character()to indicate no missing values.localeThe locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use
locale()to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.
Value
A named vector, with either the element values (default) or their index (return_index) and the guessed data types as names.
Details
This function can be used to find the culprit element, which prevents clean data type conversion.
Utilizes readr::guess_parser() to guess type of every vector element.
User can specify which data type the vector should_be, or simply rely on the majority guess.
Ties in the majority guess are resolved lexicographically by data type name, i.e. character > double > integer > logical.
If should_be is numeric, the data type is interpreted as double for lexicographic sorting.
Examples
long_data <- c(1, 2.2, "4,", 5)
# standard type conversion would convert "<4" to NA
if (FALSE) { # \dontrun{
as.numeric(long_data)
} # }
# better for maximizing information content: find culprit element and treat it appropriately
find_culprit(long_data, should_be = "numeric")
#> number
#> "4,"
as.numeric(gsub(",", "", long_data))
#> [1] 1.0 2.2 4.0 5.0