detect data types in R, check data type in R

How to detect data type in R and change it if necessary

Here is how to detect data type in R and change it when necessary. Further data manipulations may require checking of that. For example, if you want to join two data frames, the data type of the key columns must match.

 

Before you dig into the various data type detection possibilities, here is something that can be a quick fix. If you want to auto-detect and change data types, look at the post on how to do that.

 

Here is the data that I will use in this post.

data("chickwts")

head(chickwts)

#  weight      feed
#1    179 horsebean
#2    160 horsebean
#3    136 horsebean
#4    227 horsebean
#5    217 horsebean
#6    168 horsebean


data("swiss")

head(swiss)

#             Fertility Agriculture Examination Education Catholic Infant.Mortality
#Courtelary        80.2        17.0          15        12     9.96             22.2
#Delemont          83.1        45.1           6         9    84.84             22.2
#Franches-Mnt      92.5        39.7           5         5    93.40             20.2
#Moutier           85.8        36.5          12         7    33.77             20.3
#Neuveville        76.9        43.5          17        15     5.16             20.6
#Porrentruy        76.1        35.3           9         7    90.57             26.6


y1 <- 1:3

y2 <- c("a", "b", "c")

 

Check how the data is stored using the environment pane in RStudio

By using RStudio, it might be easy to look at how data is stored and get an idea of data types using the environment pane. Sometimes it is possible to keep things simple. If on the left side of the object name is a blue icon, expand the data frame and examine metadata.

detect data types in RStudio, check metadata in RStudio

 

Detect data type in R for multiply data frame columns at once

If you have to check data type for more than one column in R, it is a good opportunity to use functions that are capable of doing that. One of the best choices is function str, which is from base R. It can show a little bit of the structure of the R object.

str(swiss)

#'data.frame':	47 obs. of  6 variables:
#$ Fertility       : num  80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
#$ Agriculture     : num  17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
#$ Examination     : int  15 6 5 12 17 9 16 14 12 16 ...
#$ Education       : int  12 9 5 7 15 7 7 8 7 13 ...
#$ Catholic        : num  9.96 84.84 93.4 33.77 5.16 ...
#$ Infant.Mortality: num  22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...

Alternative for function str is function glimpse from tibble that gives similar results but a little more data. Here you can find out more about data types returned by glimpse.

tibble::glimpse(swiss)

#Rows: 47
#Columns: 6
#$ Fertility         80.2, 83.1, 92.5, 85.8, 76.9, 76.1, 83.8, 92.4, 82.4, 82.9, 87.1, 64.~
#$ Agriculture       17.0, 45.1, 39.7, 36.5, 43.5, 35.3, 70.2, 67.8, 53.3, 45.2, 64.5, 62.~
#$ Examination       15, 6, 5, 12, 17, 9, 16, 14, 12, 16, 14, 21, 14, 19, 22, 18, 17, 26, ~
#$ Education         12, 9, 5, 7, 15, 7, 7, 8, 7, 13, 6, 12, 7, 12, 5, 2, 8, 28, 20, 9, 10~
#$ Catholic          9.96, 84.84, 93.40, 33.77, 5.16, 90.57, 92.85, 97.16, 97.67, 91.38, 9~
#$ Infant.Mortality  22.2, 22.2, 20.2, 20.3, 20.6, 26.6, 23.6, 24.9, 21.0, 24.4, 24.5, 16.~

 

Try these functions to detect data types in R

One of the first functions that intuitively might be used in R to check data types is the R base function typeof.

y1 <- 1:3

typeof(y1)

#[1] "integer"

y2 <- c("a", "b", "c")

typeof(y2)

#[1] "character"

You can use the typeof function with the sapply function to detect data types for all data frame columns. Here it is used to detect data type for one column.

typeof(swiss$Education)

#[1] "integer"

Here it is used to detect data types for all of the columns.

data.frame("data types" = sapply(swiss, typeof))

#                 data.types
#Fertility            double
#Agriculture          double
#Examination         integer
#Education           integer
#Catholic             double
#Infant.Mortality     double

 

To better understand your data, you can also use function class.

class(y1)

#[1] "integer"

class(swiss)

#[1] "data.frame"

class(swiss$Education)

#[1] "integer"

data.frame("data types" = sapply(swiss, class))

#                 data.types
#Fertility           numeric
#Agriculture         numeric
#Examination         integer
#Education           integer
#Catholic            numeric
#Infant.Mortality    numeric

There are also functions like mode and storage.mode that rely on the output of the typeof function and show how it is stored.

mode(y1)

#[1] "numeric"

storage.mode(y1)

#[1] "integer"

Here are differences in the result of these functions when dealing with factors.

typeof(chickwts$feed)

#[1] "integer"

mode(chickwts$feed)

#[1] "numeric"

class(chickwts$feed)

#[1] "factor"

Take a look at this example if you want how to use factors to sort text as numbers in R.

 

Detect data type by using dplyr

There is a function type_sum that allows getting a summary of data types.

require(dplyr)

swiss %>% lapply(type_sum) %>% unlist %>% as.data.frame()

#                   .
#Fertility        dbl
#Agriculture      dbl
#Examination      int
#Education        int
#Catholic         dbl
#Infant.Mortality dbl

You can similarly use any of the previously mentioned functions.

swiss %>% lapply(typeof) %>% unlist %>% as.data.frame()

#                       .
#Fertility         double
#Agriculture       double
#Examination      integer
#Education        integer
#Catholic          double
#Infant.Mortality  double

 

Use R functions that start with “is”

If you want to know if something is what you expect, try to use functions like is.numeric, is.character, etc.

is.factor(chickwts$feed)

#[1] TRUE

is.character(chickwts$feed)

#[1] FALSE

 

Detect timezone for POSIX dates in R

Detection that something is POSIX date is not enough. Check timezone and avoid time shift problems after joining data frames. Here is more about detecting and changing the time zone in R.

 

Change data types in R

Here is how to automatically detect and change data types in a data frame, but you can do that by using multiple R functions. Look for those who start with “as”. For example, if you want to change the data type from numeric to character, here is how to do that.

typeof(swiss$Education)

#[1] "integer"

head(as.character(swiss$Education))

#[1] "12" "9"  "5"  "7"  "15" "7"

Here are two additional examples of how to convert string to date and a tricky situation when the number is stored as a factor and you want to convert it to numeric.

To change data types for ar range of columns in R, try the approach from my favorite dplyr tips and tricks.

require(dplyr)

swiss %>% str()

#'data.frame':	47 obs. of  6 variables:
#$ Fertility       : num  80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
#$ Agriculture     : num  17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
#$ Examination     : int  15 6 5 12 17 9 16 14 12 16 ...
#$ Education       : int  12 9 5 7 15 7 7 8 7 13 ...
#$ Catholic        : num  9.96 84.84 93.4 33.77 5.16 ...
#$ Infant.Mortality: num  22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...

swiss %>% summarise(across(where(is.numeric), as.character)) %>% str()

#'data.frame':	47 obs. of  6 variables:
#$ Fertility       : chr  "80.2" "83.1" "92.5" "85.8" ...
#$ Agriculture     : chr  "17" "45.1" "39.7" "36.5" ...
#$ Examination     : chr  "15" "6" "5" "12" ...
#$ Education       : chr  "12" "9" "5" "7" ...
#$ Catholic        : chr  "9.96" "84.84" "93.4" "33.77" ...
#$ Infant.Mortality: chr  "22.2" "22.2" "20.2" "20.3" ...

Here is another example with function ifelse across multiple columns.





Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *