split column by delimiter in R

How to split one column into multiple columns in R

If you want to split one data frame column into multiple in R, then here is how to do that in 3 different ways.

Here is the data frame that I created from the mtcars dataset.

df <- data.frame("mytext" = as.character(row.names(mtcars)))

head(df)

#             mytext
#1         Mazda RX4
#2     Mazda RX4 Wag
#3        Datsun 710
#4    Hornet 4 Drive
#5 Hornet Sportabout
#6           Valiant

I will split this column using space as a delimiter, and to do that, I will create two calculations. The first calculation is the count of space characters in a string. That will help to calculate the number of columns that I will get after splitting. The second calculation will generate column names depending on the count of necessary columns.

#number of columns after splitting by space
ncols <- max(stringr::str_count(df$mytext, " ")) + 1

ncols
#[1] 3
#generate necessary column names
colmn <- paste("col", 1:ncols)

colmn
#[1] "col 1" "col 2" "col 3"

Split column by a delimiter in R

 

tidyr::separate

One of the best solutions to split column by delimiter is function separate from the tidyr. In the description of function separate are a couple of examples. There is a lot of parameters that can be useful to do some configurations for the desired result.

By the default function separate will remove the original column. If you want to keep it, then try to use the remove parameter. Set this parameter to FALSE if you want to keep the original.

df <-
  tidyr::separate(
    data = df,
    col = mytext,
    sep = " ",
    into = colmn,
    remove = FALSE
  )

require(dplyr) 
head(df) %>% knitr::kable()

#|mytext            |col 1   |col 2      |col 3 |
#|:-----------------|:-------|:----------|:-----|
#|Mazda RX4         |Mazda   |RX4        |NA    |
#|Mazda RX4 Wag     |Mazda   |RX4        |Wag   |
#|Datsun 710        |Datsun  |710        |NA    |
#|Hornet 4 Drive    |Hornet  |4          |Drive |
#|Hornet Sportabout |Hornet  |Sportabout |NA    |
#|Valiant           |Valiant |NA         |NA    |

If you want to split by delimiter by using dplyr pipe, here is how to do that.

require(dplyr)
require(tidyr)

df <- df %>% separate(mytext, sep = " ", into = colmn, remove = FALSE)

 

reshape2::colsplit

Here is another approach. To keep the original data frame column, I will use a cbind to combine that with the results.

df <- cbind(df, reshape2::colsplit(df$mytext, " ", colmn))

head(df)
#             mytext   col 1      col 2 col 3
#1         Mazda RX4   Mazda        RX4      
#2     Mazda RX4 Wag   Mazda        RX4   Wag
#3        Datsun 710  Datsun        710      
#4    Hornet 4 Drive  Hornet          4 Drive
#5 Hornet Sportabout  Hornet Sportabout      
#6           Valiant Valiant

In both previous examples, you can limit the number of columns that you get after splitting. Generate fewer new column names than needed.

 

stringi::stri_split_coll()

This function is different from the other two functions because you don’t have to create new column names. I will also bind together the original with the results.

df <- cbind(df, stringr::str_split_fixed(df$mytext, " ", ncols))

head(df)
#             mytext       1          2     3
#1         Mazda RX4   Mazda        RX4      
#2     Mazda RX4 Wag   Mazda        RX4   Wag
#3        Datsun 710  Datsun        710      
#4    Hornet 4 Drive  Hornet          4 Drive
#5 Hornet Sportabout  Hornet Sportabout      
#6           Valiant Valiant

As you can see, you don’t have to create new column names, and they will be generated for you.

 

Split column by fixed position in R

If the sep parameter is numeric, then it will split column by vector of fixed positions.

df <-
  data.frame(mytext = as.character(c("210101", "210102", "210103")))

df <-
  tidyr::separate(
    data = df,
    col = mytext,
    sep = c(2, 4),
    into = c("y", "m", "d"),
    remove = FALSE
  ) 

require(dplyr) 
head(df) %>% knitr::kable()
#|mytext |y  |m  |d  |
#|:------|:--|:--|:--|
#|210101 |21 |01 |01 |
#|210102 |21 |01 |02 |
#|210103 |21 |01 |03 |

There is also possible to split column peace by peace with the function substr that can extract characters by given positions.

 

Here is a couple of interesting things to do with the text: start every string with a capital letter, extract text based on a match of multiple possible strings.





Posted

in

Comments

2 responses to “How to split one column into multiple columns in R”

  1. Hunter Ries

    There is an error in the “separate” code.

    The text reads:
    df <-
    tidyr::separate(
    data = df,
    col = mytext,
    sep = " ",
    into = colnm,
    remove = FALSE
    )

    But it should read:
    df <-
    tidyr::separate(
    data = df,
    col = mytext,
    sep = " ",
    into = colmn,
    remove = FALSE
    )

    It says "colnm" in the current text, but needs to be changed to "colmn" for this to work.

    Thanks for the code tips. I found this all very useful!

    1. Janis Sturis

      Thank you, Hunter, for your help! I fixed it.

Leave a Reply

Your email address will not be published. Required fields are marked *