If you want to split one data frame column into multiple in R, then here is how to do that in 3 different ways.
Here is the data frame that I created from the mtcars dataset.
df <- data.frame("mytext" = as.character(row.names(mtcars))) head(df) # mytext #1 Mazda RX4 #2 Mazda RX4 Wag #3 Datsun 710 #4 Hornet 4 Drive #5 Hornet Sportabout #6 Valiant
I will split this column using space as a delimiter, and to do that, I will create two calculations. The first calculation is the count of space characters in a string. That will help to calculate the number of columns that I will get after splitting. The second calculation will generate column names depending on the count of necessary columns.
#number of columns after splitting by space ncols <- max(stringr::str_count(df$mytext, " ")) + 1 ncols # 3
#generate necessary column names colmn <- paste("col", 1:ncols) colmn # "col 1" "col 2" "col 3"
Split column by a delimiter in R
One of the best solutions to split column by delimiter is function separate from the tidyr. In the description of function separate are a couple of examples. There is a lot of parameters that can be useful to do some configurations for the desired result.
By the default function separate will remove the original column. If you want to keep it, then try to use the remove parameter. Set this parameter to FALSE if you want to keep the original.
df <- tidyr::separate( data = df, col = mytext, sep = " ", into = colnm, remove = FALSE ) require(dplyr) head(df) %>% knitr::kable() #|mytext |col 1 |col 2 |col 3 | #|:-----------------|:-------|:----------|:-----| #|Mazda RX4 |Mazda |RX4 |NA | #|Mazda RX4 Wag |Mazda |RX4 |Wag | #|Datsun 710 |Datsun |710 |NA | #|Hornet 4 Drive |Hornet |4 |Drive | #|Hornet Sportabout |Hornet |Sportabout |NA | #|Valiant |Valiant |NA |NA |
If you want to split by delimiter by using dplyr pipe, here is how to do that.
require(dplyr) require(tidyr) df <- df %>% separate(mytext, sep = " ", into = colnm, remove = FALSE)
Here is another approach. To keep the original data frame column, I will use a cbind to combine that with the results.
df <- cbind(df, reshape2::colsplit(df$mytext, " ", colnm)) head(df) # mytext col 1 col 2 col 3 #1 Mazda RX4 Mazda RX4 #2 Mazda RX4 Wag Mazda RX4 Wag #3 Datsun 710 Datsun 710 #4 Hornet 4 Drive Hornet 4 Drive #5 Hornet Sportabout Hornet Sportabout #6 Valiant Valiant
In both previous examples, you can limit the number of columns that you get after splitting. Generate fewer new column names than needed.
This function is different from the other two functions because you don’t have to create new column names. I will also bind together the original with the results.
df <- cbind(df, stringr::str_split_fixed(df$mytext, " ", ncols)) head(df) # mytext 1 2 3 #1 Mazda RX4 Mazda RX4 #2 Mazda RX4 Wag Mazda RX4 Wag #3 Datsun 710 Datsun 710 #4 Hornet 4 Drive Hornet 4 Drive #5 Hornet Sportabout Hornet Sportabout #6 Valiant Valiant
As you can see, you don’t have to create new column names, and they will be generated for you.
Split column by fixed position in R
If the sep parameter is numeric, then it will split column by vector of fixed positions.
df <- data.frame(mytext = as.character(c("210101", "210102", "210103"))) df <- tidyr::separate( data = df, col = mytext, sep = c(2, 4), into = c("y", "m", "d"), remove = FALSE ) require(dplyr) head(df) %>% knitr::kable() #|mytext |y |m |d | #|:------|:--|:--|:--| #|210101 |21 |01 |01 | #|210102 |21 |01 |02 | #|210103 |21 |01 |03 |
There is also possible to split column peace by peace with the function substr that can extract characters by given positions.
Here is a couple of interesting things to do with the text: start every string with a capital letter, extract text based on a match of multiple possible strings.