Group index that restarts in R

Group index that restarts in R

Here is how to generate a group index in R that restarts every time there change in variables.

Here is a similar but simpler example of how to generate a group index in R, but there is nothing that makes the group index restart.

Below is my data frame that contains client activity. Created from existing by using function dput.

df <-
  structure(
    list(
      customer_id = c(
        933671L,
        933671L,
        933671L,
        933671L,
        933671L,
        933671L,
        871209L,
        871209L,
        871209L,
        871209L,
        871209L,
        871209L,
        871209L,
        661777L,
        661777L,
        661777L,
        661777L,
        661777L,
        661777L
      ),
      result = c(2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L),
      result_description = structure(
        c(1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L, 1L),
        .Label = c("dissatisfied", "satisfied"),
        class = "factor"
      )
    ),
    class = "data.frame",
    row.names = c(NA, -19L)
  )

head(df)
#  customer_id result result_description
#1      933671      2       dissatisfied
#2      933671      1          satisfied
#3      933671      1          satisfied
#4      933671      1          satisfied
#5      933671      2       dissatisfied
#6      933671      1          satisfied

For each of the clients are the variable “result” that contains two values – “satisfied” or “dissatisfied” that may change irregularly.

Is it possible to index a series of variable repetitions within the group? Yes, it is.

Group index that restarts in R

That can be done in two steps. The first is with the rleid function from data.table package. That will create an id for the next grouping operations. It will create a unique id for groups representing how long is the thread of my variable in combination with other variables.

df$thread_index <- data.table::rleid(df$customer_id, df$result_description)

head(df)
#  customer_id result result_description thread_index
#1      933671      2       dissatisfied            1
#2      933671      1          satisfied            2
#3      933671      1          satisfied            2
#4      933671      1          satisfied            2
#5      933671      2       dissatisfied            3
#6      933671      1          satisfied            4

The additional ids are attached and now by using rowid form data.table package, I can generate a group index that restarts every time the next thread of variable begins.

df$thread_index <- data.table::rowid(df$thread_index)

#head(df)
#  customer_id result result_description thread_index
#1      933671      2       dissatisfied            1
#2      933671      1          satisfied            1
#3      933671      1          satisfied            2
#4      933671      1          satisfied            3
#5      933671      2       dissatisfied            1
#6      933671      1          satisfied            1

What’s next?

Check out my favorite RStudio tips and tricks. For example, how to comment out multiple lines of R script at once.

 




Posted

in

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *