Calculate a conditional running sum in R for every row in data frame -
i create column equal running sum of data$rating, given 2 conditions true in column 3 , 4, data$year< current year , data$id equal current id.
in words should calculate cumulative sum of ratings each id until previous year. , should each row in data frame (about 50,000 rows). given size of data frame, i'd prefer not loop, if @ possible.
i've provided short example of how below...
> head(data[,c(3,4,13)]) year id rating cumsum 1 2010 13578 2 0 2 2010 13579 1 0 3 2010 13575 3 0 4 2011 13575 4 3 5 2012 13578 3 2 6 2012 13579 2 1 7 2012 13579 4 1
i'm coming spreadsheet background, still thinking in terms of sumifs, etc. (which nicely solve problem in excel), apologies if language isn't precise.
data <- data.frame(year = c( rep(2010, 3), 2011, rep(2012, 3) ), id = c(13578, 13579, 13575, 13575, 13578, 13579, 13579), rating = c(2, 1, 3, 4, 3, 2, 4)) data # year id rating # 1 2010 13578 2 # 2 2010 13579 1 # 3 2010 13575 3 # 4 2011 13575 4 # 5 2012 13578 3 # 6 2012 13579 2 # 7 2012 13579 4
- create column equal running sum of
data$rating
data$year < year
data$id == id
- this should calculate cumulative sum of ratings each
id
until previous year
the desired output be
data # year id rating cumsum # 1 2010 13578 2 2 # 2 2010 13579 1 1 # 3 2010 13575 3 3 # 4 2011 13575 4 7 # 5 2012 13578 3 5 # 6 2012 13579 2 3 # 7 2012 13579 4 7
this done so
year <- 2014 # maximum year include in cumsum id.values <- names(table(data$id)) # unique values of data$id, sorted # cumsum 13575 rows, followed cumsum 13578 rows, ... rating.cumsum <- unlist(sapply(id.values, function(x) cumsum(data$rating[data$id == x]))) # assign cumsum output appropriate rows data$cumsum[with(data, order(id))] <- rating.cumsum
Comments
Post a Comment