How can we do some calculations using last row within a group in data.table in R? -
i have data.table:
sample:
id cond date 1 a1 2012-11-19 1 a1 2013-05-09 1 a2 2014-09-05 2 b1 2015-03-05 2 b1 2015-07-06 3 a1 2015-02-05 4 b1 2012-09-26 4 b1 2015-02-05 5 b1 2012-09-26
i want calculate overdue days today's date within each group of 'id' , 'cond', trying difference of days between last date in each group , sys.date. desired output ;
id cond date overdue 1 a1 2012-11-19 na 1 a1 2013-05-09 832 1 a2 2014-09-05 348 2 b1 2015-03-05 na 2 b1 2015-07-06 44 3 a1 2015-02-05 195 4 b1 2012-09-26 na 4 b1 2015-02-05 195 5 b1 2012-09-26 1057
i tried achieve following code:
sample <- sample[ , overdue := sys.date() - date[.n], = c('id','cond')]
but getting following output, value recycling:
id cond date overdue 1 a1 2012-11-19 832 1 a1 2013-05-09 832 1 a2 2014-09-05 348 2 b1 2015-03-05 44 2 b1 2015-07-06 44 3 a1 2015-02-05 195 4 b1 2012-09-26 195 4 b1 2015-02-05 195 5 b1 2012-09-26 1057
i not sure, how can restrict code calculations last row , not recycle. sure there ways this, appreciated.
you make table of overdue values , rows belong in:
bycols = c("id","cond") newcoldt2 = dt[, sys.date() - date[.n], = bycols] dt[newcoldt2, overdue := v1, on = bycols, mult = "last"] # id cond date overdue # 1: 1 a1 2012-11-19 na days # 2: 1 a1 2013-05-09 832 days # 3: 1 a2 2014-09-05 348 days # 4: 2 b1 2015-03-05 na days # 5: 2 b1 2015-07-06 44 days # 6: 3 a1 2015-02-05 195 days # 7: 4 b1 2012-09-26 na days # 8: 4 b1 2015-02-05 195 days # 9: 5 b1 2012-09-26 1057 days
this (arguably uglier) one-liner version:
dt[j(unique(dt[, bycols, with=false])), overdue := sys.date() - date, on = bycols, mult = "last"]
data:
dt <- data.table(read.table(header=true,text="id cond date 1 a1 2012-11-19 1 a1 2013-05-09 1 a2 2014-09-05 2 b1 2015-03-05 2 b1 2015-07-06 3 a1 2015-02-05 4 b1 2012-09-26 4 b1 2015-02-05 5 b1 2012-09-26"))[, date := as.idate(date)] # know how fread()?
Comments
Post a Comment