stata - How to collapse numbers with same identifier but different date, but preserve the date of first observation for each identifier -
i have dataset can simplified in following format:
clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2010" 6 -5 -3 "19jan2010" 6 -1 -1 end
in dataset, there date, id, vara, , varb. each id represents unique set of transactions. want collapse (sum) vara varb, by(date)
in stata. however, want keep date of first observation each id number.
essentially, want above dataset become following:
+--------------------------------+ | date id var1 var2 | |--------------------------------| | 12jan2010 5 21 42 | | 12jan2010 6 41 17 | | 15jan2010 10 7 68 | +--------------------------------+
12jan2010 17jan2010 , 19jan2010 have same id, want collapse (sum) var1 var2
these 3 observations. want keep date 12jan2010 because date first observation. other 2 observations dropped.
i know might possible collapse id first , merge original dataset , subset. wondering if there easier way make work. thanks!
collapse
allows calculate variety of statistics, can convert string date numerical date, take minimum of numerical date first occurrence.
clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2010" 6 -5 -3 "19jan2010" 6 -1 -1 end gen date2 = date(date, "dmy") format date2 %td collapse (sum) vara varb (min) date2 , by(id) order date2, first li
yielding
+------------------------------+ | date2 id vara varb | |------------------------------| 1. | 12jan2010 5 21 42 | 2. | 12jan2010 6 41 17 | 3. | 15jan2010 10 7 68 | +------------------------------+
in response comment: can generate formatted date observations vara
> 0 (and not missing). (assuming that, per comment, vara
& varb
have same sign.)
// assume id 6 has earliest date of 17jan2005 (obs.4) // want return 'first date' // first date vara & varb both positive clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2005" 6 -5 -3 "19jan2010" 6 -1 -1 end gen date2 = date(date, "dmy") if vara > 0 & !missing(vara) format date2 %td collapse (sum) vara varb (min) date2 , by(id) order date2, first li
yielding
+------------------------------+ | date2 id vara varb | |------------------------------| 1. | 12jan2010 5 21 42 | 2. | 12jan2010 6 41 17 | 3. | 15jan2010 10 7 68 | +------------------------------+
Comments
Post a Comment