stata - How to collapse numbers with same identifier but different date, but preserve the date of first observation for each identifier -


i have dataset can simplified in following format:

clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2010" 6 -5 -3 "19jan2010" 6 -1 -1 end 

in dataset, there date, id, vara, , varb. each id represents unique set of transactions. want collapse (sum) vara varb, by(date) in stata. however, want keep date of first observation each id number.

essentially, want above dataset become following:

+--------------------------------+ |    date     id   var1   var2   | |--------------------------------| | 12jan2010    5     21     42   | | 12jan2010    6     41     17   | | 15jan2010    10     7     68   | +--------------------------------+ 

12jan2010 17jan2010 , 19jan2010 have same id, want collapse (sum) var1 var2 these 3 observations. want keep date 12jan2010 because date first observation. other 2 observations dropped.

i know might possible collapse id first , merge original dataset , subset. wondering if there easier way make work. thanks!

collapse allows calculate variety of statistics, can convert string date numerical date, take minimum of numerical date first occurrence.

clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2010" 6 -5 -3 "19jan2010" 6 -1 -1 end  gen date2 = date(date, "dmy") format date2 %td collapse (sum) vara varb (min) date2 , by(id) order date2, first li 

yielding

     +------------------------------+      |     date2   id   vara   varb |      |------------------------------|   1. | 12jan2010    5     21     42 |   2. | 12jan2010    6     41     17 |   3. | 15jan2010   10      7     68 |      +------------------------------+ 

in response comment: can generate formatted date observations vara > 0 (and not missing). (assuming that, per comment, vara & varb have same sign.)

// assume id 6 has earliest date of 17jan2005 (obs.4) // want return 'first date'  // first date vara & varb both positive clear input str9 date id vara varb "12jan2010" 5 21 42 "12jan2010" 6 47 21 "15jan2010" 10 7 68 "17jan2005" 6 -5 -3 "19jan2010" 6 -1 -1 end  gen date2 = date(date, "dmy") if vara > 0 & !missing(vara) format date2 %td collapse (sum) vara varb (min) date2 , by(id) order date2, first li 

yielding

     +------------------------------+      |     date2   id   vara   varb |      |------------------------------|   1. | 12jan2010    5     21     42 |   2. | 12jan2010    6     41     17 |   3. | 15jan2010   10      7     68 |      +------------------------------+ 

Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -