C++ Tokenizer Complexity vs strtok_r -


i'm making question because moved tokenizer strtok_r equivalent version in c++. have use strtok_r in place of strtok, because have 2 nested tokenizations perform of time.

the strtok_r algorithm this:

char *end_token, *token, *word ; // fill 'word' token = strtok_r (word, " ", &end_token) ; while (token != null) {   //   token = strtok_r (null, " ", &end_token) ; } 

and c++ version (taken post here):

string mystring, token ; size_t next_token ; // fill 'mystring' while (token != mystring) {     next_token = mystring.find_first_of (" ") ;     token = mystring.substr (0, next_token) ;     mystring = mystring.substr (next_token + 1) ;     // } 

now question: why c++ version heavy respect c version? long strings have wait 10 seconds c++ version, while c version instantaneous same strings. so, seems c++ version has higher complexity... think about?

strtok() modifies string, replacing token delimiter null terminator. if long string has n tokens, function iterates through string changing n chars null, extremely fast.

in c++ alternative making 2*n string copies, means potentially 2*n allocation operations, plus sheer copy of (very long) remaining string, heavier first alternative. difference you're not obliged change original string.

you improve keeping string you're iterating trough unchanged , example use offsets search:

string mystring, token ; size_t cur_token=0, next_token ; // fill 'mystring' {     next_token = mystring.find_first_of (" ", cur_token) ;     token = mystring.substr (cur_token, next_token-cur_token);  // next_token-(nex_token==string::npos ? 0:cur_token) cleaner     if (next_token!=string::npos)          cur_token = next_token+1;      // token; } while (next_token!=string::npos); 

live demo


Comments

Popular posts from this blog

html - Firefox flex bug applied to buttons? -

html - Missing border-right in select on Firefox -

python - build a suggestions list using fuzzywuzzy -