C++ Tokenizer Complexity vs strtok_r -
i'm making question because moved tokenizer strtok_r equivalent version in c++. have use strtok_r in place of strtok, because have 2 nested tokenizations perform of time.
the strtok_r algorithm this:
char *end_token, *token, *word ; // fill 'word' token = strtok_r (word, " ", &end_token) ; while (token != null) { // token = strtok_r (null, " ", &end_token) ; }
and c++ version (taken post here):
string mystring, token ; size_t next_token ; // fill 'mystring' while (token != mystring) { next_token = mystring.find_first_of (" ") ; token = mystring.substr (0, next_token) ; mystring = mystring.substr (next_token + 1) ; // }
now question: why c++ version heavy respect c version? long strings have wait 10 seconds c++ version, while c version instantaneous same strings. so, seems c++ version has higher complexity... think about?
strtok()
modifies string, replacing token delimiter null terminator. if long string has n tokens, function iterates through string changing n chars null, extremely fast.
in c++ alternative making 2*n string copies, means potentially 2*n allocation operations, plus sheer copy of (very long) remaining string, heavier first alternative. difference you're not obliged change original string.
you improve keeping string you're iterating trough unchanged , example use offsets search:
string mystring, token ; size_t cur_token=0, next_token ; // fill 'mystring' { next_token = mystring.find_first_of (" ", cur_token) ; token = mystring.substr (cur_token, next_token-cur_token); // next_token-(nex_token==string::npos ? 0:cur_token) cleaner if (next_token!=string::npos) cur_token = next_token+1; // token; } while (next_token!=string::npos);
Comments
Post a Comment