Sometimes we have to compare strings in a case-insensitive manner. Because that example, you might want ‘abc’ and ‘ABC’ to be considered. The is a well-defined difficulty for ASCII strings. In C/C++, there are basically two common approaches. You can do whole string comparisons:

bool isequal = (strncasecmp(string1, string2, N) == 0);Or you deserve to do character-by-character comparisons, mapping each and also every personality to a lower-case version and also comparing that:

bool isequaltrue;for (size_t i = 0; ns N; i++) if (tolower(string1) != tolower(string2)) is_the_same = false; break; Intuitively, the 2nd version is worse because it requires much more code. Us might additionally expect it to be slower. Just how much slower? I composed a rapid benchmark to check it out:

strncasecmpLinux/GNU GCCmacOS/LLVM
strncasecmp0.15 ns/byte1 ns/byte
tolower4.5 ns/byte4.0 ns/byte

I gained these outcomes with GNU GCC under Linux. And on a different device running macOS.

You are watching: C++ case insensitive string compare

So because that sizeable strings, the character-by-character approach might it is in 4 come 40 time slower! results will vary relying on your typical library and of the time of the day. However, in all my tests, strncasecmp is always substantially faster. (Note: Microsoft provides comparable functions under various names, check out _strnicmp because that example.)

Could you go faster? ns tried rolling mine own and also it runs at about 0.3 ns/byte. So that is much faster than the vain under macOS, yet slower under Linux. I doubt that the conventional library under Linux need to rely ~ above a vectorized implementation which can explain just how it win me through a variable of two.

I bet the if we use vectorization, we have the right to beat the conventional librairies.

My password is available.


Published by


*

*
seebs says:

Long ago, in the warm path that a performance an important application, I discovered a case-insensitive string compare that worked by calling strdup top top both strings, then case-smashing them, then comparing the results, then releasing the strings. The usual use instance was iterating with a big list that strings to compare a referral string to each of them as a table lookup.

See more: How To Tell If Buttermilk Has Gone Bad? What Does Buttermilk Smell Like

At the time, I thought per-character comparison would be better. (There wasn’t, yet, a library role for this in the environment.) i think it can have been, offered the allocation overhead. These days, I’d have had actually the objects save a smashed-case version of their strings and also compare those.