Issues cleaning a string only to include alphabet characters

I’m having some issues with a part of a C++ program I’m writing to encode and decode a Playfair cipher. The part of the program I’m having issues with is the first step (unfortunate, I know), which involves removing all whitespace, punctuation, and non-alphabet characters. This is what I have:

std::string step1(std::string plaintext)
{
    plaintext.erase(remove_if(plaintext.begin(), plaintext.end(),
        std::not1(std::ptr_fun(std::isalpha))));
    plaintext.erase(remove_if(plaintext.begin(), plaintext.end(),
        std::ptr_fun(std::ispunct)));
    return plaintext;
}

The output is good, except it adds characters at the end of the string once cleaned. Usually, the extra characters are copies of characters found at the end of the cleaned input. I can’t, for the life of me, figure out why this is happening. Any ideas?

Answer

std::remove/_if() does not actually remove anything, it just moves matching items to the end of the container, and then returns an iterator to the beginning of that range of items. The caller can then use that iterator to actually remove the items from the container.

You are passing that iterator to the overload of std::string::erase() that takes 1 iterator as input. Thus, it will erase only 1 character at most (if std::remove_if() did not find anything, it would return the string’s end() iterator, and calling erase() on that iterator is undefined behavior). If more than 1 character were “removed” to the end of the string, the remaining un-erased characters would be left behind. That is what you are seeing in your output.

To erase all of the “removed” characters, you need to use the overload of std::string::erase() that takes 2 iterators denoting a range, eg:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::not1(std::ptr_fun(std::isalpha))
        ),
        plaintext.end()
    );
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::ptr_fun(std::ispunct)
        ),
        plaintext.end()
    );
    return plaintext;
}

Note that removing all non-alpha characters using std:::isalpha() will include all whitespace and punctuation characters, thus your 2nd search using std::ispunct() will not find anything to remove, so you can just drop that search entirely, eg:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            std::not1(std::ptr_fun(std::isalpha))
        ),
        plaintext.end()
    );
    return plaintext;
}

That being said, as noted in comments, std::not1() and std::ptr_fun are deprecated in modern C++. In C++11 and later, you can use a lambda instead:

std::string step1(std::string plaintext)
{
    plaintext.erase(
        std::remove_if(plaintext.begin(), plaintext.end(),
            [](unsigned char ch){ return !std::isalpha(ch); }
        ),
        plaintext.end()
    );
    return plaintext;
}

In C++20 and later, you can use std::erase_if():

std::string step1(std::string plaintext)
{
    std::erase_if(plaintext,
        [](unsigned char ch){ return !std::isalpha(ch); }
    );
    return plaintext;
}