How to read and display extended ASCII symbols with Windows.h

I’m working on a console game, which uses ASCII symbols as pixels. The map for this game is stored in a .txt file:

████████████████
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
█              █
████████████████

To display the map I’m reading it from a file demo.txt line by line and writing each character to CHAR_INFO *screen:

void setScreen(const char* layoutFile, const char* levelDataFile) {
        std::ifstream levelData(levelDataFile);
        levelData >> width >> height;
        field = {0, 0, (SHORT)width, (SHORT)height};
        screen = new CHAR_INFO[width * height];
        levelData.close();

        std::ifstream layout(layoutFile);                            //reading from a file `demo.txt`
        std::string line;

        for (int j = 0; j < height; j++) {
            getline(layout, line);
            for(int i = 0; i < width; i++) {
                screen[j * width + i].Char.AsciiChar = line[i];      //writing each character of a line to screen
                screen[j * width + i].Attributes = BACKGROUND_GREEN;
            }
        }
        layout.close();
    }

After that I’m displaying a map using the following function (map.getScreen() returns pointer to screen array):

WriteConsoleOutputA(
            console.getHOut(),
            map.getScreen(),
            { (SHORT)map.getWidth(), (SHORT)map.getHeight() },
            { 0,0 },
            &map.getField()
        );

But the problem is that are displayed as , and the output looks like this:

����������������
���
���
���
���
���
���
���
���
���
���
���
���
���
���
����������������

Some things i tried:

  • SetConsoleOutputCP(CP_UTF8); SetConsoleCP(CP_UTF8);
  • SetConsoleCutputCP(1251);
  • setlocale(LC_ALL, "");
  • std::locale::global(std::locale(std::locale::empty(), new std::codecvt_utf8<wchar_t>));

Answer

From a comment:

File is encoded in UTF-8.

That makes a huge difference. Not only are you not dealing with ASCII characters (values up to 127), but you are not even dealing with extended ASCII characters (values up to 255). You are dealing with Unicode, in particular character number 9608 (a.k.a. U+2588). That goes well beyond what a single char can represent. And yet, you are storing single a char when you assign from line[i].

The UTF-8 representation for '█' consists of three bytes: 0xE2, 0x96, and 0x88. This is why your output shows three “unknown character” symbols on the left side of your board and none on the right. Those “unknown character” symbols come from the three bytes of one UTF-8 character. Then you would have width-2 spaces followed by three more “unknown characters”, except you stopped copying characters after width-3 spaces. So you never encounter the “true” right border of your board. (Check the length of line and compare it to width – for the middle rows, you should see that line.size() is width+4. For the first and last rows, you should see that line.size() is 3*width.)

Part of the solution is to use Char.UnicodeChar instead of Char.AsciiChar. However, UnicodeChar is (I think) only two bytes, so it cannot hold the three-byte UTF-8 encoding. You probably have to convert to UTF-16. If there are only a few characters you need, a lookup table might serve as well as a general solution. Change the characters in your file to true ASCII characters by dictating equivalences. For example, maybe you can say that '#' represents a full block. This has the advantage of being a single byte, so your logic mostly works. All you would need to add is a translation function, something like

WCHAR convert(char c)
{
    switch ( c ) {
        case '#': return u'x2588';  // Full block (█)
        // Etc.
    }
    return c; // If no translation is needed
}

Then when storing your map data, you would call this translation, as in

screen[j * width + i].Char.UnicodeChar = convert(line[i]);

The last step would be to make sure your console is expecting UTF-16. Oh, and use WriteConsoleOutputW() instead of WriteConsoleOutputA().

Leave a Reply

Your email address will not be published. Required fields are marked *