Home c++ Russian in the console

Russian in the console

Author

Date

Category

I am learning C++ from Stroustrup’s book, Russian characters are not displayed.
Here is the code:

# include & lt; iostream & gt;
#include & lt; string & gt;
using namespace std;
int main ()
{
  setlocale (LC_ALL, "Russian");
  string previous = "";
  string current;
  while (cin & gt; & gt; current)
  {
    if (previous == current)
    {
      cout & lt; & lt; "Duplicate word:" & lt; & lt; current & lt; & lt; endl;
    }
    previous = current;
  }
  cin.get ();
  return 0;
}

“Duplicate word:” – displayed fine thanks to setlocale.
What is after – kryakozyably, although repeating words are found.
setlocale tried different (0, “”), “”, “Rus”, etc.

Everything works in Code :: Blocks even without crackers. Even without setlocale.


Answer 1, authority 100%

There are many solutions for this task. If you need a fast and not necessarily a universal solution so you don’t have to go through too much trouble, scroll down to Less valid but useful solutions .

Correct but difficult solution

For starters, the problem with the Windows console is that its default fonts do not show all characters. You should change the console font to Unicode, this will even work on English Windows. If you want to change the font only for your program, in its console click on the icon in the upper left corner → Properties → Font. If you want to change for all future programs, the same, just go to Defaults, not Properties.

Lucida Console and Consolas handle everything but hieroglyphs. If your console fonts allow, you can also output , if not, then only those characters that are supported.

Further consideration concerns only Microsoft Visual Studio. If you have a different compiler, use the suggested ones at your own risk, there is no guarantee.

Now, the encoding of the compiler input files. The Microsoft Visual Studio compiler (at least 2012 and 2013 versions) compiles the sources in single-byte encodings as if they were actually in ANSI encoding, that is, for the Russian system – CP1251. This means that CP866 source encoding is incorrect. (This is important if you are using L "..." -strings.) On the other hand, if you store the sources in CP1251, then the same sources will not build normally on non-Russian Windows. Therefore, it is worth keeping the sources in Unicode (for example, UTF-8).

Having set up the environment, let’s move on to solving the actual problem.

The correct solution is to get away from single-byte encodings and use Unicode in the program. In this case, you will receive the correct output not only of the Cyrillic alphabet, but also support for all languages ​​(the image of the characters missing in the fonts will be absent, but you will be able to work with them). For Windows, this means moving from narrow strings (char * , std :: string ) to wide ones (wchar_t * , std :: wstring ), and use UTF-16 encoding for strings.

(Another problem that the use of wide strings solves: when compiling, narrow strings are encoded into a single-byte encoding using the current system code page, that is, ANSI encoding. If you compile your program on English Windows , this will lead to obvious problems.)

You need _setmode (_fileno (...), _O_U16TEXT); to switch console mode:

# include & lt; iostream & gt;
#include & lt; io.h & gt;
#include & lt; fcntl.h & gt;
int wmain (int argc, wchar_t * argv [])
{
  _setmode (_fileno (stdout), _O_U16TEXT);
  _setmode (_fileno (stdin), _O_U16TEXT);
  _setmode (_fileno (stderr), _O_U16TEXT);
  std :: wcout & lt; & lt; L "Unicode - English - Russian - Ελληνικά - Español." & lt; & lt; std :: endl;
  // or
  wprintf (L "% s", L "Unicode - English - Russian - Ελληνικά - Español. \ n");
  return 0;
}

This should work correctly with input and output, filenames, and stream redirection.

Important note: I / O streams are either “wide” or “narrow” – that is, they output either only char * , or only wchar_t * . Switching is not always possible after the first output. Therefore, this code:

cout & lt; & lt; 5; // or printf ("% d", 5);
wcout & lt; & lt; L "hello"; // or wprintf (L "% s", L "hello");

may well not work. Use only wprintf / wcout .


If you really do not want to switch to Unicode, and use a single-byte encoding, problems will arise. To begin with, characters that are not included in the selected encoding (for example, for the case of CP1251 – basic English and Cyrillic), will not work, instead of them gibberish will be entered and displayed. In addition, narrow string constants are ANSI-encoded, which means that Cyrillic string literals on a non-Russian system will not work (they will have a system-locale-dependent gibberish). With these issues in mind, we move on to the next series of solutions.

Less correct, but suitable solutions

Anyway, put a unicode font in the console. (This is the first paragraph of the “hard” solution.)

Make sure your sources are in CP 1251 encoding (this does not go without saying, especially if you have a non-Russian Windows locale). If, when adding Russian letters and saving, Visual Studio swears that it cannot save characters in the desired encoding, choose CP 1251.

(1) If the computer is yours, you can change the code page of the console programs on your system. To do this, do this:

  1. Start Regedit.
  2. To be on the safe side, export the registry somewhere (for some reason everyone skips this step, so when everything breaks, we warned you).
  3. In the HKEY_CURRENT_USER \ Console section, find the CodePage key (if not, create a key with this name and DWORD type).
  4. Set the value by key (left key / change / Number system = decimal) to 1251.
  5. Don’t forget to reboot after changing the registry.

Advantages of this method: examples from books will start working out of the box. Disadvantages: changing the registry can cause problems, the console encoding changes globally and permanently – this can affect other programs to break. Plus the effect will only be on your computer (and on others that have the same console encoding). Plus common problems with non-Unicode methods.

Note: Setting the global console codepage via the HKEY_CURRENT_USER \ Console \ CodePage registry entry does not work on Windows 10, it will use the OEM codepage instead – presumably bug in conhost . However, setting the console codepage at the application-specific level (HKEY_CURRENT_USER \ Console \ (application path) \ CodePage ) works.

(2) You can only change the encoding of your program. To do this, you need to change the console encoding programmatically. As a courtesy to other programs, do not forget to put the encoding back in place later!

This is done either by calling functions

SetConsoleCP (1251);
SetConsoleOutputCP (1251);

at the beginning of the program, or by calling an external utility

system ("chcp 1251");

(That is, you should end up with something like

# include & lt; cstdlib & gt;
int main (int argc, char * argv [])
{
  std :: system ("chcp 1251");
  ...

or

# include & lt; Windows.h & gt;
int main (int argc, char * argv [])
{
  SetConsoleCP (1251);
  SetConsoleOutputCP (1251);
  ...

and then ordinary program code.)

You can wrap these calls in a class to take advantage of the automatic lifetime management of C++ objects.

Example:

# include & lt; iostream & gt;
#include & lt; string & gt;
int chcp (unsigned codepage)
{
  // compose a command from pieces
  std :: string command ("chcp");
  command + = codepage;
  // execute the command and return the result
  return! std :: system (command.c_str ());
}
// this code will run before main
static int codepage_is_set = chcp (1251);

(if you are performing a task from Stroustrup, you can insert it at the end of the std_lib_facilities.h header file)

Or like this:

# include & lt; windows.h & gt;
class ConsoleCP
{
  int oldin;
  int oldout;
public:
  ConsoleCP (int cp)
  {
    oldin = GetConsoleCP ();
    oldout = GetConsoleOutputCP ();
    SetConsoleCP (cp);
    SetConsoleOutputCP (cp);
  }
  // since we changed the properties of the external object - the console, we need
  // return everything as it was (if the program crashes, the user is out of luck)
  ~ ConsoleCP ()
  {
    SetConsoleCP (oldin);
    SetConsoleOutputCP (oldout);
  }
};
// and in the program:
int main (int argc, char * argv [])
{
  ConsoleCP cp (1251);
  std :: cout & lt; & lt; "Russian text" & lt; & lt; std :: endl;
  return 0;
}

If you do not need Russian, but some other language, just replace 1251 with the identifier of the required encoding (the list is indicated below in the file), but, of course, performance is not guaranteed.

There are still methods that are also common, we present them for completeness.

Techniques that don’t work well (but might help you)

The commonly recommended method is to use the setlocale (LC_ALL, "Russian") construct; This option (at least in Visual Studio 2012) has a mountain of problems. First, there is a problem with entering Russian text: the entered text is transferred to the program incorrectly! Non-Russian text (for example, Greek) is not entered from the console at all. Well, common to all non-Unicode solutions to the problem.

Another non-Unicode method is using the CharToOem and OemToChar functions. This method requires re-encoding each of the lines in the output, and (it seems) is weakly amenable to automation. It also suffers from disadvantages common to non-Unicode solutions. In addition, this method will not work (not only with constants, but also with runtime strings!) On non-Russian Windows, since there the OEM encoding will not match CP866. In addition, we can also say that these functions are not supplied with all versions of Visual Studio – for example, some versions of VS Express simply do not have them.


Sources:

  1. How to display and enter data like wchar_t []?
    • unfortunately, the author of that question used the MinGW compiler for Cygwin and WinXP, which makes most of the modern solutions inapplicable.
  2. Output unicode strings in Windows console app
  3. Conventional wisdom is retarded, aka What the @ #% & amp; * is _O_U16TEXT ?
  4. What’s the difference between printf (“% s”), printf (“% ls”), wprintf (“% s”) , and wprintf (“% ls”)?
  5. Russian language in source code in Dev C++
  6. Code Page Identifiers

Answer 2, authority 5%

Therefore, it is worth keeping the sources in Unicode (for example, UTF-8).

And it should be saved with the signature

The situation is partially saved by re-saving the sources in UTF-8 encoding with a mandatory BOM character, without which Visual Studio begins to interpret “wide” strings with Cyrillic in a very peculiar way. However, by specifying the BOM (Byte Order Mark) of the UTF-8 encoding – a character encoded with three bytes 0xEF, 0xBB and 0xBF, we get recognition of the UTF-8 encoding on any system

Writing in Russian in code


Answer 3, authority 5%

Something worth clarifying for those looking for the correct answer about the setlocale function:

A commonly recommended method is to use the setlocale (LC_ALL, “Russian”) construct; This option (at least in Visual Studio 2012) has a mountain of problems. First, there is a problem with entering Russian text: the entered text is not passed to the program correctly! Non-Russian text (for example, Greek) is not entered from the console at all. Well, common to all non-Unicode solutions to the problem.

I will add more information on this method: It is generally not recommended correctly!

Let’s start with the first: In the second parameter, the function does not accept the name of the country or language, although in some cases it will work, but the language identifier, according to ISO 3166-1. Therefore, it is correct and correct to indicate: “ru-RU”.
Now the second: the documentation for this function is written in black and white: “If execution is allowed to continue, the function sets errno to EINVAL and returns NULL.” This is literally interpreted: when an error occurs, the function sets the value of the errno variable to EINVAL and returns NULL.

In the event of an error, errno will always be EINVAL, which means: not a valid argument. Therefore, it makes no sense to check it, but the execution of the function must be checked. Therefore, the correct call to the setlocale function looks like this:

if (setlocale (LC_ALL, "ru-RU") == NULL) {
    cout & lt; & lt; "Error set locale ru-RU." & lt; & lt; endl;
    return -1;
    // or forcibly set table 1251 via SetConsoleCP.
    // there is an example above. And don't forget to check the result of SetConsoleCP
    // If an error occurs, then we look at the error code through GetLastError.
}

And don’t forget that setlocale sets the local table for ANSI encoding only, so Greek, Spanish, Chinese and even Japanese characters will not be displayed.
For the Russian language this will be table number 1251.

And it is important: why this function is more reliable than direct installation of the symbol table via SetConsoleCP, because it switches all internal add-ons specifically for the layout for the language. Starting from the date display standard, ending with separator characters.

And yes, you should not install the language index in the form “ru”, because depending on the assembly of the axis itself and the available language packs, ru-BY, ru-UA, ru-MO and other language standards that differ significantly from ru-RU. And categorically it is impossible to indicate “Russia”, “Russian”, “Russian Federation” (yes, I’ve already met such an orgy a couple of times). Although the function also checks by the name of the region, this is not always indicated in the localization table, or it may be indicated “Russia” or “Russian” already on our layout. This is the main mistake that often causes setlocale to fail.

And yes, for an application that is running in unique mode , the _wsetlocale function should be used. It is identical and also sets the basic localization settings. In addition, if the application project in Visual Studio is configured in Unicode mode, then only _wsetlocale will work, since setlocale, according to the documentation, is not adapted to work with Unicode at all.

UPD.


I completely forgot to indicate that the setlocale and _wsetlocale function, if successful, will return the region identifier. That is, in our case the line “ru_RU \ 0”.


Answer 4, authority 4%

For Clion:

Cygwin – when installing in the package selection, you need to find and mark all sorts of cmake , GDB and others that someone recommended for installation.

WITHlion – File – Settings – Editor – File Encodings: IDE Encoding,
Project Encoding, main.cpp (Your executable) – UTF-8, Default
encoding for properties files – IBM866

In the editor window at the bottom – UTF-8.

Include in the project the header file Windows.h

SetConsoleCP (866); SetConsoleOutputCP (866);

Answer 5

You can also use this method:

system ("chcp 1251");
printf ("The beginning of the program. \ n");

It is only necessary to save it in cp1251 encoding.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions