Home c++ Cyrillic formatting in C++ when outputting to the terminal via cout

Cyrillic formatting in C++ when outputting to the terminal via cout

Author

Date

Category

In the book by R. Laforêt “OOP in C++” there is an example of a code in which the Cyrillic alphabet is present:

# include & lt; iostream & gt;
#include & lt; iomanip & gt;
using namespace std;
int main ()
{
  long pop1 = 8425785, pop2 = 47, pop3 = 9761;
  cout & lt; & lt; setw (9) & lt; & lt; "City" & lt; & lt; setw (12) & lt; & lt; "Population" & lt; & lt; endl
      & lt; & lt; setw (9) & lt; & lt; "Moscow" & lt; & lt; setw (12) & lt; & lt; pop1 & lt; & lt; endl
      & lt; & lt; setw (9) & lt; & lt; "Kirov" & lt; & lt; setw (12) & lt; & lt; pop2 & lt; & lt; endl
      & lt; & lt; setw (9) & lt; & lt; "Sullen" & lt; & lt; setw (12) & lt; & lt; pop3 & lt; & lt; endl;
  return 0;
}

The essence of the task is to display and format text in the form of a table. Since the example is written for using the MVS IDE, there should be no problems with output and formatting there (I think so, since it is indicated in the book).

The problem is that Cyrillic formatting does not work in Linux environment. Since the Cyrillic alphabet is already a multibyte encoding, it is not possible to format the text correctly using setw () . Is there any example or solution to this issue – Cyrillic formatting?

My result in terminal:


Answer 1, authority 100%

Unfortunately, there is no easy, portable way to do this.

The only slightly malformed option is to use wide characters:

int main () {
 setlocale (LC_ALL, ""); // For correct work, you need to initialize the locale
 wcout & lt; & lt; setw (9) & lt; & lt; L "City" & lt; & lt; setw (12) & lt; & lt; L "Population" & lt; & lt; endl;
 // ...
}

But, this brings with it a separate heap of problems:

  • Wide characters will have to be used almost everywhere.
  • There is still no normal portable way to convert wide characters to regular characters in C++.
  • All * nix system calls use regular strings.

Answer 2

A workaround! UTF-8 is designed so that you can easily calculate the number of characters (not bytes) in a string. We look at the two most significant bits of the character:

int utf8len (const char * s) {
  int len ​​= 0;
  for (; * s! = '\ 0'; ++ s) {
    len + = (* s & amp; 0xC0)! = 0x80;
  }
  return len;
}

Knowing the length of the string in characters, we write a pseudomanipulator. “Pseudo” because it only works correctly in one single case. Namely, when this manipulator hits the & lt; & lt; operation, it returns a proxy object, not a stream. This proxy object also knows how to & lt; & lt; and returns a stream restoring the normal state of affairs:

... & lt; & lt; utf8setw (8) & lt; & lt; "City" & lt; & lt; ...
  ^ ^ ^
  | | handles ostream
  | handles Utf8SetWProxy
  handles ostream

There are many limitations: the new manipulator does not work with other standard manipulators, and output operations cannot be interrupted between the manipulator and the string. But it looks just like the real thing.

// manipulator
// prints right aligned line
class utf8setw {
public:
  utf8setw (int w): w (w) {}
  void print (ostream & amp; os, const char * s) const {
    int pad = w - utf8len (s);
    for (int i = 0; i & lt; pad; ++ i) {
      os & lt; & lt; '';
    }
    os & lt; & lt; s;
  }
private:
  int w;
};
// proxy
// saves the manipulator, the output stream and waits for the next string argument
// when the string arrives, passes it and the stream to the manipulator
class Utf8SetWProxy {
public: 
UTF8SetWProxy (Ostream & AMP; OS, Const UTF8Setw & Amp; Manip): OS (OS), Manip (Manip) {}
  Friend Ostream & Amp; Operator & LT; & LT; (Const UTF8SetWProxy & Amp; P, Const Char * S) {
    p.manip.print (P.OS, S);
    Return P.OS;
  }
Private:
  Ostream & amp; OS;
  Const utf8setw & amp; manip;
};
// trick that runs the process
// Pay attention to the "wrong" type of result
// next operator `& lt; & lt;` will be processed not by the output
UTF8SetWProxy Operator & LT; & LT; (Ostream & amp; OS, Const UTF8Setw & Amp; Manip) {
  RETURN UTF8SetWProxy (OS, Manip);
}

Example of use. At first, the standard option is printed, which does not align Cyrillic strings, as it implies that the number of bytes and the number of characters in the string is the same. Then the new version. The code looks almost the same …

int main ()
{
  Long Pop1 = 8425785, POP2 = 47, POP3 = 9761;
  COUT & LT; & LT; SETW (9) & lt; & lt; "City" & lt; & lt; SETW (12) & LT; & LT; "Population" & lt; & lt; Endl
     & lt; & lt; SETW (9) & lt; & lt; "Moscow" & lt; & lt; SETW (12) & LT; & LT; POP1 & LT; & LT; Endl
     & lt; & lt; SETW (9) & lt; & lt; "Kirov" & lt; & lt; SETW (12) & LT; & LT; POP2 & LT; & LT; Endl
     & lt; & lt; SETW (9) & lt; & lt; "Umummumka" & lt; & lt; SETW (12) & LT; & LT; POP3 & LT; & LT; Endl;
  COUT & LT; & LT; Endl;
  COUT & LT; & LT; UTF8Setw (9) & lt; & lt; "City" & lt; & lt; UTF8Setw (12) & lt; & lt; "Population" & lt; & lt; Endl
     & lt; & lt; UTF8Setw (9) & lt; & lt; "Moscow" & lt; & lt; SETW (12) & LT; & LT; POP1 & LT; & LT; Endl
     & lt; & lt; UTF8Setw (9) & lt; & lt; "Kirov" & lt; & lt; SETW (12) & LT; & LT; POP2 & LT; & LT; Endl
     & lt; & lt; UTF8Setw (9) & lt; & lt; "Umummumka" & lt; & lt; SETW (12) & LT; & LT; POP3 & LT; & LT; Endl;
  Return 0;
}

… but lines the lines much better:

city settlement
Moscow 8425785.
Kirov 47.
Sullenka 9761.
  City population
  Moscow 8425785.
  Kirov 47.
Sullenka 9761.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions