Home php Cyrillic problem in PHP

Cyrillic problem in PHP

Author

Date

Category

The project is developed in the UTF-8 encoding. Faced the problem of processing rows with Cyrillic characters.
Code

& lt;?
  Header ('Content-Type: text / html; Charset = UTF-8');
  $ str = "Additional equipment";
  Echo Substr ($ str, 0, 7);
  echo "& lt; br & gt;";
  $ str = "Dopolnitelnoe Oborudovanie";
  Echo Substr ($ str, 0, 7);
? & gt;

gives the result

1
Dopolni.

i.e. For Cyrillic, the function does not work.

What needs to be done to get “add” instead of “add” to “add”?


Answer 1, Authority 100%

Read http://php.net/manual/en/ref.mbstring.php

If short, then for multibyte encodings use MB_SUBSTR ()

mb_substr ($ str, 0, 7, "utf-8");

Answer 2, AUTHORITY 60%

If the site is being developed, as you write, in UTF-8, it means to develop UTF-8. So that all the multi-control functions work by default with the encoding developed, it must be specified at the very beginning (and it is desirable to check whether or not):

const page_encoding = 'utf-8';
if (MB_INTERNAL_ENCODING (Page_ENCODING)! = Page_ENCODING)
Throw New SomeException ('There Is No Support Encoding:' .page_encoding);

If everything is ok, you can use all MB _ without a coding.

I forgot to write, why it is important, well, if you develop. Because when using the name of the function as a callback, it will be nowhere to register the encoding.


Answer 3, Authority 20%

mb_substr () instead of substr ()


Answer 4

To in PHP to work with Cyrillic strings in the prevail (including the extraction of substrings, etc.) you need to use special functions:
http://php.net/manual/ru/ref.mbstring.php.

All due to the fact that on the Latin 1 symbol = 1 bit, so:

$ string = 'xyz';
Echo $ String [0]; // will be x

But the cyrylic characters occupy 2 bits, so:

$ string = 'eui';
Echo $ String [0]; // will be equal to � 

At the same time, you can take this into account and work in such a way:

$ string = 'eui';
Echo $ String [0]. $ string [1]; // Output: E

or split the string via STR_Split, specifying split_length = 2:

$ string = 'eui';
$ arrstr = str_split ($ String, 2); // = ['u', 'y', 'i']

but it’s better not to do , because in this way it will not work now with the Latin and the rest of the symbols:

$ string = 'Euja. Xyzab ';
$ Strarr = STR_Split ($ String, 2); // = ['E', 'Yu', 'I', '. ',' Xy ',' za ',' b ']

By the way, to normally divide the deadline with Russian characters to the array of characters, it is best to do this:

$ string = 'Euja. Xyzab ';
$ Strarr = Preg_split ('// U', $ String, NULL, PREG_Split_no_empty); // = ['uh', 'y', 'i', ',' ',' x ',' y ',' z ', a', 'b']

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions