The project is developed in the UTF-8 encoding. Faced the problem of processing rows with Cyrillic characters.
Code
& lt;?
Header ('Content-Type: text / html; Charset = UTF-8');
$ str = "Additional equipment";
Echo Substr ($ str, 0, 7);
echo "& lt; br & gt;";
$ str = "Dopolnitelnoe Oborudovanie";
Echo Substr ($ str, 0, 7);
? & gt;
gives the result
1
Dopolni.
i.e. For Cyrillic, the function does not work.
What needs to be done to get “add” instead of “add” to “add”?
Answer 1, Authority 100%
Read http://php.net/manual/en/ref.mbstring.php
If short, then for multibyte encodings use MB_SUBSTR ()
mb_substr ($ str, 0, 7, "utf-8");
Answer 2, AUTHORITY 60%
If the site is being developed, as you write, in UTF-8, it means to develop UTF-8. So that all the multi-control functions work by default with the encoding developed, it must be specified at the very beginning (and it is desirable to check whether or not):
const page_encoding = 'utf-8';
if (MB_INTERNAL_ENCODING (Page_ENCODING)! = Page_ENCODING)
Throw New SomeException ('There Is No Support Encoding:' .page_encoding);
If everything is ok, you can use all MB _
without a coding.
I forgot to write, why it is important, well, if you develop. Because when using the name of the function as a callback, it will be nowhere to register the encoding.
Answer 3, Authority 20%
mb_substr () instead of substr ()
Answer 4
To in PHP to work with Cyrillic strings in the prevail (including the extraction of substrings, etc.) you need to use special functions:
http://php.net/manual/ru/ref.mbstring.php.
All due to the fact that on the Latin 1 symbol = 1 bit, so:
$ string = 'xyz';
Echo $ String [0]; // will be x
But the cyrylic characters occupy 2 bits, so:
$ string = 'eui';
Echo $ String [0]; // will be equal to �
At the same time, you can take this into account and work in such a way:
$ string = 'eui';
Echo $ String [0]. $ string [1]; // Output: E
or split the string via STR_Split, specifying split_length = 2:
$ string = 'eui';
$ arrstr = str_split ($ String, 2); // = ['u', 'y', 'i']
but it’s better not to do , because in this way it will not work now with the Latin and the rest of the symbols:
$ string = 'Euja. Xyzab ';
$ Strarr = STR_Split ($ String, 2); // = ['E', 'Yu', 'I', '. ',' Xy ',' za ',' b ']
By the way, to normally divide the deadline with Russian characters to the array of characters, it is best to do this:
$ string = 'Euja. Xyzab ';
$ Strarr = Preg_split ('// U', $ String, NULL, PREG_Split_no_empty); // = ['uh', 'y', 'i', ',' ',' x ',' y ',' z ', a', 'b']