Home python Python, problems with Cyrillic in lists and tales

Python, problems with Cyrillic in lists and tales

Author

Date

Category

There is a code:

#! / usr / bin / python
# - * - Coding: UTF-8 - * -
Hw = 'Mir'
Print HW.
String = []
String.APPEND (HW)
Print String

After starting, it gives it:

world
['\ xd0 \ xbc \ xd0 \ xb8 \ xd1 \ x80']

with tuples the same, how to fix it?


Answer 1, Authority 100%

In general, Python2 – no way.

You are trying to get a string representation of the list (in your case it is similar to the REPR call). However, it generates problems as REPR returns a ‘str’ object (in fact byte string) which contains UTF-8 characters of this list, and when you try to display Python converts to the default encoding, which for Python 2 is ASCII, respectively Displays shielded Unicode.

you can try to output so

print u '[% s]'% u ','. Join (Unicode (X) for X in [ U'RIVER ', U'Mir'])

In Python 3 there is no such problem, because now everything is unicode. And the default encoding is UTF-8. Everywhere. Therefore everything works as you expect

$ python3
Python 3.5.1+ (Default, Mar 30 2016, 22:46:26)
& gt; & gt; & gt; Print (['Hi', 'Mir'])
['Hello World']
& gt; & gt; & gt; REPR (['Hi', 'World'])
"['Hello World']"
& gt; & gt; & gt; # Similar to
['Hi', 'Mir'] .__ Str __ ()
"['Hello World']"

Answer 2, Authority 100%

  1. Use Unicode instead of bytes to work with text in python. For example, add from __future__ import unicode_literals so that the string constants would create Unicode objects even without explicit U '' Prefix. When Reads text from the file Use IO.OPEN () to get Unicode. When receiving data from the network, decoder bytes to Unicode in accordance with the protocol, for example, if The encoding is specified in the content-type HTTP header :

    text = data.decode (response.headers.getparam ('charset'))
    

    see Answer describing how to get text if the data is returned by the external process .

  2. Print Lists / Cutouts only for debugging, since in this case for each REPR () item is called: whose task is to get an unambiguous representation of the object, for example, [' \ xd0 \ xbc \ xd0 \ xb8 \ xd1 \ x80 '] This is a text view of the list containing the byte string. In Python 3, you would get [b '\ xd0 \ xbc \ xd0 \ xb8 \ xd1 \ x80'] (explicit b '' for bypass constant). See What is different __ REPR __ from __ STR __ ?

Format lists / Corgers / Other collections Outnient:

& gt; & gt; & gt; Print ',' .join ([U'Mir '])
peace

in Python 2, REPR () leaves as it is only “printing characters” (in from Locali is printed ASCII characters) for which isprint () returns non-zero (Such symbols are a text representation of themselves). The remaining symbols are shielded:

& gt; & gt; & gt; Print ([U'Mir '])
[U '\ u043c \ u0438 \ u0440']

on python 3, str (some_list) also calls repr () for the SOME_LIST , but printed in The current environment characters can be displayed as it is (world ) instead of using shielding ('\ u043c \ u0438 \ u0440' ).

Similar questions:

Previous articleessence ServletContext
Next articleJavaFx snake game

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions