I am trying to read the ports file from IANA. It is stored in UTF-8 encoding w / o BOM.
But on one of the lines, the
readline () function swears like this
‘charmap’ codec can’t decode byte 0x98
in position 7938: character maps to
& lt; “undefined” & gt;
The line in the file looks like this:
# Jim Harlan & lt; “jimh & amp; infowest.com” & gt;
What crutch to come up with for this? Or is there a direct solution?
For a crutch in the form of deleting this line will go (and she, for some reason, this one), but only for the duration of debugging, because then suddenly, the partners will tear the hair on my head. I will also post the code that I use for this operation:
try: file = open (path, 'r') while True: line = file.readline () if (not line): break print (line) finally: file.close ()
Answer 1, authority 100%
try using the built-in codecs library:
import codecs fileObj = codecs.open ("someFilePath", "r", "utf_8_sig") text = fileObj.read () # or read line by line fileObj.close ()
Answer 2, authority 100%
To read a text file encoded using utf-8 encoding in Python, you can use the
io.open () function, which is available as built-in
open () in Python 3 :
#! / usr / bin / env python import io with io.open (path, encoding = 'utf-8') as file: for line in file: process (line)
If the file contains errors related to the encoding: the encoding itself is correct, but there may be minor errors, then you can pass
errors = 'ignore' an error handler (or another value as appropriate) .
Do not use
codecs , which may not work correctly with generic string mode.
You don’t need to change your codepage to
cp65001 to read the utf-8 file.
If you want to print Unicode to the Windows console, see How to output a Unicode string from Python to the Windows console?
Answer 3, authority 25%
I was constantly catching this error, over and over again. The solution was seen by here .
import codecs file = codecs.open ("yourFile", "r", "utf-8") data = file.read () file .close ()
chcp 65001on the command line
These simple steps solved the problem.
Answer 4, authority 12%
file = codecs.open (path, encoding = 'utf-8', mode = 'r')