Home python Python encoding error: readline () when reading utf-8 file swears: 'charmap' codec...

Python encoding error: readline () when reading utf-8 file swears: ‘charmap’ codec can’t decode byte




I am trying to read the ports file from IANA. It is stored in UTF-8 encoding w / o BOM.
But on one of the lines, the readline () function swears like this

‘charmap’ codec can’t decode byte 0x98
in position 7938: character maps to
& lt; “undefined” & gt;

The line in the file looks like this:

# Jim Harlan & lt; “jimh & amp; infowest.com” & gt;

What crutch to come up with for this? Or is there a direct solution?


For a crutch in the form of deleting this line will go (and she, for some reason, this one), but only for the duration of debugging, because then suddenly, the partners will tear the hair on my head. I will also post the code that I use for this operation:

  file = open (path, 'r')
  while True:
    line = file.readline ()
    if (not line):
    print (line)
  file.close ()

Answer 1, authority 100%

try using the built-in codecs library:

import codecs
fileObj = codecs.open ("someFilePath", "r", "utf_8_sig")
text = fileObj.read () # or read line by line
fileObj.close ()

Answer 2, authority 100%

To read a text file encoded using utf-8 encoding in Python, you can use the io.open () function, which is available as built-in open () in Python 3 :

#! / usr / bin / env python
import io
with io.open (path, encoding = 'utf-8') as file:
  for line in file:
    process (line)

If the file contains errors related to the encoding: the encoding itself is correct, but there may be minor errors, then you can pass errors = 'ignore' an error handler (or another value as appropriate) .

Do not use codecs , which may not work correctly with generic string mode.
You don’t need to change your codepage to cp65001 to read the utf-8 file.
If you want to print Unicode to the Windows console, see How to output a Unicode string from Python to the Windows console?

Answer 3, authority 25%

I was constantly catching this error, over and over again. The solution was seen by here .

import codecs
file = codecs.open ("yourFile", "r", "utf-8")
data = file.read ()
file .close ()
  • chcp 65001 on the command line

These simple steps solved the problem.

Answer 4, authority 12%

file = codecs.open (path, encoding = 'utf-8', mode = 'r')

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions