Home python Find out file encoding

Find out file encoding

Author

Date

Category

There is a file, it is not clear in what encoding, you need to determine the encoding, I wrote this option, but I’m sure that there is a way to determine the encoding much easier, tell me.

# some file downloaded from the Internet in an unknown encoding.
open ('test.txt', 'w', encoding = 'cp500'). write ('Hello \ n')
# all known encodings can be inserted here.
encoding = [
'utf-8',
'cp500',
'utf-16',
'GBK',
'windows-1251',
'ASCII',
'US-ASCII',
'Big5'
]
correct_encoding = ''
for enc in encoding:
  try:
    open ('test.txt', encoding = enc) .read ()
  except (UnicodeDecodeError, LookupError):
    pass
  else:
    correct_encoding = enc
    print ('Done!')
    break
print (correct_encoding)

Answer 1, authority 100%

You can use chardet :

from chardet.universaldetector import UniversalDetector
detector = UniversalDetector ()
with open ('test.txt', 'rb') as fh:
  for line in fh:
    detector.feed (line)
    if detector.done:
      break
  detector.close ()
print (detector.result)

Answer 2

It’s very simple.

with open (file_path, "r") as f:
  print (f)

The output will contain information about the file object, including the encoding.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions