There is a file, it is not clear in what encoding, you need to determine the encoding, I wrote this option, but I’m sure that there is a way to determine the encoding much easier, tell me.
# some file downloaded from the Internet in an unknown encoding.
open ('test.txt', 'w', encoding = 'cp500'). write ('Hello \ n')
# all known encodings can be inserted here.
encoding = [
'utf-8',
'cp500',
'utf-16',
'GBK',
'windows-1251',
'ASCII',
'US-ASCII',
'Big5'
]
correct_encoding = ''
for enc in encoding:
try:
open ('test.txt', encoding = enc) .read ()
except (UnicodeDecodeError, LookupError):
pass
else:
correct_encoding = enc
print ('Done!')
break
print (correct_encoding)
Answer 1, authority 100%
You can use chardet :
from chardet.universaldetector import UniversalDetector
detector = UniversalDetector ()
with open ('test.txt', 'rb') as fh:
for line in fh:
detector.feed (line)
if detector.done:
break
detector.close ()
print (detector.result)
Answer 2
It’s very simple.
with open (file_path, "r") as f:
print (f)
The output will contain information about the file object, including the encoding.