The problem is very simple – it is not possible to get the contents of the page with Cyrillic symbols, for example, to take at least Russian Wikipedia.
Using URLLIB did so, but constantly stumble on Exception
from urllib.request import
from urllib.parse import quote
DEF Get_Content (Name):
Print (URLOPEN ('http://ru.wikipedia.org/wiki/' + Quote (Name)). ReadAll ()
.Decode ('UTF-8'))
Get_Content ('Forest')
of this type:
unicodeencodeerror: 'charmap' codec can't encode character '\ xb2' in position 14187: Character Maps To & LT; undefined & gt;
read similar questions in other discussions, but regardless of what I do with QUOTE – the result is the same.
Perhaps I do something stupid, but so far it simply get a page from the wiki does not come out
Answer 1, Authority 100%
just need to add
# Coding = UTF-8
From UrLLIB Import Urlopen, Quote
DEF Get_Content (Name):
Return Urlopen ('http://ru.wikipedia.org/wiki/' + Quote (Name)). Read ()
Print Get_Content ('Forest')
Answer 2
maybe it will help:
#! USR / BIN / ENV PYTHON
# _ * _ Coding: UTF-8 _ * _
Print (URLOPEN (U'Http: //ru.wikipedia.org/wiki/ '+ QUOTE (Name)). ReadAll ()
.Decode ('UTF-8'))