Home python Why HTML code in the browser and in Beautiful SOUP different?

Why HTML code in the browser and in Beautiful SOUP different?

Author

Date

Category

In the browser in the developer panel shows such a tag:

& lt; input type = "hidden" id = "g-recaptcha-response" name = "g-recaptocha response "value =" 03AGdBq25WAkToUZeuT7g4nM7immntzoP3yfZbJbkCOnrVgaNyLHjXhI2z-yZCOI3ZJn1_bUSyfoqfvyhURkuAD-mY1YQ7k3IHBxl1641M4vnbGstqwbpYplZ8F4MQ2xlxAOjUS0cKvmVvcPXwGdiMpIjEq3osk0ItwAKGmKgtn5fT6-Dlos7mU7X7GtNrXrk2nTUIrN9G-W944VubpLXWptMfKl2m5J5boT1eM_59HDRduOOUzPiX2zbctQSTRDs_ieyBkDJG29hFe3g2Na7EHWw8JSCxKrI1QFMmVvQh7-ppV0eiQqLNtoxy8EcW-6qHxG16cV9uqktKQdllpq_qU9EwbriAKvnuLV2ykBZEGu2d2r0kA9DB_AV__VdlUr_qsncPQ1Pi3jSE5FEDfMKWGi7US8jURtkJtwfRqGJTZ8h2gJh8bADkv5EG5XvbxKcoq3-bbTj1oM8H "& gt;

When parseing with Beautiful-Soup, the same tag looks like this:

& lt; input id = "g-recaptcha-response" name = "g-recaptcha-response" Type = " Hidden "/ & gt;

How can I get the value of Value?


Answer 1, Authority 100%

Two problems why this happens:

  1. You may not logged on the site and tag that you want to see in HTML page code, only authorized users can see.
  2. Perhaps the tag you want to get is dynamically loaded by JavaScript – script. The get method, with which you get HTML – page code, only works once, it cannot receive updates occurring on the page. Most likely, you should use Selenium to solve the task.

Answer 2, Authority 50%

was the same problem. The solution to the problem can be a library
Requests_html . It will be able to boast JS scripts.

from requests_html import htmlsession
Session = HTMLSession ()
R = session.get ('https://site.com')
r.html.render ()
r.html.html # HTML page in which JS is worked out

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions