python - Avito Parsing Python

The problem with the parsing of subsequent pages, in the option, to which I am a dopter, can only pull the first page if you use this option Pagination = Soup.find ('Div', class _ = 'pagination-root-2ocjz' ) , then everything seems to be good, the entire list of pages in HTML is displayed, but how to pull them out I do not understand …
Pages itself – https://www.avito.ru/murmanskaya_oblast/avtomobili / mitsubishi-asgbagicautgtg3ymcg? cd = 1

import requests
From BS4 Import Beautifulsoup
Def Get_Pages_Count (HTML):
  Soup = Beautifulsoup (HTML, 'HTML.PARSER')
  Pagination = Soup.find ('Span', Class _ = 'Pagination-Item-1WYVP'). Findnext ('Span') ['Data-Marker']
  IF Pagination:
    RETURN INT (PAGINATION [-2])
  ELSE:
    Return 1.
DEF PARSE ():
  HTML = Get_HTML (URL)
  If html.status_code == 200:
    CARS = []
    pages_count = get_pages_count (html.text)
    For Page in Range (1, Pages_Count + 1):
      Print (F'Parsing sentence {Page} from {pages_count} ... ')
      HTML = Get_HTML (URL, PARAMS = {"P": Page})
      Cars.Extend (Get_Content (HTML.TEXT))
    PRINT (CARS)
  ELSE:
    Print ('error')
Parse ()

Answer 1

url = "https://www.avito.ru/murmanskaya_oblast/avtomobili/mitsubishi-arsgbagicautgtg3ymcg?cd=1& ; p = "
# So looks your link
# And every step cycle change your url
For Page in Range (1, Pages_Count + 1):
  HTML = Get_HTML (URL + STR (Page))
  # and then your logic

Answer 2

In general, I counted the pages so

def get_pages_count (html):
  Soup = Beautifulsoup (HTML, 'HTML.PARSER')
  Pagination = Soup.find ('Div', Class _ = 'Pagination-Root-2OCJZ')
  Line = pagination.text
  P_COUNT = LINE [-8]
  Print (P_COUNT)

Naturally, instead of PRINT put RETURN , this applies to Avito at the time of writing. Perhaps there will be a problem when the page will be one, TC. In this case, the bottom of the page will not be their number at all.

Answer 3

You can try on the link below to get a list of ads, and then calculate the number of Items in JSON simply. Do not forget to prescribe headers and cookies, so that the site is not slapped

link – https://m.avito.ru/api/9/items?key=af0deccbgcgidddjgnvljitntccdduijhdinfgjgfjir&amp ; CategoryID = 9 & amp; params [1283] = 14756 & amp; locationid = 640000 & amp; params [110000] = 329273 & amp; Withimagesonly = 1 & amp; Page = 1 & amp; Laststamp = 1611316560 & amp; display = List & amp; limit = 30

Here there is article how to pars Avito so that they are not banned – the truth only about the phone number, but still something useful there.

In general, Avito through HTML is an ungrateful job

Avito Parsing Python

Answer 1

Answer 2

Answer 3

Programmers, Start Your Engines!

Recent questions

yandex cards disappear labels with zoom

Embarcadero C++ Builder 10.3 does not give prompts by code

Found input variables with inconsistent numbers of samples error

Return to previous page

Lua C++ error handling