Home python Avito Parsing Python

Avito Parsing Python

Author

Date

Category

The problem with the parsing of subsequent pages, in the option, to which I am a dopter, can only pull the first page if you use this option Pagination = Soup.find ('Div', class _ = 'pagination-root-2ocjz' ) , then everything seems to be good, the entire list of pages in HTML is displayed, but how to pull them out I do not understand …
Pages itself – https://www.avito.ru/murmanskaya_oblast/avtomobili / mitsubishi-asgbagicautgtg3ymcg? cd = 1

import requests
From BS4 Import Beautifulsoup
Def Get_Pages_Count (HTML):
  Soup = Beautifulsoup (HTML, 'HTML.PARSER')
  Pagination = Soup.find ('Span', Class _ = 'Pagination-Item-1WYVP'). Findnext ('Span') ['Data-Marker']
  IF Pagination:
    RETURN INT (PAGINATION [-2])
  ELSE:
    Return 1.
DEF PARSE ():
  HTML = Get_HTML (URL)
  If html.status_code == 200:
    CARS = []
    pages_count = get_pages_count (html.text)
    For Page in Range (1, Pages_Count + 1):
      Print (F'Parsing sentence {Page} from {pages_count} ... ')
      HTML = Get_HTML (URL, PARAMS = {"P": Page})
      Cars.Extend (Get_Content (HTML.TEXT))
    PRINT (CARS)
  ELSE:
    Print ('error')
Parse ()

Answer 1

url = "https://www.avito.ru/murmanskaya_oblast/avtomobili/mitsubishi-arsgbagicautgtg3ymcg?cd=1& ; p = "
# So looks your link
# And every step cycle change your url
For Page in Range (1, Pages_Count + 1):
  HTML = Get_HTML (URL + STR (Page))
  # and then your logic

Answer 2

In general, I counted the pages so

def get_pages_count (html):
  Soup = Beautifulsoup (HTML, 'HTML.PARSER')
  Pagination = Soup.find ('Div', Class _ = 'Pagination-Root-2OCJZ')
  Line = pagination.text
  P_COUNT = LINE [-8]
  Print (P_COUNT)

Naturally, instead of PRINT put RETURN , this applies to Avito at the time of writing. Perhaps there will be a problem when the page will be one, TC. In this case, the bottom of the page will not be their number at all.


Answer 3

You can try on the link below to get a list of ads, and then calculate the number of Items in JSON simply. Do not forget to prescribe headers and cookies, so that the site is not slapped

link – https://m.avito.ru/api/9/items?key=af0deccbgcgidddjgnvljitntccdduijhdinfgjgfjir&amp ; CategoryID = 9 & amp; params [1283] = 14756 & amp; locationid = 640000 & amp; params [110000] = 329273 & amp; Withimagesonly = 1 & amp; Page = 1 & amp; Laststamp = 1611316560 & amp; display = List & amp; limit = 30

Here there is article how to pars Avito so that they are not banned – the truth only about the phone number, but still something useful there.

In general, Avito through HTML is an ungrateful job

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions