Home python python. beautifulsoup. It is impossible to get all the links from the...

python. beautifulsoup. It is impossible to get all the links from the page

Author

Date

Category

I can not pull out all links to the LCD cards, it turns out to be pulled out only 1 link using Find

On the site code looks like this:

& lt; div data-name = "container" data-mark = "gkcardtitle" class = "_ 0fce717cdb - Container --1gxqr _0fce717cdb - container-background_color - transparent - 3pvxk _0fce717cdb - Container-Display - inline-block - 3beb "style =" padding: 4px 12px 0 0 "& gt; & lt; a target =" _ blank " title = "" Data-Mark = "Link" ** href = "https://zhk-evropeyskiy-krasnodar.cian.ru/" ** class = "_ 0FCE717CDB - Element - 2VDM4" & gt;
& lt; span data-name = "text" data-mark = "text" class = "_ 0FCE717CDB - Element - 1DA0Y _0FCE717CDB - Element-Color - Blue - QWZIQ _0FCE717CDB - Element-Display - INLINE-- 1FKWO _0fce717CDB - Element-Font_Weight - Bold - 1L-AO _0fce717CDB - Element-Word_Wrap - Normal - 3WGCE _0FCE717CDB - Element-White_Space - Normal - 3WKKF _0FCE717CDB - Element-font_size - 18-- B8ELB _0FCE717CDB - Element-Line_Height - 22--38O8Y _0fce717cdb - element-color_hovered - red - Ops-L "& gt; LCD" European "& lt; / span & gt; & lt; / a & gt; & lt; / div & gt;

Link to the page itself: https://krasnodar.cian.ru/novostroyki

FIND method with refinement

Soup.find ('div', class _ = 'class = "_ 0FCE717CDB - Container - 1GXQR _0fce717cdb-- Container-Background_color - Transparent - 3PVXK _0FCE717CDB - Container-Display - Inline-Block - 3beb "). Find ('A'). Get ('href')

works and finds the required link

But the Find_all method does not work in achieve the desired result.


Answer 1, Authority 100%

import requests
From BS4 Import Beautifulsoup AS Soup
From BS4.Element Import Tag
Response = Requests.get ('https://krasnodar.cian.ru/novostroyki')
Soup = Soup (Response.Content, 'HTML.PARSER')
DEF Link_From_Header (Header: Tag):
  A = Header.find ('A')
  Return a.get ('href')
link_list = [* Map (link_from_header, Soup.find_all ('Div', {'Data-Mark': 'gkcardtitle'}))]
Print (* link_list, sep = '\ n')
# https://zhk-evropeyskiy-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompletks-samolet-krasnodar-353100/
# https://zhk-skazka-grad-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-park-pobedy-krasnodar-1686651/
# https://zhk-strizhi-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompletks-abrikosovo-krasnodar-16066/
# https://krasnodar.cian.ru/zhiloy-kompleks-elegant-krasnodar-8304/
# https://krasnodar.cian.ru/zhiloy-kompleks-dostoyanie-krasnodar-1789905/
# https://zhk-sportivnaya-derevnya-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-otkrytie-krasnodar-50168/
# https://zhk-gubernskiy-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-rakurs-krasnodar-1659959/
# https://krasnodar.cian.ru/zhiloy-kompleks-dyhanie-krasnodar-39245/
# https://krasnodar.cian.ru/zhiloy-kompleks-Development-Plaza-Krasnodar-144280/
# https://zhk-sportivnyy-park-krasnodar.cian.ru/
# https://zhk-melodiya-krasnodar.cian.ru/
# https://zhk-solnechnyy-gorod-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-novella-krasnodar-1706586/
# https://krasnodar.cian.ru/zhiloy-kompleks-svoboda-krasnodar-33666/
# https://krasnodar.cian.ru/zhiloy-kompleks-lime-krasnodar-1276272/
# https://krasnodar.cian.ru/zhiloy-kompletks-serdce-shkolnyy-mkr-47259/
# https://krasnodar.cian.ru/zhiloy-kompleks-grani-krasnodar-48852/
# https://krasnodar.cian.ru/zhiloy-kompleks-Rezidenciya-kozHzavod-mkr-7135/
# https://krasnodar.cian.ru/zhiloy-kompleks-sedmoy-kontinent-krasnodar-8390/
# https://krasnodar.cian.ru/zhiloy-kompleks-yuzhane-krasnodar-23921/

Answer 2, Authority 200%

import requests
From BS4 Import Beautifulsoup
url = 'https://krasnodar.cian.ru/novostroyki/'
R = Requests.get (URL)
Soup = Beautifulsoup (R.Content, 'HTML.PARSER')
Cards = Soup.find_all ('Div', {'Data-Mark': 'gkcard'})
For Card In Cards: 
title = Card.Find ('Span', {'Data-Mark': 'Text'}). text
  link = Card.Find ('A', {'Data-Mark': 'Link'}) ['href']
  Print (F '{title} {Link}')

Will to print:

«European» https://zhk-evropeyskiy-krasnodar.cian.ru/#map
LCD "Airplane" https://krasnodar.cian.ru/zhiloy-kompleks-samolet-krasnodar-stroitelstva/
LCD "Fairy Tale Grad" https://zhk-skazka-grad-krasnodar.cian.ru/hod-stroitelstva/
...
LCD "Residence" https://krasnodar.cian.ru/zhiloy-kompleks-Rezidenciya-kozHzavod-mkr-7135/hod-stroitelstva/
LCD "Seventh Continent" https://krasnodar.cian.ru/zhiloy-kompleks-Sedmoy-kontinent-Krasnodar-8390/otzyvy/
LCD "Yazhne" https://krasnodar.cian.ru/zhiloy-kompleks-yuzhane-krasnodar-23921/hod-stroitelstva/

Answer 3, Authority 50%

You can collect all the ‘Div’ by the Data-Mark = “Gkcardtitle” tag and pull out the values. Using regular expressions, you can get all the necessary names:

import requests
From BS4 Import Beautifulsoup
Page = Requests.get ("https://krasnodar.cian.ru/novostroyki/")
Soup = Beautifulsoup (Page.Content, 'HTML.PARSER')
Divs = Soup.find_all ('Div', {"Data-Mark": "gkcardtitle"})
Matches = Re.Findall (R '(? & lt; = data \ -name \ = \ "text \" \ & gt;) (*?) (? = \ & lt; \ / span \ & gt;)', str ( Divs))
PRINT (Matches)
# ['LCD "European"', 'LCD "Airplane",' LCD "Fairy Tale Hrad", 'LCD "Park Victory"', 'LCD "Strey"', 'LCD "Apricosovo",' LCD Elegant »',' LCD" Treasure "',' LCD" Sports Village "',' LCD" Opening ", 'LCD" Gubernsky "', 'LCD" Rakurs ",' LCD" Breath ", 'LCD" Development Plaza (Plaza Development) »',' LCD" Sports Park "',' LCD" Melody "',' LCD Zelenodar ',' NOVELLA (NOVEL)», 'LCD "Freedom",' LCD " Lime (Lime) », 'LCD" Heart ",' LCD" Rib "',' LCD" Residence ", 'LCD" Seventh Continent "', 'LCD Yazhne]

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions