I can not pull out all links to the LCD cards, it turns out to be pulled out only 1 link using Find
On the site code looks like this:
& lt; div data-name = "container" data-mark = "gkcardtitle" class = "_ 0fce717cdb - Container --1gxqr _0fce717cdb - container-background_color - transparent - 3pvxk _0fce717cdb - Container-Display - inline-block - 3beb "style =" padding: 4px 12px 0 0 "& gt; & lt; a target =" _ blank " title = "" Data-Mark = "Link" ** href = "https://zhk-evropeyskiy-krasnodar.cian.ru/" ** class = "_ 0FCE717CDB - Element - 2VDM4" & gt;
& lt; span data-name = "text" data-mark = "text" class = "_ 0FCE717CDB - Element - 1DA0Y _0FCE717CDB - Element-Color - Blue - QWZIQ _0FCE717CDB - Element-Display - INLINE-- 1FKWO _0fce717CDB - Element-Font_Weight - Bold - 1L-AO _0fce717CDB - Element-Word_Wrap - Normal - 3WGCE _0FCE717CDB - Element-White_Space - Normal - 3WKKF _0FCE717CDB - Element-font_size - 18-- B8ELB _0FCE717CDB - Element-Line_Height - 22--38O8Y _0fce717cdb - element-color_hovered - red - Ops-L "& gt; LCD" European "& lt; / span & gt; & lt; / a & gt; & lt; / div & gt;
Link to the page itself: https://krasnodar.cian.ru/novostroyki
FIND method with refinement
Soup.find ('div', class _ = 'class = "_ 0FCE717CDB - Container - 1GXQR _0fce717cdb-- Container-Background_color - Transparent - 3PVXK _0FCE717CDB - Container-Display - Inline-Block - 3beb "). Find ('A'). Get ('href')
works and finds the required link
But the Find_all method does not work in achieve the desired result.
Answer 1, Authority 100%
import requests
From BS4 Import Beautifulsoup AS Soup
From BS4.Element Import Tag
Response = Requests.get ('https://krasnodar.cian.ru/novostroyki')
Soup = Soup (Response.Content, 'HTML.PARSER')
DEF Link_From_Header (Header: Tag):
A = Header.find ('A')
Return a.get ('href')
link_list = [* Map (link_from_header, Soup.find_all ('Div', {'Data-Mark': 'gkcardtitle'}))]
Print (* link_list, sep = '\ n')
# https://zhk-evropeyskiy-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompletks-samolet-krasnodar-353100/
# https://zhk-skazka-grad-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-park-pobedy-krasnodar-1686651/
# https://zhk-strizhi-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompletks-abrikosovo-krasnodar-16066/
# https://krasnodar.cian.ru/zhiloy-kompleks-elegant-krasnodar-8304/
# https://krasnodar.cian.ru/zhiloy-kompleks-dostoyanie-krasnodar-1789905/
# https://zhk-sportivnaya-derevnya-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-otkrytie-krasnodar-50168/
# https://zhk-gubernskiy-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-rakurs-krasnodar-1659959/
# https://krasnodar.cian.ru/zhiloy-kompleks-dyhanie-krasnodar-39245/
# https://krasnodar.cian.ru/zhiloy-kompleks-Development-Plaza-Krasnodar-144280/
# https://zhk-sportivnyy-park-krasnodar.cian.ru/
# https://zhk-melodiya-krasnodar.cian.ru/
# https://zhk-solnechnyy-gorod-krasnodar.cian.ru/
# https://krasnodar.cian.ru/zhiloy-kompleks-novella-krasnodar-1706586/
# https://krasnodar.cian.ru/zhiloy-kompleks-svoboda-krasnodar-33666/
# https://krasnodar.cian.ru/zhiloy-kompleks-lime-krasnodar-1276272/
# https://krasnodar.cian.ru/zhiloy-kompletks-serdce-shkolnyy-mkr-47259/
# https://krasnodar.cian.ru/zhiloy-kompleks-grani-krasnodar-48852/
# https://krasnodar.cian.ru/zhiloy-kompleks-Rezidenciya-kozHzavod-mkr-7135/
# https://krasnodar.cian.ru/zhiloy-kompleks-sedmoy-kontinent-krasnodar-8390/
# https://krasnodar.cian.ru/zhiloy-kompleks-yuzhane-krasnodar-23921/
Answer 2, Authority 200%
import requests
From BS4 Import Beautifulsoup
url = 'https://krasnodar.cian.ru/novostroyki/'
R = Requests.get (URL)
Soup = Beautifulsoup (R.Content, 'HTML.PARSER')
Cards = Soup.find_all ('Div', {'Data-Mark': 'gkcard'})
For Card In Cards:
title = Card.Find ('Span', {'Data-Mark': 'Text'}). text
link = Card.Find ('A', {'Data-Mark': 'Link'}) ['href']
Print (F '{title} {Link}')
Will to print:
«European» https://zhk-evropeyskiy-krasnodar.cian.ru/#map
LCD "Airplane" https://krasnodar.cian.ru/zhiloy-kompleks-samolet-krasnodar-stroitelstva/
LCD "Fairy Tale Grad" https://zhk-skazka-grad-krasnodar.cian.ru/hod-stroitelstva/
...
LCD "Residence" https://krasnodar.cian.ru/zhiloy-kompleks-Rezidenciya-kozHzavod-mkr-7135/hod-stroitelstva/
LCD "Seventh Continent" https://krasnodar.cian.ru/zhiloy-kompleks-Sedmoy-kontinent-Krasnodar-8390/otzyvy/
LCD "Yazhne" https://krasnodar.cian.ru/zhiloy-kompleks-yuzhane-krasnodar-23921/hod-stroitelstva/
Answer 3, Authority 50%
You can collect all the ‘Div’ by the Data-Mark = “Gkcardtitle” tag and pull out the values. Using regular expressions, you can get all the necessary names:
import requests
From BS4 Import Beautifulsoup
Page = Requests.get ("https://krasnodar.cian.ru/novostroyki/")
Soup = Beautifulsoup (Page.Content, 'HTML.PARSER')
Divs = Soup.find_all ('Div', {"Data-Mark": "gkcardtitle"})
Matches = Re.Findall (R '(? & lt; = data \ -name \ = \ "text \" \ & gt;) (*?) (? = \ & lt; \ / span \ & gt;)', str ( Divs))
PRINT (Matches)
# ['LCD "European"', 'LCD "Airplane",' LCD "Fairy Tale Hrad", 'LCD "Park Victory"', 'LCD "Strey"', 'LCD "Apricosovo",' LCD Elegant »',' LCD" Treasure "',' LCD" Sports Village "',' LCD" Opening ", 'LCD" Gubernsky "', 'LCD" Rakurs ",' LCD" Breath ", 'LCD" Development Plaza (Plaza Development) »',' LCD" Sports Park "',' LCD" Melody "',' LCD Zelenodar ',' NOVELLA (NOVEL)», 'LCD "Freedom",' LCD " Lime (Lime) », 'LCD" Heart ",' LCD" Rib "',' LCD" Residence ", 'LCD" Seventh Continent "', 'LCD Yazhne]