Home python Parsing XML on python

Parsing XML on python

Author

Date

Category

The question is needed parsing an XML file, you need to pull alternately certain tags such as the tag name

The following code is not working

import xml.etree.ElementTree as ET
root = ET.parse ( 'fayl.xml'). getroot ()
for type_tag in root.findall ( 'shop / offers / offer'):
  value = type_tag.get ( 'name')
  print (value)

The file then https://sotiknadom.ru/snprice.xml

It is asking for help!


Answer 1, Authority 100%

Option with BeautifulSoup

# pip install bs4 lxml (if not installed)
from bs4 import BeautifulSoup as Soup
if __name__ == '__main__':
  with open ( 'snprice.xml', 'r', encoding = 'utf-8') as xml:
    soup = Soup (xml.read (), 'lxml')
  names = [offer.find ( 'name'). text for offer in soup.find_all ( 'offer')]
  print (names, sep = '\ n')

And this is a working version of your implementation:

import xml.etree.ElementTree as ET
xml_file = ET.parse ( 'snprice.xml')
for type_tag in xml_file.findall ( 'shop / offers / offer'):
  value = type_tag.find ( 'name'). text
  print (value)

Or, you can slightly reduce the code:

from xml.etree.ElementTree import parse
print (
  * (
    type_tag.text for type_tag in parse ( 'snprice.xml'). findall ( 'shop / offers / offer / name')
  ),
  sep = '\ n'
)

It is not recommended, unless you know the source of data and need any checking in the course of


Answer 2

Anyone who face such XML structure:

& lt; xml version = "1.0"? & gt;
& Lt; feed xmlns = "http://www.w3.org/2005/Atom" xmlns: g = "http://base.google.com/ns/1.0" & gt;
& Lt; title & gt; SotikNadom & lt; / title & gt;
& Lt; link rel = "alternate" type = "text / html" & gt; https: //sotiknadom.ru/< / link & gt;
& Lt; updated & gt; 2020-08-01T06: 03: 43Z & lt; / updated & gt;
& Lt; entry & gt;
** & lt; g: ** title & gt; Monitor Acer K242HLbd (black) & lt; / g: title & gt;
** & lt; g: ** link & gt; https: //sotiknadom.ru/monitory/monitor-acer-k242hlbd-chernyj< / g: link & gt;
** & lt; g: ** price & gt; 8800.00 RUB & lt; / g: price & gt;
** & lt; g: ** id & gt; 51 & lt; / g: id & gt;
** & lt; g: ** availability & gt; in stock & lt; / g: availability & gt;
** & lt; g: ** condition & gt; new & lt; / g: condition & gt;

To do so:

import xml.etree.ElementTree as ET
xml_file = ET.parse ( 'sngoogle2.xml')
for type_tag in xml_file.findall ** ( '{http://www.w3.org/2005/Atom} entry') **:
  value = type_tag.find ** ( '{http://base.google.com/ns/1.0} price') **. text
  print (value)

Documentation: https: / /docs.python.org/3/library/xml.etree.elementtree.html#parsing-xml-with-namespaces

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions