java - How to parse html in java?

Answer 1, authority 100%

What’s so difficult?

They have the most common JavaDoc documentation. But even there you can find almost everything you need.
For example :

Typical usage of the parser is:

Parser parser = new Parser ("http: // whatever");
NodeList list = parser.parse (null);
// do something with your list of nodes.

And then take a little more look:

NodeList parse (NodeFilter filter)

NodeFilter – & gt; here

Everything, in my opinion, is too simple.

Never mind

bin / parser http: // website_url
[tag_name]
where tag_name is an optional tag name to be used as a filter, i.e.
A – Show only the link tags extracted from the document
IMG – Show only the image tags extracted from the document
TITLE – Extract the title from the document
NOTE: this is also the default program for the htmlparser.jar, so the
above could be:
java -jar lib / htmlparser.jar http: // website_url [tag_name]

UPD:

public static void main (String [] args) {
  try {
    Parser parser = new Parser ("http://www.alliance-bags.ru/catalog.php?tov=576");
parser.setEncoding ("windows-1251");
    NodeFilter atrb1 = new TagNameFilter ("IMG");
    NodeList nodeList = parser.parse (atrb1);
    for (int i = 0; i & lt; nodeList.size (); i ++) {
      Node node = nodeList.elementAt (i);
      System.out.println (node.toHtml ());
    }
  } catch (ParserException e) {
    e.printStackTrace ();
  }
}

Answer 2, authority 27%

Answer 3, authority 18%

Answer 4, authority 18%

jsoup: Java HTML Parser :

Document doc = Jsoup.connect ("http://en.wikipedia.org/") .get ();
Elements newsHeadlines = doc.select ("# mp-itn b a");

Answer 5

Standard Java tools can be used. Why use an additional lib to retrieve the path to a picture?

If you need to do it once, you can use DOM and XPath.
If you need to process a bunch of large documents, then it is better to use SAX. Once you have spent time parsing these methods, you will never again have problems with parsing not only HTML, but also any XML documents.

Answer 6

Take a look at this . Quite a simple principle of operation, it supports invalid pages. There is a collection of objects mapped to tags. Very comfortably.

How to parse html in java?

Answer 1, authority 100%

Answer 2, authority 27%

Answer 3, authority 18%

Answer 4, authority 18%

Answer 5

Answer 6

Programmers, Start Your Engines!

Recent questions

yandex cards disappear labels with zoom

Embarcadero C++ Builder 10.3 does not give prompts by code

Found input variables with inconsistent numbers of samples error

Return to previous page

Lua C++ error handling