Home java How to parse html files instead of Websites? Jsoup

How to parse html files instead of Websites? Jsoup

Author

Date

Category

I have a catalog downloaded Html-file, I want to work out in their parsing, but I can not figure out how I use the jsoup find these files (in html files scattered in folders) for further parsing?
This code gives an error, do not know how to specify the path (ideally need all the html files in a directory).

import org.jsoup *.;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Import java.io.ioException;
public class ParsingGcTest {
  Public Static Void Main (String [] Args) Throws IoException {
    Document document = Jsoup.connect ( "C: \\ Users \\ 71376160 \\ Desktop \\ docs \\ specs \\ security \\ standard-names.html") get ();.
    Elements elements = document.select ( "code");
    for (Element element: elements) {
      System.out.println (element);
    }
  }
}
Exception in thread "main" java.lang.IllegalArgumentException: Malformed URL: C: \ Users \ 71376160 \ Desktop \ docs \ specs \ security \ standard-names.html
  at org.jsoup.helper.HttpConnection.url (HttpConnection.java:131)
  at org.jsoup.helper.HttpConnection.connect (HttpConnection.java:70)
  at org.jsoup.Jsoup.connect (Jsoup.java:73)
  at ParsingGcTest.main (ParsingGcTest.java:10)
Caused by: java.net.MalformedURLException: unknown protocol: c
  at java.base / java.net.URL & lt;. init & gt; (URL.java:679)
  at java.base / java.net.URL & lt;. init & gt; (URL.java:568)
  at java.base / java.net.URL & lt;. init & gt; (URL.java:515)
  at org.jsoup.helper.HttpConnection.url (HttpConnection.java:129)
  3 ... more

Answer 1, Authority 100%

The reason for the error is that the connect was expecting a link to the website (which is why he accepted protocol for C , as for example between the C: and http: has similarities :))

Try by parse download the file:

String filename = "C: \\ Users \\ 71376160 \\ Desktop \\ docs \\ specs \\ security \\ standard-names.html ";
File input = new File (filename);
Document document = Jsoup.parse (input, "UTF-8", "http://example.com/");
...

The third parameter in the parse is baseUri , it is necessary to determine the complete addresses for relative references in the document. If you do not need it, you can leave an empty line.

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions