Home java site parsing in java using jsoup

site parsing in java using jsoup

Author

Date

Category

Decided to write a simple television program using JSoup . The task is more than boring, but for me a new hell before the parser did not work. I ask you to help not in writing the code, but how to work with a parser, read articles, but they only tell about the headlines, etc., but how, for example, resign a schedule from the first channel to Page . And maybe you think that JSUP Full Larger Advise another.


Answer 1, Authority 100%

Well, what’s the problem?

  1. Download the document .
  2. Snapsorize to the item containing the program.
  3. in the same way Find the necessary data items.

which HTML attributes you need is installed attentive reading the source site that you are going to steal.

On the site there is even a the simplest example (without navigation, just all links).


Answer 2, Authority 112%

There are 2 types of HTML / XML Parcers:

  1. SAX Parser – Parsit in streaming mode, the input is fed by HTML / XML, in certain places it is triggered by the so-called. Handlers, that is, the interceptors who say “now the parser came across such an element.” In Handler, a proper code inserts its code and makes its business
  2. Dom Parser – the whole source is shoved, we get a tree at the exit – sometimes quite complicated.

jsoup is a type of Dom Parser, so the whole question is to properly be proposed in the tree received after the parsing – or expressing the model DOM model in nodes. This is described by the JSOP API documentation in the ORG.JSUP.NODES package.

At the same way, it will be worth reading about DOM it will immediately send the brains in the right direction.

Good luck.


Answer 3, Authority 62%

I do it like that. Also the first time 🙂
Until figured out how beautiful to bring what he sparzyl: (

package sm.play.sportlife.ua;
Import java.io.ioException;
Import java.util.arraylist;
Import java.util.calendar;
Import org.jsoup.jsoup;
Import org.jsoup.nodes.document;
Import org.jsoup.nodes.Element;
Import org.jsoup.select.Elements;
Import Android.app.ActiveIVITY;
Import android.app.progressDialog;
Import android.graphics.color;
Import Android.os.asynctask;
Import android.os.Bundle;
Import Android.Support.v4.app.fragment;
Import android.util.log;
Import android.view.LayoutInflater;
Import android.view.Menu;
Import android.view.MenuItem;
Import Android.View.View;
Import android.view.viewGroup;
Import android.widget.TableLayout;
Import android.widget.TableRow;
Import android.widget.textView;
Public Class MainActivity EXTENDS Activity {
  Private Static Final String Tag = MainActivity.class.getSimpleName ();
  Public Static int DayOfTheWeek = 0;
  // Thanks to this class we will disassemble the data into pieces
  Public Elements Time, CurrentDay;
  // then what we will store data
  Public ArrayList & LT; String & GT; Timelist = New ArrayList & LT; String & GT; ();
  Public ArrayList & LT; String & GT; DayEventList = New ArrayList & LT; String & GT; ();
  @Override
  PROTECTED VOID OnCreate (Bundle SavedInstanceState) {
    Super.ONCREATE (SavedInstanceState);
    setContentView (R.Layout.Activity_Main);
    / ** Inquiry to our separate thread to select data * /
    New GetDataThread (). Execute ();
    dayOfTheWeek = Calendar.getInstance () get (Calendar.DAY_OF_WEEK) - 1.;
    if (dayOfTheWeek == 0) {
      dayOfTheWeek = 7;
    }
    Log.d (TAG, "Day of the week:" + dayOfTheWeek);
  }
  / **
   * And here is the inner class that makes requests in a separate thread
   * /
  public class GetDataThread extends AsyncTask & lt; String, Void, String & gt; {
    private TableRow row;
    private TableLayout inflate; 
Private TextView TXTCOL1, TXTCOL2;
    Private String Eventnames;
    // Private ProgressDialog Prog;
    / **
     * Method performing a request in the background, in versions above 4 android, queries in
     * Main thread can not be performed, so all you need to perform
     * - take a separate stream
     * /
    @Override
    PROTECTED STRING DOINBACKGROUND (STRING ... ARG) {
      String MyURL = "http://www.sportlife.ua/ru /Services/Schedule/14875";
      // class which captures the page
      Document Doc;
      try {// Determine where we will download data
        doc = jsoup.
            .connect (MYURL)
            .Useragent (
                "Mozilla / 5.0 (Windows NT 6.1) AppleWebKit / 537.36 (KHTML, LIKE GECKO) Chrome / 33.0.1750.154 Safari / 537.36")
            .get (); // ask from what place to pars
        / **
         * Select the contents of the Schedule on the column index
         * TD: EQ (index)
         * /
        // Time of classes
        int k = 0;
        String link = "# Shedule-Content TR: GT (0)" + "TD: EQ (" + k + ")";
        Time = Doc.Select (Link);
        link = "# Shedule-Content TR: GT (0)" + "TD: EQ (" + DayOfTheWeek
            + ")";
        CurrentDay = Doc.Select (Link);
        / **
         * Clean our arraylist in order to fill in the cycle
         * Capture all the data which is on the page
         * /
        timelist.clear ();
        dayEventList.Clear ();
        For (Element Times: Time) {
          if (Times.ClassName (). Equals ("Time-Col")) {
            timelist.add (times.text ()); // write to ArrayList
                          // Time of classes
          }
        }
        / **
         * For each Event CurrentDay from writing to the array sheet
         * Day events
         * /
        For (Element Event: CurrentDay) {
          if (event.hastext () == true) {
            Elements Mevents = jsoup.parse (event.html ()). SELECT (
                ".event-Item-Body");
            / ** There may be several lessons at the same time * /
            int i = 0;
            do {
              Element Textevent = Mevents.get (I);
              String TMPString = textevent.text ();
              if (eventnames == null) {
                eventnames = tmpstring + "\ n";
              } else {
                eventnames = eventnames + tmpstring + "\ n";
              }
              I ++;
            } While (i & lt; mevents.size ());
            / ** Events in the list * /
            DayEventList.add (eventnames);
            eventnames = "";
          } ELSE.
            // dayeventlist.add (titles.text ());
            dayeventlist.add ("");
        }
      } Catch (IoException E)
      {
        E.PrintStackTrace ();
      }
      RETURN NULL;
    }
    @Override
    PROTECTED VOID ONPREEXECUTE () {
      // Prog = New ProgressDialog (MainActivity.this);
      // Prog.SetMessage ("Connect ...");
      // Prog.Show ();
    }
    @Override
    Protected Void Onpostexecute (String Result) {
      / ** form table * /
      INFLATE = (TableLayout) MainActivity.this
          .findViewByid (R.ID.MYTABLE);
      for (int i = 0, j = 0; i & lt; timelist.size ()
          || J & LT; dayEventList.size ();) {
        Row = New Tablerow (MainActivity.this);
        Txtcol1 = New TextView (MainActivity.this);
        If (timelist.size () & gt; i) {
          if ((Timelist.get (i)! = NULL)) { 
txtcol1.settext (timelist.get (i));
            txtcol1.setBackgroundResource (r.Drawable.shape_REC);
            // txtcol1.settextcolor (Color.RGB (245, 245, 220));
            // TXTCOL1.SetBackGroundColor (Color.RGB (0, 0, 0));
            I ++;
          }
        } else {
          txtcol1.settext (");
        }
        Row.AddView (TXTCOL1);
        Txtcol2 = New TextView (MainActivity.this);
        if ((dayaventlist.size () & gt; j)) {
          If (dayEventList.get (j)! = NULL) {
            txtcol2.settext (dayEventlist.get (j));
            txtcol2.setBackgroundResource (R.Drawable.shape_REC);
            // txtcol2.setmaxlines (20);
            j ++;
          }
        } else {
          txtcol2.settext ("");
        }
        this.row.addview (TXTCOL2);
        inflate.addView (ROW);
      }
      / ** End of the formation of the table * /
      // Super.ONPOSTEXECUTE (RESULT);
      // prog.dismiss ();
    }
  }
  @Override
  Public Boolean OnCreateOptionsMenu (Menu Menu) {
    // Inflate The Menu; This Adds Items To The Action Bar If It is present.
    getMenuinflater (). inflate (R.Menu.main, Menu);
    RETURN TRUE;
  }
  @Override
  Public Boolean OnoPtionSetemSelected (MenuItem Item) {
    // Handle Action Bar Item Clicks Here. The Action Bar Will
    // Automatically Handle Clicks on the Home / Up Button, SO Long
    // AS You Specify A Parent Activity in Androidmanifest.xml.
    int id = item.getitemid ();
    if (id == r.id.action_settings) {
      RETURN TRUE;
    }
    RETURN SUPER.ONOPTIONSIEMSELECTED (Item);
  }
  / **
   * A Placeholder Fragment Containing A Simple View.
   * /
  Public Static Class PlaceholderFragment EXTENDS FRAGMENT {
    Public PlaceholderFragment () {
    }
    @Override
    Public View OnCreater Inflater (LayoutInflater Inflater, ViewGroup Container,
        Bundle SavedInstanceState) {
      View RootView = inflater.inflate (R.Layout.fragment_main, Container,
          false);
      RETURN ROOTVIEW;
    }
  }
}

Programmers, Start Your Engines!

Why spend time searching for the correct question and then entering your answer when you can find it in a second? That's what CompuTicket is all about! Here you'll find thousands of questions and answers from hundreds of computer languages.

Recent questions