Saturday, 12 December 2015

XML : JSOUP get div content from div's with same name

I wish to parse webpage which has 2 div's with the same class.

Following is the part of the webpage i'm trying to parse:

  <div class="bid-row rgray bmatch" id="m590574">  <div class="mtime">12:00</div>  <div class="mteams w240" data-original-title="" title="">      <div class="team">Rayo Vallecano</div>      <div class="team">Malaga CF</div>  </div>  <div class="modds w160">      <div class="clear">          <div class="blank"></div>          <input class="bet" id="q43909084" type="button" value="2.35">          <input class="bet" id="q43909085" type="button" value="3.30">          <input class="bet" id="q43909086" type="button" value="3.15">      </div>  </div>  <div class="minfo">      <div class="stats" data-brid="7610448_1"></div>      <div data-tvinfo="Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD" class="fleft tv"></div>      <div class="mlive"></div>      <div class="slider" data-mode="1" data-tid="36" data-cid="32">+50<span class="glyphicon glyphicon-chevron-right"></span></div>  </div>    

I am using JSOUP to parse it, here is how my code looks like right now:

       Elements hrefElements = doc.select("div.bmatch");      DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();      DocumentBuilder docBuilder = docFactory.newDocumentBuilder();        // root elements      org.w3c.dom.Document doc1 = docBuilder.newDocument();      org.w3c.dom.Element rootElement = doc1.createElement("company");        doc1.appendChild(rootElement);         String[] mtime = new String[hrefElements.size()];         String[] team = new String[hrefElements.size()];       String[] tvinfo = new String[hrefElements.size()];         for(int i=0;i<hrefElements.size();i++)       {           mtime[i] = hrefElements.get(i).getElementsByClass("mtime").text();           team[i] = hrefElements.get(i).getElementsByClass("team").text();           tvinfo[i] = hrefElements.get(i).getElementsByTag("div").attr("data-tvinfo");       }       for(int j=0;j<hrefElements.size();j++)       {           // staff elements      org.w3c.dom.Element staff = doc1.createElement("Event");      rootElement.appendChild(staff);        // set attribute to staff element      Attr attr = doc1.createAttribute("id");      attr.setValue("1");      staff.setAttributeNode(attr);            org.w3c.dom.Element firstname = doc1.createElement("Time");      firstname.appendChild(doc1.createTextNode(mtime[j]));      staff.appendChild(firstname);        // lastname elements      org.w3c.dom.Element lastname = doc1.createElement("Teams");      lastname.appendChild(doc1.createTextNode(team[j]));      staff.appendChild(lastname);                // nickname elements      org.w3c.dom.Element nickname = doc1.createElement("TV");      nickname.appendChild(doc1.createTextNode(tvinfo[j]));      staff.appendChild(nickname);               System.out.println("Time: "+mtime[j]);           System.out.println("Event: "+team[j]);           System.out.println("TvInfo: "+tvinfo[j]);       }  TransformerFactory transformerFactory = TransformerFactory.newInstance();      Transformer transformer = transformerFactory.newTransformer();      DOMSource source = new DOMSource(doc1);               String nameGame =  jTextField3.getText();      StreamResult result = new StreamResult(new File("test.xml"));              //StreamResult result =  new StreamResult(System.out);   transformer.transform(source, result);      // Output to console for testing      // StreamResult result = new StreamResult(System.out);        transformer.transform(source, result);        System.out.println("File saved!");    }    

However, the output i get for that part of HTML is the following:

   <Event id="1">          <Time>Today12:00</Time>          <Teams>Rayo Vallecano Malaga CF</Teams>          <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>      </Event>    

The final xml i'm trying to achieve should look something like this:

          <Event id="1">          <Time>Today12:00</Time>          <Team1>Rayo Vallecano</Team1>          <Team2>Malaga CF</Team2>          <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV>      </Event>    

No comments:

Post a Comment