I wish to parse webpage which has 2 div's with the same class.
Following is the part of the webpage i'm trying to parse:
<div class="bid-row rgray bmatch" id="m590574"> <div class="mtime">12:00</div> <div class="mteams w240" data-original-title="" title=""> <div class="team">Rayo Vallecano</div> <div class="team">Malaga CF</div> </div> <div class="modds w160"> <div class="clear"> <div class="blank"></div> <input class="bet" id="q43909084" type="button" value="2.35"> <input class="bet" id="q43909085" type="button" value="3.30"> <input class="bet" id="q43909086" type="button" value="3.15"> </div> </div> <div class="minfo"> <div class="stats" data-brid="7610448_1"></div> <div data-tvinfo="Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD" class="fleft tv"></div> <div class="mlive"></div> <div class="slider" data-mode="1" data-tid="36" data-cid="32">+50<span class="glyphicon glyphicon-chevron-right"></span></div> </div>
I am using JSOUP to parse it, here is how my code looks like right now:
Elements hrefElements = doc.select("div.bmatch"); DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance(); DocumentBuilder docBuilder = docFactory.newDocumentBuilder(); // root elements org.w3c.dom.Document doc1 = docBuilder.newDocument(); org.w3c.dom.Element rootElement = doc1.createElement("company"); doc1.appendChild(rootElement); String[] mtime = new String[hrefElements.size()]; String[] team = new String[hrefElements.size()]; String[] tvinfo = new String[hrefElements.size()]; for(int i=0;i<hrefElements.size();i++) { mtime[i] = hrefElements.get(i).getElementsByClass("mtime").text(); team[i] = hrefElements.get(i).getElementsByClass("team").text(); tvinfo[i] = hrefElements.get(i).getElementsByTag("div").attr("data-tvinfo"); } for(int j=0;j<hrefElements.size();j++) { // staff elements org.w3c.dom.Element staff = doc1.createElement("Event"); rootElement.appendChild(staff); // set attribute to staff element Attr attr = doc1.createAttribute("id"); attr.setValue("1"); staff.setAttributeNode(attr); org.w3c.dom.Element firstname = doc1.createElement("Time"); firstname.appendChild(doc1.createTextNode(mtime[j])); staff.appendChild(firstname); // lastname elements org.w3c.dom.Element lastname = doc1.createElement("Teams"); lastname.appendChild(doc1.createTextNode(team[j])); staff.appendChild(lastname); // nickname elements org.w3c.dom.Element nickname = doc1.createElement("TV"); nickname.appendChild(doc1.createTextNode(tvinfo[j])); staff.appendChild(nickname); System.out.println("Time: "+mtime[j]); System.out.println("Event: "+team[j]); System.out.println("TvInfo: "+tvinfo[j]); } TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(); DOMSource source = new DOMSource(doc1); String nameGame = jTextField3.getText(); StreamResult result = new StreamResult(new File("test.xml")); //StreamResult result = new StreamResult(System.out); transformer.transform(source, result); // Output to console for testing // StreamResult result = new StreamResult(System.out); transformer.transform(source, result); System.out.println("File saved!"); }
However, the output i get for that part of HTML is the following:
<Event id="1"> <Time>Today12:00</Time> <Teams>Rayo Vallecano Malaga CF</Teams> <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV> </Event>
The final xml i'm trying to achieve should look something like this:
<Event id="1"> <Time>Today12:00</Time> <Team1>Rayo Vallecano</Team1> <Team2>Malaga CF</Team2> <TV>Sky Sports 4, Sport1 HU, LiG TV 3, Canal+ Liga, NTV Plus Futbol 2, TK Futbol 1 UA, Digi Sport 2 RO, CANAL9 DK, Sport Klub 1 SRB, SKY Sport Plus IT HD, Eleven HD</TV> </Event>
No comments:
Post a Comment