XML : HTML Data Extraction with Jsoup parser

From the following HTML, What is the best way to extract data in the given format.

  <table class="item over  spicy_logo item_border" item_id="3464864" id="item_3464864" ua-action="Item" ua-label="Item">          <tbody>              <tr itemscope itemtype="http://schema.org/MenuItem">                  <td class="item_img_box" item_id="3464864" title="How is it?">                      <table>                          <tbody>                              <tr>                                  <td>                                      <div>                                          <img id='img3464864' src="/yelp_images/s3-media4.fl.yelpcdn.com/bphoto/1P50jjYUA4ofx5hF85wm5Q/ms.jpg" align="left" class="item_img" border="0" alt="How is it?"/>                                      </div>                                  </td>                              </tr>                          </tbody>                      </table>                  </td>                  <td class="item_name ">                      <div>                          <a class="cpa" href="http://miami-beach.eat24hours.com/carrot-express/26721?item_id=3464864" itemprop="name">Teeka Salad</a>                          <div class="item_desc" itemprop="description">Kale, sunflower sprouts, quinoa, avocado, grape tomato, alfalfa bean sprouts, carrots and cucumber with a choice of dressing.</div>                      </div>                  </td>                  <td class="item_price">                      <div >$<span itemprop="price">9.95</span></div>                  </td>              </tr>          </tbody>    

Expected Output:

ITEM_NAME : Teeka Salad

ITEM_DESCRIPTION : Kale, sunflower sprouts, quinoa, avocado, grape tomato, alfalfa bean sprouts, carrots and cucumber with a choice of dressing.

ITEM_PRICE: $9.95

ITEM_IMG : /yelp_images/s3-media4.fl.yelpcdn.com/bphoto/1P50jjYUA4ofx5hF85wm5Q/ms.jpg

I've tried various methods using Jsoup and Jaunt. still not able to figure it out.

No comments:

Post a Comment