C++ recursively/iteratively search HTML file (using Boost C++)



I'm working on an application where I need to fetch a HTML file (from the web) and obtain a piece of information, by searching for a string.


I reckon it is more effective and easier to treat the HTML file as a XML file and iterate over the tags in the HTML file and match the content with a string.


Here is the HTML table I'm interested in:



<table width='100%' class='datatable' cellspacing='0' cellpadding='0'>
<tr>
<td>
</td>
<td width='30px'>
</td>
<td width='220px'>
</td>
<td width='50px'>
</td>
</tr>
<tr>
<td height='7' colspan='4'>
<img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
</td>
</tr>
<tr>
<td width='170'>
Aktiv tid: <!--This is a string I will search for.-->
</td>
<td colspan='3'>
1 dag, 17:03:46 <!--This is a piece of information I need to obtain.-->
</td>
</tr>
<tr>
<td height='7' colspan='4'>
<img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
</td>
</tr>
<tr>
<td width='170'>
Bandbredd (upp/ned) [kbps/kbps]:
</td>
<td colspan='3'>
1.058 / 21.373
</td>
</tr>
<tr>
<td height='7' colspan='4'>
<img src='/images/spacer.gif' width='1' height='7' border='0' alt=''>
</td>
</tr>
<tr>
<td width='170'>
Överförda data (skickade/mottagna) [GB/GB]: <!--This is another string I will search for.-->
</td>
<td colspan='3'>
1,67 / 42,95 <!--This is another piece of information I need to obtain.-->
</td>
</tr>
</table>


So I will search for the <td> tags containing either of the following strings:



  • Aktiv tid:

  • Överförda data (skickade/mottagna) [GB/GB]:


After that I need to select the next <td> tag containing the piece of information I want (in the same <tr>.


I successfully fetched the HTML file using cURL but need a little help with the XML search algorithm.


Thank you in advance!


Tiada ulasan:

Catat Ulasan