BeautifulSoup counting tags without parsing deep inside them



I thought about the following while writing an answer to this question.


Suppose I have a deeply nested xml file like this (but much more nested and much longer):



<section name="1">
<subsection name"foo">
<subsubsection name="bar">
<deeper name="hey">
<much_deeper name"yo">
<li>Some content</li>
</much_deeper>
</deeper>
</subsubsection>
</subsection>
</section>
<section name="2">
... and so forth
</section>


The problem with len(soup.find_all("section")) is that BS kind of "carries" the whole tree for each element in this process.


So, two questions:



  1. Is there a way to make BS count the number of section without returning the whole content?

  2. Is it more efficient or is it the same internal process?


No comments:

Post a Comment