how to access the content of a HTML tag in Bash using XMLStarlet



I'm trying to learn how to access the content of HTML tags in Bash using XMLStarlet. As an example, I'm attempting to access some text in the page http://ift.tt/1mLAo8X. I'm having some difficulty specifying the "address" of the content in the HTML for XMLStarlet and would value some assistance. My code attempt is below:



URL="http://ift.tt/1mLAo8X"
webPage="$(curl -s "${URL}")"
echo "${webPage}" | xmlstarlet sel -T -t -c "//html/body//table/tr/td[@id='quote']/header/h2/"


This produces the following output:



-:29.12: Opening and ending tag mismatch: meta line 5 and head
</head>
^
-:35.100: Entity 'nbsp' not defined
te"><header><h2>&quot;Emotional intelligence is beyond total reality&quot;&nbsp;
^
-:35.106: Entity 'nbsp' not defined
eader><h2>&quot;Emotional intelligence is beyond total reality&quot;&nbsp;&nbsp;
^
-:41.119: EntityRef: expecting ';'
witter.com/intent/tweet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via
^
-:41.139: EntityRef: expecting ';'
eet?original_referer=http%3A%2F%2Fwww.wisdomofchopra.com&via=WisdomOfChopra&text
^
-:41.196: EntityRef: expecting ';'
via=WisdomOfChopra&text=%27Emotional+intelligence+is+beyond+total+reality%27&url
^
-:52.169: EntityRef: expecting ';'
));document.write(' src="http://ift.tt/XWxq5p
^
-:52.186: EntityRef: expecting ';'
(' src="http://ift.tt/1mLAope
^
-:52.209: EntityRef: expecting ';'
http://ift.tt/1mLAnBX'+AdBrite_Iframe+'&ref
^
-:53.99: EntityRef: expecting ';'
p" href="http://ift.tt/XWxqlJ
^
-:57.9: Opening and ending tag mismatch: head line 3 and html
</html>
^
-:58.1: Premature end of data in tag html line 2

No comments:

Post a Comment