I want to scrape this http://ift.tt/1pSiFHI white house college scorecard ed.gov page (see below)
They have elements of the data that can be downloaded, but not all of it. For example you can download this link: http://ift.tt/WViNzC
From there you can get all the college ID numbers. Then you can paste the ID into the link below to get
http://ift.tt/1pSiImR as I did this college Methodist University. All the examples below of data points I want are from this one college, but in actuality I want each of these data points for all the colleges with UnitID numbers in the scorecard.xls file above.
There is data in the html/css that I want to use for my research. All related to the visualizations for low, medium, and high levels. BTW: for the XPATH info below, if it is wrong, I am sorry. This is my first time using Google Chrome's Inspect Element > Copy XPATH option
FOR EXAMPLE: for the cost dial (first viz) I want:
- radius angle (258.12)
XPATH info from Google Chrome is: //*[@id="costGasGaugeHolder"]/svg/circle)
html : path fill="none" stroke="#58585a" d="M81.5,96.5L81.5,141.5" stroke-width="4" style="stroke-width: 4px;" transform="rotate(258.12 81.5 81)">
- amount $30,092
XPATH //*[@id="costGasGauge"]/div[2]/center)
html: style="width:163px;color:#cc6600;font-size:17px;margin-top:5px;font-weight:bold;" class="general_font">$30,092 / yr
For the graduation rate, I want:
- margin-left value (39.5px)
XPATH: //*[@id="grHBarChart"]/div[1]
html: class="hbar_pointer" style="height: 37px; width: 57px; margin-left: 39.5px;"
- percentage (39.1%)
XPATH: //*[@id="grHBarChart"]/div[1]/div[1]/div[2]/table/tbody/tr/td/div
html : class="hbar_bubble_value_label">39.1%
For median borrowing, I want
- radius angle (215.00)
XPATH : //*[@id="debtGasGaugeHolder"]/svg/path
html : path fill="none" stroke="#58585a" d="M81.5,96.5L81.5,141.5" stroke-width="4" style="stroke-width: 4px;" transform="rotate(215.00 81.5 81)"
- month payment ($186.02)
XPATH //*[@id="debtGasGauge"]/div[2]/center
html: center>$186.02
For the loan rate default, I need for left bar info for each college
LEFT
- bar height (38.3px )
XPATH //*[@id="cdrVBarChart"]/div/div[1]/div[3]
html: class="left_1_bar" style="width: 67px; margin-left: auto; margin-right: auto; height: 38.3px; background-image: .........">
- loan default rate text (12.5% )
XPATH //*[@id="cdrVBarChart"]/div/div[1]/div[1]/div[1]/div[2]/table/tbody/tr/td/div
html: class="vbar_bubble_value_label">12.5%
No comments:
Post a Comment