XML : HTML Rendering of EDGAR .txt Filings

Currently, I'm working on a project where one PHP script grabs an index file from ftp://ftp.sec.gov and places all the company information into the database. The second PHP script then grabs the raw text file from the SEC and saves it locally for processing.

An example of the raw text file can be found here -

ftp://ftp.sec.gov/edgar/data/2488/0000002488-15-000028.txt

An example of what the final result should be can be found here - (www).sec.gov/Archives/edgar/data/1084869/000143774915020024/flws20150927_10q.(htm)

The goal is to be able to present the filing in a formatted way just like many companies do, but the problem is I can't seem to figure out how it's done reliably for every filing. Some filings seem to have XML, others seem to have HTML

How would I be able to reliably produce the formatted version of the raw text files?

No comments:

Post a Comment