How to process DOCX imported XML styles, with this php script



this is my first post on StackOverflow. So, Hello everybody!


Now, my question is. How do i make sure all of the basic headings, bold & italic styles and list items are processed with this script. I want to be able to "import" this into a database, therefore i need the basic styling elements.


I've been looking into various other scripts findable on the web, and something like PHPDocx is not what im looking for. I've found the basic "plain text" conversion and edited it a bit, so it unzips and prints all the plain text already.


Now i need to know how the basic html styles like the "h1", "p", "bold", "italic", and "li". This has not succeeded so far.


Please help!



<?php

$filename = 'worddoc.docx';


function read_file_docx($filename){

$striped_content = '';
$content = '';

if(!$filename || !file_exists($filename)) return false;

$zip = zip_open($filename);

if (!$zip || is_numeric($zip)) return false;


while ($zip_entry = zip_read($zip)) {

if (zip_entry_open($zip, $zip_entry) == FALSE) continue;

if (zip_entry_name($zip_entry) != "word/document.xml") continue;

$content .= zip_entry_read($zip_entry, zip_entry_filesize($zip_entry));

zip_entry_close($zip_entry);
}

zip_close($zip);


$content = str_replace('</w:r></w:p></w:tc><w:tc>', " ", $content);
$content = str_replace('</w:r></w:p>', "\r\n", $content);
$striped_content = strip_tags($content);

return $striped_content;
}





$content = read_file_docx($filename);
if($content !== false) {

echo nl2br($content);
}
else {
echo '';
}

?>

No comments:

Post a Comment