I have a requirement where i need to extract Powerpoint text. Every thing is working fine as perl script is able to read <a:t></a:t> tag. but it is not able to give a white space after text. Below is the detail explaination: file.xml
<a:t>Stack</a:t>
<a:t>Overflow</a:t>
The output it is printing is StackOverflow. What i want is Stack Overflow
Below is the code what i am using:
#!/usr/bin/perl
use strict;
use warnings;
use Archive::Zip qw( :ERROR_CODES );
use XML::Twig;
my @text;
my $file = "test_server-1.pptx";
my $zip = Archive::Zip->new();
$zip->read( $file ) == AZ_OK or die "Unable to open Office file\n";
my @slides = $zip->membersMatching( "ppt/slides/slide.+\.xml" );
#print @slides;
for my $i ( 1 .. scalar @slides ) # to sort them.
{
my $content = $zip->contents( "ppt/slides/slide${i}.xml");
my $twig= XML::Twig->new( keep_encoding=>1,
twig_handlers => { 'a:t' => \&topicref_processing,
},
);
$twig->parse( $content );
}
sub topicref_processing
{
my($twig, $ppttext) = @_;
print $ppttext->text();
}
Any help would be hight appreciated.
No comments:
Post a Comment