Saturday, 18 October 2014

How to parse xml with multiple versions of tracks with REXML in rails 4



I am new to ROR and am trying to parse the xml file that is generated by a media info software. I have spent many days searching and reading to try and figure out what to do. I am using the Parsing that is part of Ruby on Rails. I do not want to install the Nokori gem at this time. I have been able to write the code to pull the top level xml and save it to variables which is the general and video tags: and


Here is how I am doing it:



@gen_format = REXML::XPath.each(media_parse_doc, "*//Format/text()") { |element| element }
@video_duration = REXML::XPath.each(media_parse_doc, "//track[@type='Video']/Duration/text()") { |element| element }


My problem is I need to extract the audio attributes and their tags is sometimes like this: and other times like this: ,


I am not sure how to deal with the variation of tag type, how do I determine the total number when the audio is multiple with streamid= and how do I parse the data with the streamid. I built a loop and tried to extract the data but I have not been able to extract it successfully yet.


Here are some examples of what I tried so far to parse the audio with streamid:



@audio_track = REXML::XPath.each(media_parse_doc, "//track[@type= 'Audio']/track/text()") { |element| element }
@track_array = REXML::XPath.each(media_parse_doc,(@audio_track)) do {|track| track.elements["Bit_rate"].text }
@bitrate = track.elements["Bit_rate"].text
end
@audio_tracknum = REXML::XPath.each(media_parse_doc, "//track[@type='Audio']/track streamid=/text()") { |element| element }
@audio_format = REXML::XPath.each(media_parse_doc, "//track[@type='Audio']/Format/text()") { |element| element }


Here is a sample of the xml for audio with streamid:



<track type='Audio' streamid='1'>
<ID>
189 (0xBD)-128 (0x80)
</ID>
<Format>
AC-3
</Format>
<Format_Info>
Audio Coding 3
</Format_Info>
<Mode_extension>
CM (complete main)
</Mode_extension>
<Format_settings__Endianness>
Big
</Format_settings__Endianness>
<Muxing_mode>
DVD-Video
</Muxing_mode>
<Duration>
30s 16ms
</Duration>
<Bit_rate_mode>
Constant
</Bit_rate_mode>
<Bit_rate>
320 Kbps
</Bit_rate>
<Channel_s_>
6 channels
</Channel_s_>
<Channel_positions>
Front: L C R, Side: L R, LFE
</Channel_positions>
<Sampling_rate>
48.0 KHz
</Sampling_rate>
<Bit_depth>
16 bits
</Bit_depth>
<Compression_mode>
Lossy
</Compression_mode>
<Stream_size>
1.15 MiB (1%)
</Stream_size>
</track>
<track type='Audio' streamid='2'>
<ID>
189 (0xBD)-129 (0x81)
</ID>
<Format>
AC-3
</Format>
<Format_Info>
Audio Coding 3
</Format_Info>
<Mode_extension>
CM (complete main)
</Mode_extension>
<Format_settings__Endianness>
Big
</Format_settings__Endianness>
<Muxing_mode>
DVD-Video
</Muxing_mode>
<Duration>
30s 16ms
</Duration>
<Bit_rate_mode>
Constant
</Bit_rate_mode>
<Bit_rate>
192 Kbps
</Bit_rate>
<Channel_s_>
2 channels
</Channel_s_>
<Channel_positions>
Front: L R
</Channel_positions>
<Sampling_rate>
48.0 KHz
</Sampling_rate>
<Bit_depth>
16 bits
</Bit_depth>
<Compression_mode>
Lossy
</Compression_mode>
<Stream_size>
704 KiB (0%)
</Stream_size>
</track>
<track type='Text' streamid='1'>
<ID>
224 (0xE0)-CC1
</ID>
<Format>
EIA-608
</Format>
<Muxing_mode>
A/53 / DTVCC Transport
</Muxing_mode>
<Muxing_mode__more_info>
Muxed in Video #1
</Muxing_mode__more_info>
<Bit_rate_mode>
Constant
</Bit_rate_mode>
<Stream_size>
0.00 Byte (0%)
</Stream_size>
</track>


Here is a sample of the xml for audio: ( I Expect the way I did the video parsing would work but how do I handle the varying case types within the code?



<track type='Audio'>
<ID>
482 (0x1E2)
</ID>
<Menu_ID>
1 (0x1)
</Menu_ID>
<Format>
AC-3
</Format>
<Format_Info>
Audio Coding 3
</Format_Info>
<Mode_extension>
CM (complete main)
</Mode_extension>
<Format_settings__Endianness>
Big
</Format_settings__Endianness>
<Codec_ID>
129
</Codec_ID>
<Duration>
1h 45mn
</Duration>
<Bit_rate_mode>
Constant
</Bit_rate_mode>
<Bit_rate>
384 Kbps
</Bit_rate>
<Channel_s_>
6 channels
</Channel_s_>
<Channel_positions>
Front: L C R, Side: L R, LFE
</Channel_positions>
<Sampling_rate>
48.0 KHz
</Sampling_rate>
<Bit_depth>
16 bits
</Bit_depth>
<Compression_mode>
Lossy
</Compression_mode>
<Delay_relative_to_video>
-67ms
</Delay_relative_to_video>
<Stream_size>
291 MiB (1%)
</Stream_size>
</track>


Can someone please help me figure out how to manage the two types of cases for audio tags and how to parse the xml with multiple streamid records data into variables? Also how to determine how many streamid records there are for audio to store the total number?


Thank you for your help!


No comments:

Post a Comment