Inserting CDATA section markers at random locations in a VARCHAR(MAX) column



I have a VARCHAR(Max) column, OrderDetails, that is hosting some XML-like data that's been around quite a bit longer than the XML column has been.


What I want to do is to clean up the existing data in OrderDetails to be XML compliant in such a way as to be easily visually interpreted by humans.


One of the steps that's needed to do this transformation is to wrap an element containing SQL code (<SQLCODE>) in a CDATA section so the SQL comparison operators don't crash the XML parser. In the examples below, the "<>" and "&" characters crash the parser.


The problem comes in when I want to insert the closing CDATA section marker. In addition to the varying quantity of text that precedes the SQLCODE element, the number of leading spaces and/or tabs, if any, that precedes the </SQLCODE> on the same line can vary with each row in the table. The format variations are due to changes in coding format standards over the years.


For the most part, I can rely on the </SQLCODE> closing tag to be on a line with only the leading whitespace and a trailing CRLF. Typically, 1 tab or 3 spaces are used for one level of indentation,


Once the data transition is complete, the OrderDetails column will be ported to a new XML column in the tOrders table.




Example 1, tabs as leading chars

<CABINET CabID="1941" Technician="Fred" TechID="831" />
<MODEL>Aspen</MODEL>
<SIZE>W18</SIZE>
<!-- Get Tech's availability -->
<SQLCODE>
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 831
AND Source <> 'T&G'
</SQLCODE>
</CABINET>


Example 2, spaces as leading chars



<CABINET CabID="2977" Technician="Juan" TechID="740" />
<MODEL>
Ticonderoga
</MODEL>
<SIZE>B24-2</SIZE>
<!-- Get Tech's availability -->
<SQLCODE>
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 740
</SQLCODE>
</CABINET>


Example 3, mixed tabs and spaces as leading chars



<CABINET CabID="3291" Technician="Thomas" TechID="342" />
<MODEL>SpecialOrder</MODEL>
<SIZE>B48-4.2</SIZE>
<CUTOUTS>3</CUTOUTS>
<DRAWING>Cust002217\Job03291.dwg</DRAWING>
<!-- Get Tech's availability -->
<SQLCODE>
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 342
AND TimeBlock > 24
</SQLCODE>
</CABINET>








For Example 2, say, what I can get now is

<SQLCODE>
--<![CDATA[
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 740
--]]>
</SQLCODE>






This is functional, but not formatted properly. What I'd like to get is



<SQLCODE>
--<![CDATA[
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 740
--]]>
</SQLCODE>


Or better yet,



<SQLCODE>
--<![CDATA[
SELECT sum(AvailableHours)
FROM tSchedule
WHERE Booked <> 1
AND TechID = 740
--]]>
</SQLCODE>






Currently I'm using this:



-- 1. Insert the CDATA section start marker
UPDATE tOrders
SET OrderDetails = REPLACE(OrderDetails,
'<SQLCODE>',
'<SQLCODE>
--<![CDATA[' )
where OrderID = 1604
and OrderDetails not like '%<![CDATA[%'

-- 2. Insert the CDATA section end marker
UPDATE tOrders
SET OrderDetails = REPLACE(OrderDetails,
'</SQLCODE>',
'--]]>
</SQLCODE>')
where OrderID = 1604
and OrderDetails not like '%<![CDATA[%'




Can you suggest changes to this code (or another method) to get to one of the desired outputs?


No comments:

Post a Comment