There are many businesses where You got data from someone outside of Your company, change it and send it somewehere else. You cannot trust to Your "data supplier", but You are responsible for data You provide to. One of complicated cases I spotted is building XML from strings delivered from outside. As You maybe know, there are many illegal XMl characters. Some of them could be visually detected - examples are: &, >, < symbols. It can be removed with minimum effort. Unfortunately, there are more of problematic characters.Moreover, it can not be detected visually. e.g. control chars "Start of Header" CHAR(1) OR "Backspace" CHAR(8).
Let us try to analyze such invalid source.
DECLARE @Definition NVARCHAR(MAX)='some '+CHAR(1)+'t'+CHAR(8)+'ext';If we try to build this value from original we would get an error:
SELECT CASt(@Definition as xml)
'Msg 9420, Level 16, State 1, Line 3 XML parsing: line 1, character 6, illegal xml character'
You could say we have a hint here: the char #6 is illegal. But we have more than one there.How we can detect all the symbols?
The easiest way to do this is the split source string into separate chars and trying to parse every char as XML. If You got an exception - here is Your illegal character.
Let us create the table for storing the result:
DECLARE @T Table ( [Char] CHAR(1) NOT NULL, Code int, Position int, N NVARCHAR(4000), IsError BIT )
For string division let us use recursivŠµ CTE.
;WITH H AS ( SELECT SUBSTRING(@Definition, 1, 1) AS [Char] , ASCII(SUBSTRING(@Definition, 1, 1)) AS Code , 1 AS Position UNION ALL SELECT SUBSTRING(@Definition, H2.Position + 1, 1) AS [Char] , ASCII(SUBSTRING(@Definition, H2.Position + 1, 1)) AS Code , H2.Position + 1 AS Position FROM H H2 WHERE H2.Position < LEN(@Definition) ) , H2 AS ( SELECT H.[Char] , H.Code , H.Position , ''+H.Char+' ' N FROM H ) INSERT INTO @T ( [Char], Code, Position, N ) SELECT [Char] , Code , Position , N FROM H2 H OPTION ( MAXRECURSION 0 )
Now our sub-XMls are ready for being parsed.
Let us use cursor for such purpose.
WHILE (1=1) BEGIN SET @I=(SELECT top(1) t.Position FROM @T T WHERE T.Position>@I ORDER BY T.Position ASC) IF (@I IS NULL) BREAK; BEGIN TRY SELECT @curStr=t.N FROM @T T WHERE T.Position=@I SET @curXML=CAST(@curStr as xml) END TRY BEGIN CATCH UPDATE T SET T.IsError = 1 FROM @T T WHERE T.Position=@I END CATCH END