Translate

Sunday 27 March 2022

How to remove invalid characters from XML (Msg 9420 illegal xml character)

There are many businesses where You got data from someone outside of Your company, change it and send it somewehere else. You cannot trust to Your "data supplier", but You are responsible for data You provide to. One of complicated cases I spotted is building XML from strings delivered from outside. As You maybe know, there are many illegal XMl characters. Some of them could be visually detected - examples are: &, >, < symbols. It can be removed with minimum effort. Unfortunately, there are more of problematic characters.Moreover, it can not be detected visually. e.g. control chars "Start of Header" CHAR(1) OR "Backspace" CHAR(8). Let us try to analyze such invalid source.
DECLARE @Definition NVARCHAR(MAX)='some '+CHAR(1)+'t'+CHAR(8)+'ext';
If we try to build this value from original we would get an error:
SELECT CASt(@Definition as xml)
'Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 6, illegal xml character'
You could say we have a hint here: the char #6 is illegal. But we have more than one there.How we can detect all the symbols? The easiest way to do this is the split source string into separate chars and trying to parse every char as XML. If You got an exception - here is Your illegal character. Let us create the table for storing the result:
DECLARE @T Table (
  [Char]   CHAR(1) NOT NULL,
  Code     int,
  Position int,
  N        NVARCHAR(4000),
  IsError  BIT )
For string division let us use recursivŠµ CTE.
;WITH H
AS
(
 SELECT
    SUBSTRING(@Definition, 1, 1) AS [Char]
  , ASCII(SUBSTRING(@Definition, 1, 1)) AS Code
  , 1 AS Position
 UNION ALL
 SELECT
    SUBSTRING(@Definition, H2.Position + 1, 1) AS [Char]
  , ASCII(SUBSTRING(@Definition, H2.Position + 1, 1)) AS Code
  , H2.Position + 1 AS Position
 FROM H H2
  WHERE
   H2.Position < LEN(@Definition)
)
,    H2
AS
(
 SELECT
    H.[Char]
  , H.Code
  , H.Position
  , ''+H.Char+'' N
 FROM H
)
INSERT INTO @T ( [Char], Code, Position, N )
SELECT
   [Char]
 , Code
 , Position
 , N
FROM H2 H
OPTION ( MAXRECURSION 0 )
Now our sub-XMls are ready for being parsed. Let us use cursor for such purpose.
WHILE (1=1)
BEGIN
 SET @I=(SELECT top(1)
  t.Position
 FROM @T T
  WHERE
   T.Position>@I
 ORDER BY T.Position ASC)
 IF (@I IS NULL)
  BREAK;
 BEGIN TRY
 SELECT
  @curStr=t.N
 FROM @T T
  WHERE
   T.Position=@I
 SET @curXML=CAST(@curStr as xml)
 END TRY
 BEGIN CATCH
 UPDATE T
 SET T.IsError = 1
 FROM @T T
  WHERE
   T.Position=@I
 END CATCH
END

Thursday 10 May 2012

How to see all string characters with their ASCII codes

When i analyzes some databases, i can see differences in stored procedures. SSMS 2012 suggests me to change CR/LF new line type. What would it can be?
To analyze differences, i wrote very simple code to see all SQL module definition characters together with their ASCII code values.
DECLARE @Object_Name VARCHAR(256)= 'Your object'
DECLARE @Definition NVARCHAR(MAX)
SELECT  @Definition = m.definition
       FROM    sys.all_sql_modules M
       WHERE   OBJECT_NAME (M.object_id) = @Object_Name;
;
WITH    H AS ( SELECT   SUBSTRING(@Definition, 1, 1) AS Char ,
                        ASCII(SUBSTRING(@Definition, 1, 1)) AS Code ,
                        1 AS Position
               UNION ALL
               SELECT   SUBSTRING(@Definition, H2.Position + 1, 1) AS Char ,
                        ASCII(SUBSTRING(@Definition, H2.Position + 1, 1)) AS Code ,
                        H2.Position + 1 AS Position
               FROM     H H2
               WHERE    H2.Position < LEN(@Definition)
             )
    SELECT  *
    FROM    H
OPTION  ( MAXRECURSION 0 )
The result should look like this:
You can set @Definition variable as your custom string and see all characters and ASCII codes if you need.

Wednesday 29 February 2012

Keyword or statement option 'XMLNAMESPACES' is not supported in this version of SQL Server.

When you will prepare database migration from SQL Server standalone edition to SQL Azure, there will be many compatibility issues. One of them is unsupported namespaces inside of XMLs.

You can find a list of unsupported Transact-SQL statements here.

As long as Microsoft does not planning to include namespace support in SQL Azure, we still need to manage this issue.
So assume we have an Excel XML looking like this:
DECLARE @XML xml='


 
       1     SomeText1     SomeText2     SomeText4     SomeText5    
'
One of possible ways is to convert XML to NVARCHAR(MAX) type, then remove namespaces and prefixes using REPLACE function, and then convert it back to XML type.
In this case your code looks like this:
DECLARE @XML_WithNoNamespace_Char XML = 

REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(CAST(@XML AS NVARCHAR(MAX)),
                                                              'xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"',
                                                              ''), 'ss:', ''),
                                                              ' xmlns:o="urn:schemas-microsoft-com:office:office"',
                                                              ''), 'o:', ''),
                                                              'xmlns:html="http://www.w3.org/TR/REC-html40"',
                                                              ''), 'html:', ''),
                                                           'xmlns="urn:schemas-microsoft-com:office:spreadsheet"',
                                                           ''),
                                                   'xmlns:x="urn:schemas-microsoft-com:office:excel"',
                                                   ''), 'x:', '')
Let us see test result:
SELECT 
       @XML.exist('/Workbook/Worksheet/Table/Row/Cell/Data[text()="SomeText1"]') AS [Is_Found_In_Old_XML],
       @XML_WithNoNamespace_Char.exist('/Workbook/Worksheet/Table/Row/Cell/Data[text()="SomeText1"]') AS [Is_Found_In_New_XML]
As expected, in first case row was not found, but in second case required row was found.

Another way is to extract data using XQuery namespace declaration predicate. So we can get all required data - Row number,Cell number and Cell value using numbers table which should be populated from 1 to XML`s row count. I used cross joined sys.objects table for this but definitely you can find some other sequence generating algorithms (or if you have SQL Server 2012 version, you can use built-in sequence generator).

Code looks like this:
DECLARE @NumbersOnTheFly TABLE (Number INT PRIMARY KEY);

WITH Numbers
AS (
 SELECT ROW_NUMBER() OVER (
   ORDER BY O1.[object_id]
    , o2.[object_id]
   ) AS Number
 FROM sys.[objects] O1
 CROSS JOIN sys.[objects] O2
 )
INSERT INTO @NumbersOnTheFly
SELECT Number
FROM Numbers
WHERE Number <= @XMl.query('
  declare namespace ss="urn:schemas-microsoft-com:office:spreadsheet";
   count(/ss:Workbook/ss:Worksheet/ss:Table/ss:Row)').value('.', 'bigint');

WITH C
AS (
 SELECT N.Number
  , @XMl.query('
     declare namespace ss="urn:schemas-microsoft-com:office:spreadsheet";
      for $row in /ss:Workbook/ss:Worksheet/ss:Table/ss:Row[position()=sql:column("N.Number")]/ss:Cell
      return  {$row/ss:Data[1]/text()} 
      ') Cell
 FROM @NumbersOnTheFly N

 )
SELECT C.Number AS RowNumber
 , ROW_NUMBER() OVER (
  PARTITION BY C.Number ORDER BY a.b
  ) AS CellNumber
 , b.value('.', 'NVARCHAR(1000)') AS CellValue
FROM C
CROSS APPLY C.Cell.nodes('Cell') a(b)

Which way to choose to process XML with namespaces on SQL Azure? You should check performance to see which one is best for your XMLs.
Anyway, we are still waiting for XML namespaces support in SQL Azure.

Thursday 28 April 2011

How to preview large string

Often you need to debug dynamic SQL statement. Simplest way to do this is to print statement. Because of 8000 char limit (400 for unicode), it is hard to do in "standard way".

here is an example:
DECLARE @Very_big_string NVARCHAR(MAX)= '' ;
WITH    c AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o1.object_id, o2.object_id ) RN
               FROM     sys.objects o1
                        CROSS JOIN sys.objects o2
             )
    SELECT TOP 9999
            @Very_big_string = @Very_big_string + 
    REPLICATE(0,4 - LEN(CAST(rn AS NVARCHAR)))
            + CAST(rn AS NVARCHAR) + CHAR(13)
    FROM    c    

PRINT @Very_big_string 
Last result will be 0800 - 4 chars  for digits and one for end line - char(13) (4+1)*5=4000

There are some methods to print string by UDF etc. I propose simpler method: print XML which will contain your string (xml is not limited by char, by size in SSMS settings only - default is 2MB
So here is an example for this string:
DECLARE @Very_big_string NVARCHAR(MAX)= '' ;
WITH    c AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o1.object_id, o2.object_id ) RN
               FROM     sys.objects o1
                        CROSS JOIN sys.objects o2
             )
    SELECT TOP 9999
            @Very_big_string = @Very_big_string + 
    REPLICATE(0,4 - LEN(CAST(rn AS NVARCHAR)))
            + CAST(rn AS NVARCHAR) + CHAR(13)
    FROM    c    

SELECT CAST(''+@Very_big_string+'' as XML)
Here you can preview your string in XML viewer.

Disadvantage of this method is that all standard XML symbols like <,>,! etc are changed to ampersand sequences (eg > is changed to &dt;)

Wednesday 27 April 2011

How to extract substrings limited with delimiters.

If you need extract from really big string small strings which are limited by delimiters, so for you can be useful this example:

DECLARE @text AS NVARCHAR (MAX) = '***[ddd]***
dfdf
fdfdfdfdfdf
***[fff]***
4545445
45454
***[ahaASSDAD]***

DFDFDF
***[SOME   TEXT
';

DECLARE @PatBegin AS VARCHAR (1000) = '***[';

DECLARE @PatEnd AS VARCHAR (1000) = ']***';

WITH   C1
AS     (SELECT 0 AS Num
        UNION ALL
        SELECT 0),
       C2
AS     (SELECT C1.Num
        FROM   C1 CROSS JOIN C1 AS C2),
       C3
AS     (SELECT C2.Num
        FROM   C2 CROSS JOIN C2 AS C3),
       C4
AS     (SELECT C3.Num
        FROM   C3 CROSS JOIN C3 AS C4),
       C5
AS     (SELECT C4.Num
        FROM   C4 CROSS JOIN C4 AS C5),
       numbers
AS     (SELECT ROW_NUMBER() OVER ( ORDER BY o.Num) AS Number
        FROM   C5 AS o),
       counts
AS     (SELECT COUNT(*) AS CNT
        FROM   numbers
        WHERE  PATINDEX(@PatEnd, @TExt) = 1),
       c
AS     (SELECT N.Number AS CHARBegin
        FROM   numbers AS N
        --CROSS JOIN numbers N2
        WHERE  SUBSTRING(@text, N.Number, LEN(@PatBegin)) = @PatBegin),
       d
AS     (SELECT c.CHARBegin,
               T.CHARBegin AS CharEnd,
               ISNULL(c.CHARBegin + LEN(@PatBegin) + NULLIF (CHARINDEX(@PatEnd, SUBSTRING(@text, c.CHARBegin + 1, ISNULL(T.CHARBegin, 2E7) - C.CHARBegin)), 0), 2e7) AS FactEnd
        FROM   c OUTER APPLY (SELECT TOP 1 C2.CHARBegin
                              FROM   C AS C2
                              WHERE  C2.CHARBegin > C.CharBegin) AS T)
SELECT SUBSTRING(@text, d.CHARBegin + LEN(@PatBegin), d.FactEnd - [d].[CHARBegin] - LEN(@PatBegin) - LEN(@PatEnd)) AS String_OUT,
       [d].CHARBegin
FROM   d;

here you have 2 delimiter - @PatBegin and @PatEnd - begin and end delimiters.
Result here will be:










UPDATE (09-01-2012)
Fixed logic to handle lack of last closing delimiter

Dynamic repeatable character replacement

Sometime you need to eliminate duplicated char in string. Static replacement like
SET @Output_String=REPLACE(@teststring,'__','++')
is good for not big and predictable values. But what if you need to change replaced chars dynamically or its quantity is big enough?
In that case you can use dynamic replacement. Here is an example:


SET @Output_String=REPLACE(@teststring,'__','++')

DECLARE @Duplicate NVARCHAR(100)= '#$'
DECLARE @TestString NVARCHAR(MAX)= 'test_test__f##f2$$g'
DECLARE @Replacement NVARCHAR(MAX)= ''
DECLARE @OutputString NVARCHAR(MAX)= @teststring ;
WITH    numbers
          AS ( SELECT   ROW_NUMBER() OVER ( ORDER BY o.object_id, o2.object_id ) Number
               FROM     sys.objects o
                        CROSS JOIN sys.objects o2
             ),
        chars
          AS ( SELECT   SUBSTRING(@Duplicate, 1, 1) CHAR ,
                        CAST(1 AS INT) [LEVEL]
               UNION ALL
               SELECT   SUBSTRING(@Duplicate, numbers.Number, 1) CHAR ,
                        CAST(numbers.Number AS INT) [LEVEL]
               FROM     numbers
                        JOIN chars ON chars.Level + 1 = numbers.Number
               WHERE    LEN(SUBSTRING(@Duplicate, numbers.Number, 1)) > 0
             ),
        Replicated
          AS ( SELECT   REPLICATE(CHAR, numbers.number) Repl ,
                        numbers.Number
               FROM     chars
                        CROSS JOIN numbers
             )
    SELECT  @OutputString = REPLACE(@OutputString, Repl, @Replacement)
    FROM    replicated
    WHERE   number <= LEN(@TestString)

SELECT  @OutputString
You can modify @Duplicate string which is storage for replaced characters. Also @Replacements is a string which will be inserted instead of duplicated chars

Tuesday 19 April 2011

Microsoft releases cumulative update #7 for SQL Server 2008 R2

Microsoft releases cumulative update #7 for SQL Server 2008 R2. After this update, your SQL will have version 10.50.1777.0
You can check this by running query:
SELECT  SERVERPROPERTY('ProductVersion') as ServerVersion

You can find article related with this Update on Microsoft page: http://support.microsoft.com/kb/2507770.
Most critical fixes:

Database Engine :


  • Error message when you try to bulk insert data into a table that contains a DATE type column that uses a US date format in SQL Server 2008 or in SQL Server 2008 R2: "Bulk load data conversion error
  • "CPerIndexMetaQS::ErrorAbort - Index corruption" error when you try to update a partitioned view in SQL Server 2008 R2
  • A common table expression-based query that uses the CHANGETABLE function runs very slowly in SQL Server 2008 Service Pack 1 or SQL Server 2008 R2
  • A query that uses the DATEDIFF function may run slowly in SQL Server 2005 or in SQL Server 2008 or in SQL Server 2008 R2
  • SQL Server Agent job runs incorrectly if it is scheduled to run every hour on the last day of every week or month in SQL Server 2005 or in SQL Server 2008 or in SQL Server 2008 R2
  • Error 7359 when you run a query against a user-defined function or a view that uses a synonym on a linked server in SQL Server 2005, SQL Server 2008 or SQL Server 2008 R2
  • Assertion error when you try to recover a database in SQL Server 2008 R2
  • "There is no function '{urn:schemas-microsoft-com:xml-sql}:variable()" error for an XQuery query that updates the value of an XML column in SQL Server 2005 or in SQL Server 2008 R2
  • PowerPivot window does not open when you open an .xla or .xlam file in Excel 2010 if PowerPivot for Excel 2010 is installed
  • SQL Server Browser service periodically does not respond to incoming requests
  • Application performance issue when a query references temporary tables that are created in a session in SQL Server 2008 R2
  • The Intellisense feature in SQL Server Management Studio for SQL Server 2008 R2 may stop working after you install Visual Studio 2010 Service Pack 1


Reporting services:


  • Performance decreases after you move a large report to SQL Server 2008 R2 Reporting Services
  • Report builder 3.0 stops responding after you run a query more than two times against a report model in SSRS 2008 R2
  • Performance decreases after you move a report that contains a large multi-select drop-down parameter list to SQL Server 2008 R2 Reporting Services


SSIS:


  • You can now use Package Installation Wizard (Dtsinstall.exe) to run the deployment manifest file of an SSIS package when SSIS 2008 or SSIS 2008 R2 is not installed
  • Column of a Flat File source is not imported if the next column data is one byte longer than the output column width in SSIS 2005, SSIS 2008 or in SSIS 2008 R2