Monday, December 20, 2010

Using T-SQL to remove non-printable characters

 We frequently have a need to remove non-printable characters from text fields for export or printing.  Most often, this is the chars 9,10,or 13, but can frequently consist of other unicode characters.

Before I go on, let me say that I understand the whole idea of "printable" is dependant on what you mean by "print".  For simplicity, I am defining "printable" as anything in the base ASCII set (<128) that will actually display in the standard SQL Management Studio query results.  Therefore, everything else is "non-printable".  I realize there may be an exception or two, so I made sure to write code that was easily modifiable to include exceptions.

I first tried PATINDEX, but the pseudo-regex patterns can return somE wacky results, based on which default collation is being used.  It would also make the code harder to read and modify for someone not familiar with the PATINDEX flavor of regex.

So, here it is.  It is not the most efficient way, but certainly adequete, while being easy to modify.


DECLARE @I INT,
        @TEST VARCHAR(100)

SELECT @TEST='123' + CHAR(13) + '412' + CHAR(200) + '341', --testing string
       @I=0 --start at zero


WHILE @I<256 --check entire extended ascii set
BEGIN
   IF @I=33 SELECT @I=127 --this jumps over the range that I want to keep
   SELECT @TEST=REPLACE(@TEST, CHAR(@I), ' ') --this replaces the current char with a space
   SELECT @I=@I+1
END

SELECT @TEST


Another way to do this which allows better granular control over which characters you allow is this:


DECLARE @GOOD VARCHAR(256),
        @TEST VARCHAR(100),
@I INT

SELECT @GOOD='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890!@#$%^&*()-_=+/?.>,<;:|~ ' + char(13) + char(10) + char(9)

SELECT @TEST='123' + CHAR(13) + '412' + CHAR(200) + '341', --testing string
@I=0
SELECT @TEST


WHILE @IBEGIN
IF PATINDEX('%' + SUBSTRING(@TEST,@I,1) + '%',@GOOD)=0
BEGIN
SELECT @TEST=REPLACE(@TEST,SUBSTRING(@TEST,@I,1),' ')
END

   SELECT @I=@I+1
END

SELECT @TEST

Did this help you?  If so, please leave a comment!

Monday, December 13, 2010

More adventures in T-SQL

I often see overly complicated where clauses in procedures or views that could have been simplified with a little trick I have been using for years.

Here is an example of a where clause which is hard to read, and therefore hard to debug and modify:

WHERE (SV00700.Service_Call_ID = @Service_Call_ID OR
@DocumentNumber <> '' OR
@BatchNum <> '') AND (SV00700.Call_Invoice_Number = @Call_Invoice_Number OR
@DocumentNumber <> '' OR
@BatchNum <> '') AND (@DocumentNumber = '' OR
@DocumentNumber = rtrim(SV00700.Call_Invoice_Number)) AND (@BatchNum = '' OR
SV00700.BACHNUMB = @BatchNum)

The intent here was to filter based on a combination of parameters.
The trick I like to use is simple:  Assume all parameters are empty and temporarily replace them with the values.  If the statement still looks complicated, you need to refactor.

In this case:
WHERE (SV00700.Service_Call_ID = '' OR

'' <> '' OR
'' <> '') AND (SV00700.Call_Invoice_Number = '' OR
'' <> '' OR
'' <> '') AND '' = '' OR
'' = rtrim(SV00700.Call_Invoice_Number)) AND '' = '' OR
SV00700.BACHNUMB = '' ) AND remit2.INTERID = rtrim(db_name())

Notice all those instances of '' <> ''.  That is what SQL is actually seeing at run-time if the parameter is an empty string. 

We can make this much easier to read by figuring out how to evaluate each parameter individually, rather than trying to evaluate all possible combinations.

Consider this:
WHERE SV00700.Service_Call_ID LIKE CASE RTRIM(@Service_Call_ID) WHEN '' THEN '%' ELSE RTRIM(@Service_Call_ID) END
  AND SV00700.Call_Invoice_Number LIKE CASE RTRIM(@Call_Invoice_Number) WHEN '' THEN '%' ELSE RTRIM(@Call_Invoice_Number) END
  AND SV00700.Call_Invoice_Number LIKE CASE RTRIM(@DocumentNumber) WHEN '' THEN '%' ELSE RTRIM(@DocumentNumber) END
  AND SV00700.BACHNUMB LIKE CASE RTRIM(@BatchNum) WHEN '' THEN '%' ELSE RTRIM(@BatchNum) END
It accomplishes the same thing, but is much easier to read.  Further, each parameter can be easily removed without reworking the entire statement.
 
When we apply the trick to it:
WHERE SV00700.Service_Call_ID LIKE CASE RTRIM('') WHEN '' THEN '%' ELSE RTRIM('') END

AND SV00700.Call_Invoice_Number LIKE CASE RTRIM('') WHEN '' THEN '%' ELSE RTRIM('') END
AND SV00700.Call_Invoice_Number LIKE CASE RTRIM('') WHEN '' THEN '%' ELSE RTRIM('') END
AND SV00700.BACHNUMB LIKE CASE RTRIM('') WHEN '' THEN '%' ELSE RTRIM('') END

It is still easy to read. 

I hopes this helps someone else to go forth and simplify.



Did this help you?  If so, please leave a comment!

SQL 2022 TSQL snapshot backups!

  SQL 2022 now actually supports snapshot backups!  More specifically, T-SQL snapshot backups. Of course this is hardware-dependent. Here ...