The ability to clean data is one of the most valuable skills for any data analyst. Below are 3 ways how to clean any special character from a string.

## Using a Formula

=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!\$B3,SEQUENCE(LEN(formula!\$B3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(formula!\$B3,SEQUENCE(LEN(formula!\$B3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(formula!\$B3,SEQUENCE(LEN(formula!\$B3)),1))-52.5)<=4.5)),
CHAR(CODE(MID(formula!\$B3,SEQUENCE(LEN(formula!\$B3)),1))),
“”))

if you do not have office365, replace the SEQUENCE function with ROW & INDIRECT functions as shown below

=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!\$B3,ROW(INDIRECT(“1:”&LEN(formula!\$B3))),1))-109.5)<=12.5)+ *check for lowercase*
(ABS(CODE(MID(formula!\$B3,ROW(INDIRECT(“1:”&LEN(formula!\$B3))),1))-77.5)<=12.5)+ *check for UPPERCASE
(ABS(CODE(MID(formula!\$B3,ROW(INDIRECT(“1:”&LEN(formula!\$B3))),1))-52.5)<=4.5)), *check for number*
CHAR(CODE(MID(formula!\$B3,ROW(INDIRECT(“1:”&LEN(formula!\$B3))),1))), *return the code and convert to char*
“”))

How the Formula Works

NB: If you ever want to check if a string does not contain a special character use the below formula

=SUM(
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-52.5)<=4.5)
)=LEN(C3)

Thanks to a user called Xlambda for Mr. Excel Forum, we can now check if a character is a letter using the EXACT function

A letter is a character that has different values between its lower- and upper-case values.

Therefore, =EXACT( LOWER(string) , UPPER(string) ) will always return FALSE for all letters

=TEXTJOIN(“”,TRUE,
IF(
EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)))
*
NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)),
“”,
MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)
)
)

How the Formula Works

EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1))) checks if character is not a letter

NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)) checks if character is not a number

Now with the new knowledge, to check if a string contains a special character,

= IF(
SUM(EXACT(UPPER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)),LOWER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)))*
NOT(ISNUMBER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)+0))),
1,
0)

You can replace the IF function with the FILTER function

=TEXTJOIN(“”,TRUE,
FILTER( MID(E3,SEQUENCE(LEN(E3)),1),
NOT(EXACT(UPPER(MID(E3,SEQUENCE(LEN(E3)),1)),LOWER(MID(E3,SEQUENCE(LEN(E3)),1))))+
ISNUMBER(MID(E3,SEQUENCE(LEN(E3)),1)+0)
)
)

## Use Power Query to Clean Any Special Character

Source, “Clean Data”, each
Text.Select([DIRTY DATA],{“a”..”z”,”A”..”Z”,”0″..”9″})
)

## Using a Recursive LAMBDA to Clean Any Special Character

=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
charcode, CODE(char),
IF((ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5), ddata , REPLACE(ddata,start,1,”-“))
),
start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)

How it works:

start<=LEN(ddata) always check if we have looped through the whole string. If not yet, it calls the function CLEANDATA

MID(ddata,start,1) Retrieves every character in the string one at a time which is later converted into a code CODE(char)

(ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5) Checks if the extracted character code is either lowercase, UPPERCASE or number, if true, returns the character

REPLACE(ddata,start,1,”-“) if a special character is found it is replaced with a hyphen “-“

SUBSTITUTE(ddata, “-“, “”) When we have looped through the whole string, substitute “-” with nothing

See more recursive LAMBDAs on Mr. Excel Forum

With the new knowledge on how to check for any letter in a string, you can re-write the recursive LAMBDA below

=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
IF(EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
REPLACE(ddata,start,1,”-“), ddata)
)
,start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)

## Using a NON-Recursive LAMBDA to Clean Any Special Character

=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
charcode, CODE(char),
notSpecial, (ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5),
IF(notSpecial , char , “”)
)
)
)

With the new knowledge on how to check for any letter in a string, you can re-write the non-recursive LAMBDA below

=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
special, EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
IF(special, “” , char)
)
)
)

## Seeking Microsoft Certification?

No Huff! No Fluff! No Spam!