The ability to clean data is one of the most valuable skills for any data analyst. Below are 3 ways how to clean any special character from a string.
Using a Formula

=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-52.5)<=4.5)),
CHAR(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))),
“”))
if you do not have office365, replace the SEQUENCE function with ROW & INDIRECT functions as shown below
=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-109.5)<=12.5)+ *check for lowercase*
(ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-77.5)<=12.5)+ *check for UPPERCASE
(ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-52.5)<=4.5)), *check for number*
CHAR(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))), *return the code and convert to char*
“”))
How the Formula Works
NB: If you ever want to check if a string does not contain a special character use the below formula
=SUM(
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-52.5)<=4.5)
)=LEN(C3)
Thanks to a user called Xlambda for Mr. Excel Forum, we can now check if a character is a letter using the EXACT function
A letter is a character that has different values between its lower- and upper-case values.
Therefore, =EXACT( LOWER(string) , UPPER(string) ) will always return FALSE for all letters
=TEXTJOIN(“”,TRUE,
IF(
EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)))
*
NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)),
“”,
MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)
)
)
How the Formula Works
EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1))) checks if character is not a letter
NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)) checks if character is not a number
Now with the new knowledge, to check if a string contains a special character,
= IF(
SUM(EXACT(UPPER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)),LOWER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)))*
NOT(ISNUMBER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)+0))),
1,
0)
You can replace the IF function with the FILTER function
=TEXTJOIN(“”,TRUE,
FILTER( MID(E3,SEQUENCE(LEN(E3)),1),
NOT(EXACT(UPPER(MID(E3,SEQUENCE(LEN(E3)),1)),LOWER(MID(E3,SEQUENCE(LEN(E3)),1))))+
ISNUMBER(MID(E3,SEQUENCE(LEN(E3)),1)+0)
)
)
Use Power Query to Clean Any Special Character
= Table.AddColumn(
Source, “Clean Data”, each
Text.Select([DIRTY DATA],{“a”..”z”,”A”..”Z”,”0″..”9″})
)
Using a Recursive LAMBDA to Clean Any Special Character
=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
charcode, CODE(char),
IF((ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5), ddata , REPLACE(ddata,start,1,”-“))
),
start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)

How it works:
start<=LEN(ddata) always check if we have looped through the whole string. If not yet, it calls the function CLEANDATA
MID(ddata,start,1) Retrieves every character in the string one at a time which is later converted into a code CODE(char)
(ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5) Checks if the extracted character code is either lowercase, UPPERCASE or number, if true, returns the character
REPLACE(ddata,start,1,”-“) if a special character is found it is replaced with a hyphen “-“
SUBSTITUTE(ddata, “-“, “”) When we have looped through the whole string, substitute “-” with nothing
See more recursive LAMBDAs on Mr. Excel Forum
With the new knowledge on how to check for any letter in a string, you can re-write the recursive LAMBDA below
=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
IF(EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
REPLACE(ddata,start,1,”-“), ddata)
)
,start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)
Using a NON-Recursive LAMBDA to Clean Any Special Character
=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
charcode, CODE(char),
notSpecial, (ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5),
IF(notSpecial , char , “”)
)
)
)

With the new knowledge on how to check for any letter in a string, you can re-write the non-recursive LAMBDA below
=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
special, EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
IF(special, “” , char)
)
)
)
Recent Comments