The ability to clean data is one of the most valuable skills for any data analyst. Below are 3 ways how to clean any special character from a string.

Using a Formula

=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))-52.5)<=4.5)),
CHAR(CODE(MID(formula!$B3,SEQUENCE(LEN(formula!$B3)),1))),
“”))

if you do not have office365, replace the SEQUENCE function with ROW & INDIRECT functions as shown below

=TEXTJOIN(“”,TRUE,
IF(
((ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-109.5)<=12.5)+ *check for lowercase*
(ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-77.5)<=12.5)+ *check for UPPERCASE
(ABS(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))-52.5)<=4.5)), *check for number*
CHAR(CODE(MID(formula!$B3,ROW(INDIRECT(“1:”&LEN(formula!$B3))),1))), *return the code and convert to char*
“”))

How the Formula Works

NB: If you ever want to check if a string does not contain a special character use the below formula

=SUM(
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-77.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-109.5)<=12.5)+
(ABS(CODE(MID(C3,SEQUENCE(LEN(C3)),1))-52.5)<=4.5)
)=LEN(C3)

Thanks to a user called Xlambda for Mr. Excel Forum, we can now check if a character is a letter using the EXACT function

A letter is a character that has different values between its lower- and upper-case values.

Therefore, =EXACT( LOWER(string) , UPPER(string) ) will always return FALSE for all letters

=TEXTJOIN(“”,TRUE,
IF(
EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)))
*
NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)),
“”,
MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)
)
)

How the Formula Works

EXACT(UPPER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)),LOWER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1))) checks if character is not a letter

NOT(ISNUMBER(MID(B3,ROW(INDIRECT(“1:”&LEN(B3))),1)+0)) checks if character is not a number

Now with the new knowledge, to check if a string contains a special character,

= IF(
SUM(EXACT(UPPER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)),LOWER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)))*
NOT(ISNUMBER(MID(B4,ROW(INDIRECT(“1:”&LEN(B4))),1)+0))),
1,
0)

You can replace the IF function with the FILTER function

=TEXTJOIN(“”,TRUE,
FILTER( MID(E3,SEQUENCE(LEN(E3)),1),
NOT(EXACT(UPPER(MID(E3,SEQUENCE(LEN(E3)),1)),LOWER(MID(E3,SEQUENCE(LEN(E3)),1))))+
ISNUMBER(MID(E3,SEQUENCE(LEN(E3)),1)+0)
)
)

Use Power Query to Clean Any Special Character

= Table.AddColumn(
Source, “Clean Data”, each
Text.Select([DIRTY DATA],{“a”..”z”,”A”..”Z”,”0″..”9″})
)

Using a Recursive LAMBDA to Clean Any Special Character

=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
charcode, CODE(char),
IF((ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5), ddata , REPLACE(ddata,start,1,”-“))
),
start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)

How it works:

start<=LEN(ddata) always check if we have looped through the whole string. If not yet, it calls the function CLEANDATA

MID(ddata,start,1) Retrieves every character in the string one at a time which is later converted into a code CODE(char)

(ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5) Checks if the extracted character code is either lowercase, UPPERCASE or number, if true, returns the character

REPLACE(ddata,start,1,”-“) if a special character is found it is replaced with a hyphen “-“

SUBSTITUTE(ddata, “-“, “”) When we have looped through the whole string, substitute “-” with nothing

See more recursive LAMBDAs on Mr. Excel Forum

With the new knowledge on how to check for any letter in a string, you can re-write the recursive LAMBDA below

=LAMBDA(ddata, start,
IF(start<=LEN(ddata),
CLEANDATA(
LET(
char, MID(ddata,start,1),
IF(EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
REPLACE(ddata,start,1,”-“), ddata)
)
,start+1
),
SUBSTITUTE(ddata, “-“, “”)
)
)

Using a NON-Recursive LAMBDA to Clean Any Special Character

=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
charcode, CODE(char),
notSpecial, (ABS(charcode-77.5)<=12.5)+(ABS(charcode-109.5)<=12.5)+(ABS(charcode-52.5)<=4.5),
IF(notSpecial , char , “”)
)
)
)

With the new knowledge on how to check for any letter in a string, you can re-write the non-recursive LAMBDA below

=LAMBDA(ddata,
TEXTJOIN(“”,TRUE,
LET(
char, MID(ddata,SEQUENCE(LEN(ddata)),1),
special, EXACT(LOWER(char),UPPER(char))*NOT(ISNUMBER(char+0)),
IF(special, “” , char)
)
)
)

DOWNLOAD WORKSHEET

Print Friendly, PDF & Email

Do you want to excel in Excel?

Or

Seeking Microsoft Certification?

Every Week Receive Resources To Get you Closer to Your Goal!

No Huff! No Fluff! No Spam!

You are on your way to Excelling in Excel and Becoming Microsoft Certified!

Share This

Spread the Good News!

Do me favour, please? Share this with your friends!

Thanks For Commenting!

Awesome for you to comment! Please share the post with your friends.