Home Blog .NET Stripping non-ascii character using Regex
Stripping non-ascii character using Regex
Written by Dicky   
Monday, 07 October 2013 14:30
AddThis Social Bookmark Button

I was looking the best way to strip non-ascii characters so it will not cause any issue during database export to a flat file.

User often using a non-US standard keyboard thus entering address such: 4900 Union�Road. This is causing problem when we only support ascii character. To rectify the issue, here's a simple regex that will remove all non-ascii characters and replace it with a white space. The regex basically trying to match first 255 characters of ascii characters in UTF-8 or unicode and with the ^ not operator present. It will match only the non-ascii character and replace it with white space. Here is the code:

 string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]", string.Empty);

Hope this helps!