1. FAIL: Your name is invalid. (Ruby UTF-8 regex)

    Often times you’ll want to validate users’ names (or nicknames) in your web applications.  Although I’m not fundamentally opposed to using all the wonderful Unicode (UTF-8 in particular) characters under the sun, it does make it easier to use and understand an app when user names are at least recognizable to you.  If an app is global (like twitter) this may not be the case.  But you get there organically, and it doesn’t make sense to open the floodgates right now.  So we started with

    validates_format_of :nickname, :with => /\A[a-zA-Z0-9_\.\-]+\Z/
    

    Allowing lower and uppercase letters, numbers, dot, hyphen and underscore.  A pretty standard start.  But we saw some validations fail when user names were copied from 3rd party services (twitter, facebook, tumblr) including some letters with accents over them.

    Fortunately we are using Ruby 1.9 (with Rails 3.1) and validating with unicode is straightforward.  We added support for all the Latin extensions with just two tweaks.  First, we need this at the top of the file with the regular expression so Ruby interprets it correctly

    # encoding: UTF-8
    

    then we add from code point U+00C0 to U+02AE, chaning our regex to

    validates_format_of :nickname, :with => /\A[\u00c0-\u02aea-zA-Z0-9_\.\-]+\Z/
    

    Once we get bigger in countries that speak other languages, I’ll be adding some more characters sets to that regex.

     

    tags:  utf-8  unicode  ruby  regex  fail 

    Comments
  2. blog comments powered by Disqus