While we're on the topic, e-mail validation is a tricky beast and something I have researched to death. There are "good enough" e-mail RegEx checks out there that I've been happy with in the past (
http://www.regular-expressions.info/email.html ) when paired with a 254-char length validation. There is also a total and complete e-mail spec RegEx example out there as a reference (
https://code.iamcal.com/php/rfc822/full_regexp.txt from
https://www.iamcal.com/publish/articles ... sing_email ), but it is 22,174 characters long. While kept up to date, it is a beast and I much prefer things I understand and can debug myself. I've never tried to use it.
That said, I eventually found some issues with even the "good enough" RegEx I was using. For example, the e-mail spec allows for spaces, and @ symbols in the local part. The e-mail spec actually says you should accept pretty much anything in the local part. So as to accept double quotes, spaces, and @ symbols in the local part I loosened up the e-mail validation on all of my web apps and ended up with something I am even more happy with. I think the key here is to not try to have the RegEx do everything. For example, the true length limit of an e-mail address is 254 characters. There is no reason to have the RegEx deal with that as well. Just check that independently. I also split up the task of handling whitespace to a separate validation. So I check 3 things. Length, whitespace in the local, and then the overall structure with RegEx.
I think my [Ruby] code says it best:
Code: Select all
# Original e-mail format RegEx cribbed from: http://www.regular-expressions.info/email.html
# Local part made much more permissive based on advice from: https://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-address-until-i.aspx
# More fun reading: https://www.iamcal.com/publish/articles/php/parsing_email
#
# Don't attempt to make this RegEx self-commenting unless you know exactly what you're doing.
# 1) Placing the RegEx on multiple lines allows for line comments only if you ignore whitespace in the RegEx and we
# have a space character in there we want to preserve.
# 2) Building regular expressions from individual strings in Ruby is more difficult than you'd think. Interpolating via
# double-quoted strings or even into RegEx literals (it can be done!) cause unexpected character escaping or other
# nonsense.
# For these reasons the RegEx is assigned as one long line but described individually below:
#
# Description RegEx Explanation
# --------------------------------------------------------------------------------------------------------------------
# Begin-string anchor: \A Match the beginning of the string. The JavaScript version requires the less-ideal begin-line anchor `^`.
# Local part: [A-Z0-9 .!#$%&\'"*+\/\\=?^_`{|}~@-]{1,64} 1 to 64 characters inclusive of alphanumeric, a literal space, and .!#$%&'"*+/\=?^_`{|}~@- some of which are escaped with a backslash.
# Delimiter: @ 1 required literal @ symbol. The e-mail delimiter we all know and love. (Note the local part can also include this character.)
# Domain part: (?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63} See: http://www.regular-expressions.info/email.html
# End-string anchor: \z Match the end of the string. The JavaScript version requires the less-ideal end-line anchor `$`.
#
# The final resulting RegEx is designed to be used while ignoring case.
EMAIL_FORMAT_REGEX = /\A[A-Z0-9 .!#$%&\'"*+\/\\=?^_`{|}~@-]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}\z/i.freeze
EMAIL_FORMAT_REGEX_FOR_JS = /^[A-Z0-9 .!#$%&\'"*+\/\\=?^_`{|}~@-]{1,64}@(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}$/i.freeze
That's the main RegEx.
There is then also an overall length validation to make sure the address is 254 chars or fewer.
And then here is my whitespace validation:
Code: Select all
#
# This validator is used alongside normal length and RegEx format validations to ensure e-mail addresses are valid.
# This specific validator makes sure that if spaces exist in the local portion of the e-mail address that they are only
# included inside a double-quoted section of said local portion.
# Blanks are allowed; use a separate presence validation to enforce that if desired.
#
# Valid examples:
# ''
# 'normal.valid@email.address'
# 'missing.domain.delimiter'
# '"This e-mail address has quoted spaces in the local part"@valid.address'
# '"This e-mail address is otherwise invalid"@'
# '"Two or more of these"quoted.sections"are also completely valid"@email.address'
#
# Invalid examples:
# 'Spaces without quotes@domain.tld'
# 'Spaces but some are "not inside" quotes@domain.tld'
# '"Two of these" quoted sections "but some spaces outside"@domain.tld'
#
class EmailAddressLocalWhitespaceValidator < ActiveModel::EachValidator
###################################################################
#
# #validate_each()
#
###################################################################
def validate_each(record, attribute, value)
# Blanks are allowed.
return if value.blank?
attribute_name = attribute.to_s.titleize
message = ''
if !self.class.whitespace_valid?(value)
message = "The provided #{attribute_name} contains invalid spaces. If the e-mail address truly contains spaces they should be wrapped inside double quotes."
end
record.errors.add(attribute, message) if message.present?
end
###################################################################
#
# Public Class Methods
#
###################################################################
def self.whitespace_valid?(email_address)
return true if email_address.blank? # Entire e-mail address was blank, so local part whitespace is valid.
local = get_local_part(email_address)
return true if local.blank? # Local part was blank or no domain delimiter (@) exists, so assume local part whitespace is valid.
return true if !local.include?(' ') # No spaces in local part, so local part whitespace is valid.
return false if local.count('"').odd? # Spaces exist but with the wrong number of double-quotes, so local part whitespace is invalid.
# The way split works, the even indexes always hold the unquoted parts regardless of where the quotes are.
# We check the even indexes for [invalid] spaces.
# For example:
# even even even "odd odd odd" even even "odd odd"@domain.tld
# "odd odd odd" even even even "odd odd" even even@domain.tld
parts = local.split('"')
parts.each_with_index do |part, index|
return false if index.even? && part.include?(' ')
end
true
end
###################################################################
#
# Private Class Methods
#
###################################################################
def self.get_local_part(email_address)
return '' if email_address.blank?
parts = email_address.split('@') # Split address by '@' so we can remove the domain part.
parts.pop if parts.size > 1 # Remove the domain part from the array (in place) if it exists.
parts.join('@') # Join the local part back together.
end
private_class_method :get_local_part
end
Of course this all relies on servers down the line you don't control also adhering to the spec and not blowing up on spec-legal but otherwise uncommon things like having two @ symbols in an address or whitespace in the local part. That should be monitored and taken into account. I've not run into any issues so far and it's been a few years in production.