You are reading O'Reilly XForms Essentials by Micah Dubinko. (What is this?) - Buy XForms Essentials Online

An Email Datatype for XForms

One of the great disappointments in the XForms specification is the lack of a defined datatype for email—the one datatype common to nearly every Web form. Even if the specification doesn't define an email datatype, form designers still can. Getting all the details right is a little tricky, though. Since regular expressions aren't a programming language, there's no way to define a common recurring segment, and the regular expression tends to get a little repetitive. Taken one step at a time, however, it makes perfect sense. The datatype definition conforming to RFC 2822 is:

<xs:simpleType name="email">
  <xs:restriction base="xs:string">
    <xs:pattern value="[A-Za-z0-9!#-'\*\+\-/=\?\^_`\{-~]+

The main achievement in this lengthy statement is the definition of what the email address specification calls atext, which is defined alpha characters, digits, or one of the following characters:

"!" "#" "$" "%" "&" "'" "*" "+" "-" "/" "=" "?" "^" "_" "`" "{" "|" "}" "~"

In regular expression syntax, the definition for a single character of atext looks like this:


Note that the character ranges in this expression prevent it from being even bulkier, and that a number of the characters used need to be escaped. If you compare this with the entire regular expression given earlier, you will see that this definition repeats four times overall. If regular expressions had a way to define a commonly-recurring string, the regular expression might look like this (with spaces added for readability):

atext+ (\. atext+)* @ atext+ (\. atext+)*

But alas, the actual regular expression needs to repeat the full definition of atext four times, yielding the full definition of the email datatype.