You are reading O'Reilly XForms Essentials by Micah Dubinko. (What is this?) - Buy XForms Essentials Online

Chapter 4. XML Schema in XForms

"Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information on it."

Samuel Johnson

Forms and datatypes always seem to be mentioned together. It's natural to think of data entry in terms of specific types, such as date or phone number. Despite a feint in the opposite direction taken by earlier drafts, XForms incorporates the datatypes defined in W3C XML Schema. This chapter discusses these datatypes, and describes the general framework for describing and defining custom datatypes.

In describing a datatype, XML Schema distinguishes between a lexical space, or the data as it appears in XML, and a value space, or the data as it exists on an abstract level. In practice, many datatypes have a one-to-one mapping between the lexical space and the value space, so the distinction can seem a little academic. It is important, however, when there are equivalent representations for some value. For instance, the boolean datatype can represent true as either 1 or true, (and false as either 0 or false). Even though there are multiple possible representations, they both map to the underlying concept of trueness and falseness, respectively. This is important when comparing values; the value space is used as the basis for comparison.

Many observers have pointed out that the lexical representations of some XML Schema datatypes aren't very user friendly. As an example, the duration of a day and an hour is P1DT1H. From the perspective of the person filling out a form, this is complete gibberish. To work around this, XForms gives responsibility to individual form controls to present data to the user in a manner that's convenient to the intended audience. Thus, XForms introduces (but doesn't specifically name) a third space, the user space. For the benefit of users, this might not be a straightforward mapping—the form control can have great latitude in rearranging things, such as a graphical calendar control to enter durations and dates.

XML Schema uses a divide-and-conquer technique to define datatypes. Each datatype can be broken down into a number of facets, each of which constrains some particular part of the allowed value space for that datatype. (One important exception is the pattern facet, which works on the lexical space.)

It's possible to take an existing datatype and trim it down to exactly meet your needs. This is called derivation by restriction, and entails changing one or more facets in the datatype. For example, the following XML Schema fragment limits the length of a string to 50 characters:

<xs:simpleType name="myString50">
  <xs:restriction  base="xs:string">
    <xs:maxLength value="50"/>
  </xs:restriction>
</xs:simpleType>

This creates a new datatype named myString50, which can then be used in a form to limit the number of characters that can be entered. Other facets can similarly be restricted, as shown in the examples later in this chapter. The list of facets is as follows.

Another kind of derivation is by list. This simply takes a simple datatype and produces a whitespace-separated list datatype. XForms includes a ready-made list datatype called listItems. Another variation is derivation by union, which can combine the value spaces of two separate datatypes. One final variation on derivation is by extension, which is used only in complexTypes, which are discussed later in this chapter.

One of the most useful facet-based restrictions in forms is pattern, which takes a regular expression syntax, adjusted for Unicode compatibility. Entire books have been written on regular expression, so this section only covers the basics. For further information, a good source is Chapter 6 of Eric van der Vlist's XML Schema (O'Reilly).

When a regular expression contains letters or digits, the characters must appear in the entered data, as shown in Table 4.1, “Simple regular expressions ”.

Oftentimes, you might know the format of a string but not the exact contents. For instance, a telephone number might be of the format 123-4567. To handle this, you can use escape sequences, which represent certain character types. Regular expressions support the escape sequences shown in Table 4.2, “Escape sequences (case matters) ”.

Regular expressions can also make use of the character classes shown in Table 4.3, “Character classes ”.

Using these, more complicated patterns are possible:

Since it quickly becomes tedious to repeat an escape sequence (e.g., representing a telephone number with \n\n\n-\n\n\n\n), regular expressions allow for partial matches, sequences, and repeat counts, as shown in Table 4.4, “Quantifiers ”.

Using quantifiers, more complicated types of expressions are possible. Also, parentheses can be used for grouping, and the vertical bar (|) to express two possible branches, either one of which can satisfy the expression, as shown in Table 4.5, “Regular expressions with quantifiers ”.

The final thing to remember is that characters otherwise used for something else need to be escaped when used literally. These characters, in their escaped form, are \\; \|; \.; \-; \^; \?; \*; \+; \{; \}; \(; \); \[; and \].

Table 4.6, “Regular expressions: complete examples ” provides a few ready-to-use regular expressions, suitable for copy-and-paste.