You are reading O'Reilly XForms Essentials by Micah Dubinko. (What is this?) - Buy XForms Essentials Online

Where and How to Submit

The questions of where and how are closely related, because the target of submission is a URI. The first part of a URI, called the scheme, indicates the general approach for the submit transaction, as in "http," "file," or "mailto." The remainder of the URI gives more specific information on where the destination for the data is to be.

Additionally, there need to be rules for how the in-memory instance data gets written down as a pattern of bytes on the wire. In addition to XML, several backward-compatible formats included in XForms are described in the following sections.

URI schemes, included as part of the action attribute on submission, are the broadest selector of where and how form data gets submitted. A more fine-grained distinction is the request method (often just simply called "method"), which defines details about the relationship between a URI and the representation of whatever resides at that URI.

The most common request method is GET, which is used for requesting most web pages, images, sound, and video through a web browser. GET is commonly used with forms, too, especially shorter ones. The second most common method is POST, which is described in the definition of HTTP/1.1 at RFC 2616 as the preferred way to provide:

In any case, the actual function performed by the POST method is determined by the server and is usually dependent on the URI that is part of the operation.

A third request method is PUT—little used on the Web today, but hopefully something that XForms can help change. A PUT is also a write operation but, unlike POST, it implies that an existing resource indicated by the URI is getting replaced, rather than annotated or appended to. If there is no preexisting resource, then the PUT method has the effect of creating a new resource.

In XForms terms, the attribute method on submission indicates the author's selection of request method. The combination of URI scheme and method defines the overall processing that will happen during submit. Note, however, that nonsensical combinations are possible—such as "mailto:" with PUT or "file:" with POST.

The file scheme represents access to the local filesystem. On the Windows platform, networked file shares, which are treated in a similar fashion to the local file system, can also be accessed through the file URI scheme.

Only PUT makes sense for form data sent via the file scheme. Since not every XForms processor is guaranteed to have a filesystem, this scheme isn't guaranteed to be supported.

The file scheme can be useful when used indirectly with relative paths. For example, the following declaration

<submission method="put" action="myfile.xml"/>

specifies that the action URI is relative. When the containing document is loaded from a file scheme, then the submission also goes to the file myfile.xml in the current directory. When the containing document is loaded from an http scheme, however, the submission gets PUT to a URI in the same directory as the current document, except ending in myfile.xml.

file:/C:/dir/file.xml
file://C:/dir/file.xml
file:///C:/dir/file.xml
file:///C|/dir/file.xml
file:///C%3A/dir/file.xml

At some point the in-memory instance data gets converted, or serialized, into a stream of bytes suitable for sending over the wire. The following sections describe the serialization formats defined in XForms 1.0.

XML for form data submission is one of the main motivations behind XForms. This is the most straightforward serialization format; after all, instance data is based on the XPath data model, which was specifically designed to model XML.

XForms borrows from XSLT several attributes that fine-tune the serialization process: indent, encoding, omit-xml-declaration, standalone, and cdata-section-elements. Note that these attributes maintain the original spelling, including dashes, as in XSLT. The following section describing the submission element contains all the details on what each attribute does. The important thing to note is that all of these attributes taken from XSLT are advisory only, and that an XForms processor is free to ignore any that are inconvenient for the implementer.

The media type of the submitted XML will be application/xml by default, though this can be overridden with the mediatype attribute.

Another attribute, includenamespaceprefixes, is the part of XForms that has to do with details of how namespaces are generally handled in XML-based specifications. The XPath data model contains, for each element node, one namespace node per in-scope namespace. As a result, inline instance data will have additional, generally unwanted, namespace nodes that get serialized. Example 8.2, “Serialization of namespace nodes ” shows code that will give this result.

In Example 8.2, “Serialization of namespace nodes ”, the XForms namespace is in scope, bound to the prefix xforms. Correspondingly, the XPath data model will contain a namespace node for the XForms namespace and, without taking any special action, the serialized XML will look like this:

<my:data xmlns:xforms="http://www.w3.org/2002/xforms"/>

In other words, the end result includes an unnecessary namespace declaration. Because of the widespread use of namespace prefixes in attribute values and text, it's not always safe to throw away unused prefixes. The solution is to specify the includenamespacesprefixes attribute, which will cause any prefixes that are not visibly used (for element or attribute names) to be suppressed, unless they are included in a space-separated list. A special value, #default, applies to the default namespace. So, to prevent the unwanted xforms namespace declaration seen earlier, a simple:

includenamespacesprefixes=""

on the submission element would do the trick.

The algorithm for urlencoding is quite simple, but nevertheless has caused many problems in the past. The reason for this is that the algorithm specification, as defined in HTML, didn't say what to do with characters outside the range of ASCII. As a result, numerous variations sprang into existence, with no way to tell which was which.

XForms fixes this by mandating UTF-8 as the one true basis for urlencoding. In UTF-8, a single character is represented by a single byte for characters in the ASCII range, and by between two and five bytes for other Unicode characters. Overall, the urlencoding algorithm for a given string boils down to:

For example, the string "Ünited Stätes" after urlencoding, would be "%C3%9Cnited+St%C3%A4tes".[5]

A bigger hurdle is representing structured XML as a flat list of name/value pairs. In this, XForms doesn't attempt to model an entire tree as a flat structure. Instead, only the leaf element nodes—those that contain one and only one text child node—are included in the serialization. That's right—no attributes, no namespace information, and no elements that aren't leaf nodes. When such XML features are needed, application/xml is the appropriate serialization format.

The overall serialization follows the document order of the instance data, and is formatted as:

{element local name}={value of text node}{separator}

Where the element local name and value of the text node are urlencoded, separated by a literal equals character. Between each grouping is a separator character, a semicolon by default. For compatibility with older systems, this character can be changed to an ampersand through the separator attribute on submission (the ampersand is no longer favored, because it needs to be specially escaped as &amp; when represented in XML).

One drawback of application/xml, and especially of urlencoded data, is that binary content can't be represented efficiently. The answer to this dilemma is a media type that allows binary content to be packaged separately from XML. A number of MIME types that start with multipart/, though originating as part of the global email system, have come into use as a way to package binary data along with XML.

All of the multipart formats break a message into smaller pieces, simply called parts. In multipart/related, the first part contains XML serialized just as in the application/xml serialization method. Subsequent parts contain binary resources that the user selected through <upload> form controls, which must be bound to instance data nodes of the XML Schema datatype anyURI.

For example, a simple form might capture an employee name as a string and a photo as an anyURI, like this:

<xforms:input ref="name">
  <xforms:label>Name</xforms:label>
</xforms:input>
<xforms:upload ref="picture" mediatype="image/*">
  <xforms:label>Photo</xforms:label>
</xforms:upload>

Serialized as multipart/related, the result would be:

Content-Type: multipart/related; boundary=a42113842b; type=application/xml; start"=<000000@dubinko.info>"
Content-Length: 65232
--f93dcbA3
Content-Type: application/xml; charset=UTF-8
Content-ID: <000000@dubinko.info>
<?xml version="1.0"?>
<root_element>
  <name>Cordova Cassanova</name>
  <picture>cid:000001@edubinko.info</picture>
</root_element>
--a42113842b
Content-Type: image/jpg
Content-Transfer-Encoding: binary
Content-ID: <000001@dubinko.info>
...binary image data...
--a42113842b--

Notice that the URI of the picture has been dereferenced, and the actual data now appears in the submitted data stream.



[5] According to The Onion, 29 April 1997, the U.S. Congress planned to toughen the image of the country by adding umlauts to the name.