You are reading O'Reilly XForms Essentials by Micah Dubinko. (What is this?) - Buy XForms Essentials Online

A Brief Review of HTML Forms

The introduction of the forms chapter in HTML 4.01 reads: "An HTML form is a section of a document containing normal content, markup, special elements called controls (checkboxes, radio buttons, menus, etc.), and labels on those controls. Users generally 'complete' a form by modifying its controls (entering text, selecting menu items, etc.), before submitting the form to an agent for processing (e.g., to a web server, to a mail server, etc.)."

The defining element for HTML forms is named, not too surprisingly, form. This element describes some important aspects of the form, including where and how to submit data. The content of this element consists of regular HTML markup, as well as controls.

Forms represent a structured exchange of data. In HTML forms, the structure of the collected data, called a form data set, is a set of name/value pairs. The names and values that are included in this set are solely determined by the controls present within the form, so that adding a new control element, as well as adding to the user interface, also adds a new name/value pair to the data set. Many authors take for granted this basic violation of the separation between the data layer and the user interface layer—a problem that XForms has gone to considerable lengths to alleviate.

Which control types are available in HTML forms? The following sections will answer this question.

Printed forms make extensive use of labels as directions for filling out the document, which is good, since most people don't read the regular instructions, anyway. HTML forms are no different. A label element can be associated with any control, either by wrapping the label around the control, or by referencing an ID unique to the form control. When connected this way, the label becomes an extension of the control, which helps make forms more usable. For example, a radio button label is a much easier target to click on than the tiny circular control itself. When the label is properly connected, clicking it has the same effect as clicking the related control.

Nobody is sure exactly why, but the simple practice of using label elements has failed to catch on with authors. As a result, many HTML forms still use tables and other inaccessible techniques where text associated with a form control might visually appear nearby the control, but is actually defined in some unrelated markup structure, such as a different table cell. That kind of document is a major obstacle for non-visual users to figure out, since the visual proximity of items is the only connection between form controls and labels.

Groups of radio buttons pose another problem for labeling. Each radio button can have an individual label, but what about labeling the overall group? For this purpose, HTML forms include a general-purpose grouping element called fieldset, the first child of which may be legend, which is another kind of label. Example 1.12, “XHTML code for a fieldset ” shows the XHTML code for a fieldset, and Figure 1.11, “Rendering of a fieldset” shows the result.

Usually, the primary purpose of a form is to submit data. The original, and still most popular, encoding for this is called urlencoded, and is represented by the Internet media type application/x-www-form-urlencoded. In this encoding, spaces become plus signs, and any other reserved characters become encoded as a percent sign and hexadecimal digits, as defined in RFC 1738. One unfortunate aspect of this definition is that it doesn't describe how to encode anything beyond simple ASCII characters. Some implementations have used the document encoding to control this process, but interoperability has remained elusive.

A second encoding became necessary with the introduction of the file upload control and the binary data this introduced into the form data set. This is called multipart/form-data, and is based on the MIME format defined in RFC 2388. This format allows for much more efficient representation of binary and non-ASCII data.

One final consideration in form submission is how the data gets submitted. The HTML specification defines submission through the HTTP methods GET and POST and also includes an example of email, through the mailto: URI scheme. The HTTP specification gives some specific advice on when to use GET versus POST, which we will consider later.

Example 1.13, “XHTML code for a typical XHTML form ” shows a simple, but typical, HTML form. Figure 1.12, “Rendering of a typical XHTML form” shows how this form is rendered.