The introduction of the forms chapter in HTML 4.01 reads: "An HTML form is a section of a document containing normal content, markup, special elements called controls (checkboxes, radio buttons, menus, etc.), and labels on those controls. Users generally 'complete' a form by modifying its controls (entering text, selecting menu items, etc.), before submitting the form to an agent for processing (e.g., to a web server, to a mail server, etc.)."
The defining element for HTML forms is named, not too surprisingly, form. This element describes some important aspects of the form, including where and how to submit data. The content of this element consists of regular HTML markup, as well as controls.
Forms represent a structured exchange of data. In HTML forms, the structure of the collected data, called a form data set, is a set of name/value pairs. The names and values that are included in this set are solely determined by the controls present within the form, so that adding a new control element, as well as adding to the user interface, also adds a new name/value pair to the data set. Many authors take for granted this basic violation of the separation between the data layer and the user interface layer—a problem that XForms has gone to considerable lengths to alleviate.
Which control types are available in HTML forms? The following sections will answer this question.
The workhorse of HTML forms, this control permits the entry of any character data. Text input controls accept a string value and contribute it to the form data set. Example 1.1, “XHTML code for a single-line text control ” shows the XHTML code needed to produce a basic single-line text control, and Figure 1.1, “Rendering of a single-line text input” shows the result.
Example 1.1. XHTML code for a single-line text control
<input type="text" name="name" value="Dubinko, Micah"/>
A more complex variation of text entry is when multiple lines of text need to be entered. For this purpose, HTML forms includes a separate form control that is typically larger than standard text input controls and offers special handling of multiple-line text. Multi-line text input controls contribute to the form data set exactly as do single-line text input controls. Example 1.2, “XHTML code for a multi-line text control ” shows the XHTML code for a multi-line text control, and Figure 1.2, “Rendering of a multi-line text control” shows the result.
Example 1.2. XHTML code for a multi-line text control
<textarea name="blogentry"><strong>The Joy of Named ...</textarea>
Another variation of text entry is for sensitive data, such as a password, that could be harmful to display on the screen where someone could "shoulder surf," or covertly observe, and thus compromise security measures. It is important to note that this control provides only a casual level of security in the presentation: it does not, for example, provide any data encryption. Password text input controls contribute to the form data set exactly as do text input controls. Example 1.3, “XHTML code for a password control ” shows the XHTML code needed for a password control, and Figure 1.3, “Rendering of a password control” shows the result.
These controls are similar to buttons, but when activated have the effect of built-in processing (to submit or reset the form, respectively). Reset controls aren't supposed to contribute to the form data set, but up to one submit button can. This can be useful, when there are multiple submit buttons, in determining which one initiated the submission process. Example 1.4, “XHTML code for submit and reset controls ” shows the XHTML code needed for submit and reset controls, and Figure 1.4, “Rendering of submit and reset controls” shows the result.
Example 1.4. XHTML code for submit and reset controls
<input type="submit" value="Continue"/> <input type="reset" value="Clear Order Form"/>
The effect of activating a button is to invoke a call in a scripting language. A button can be specified in two slightly different ways, with the button syntax being slightly more expressive. If a value is assigned to the button, it will be contributed unchanged to the form data set (not the most useful functionality, but there if you need it). Example 1.5, “XHTML code for a button control ” shows the XHTML code for a button control, and Figure 1.5, “Rendering of a button control” shows the result.
Named after the mechanical controls on old radios, this common control requires that a single option always be selected, and thus is almost always used as a group of controls with the same name. The HTML specification encourages authors to ensure that a particular choice is initially selected, but in practice authors usually don't select a particular choice, resulting in "undefined" behavior. (One common implementation choice is to provide a temporary exception to the one-thing-must-always-be-selected rule, but it isn't safe to rely on this behavior.) A group of radio buttons provides a single value representing the current selection to the form data set. Example 1.6, “XHTML code for a radio button group ” shows the XHTML code for a radio button group, and Figure 1.6, “Rendering of a radio button group” shows the result.
Example 1.6. XHTML code for a radio button group
<input type="radio" name="car" value="0"/> None<br/> <input type="radio" name="car" value="1"/> 1 car<br/> <input type="radio" name="car" value="2"/> 2 cars<br/> <input type="radio" name="car" value="3"/> 3 cars<br/> <input type="radio" name="car" value="4"/> 4 cars<br/> <input type="radio" name="car" value="many"/> 5 or more<br/>
This simple on/off control has become familiar to computer users everywhere. Often, this control is used in a group which uses the same name, which allows for a select-zero-or-more behavior, though solo checkboxes are common as well. Only checkboxes that are checked contribute to the form data set. In cases where multiple checkboxes share the same name and are checked, the form data set will contain multiple entries with the same name and each selected value. Example 1.7, “XHTML code for a checkbox group ” shows the XHTML code for a checkbox group, and Figure 1.7, “Rendering of a checkbox group” shows the result.
Example 1.7. XHTML code for a checkbox group
<input type="checkbox" name="referBy" value="td"/> Test driven a vehicle<br/> <input type="checkbox" name="referBy" value="dlr"/> Visited an autotmotive dealer<br/> <input type="checkbox" name="referBy" value="veh"/> Purchased/Leased a vehicle<br/> <input type="checkbox" name="referBy" value="ins"/> Purchased automobile insurance<br/>
Commonly called a listbox or drop-down menu, this control enforces a single selection out of several options. In effect, this control provides another way to achieve the same function as radio buttons, but with a different visual presentation. As is the case with radio buttons, an initial state that doesn't explicitly select some initial choice is "undefined," though existing implementations usually allow an initial nothing-selected state. Single-select menus use one option child element for each option, which can include both a display value and a storage value. The storage value representing the current selection is provided to the form data set. Example 1.8, “XHTML code for a single-select control ” shows the XHTML code for a single-select control, and Figure 1.8, “Rendering of a single-select control” shows the result.
Example 1.8. XHTML code for a single-select control
<select name="searchtype"> <option selected="selected" value="all">all words</option> <option value="any">any words</option> </select>
Adding an attribute to the select element enables the control to accept multiple selections, or even to select nothing at all. In this configuration, this control can achieve the same function as a group of checkbox controls, but with a different presentation. As with checkboxes, if any options are selected, this control provides the display value of each selection to the form data set. Example 1.9, “XHTML code for a multiple-select control ” shows the XHTML code for a multiple-select control, and Figure 1.9, “Rendering of a multiple-select control” shows the result.
Example 1.9. XHTML code for a multiple-select control
<select multiple="multiple"> <option value="0">UNCONFIRMED</option> <option selected="selected" value="1">NEW</option> <option selected="selected" value="2">ASSIGNED</option> <option selected="selected" value="3">REOPENED</option> <option value="4">RESOLVED</option> <option value="5">VERIFIED</option> <option value="6">CLOSED</option> </select>
A more recent addition to HTML was the ability to select a local file to submit along with the rest of the form data. This control contributes binary data into the form data set, which has implications on the wire format used to submit data, as discussed later. The filename selected is also included, in a secondary way, in the submitted data. Example 1.10, “XHTML code for a file select control ” shows the XHTML code for a file select control, and Figure 1.10, “Rendering of a file select control” shows the result.
Often, a form needs to hold more data than what is visible, in order to track state or earlier interactions. This control has no user interface effect, but contributes to the form data set. Example 1.11, “XHTML code for a hidden control ” shows the XHTML code for a hidden control.
Finally, the HTML specification defines a way for additional controls, such as plug-ins or Java applets, to participate in forms. This approach, however, never gained popularity, although clever programmers have used scripting and dynamic HTML to accomplish many of the same goals.
Printed forms make extensive use of labels as directions for filling out the document, which is good, since most people don't read the regular instructions, anyway. HTML forms are no different. A label element can be associated with any control, either by wrapping the label around the control, or by referencing an ID unique to the form control. When connected this way, the label becomes an extension of the control, which helps make forms more usable. For example, a radio button label is a much easier target to click on than the tiny circular control itself. When the label is properly connected, clicking it has the same effect as clicking the related control.
Nobody is sure exactly why, but the simple practice of using label elements has failed to catch on with authors. As a result, many HTML forms still use tables and other inaccessible techniques where text associated with a form control might visually appear nearby the control, but is actually defined in some unrelated markup structure, such as a different table cell. That kind of document is a major obstacle for non-visual users to figure out, since the visual proximity of items is the only connection between form controls and labels.
Groups of radio buttons pose another problem for labeling. Each radio button can have an individual label, but what about labeling the overall group? For this purpose, HTML forms include a general-purpose grouping element called fieldset, the first child of which may be legend, which is another kind of label. Example 1.12, “XHTML code for a fieldset ” shows the XHTML code for a fieldset, and Figure 1.11, “Rendering of a fieldset” shows the result.
Example 1.12. XHTML code for a fieldset
<fieldset> <legend>Personal Information</legend> <input type="radio" name="mstatus" value="M"/> Married<br/> <input type="radio" name="mstatus" value="S"/> Single<br/> <input type="radio" name="mstatus" value="X"/> Decline to state<br/> </fieldset>
Using a keyboard to get around in a form is not only an accessibility feature, but also a convenience for people who need to fill large numbers of forms or lengthy forms. All controls accept two attributes to help define a keyboard interface:
Often it is necessary in an electronic form to have a control that displays, but doesn't allow changes to, a piece of data. This can be accomplished through an attribute called readonly, which unfortunately only applies to text input controls. When a control is read-only, it is still possible to navigate to it, and any data present will still be submitted.
The disabled attribute enforces a stronger prohibition. Any control, even lists, radio buttons, or checkboxes, can be disabled, in which case the browser gives the control a distinctive "grayed out" appearance, indicating its unavailability. It is not possible to navigate to a disabled control, nor will it participate in data submission. Effectively, the control is not part of the form anymore (although it is still available to scripting).
Except for the file upload control, it's possible to provide initial data for all form controls, but keeping track of the differing form control types is complicated. Here are some of the different control types and the data they accept:
Inserting initial data is a major bottleneck in large-scale projects involving forms, both in terms of processing time and in opportunities for bugs to appear. The typical approach is to have a template language that is processed by an application server, effectively doing a large search-and-replace operation before delivering every page containing forms. Workflow and routing scenarios, where submitted data is sent from one user's desktop to another, are similarly burdened with large amounts of templating and tricks to populate forms in advance.
Usually, the primary purpose of a form is to submit data. The original, and still most popular, encoding for this is called urlencoded, and is represented by the Internet media type application/x-www-form-urlencoded. In this encoding, spaces become plus signs, and any other reserved characters become encoded as a percent sign and hexadecimal digits, as defined in RFC 1738. One unfortunate aspect of this definition is that it doesn't describe how to encode anything beyond simple ASCII characters. Some implementations have used the document encoding to control this process, but interoperability has remained elusive.
A second encoding became necessary with the introduction of the file upload control and the binary data this introduced into the form data set. This is called multipart/form-data, and is based on the MIME format defined in RFC 2388. This format allows for much more efficient representation of binary and non-ASCII data.
One final consideration in form submission is how the data gets submitted. The HTML specification defines submission through the HTTP methods GET and POST and also includes an example of email, through the mailto: URI scheme. The HTTP specification gives some specific advice on when to use GET versus POST, which we will consider later.
Example 1.13, “XHTML code for a typical XHTML form ” shows a simple, but typical, HTML form. Figure 1.12, “Rendering of a typical XHTML form” shows how this form is rendered.
Example 1.13. XHTML code for a typical XHTML form
<form action="http://example.com/cgi-bin/submit-here" name="shake-poll"> <p>Poll: to be or not to be?</p> <input type="radio" name="thequestion" id="radio1" value="b"/> <label for="radio1">To Be<label><br/> <input type="radio" name="thequestion" id="radio2" value="n"/> <label for="radio2">Not To Be<label><br/> <input type="radio" name="thequestion" id="radio3"/> <label for="radio3">Other (please specify)<label><br/> <input type="text" name="othersel"/> </form>