(2.1.p.1) Mechanisms for entering text

Background

A central design criterion for a PDA is size, i.e. it should be small enough to fit in its user’s pocket. As for screen size, this is in conflict with certain usability criteria. In the case of mechanisms for entering text, the size of the equipment is in direct contrast with having a normal size keyboard available. This makes entering text at the best different, and usually much more difficult than using a keyboard on a stationary PC.

The main mechanisms for entering text on PDAs may be split in two axes with two variants in each. Along one axis, there is a distinction between keyboards and stokes-based input. Keyboards are either SW keyboards or small physical keyboards. Strokes-based input is usually performed in a designated area, either on the screen (similar to SW keyboards) or on a pressure sensitive area outside the screen. Some strokes-based input (like the Transcriber on the PocketPC platform) can be used anywhere on the screen.

Common for all built-in physical keyboards is that they are so small that the user must enter text with one or two finger. Common for SW keyboards and all strokes-based input is that it is difficult to operate them without using the stylus. Entering national character like the special Nordic letters (often referred to as accented characters) often requires special efforts by the user. This is particularly the case for physical keyboards and strokes-based mechanisms.

Problem

When designing a PDA application where the user must enter some text, there are two related problems to solve: how to avoid that the user must enter text and how to make it easier for the user to enter text. These two approaches are so interconnected that it is difficult to make an accurate distinction between them, especially when considering solutions (see below). The main goal in most cases is that it as much as possible should be avoided that the user is forced to use the generic text entering mechanisms (keyboard/strokes-based input).

Solution(s)

An obvious solution for making it easier to enter text is to use auto complete. This is a mechanism that tries to guess what the user is about to write and suggests this by filling in the suggested text ahead of the writing of the user. On the PocketPC platform one form of auto complete is included in all the generic input mechanisms, i.e. a pop-up list of possible completion(s) of the word being written. To apply the suggestion, the user must actively choose it. This mechanism is also adaptive, in the way that it remembers words written earlier in the same session with the given document. Except for this, it only suggests words from a dictionary in the language of the operating system. This means that while writing Norwegian using an English version of PocketPC, the mechanism is of modest help. Such generic mechanism is not available in Palm OS and Symbian. Some applications (regardless of platform) implement auto complete in certain fields. E.g. this is common when writing an URL in most web browsers on most platforms. It is also common when writing names in an email client. Common for such solutions is that they are always adaptive, i.e. they use the history of values used earlier to suggest the new ones. This type of mechanism may be very helpful, especially when long and/or not too intuitive values should be entered.

A related solution to auto complete is to use predefined values. By this we mean having a list of all (or usually the most common) texts to enter in a field. The values may be accessed from a menu (especially if the values may be used in different fields), or from a combo box (combination of dropdown list box and text field). If the number of values is small and predefined, using radio buttons to choose the value is also an option. If the number of possible values is large and not restricted to predefined values, only the most common ones should be presented. In this case, the list should not be longer than what fits on the screen (to avoid scrolling) and may well be adaptive to the actual values entered/chosen by the user.

Presenting predefined values as a set of radio buttons fits better as an example of a different solution, i.e. alternative input mechanisms. By this we mean using UI controls that do not require using a keyboard or keyboard replacement. In addition to radio buttons, (dropdown) list boxes, check boxes, spinners (input field with up/down arrows to browse through values (called updown in .net compact framework)), sliders (called trackbar in .net compact framework), and menus are the most common controls for entering values without having to type, but in some cases, buttons may also be used. Most of these mechanisms require that there are some sort of restrictions on the domain of the attributes that should be entered through the mechanism.

The alternative input mechanisms just discussed are generic mechanisms that can be used to avoid having to use a keyboard. It is also possible to make specialized input mechanisms. By this we do not primarily consider self made UI controls, but rather using (a combination of) existing controls in a new way to implement a creative solution. An example of this approach is the mechanism used in an application for service technicians implemented by IT Liberator, where the user may write common fault description in a natural language like syntax by choosing from four drop down list with commonly used nouns, verbs and preposition expressions. Having a restricted number of values in each dropdown list still facilitates entering a very large number of possible sentences in a simple way. Such mechanisms are usually application domain specific – at least the values to choose from.
Most of the solutions presented in this problem exploit a general principle that may be considered a solution in itself, i.e. exploiting domain knowledge. Taking advantage of knowledge about the domain area is often a question of restricting the set of possible values to enter. As soon as this number is at a certain low enough level, some of the solutions above may be chosen. In the cases where it is not possible to restrict the set of legal values, identifying the most common ones (i.e. the ones that it is most likely that the user will enter) may facilitate using the same or similar mechanisms.

A solution in the same category as exploiting domain knowledge is to have dynamic behaviour based on actual use. By this we mean that the application tracks how it is used (in this context mainly which values that are entered) and adapt the values presented in a given input mechanism based on this. To make solutions based on this principle more user-friendly and predictable, it may be combined with having special functionality in the application letting the user adjust which values that should used (like the “edit my texts” functionality in the messaging application on the PocketPC platform).

The solutions presented so far has all focused on making text entry from a keyboard easier or replacing it with various on-screen mechanisms. An alternative is to try to collect the data from some other source than user interaction, usually by exploiting contextual data. This principle is based on an assumption that the context in which a mobile user operates changes more rapidly than it does for a stationary user. The idea is that if an application is able to obtain knowledge about the context, this knowledge may be used to make the application more user-friendly.

Regarding entering text, being more user-friendly means that the application obtains data that the user would have had to enter if the application did not have this ability. A very simple example (that also applies for a stationary user) is that the date and time of a given event is entered automatically from a system clock instead of being typed by the user. Another example is using an rf-id sensor to identify a piece of equipment that is to be inspected so that the user is relieved from enter a long and cryptic equipment id. This topic is discussed more exhaustively in section 5.2 (Exploiting that the user is mobile).

A related alternative to exploiting contextual data is to use multimodal input. Although this by definition involves user interaction, the principle is related in the sense that the goal is to avoid both keyboard and screen based interaction. It is also related because mechanisms for multimodal input are sometimes used as a means for obtaining contextual data. E.g. using an rf-id sensor is primarily a mechanism for obtaining contextual data, while a bar code reader is primarily an input mechanism. This is spite the fact that what really happens on a fairly low level of abstraction is identical. By multimodal input we partly mean using alternative input mechanisms to keyboard, hardware buttons and screen interaction (e.g. bar code reader and voice input), and partly using more than one input mechanism at the same time. The former is easier to exploit in an application as it is usually possible to rely on mechanisms outside the application for obtaining the input. The latter often requires some kind of interpretation from the application. Multimodal input is discussed more exhaustively in Problem 2.1.p.4 – Multi modal interaction – stylus, scanner, RF-ID, different types of keyboards, voice control.

An alternative for situations where large amounts of data should be entered is using a voice recorder, and transcribing it manually in a stationary context later. In some cases, this may be combined with speech-to-text translation on a stationary computer, specially if the text is regarding a limited application domain.

A solution not discussed so far is to include external (nearly) full-size keyboards. Although such keyboard may solve the problem, they are of limited use. The reason for this is that once the user must connect an external keyboard, he must also sit down (or at least find a fairly horizontal area to place the keyboard). This severely limits the mobility of the user.
In earlier versions of operating systems for PDAs, only strokes and SW keyboards with QWERTY layout were available as generic mechanisms. In addition, some applications offered tailored keyboards like a keypad for phone or pin code entry. A general trend has been that the number of available mechanisms has increased. One the one hand by offering HW keyboards on an increasing number of devices, and on the second hand by offering different SW solutions for entering text. In addition to strokes and QWERTY keyboard, transcriber (writing whole words anywhere on the screen), and T9 have become standard. Furthermore, some applications also offers alternative SW keyboards like alphabetic ones and T9 for applications also on OSs not supporting it as a generic mechanism. An important reason for this development is that it offers the user choices for how to type. This is important, as different users prefer different solutions, both based on personal preferences and prior experiences.

An example of extreme flexibility is PSIs dynamic keyboard that is configured using an XML file. In principle, this opens for letting each user design his own keyboard.
An interesting development that may become a standard keyboard later is the SHARK keyboard from IBM. It offers strokes on a SW keyboard. By combining this with keyboard layout that it optimized to make writing common words easier (of course language dependent), writing speed may be considerably increased.

Published October 7, 2008

Main problem area:
Interaction mechanisms

Problem area:
Handling input

Main source for problem:
Pilot