Ben Schneiderman argues that visual interfaces will predominate in the future instead of auditory interfaces. I realize that the newspaper interview doesn?t do justice to Mr. Schneiderman?s ideas, but let?s examine the question posed by the journalist.
As the article points out, speaking voice commands into a personal device or PC is slow, laborious and tends to prevent the individual from multitasking. A clear advantage to the visual medium is that people can touch screens or click things while talking or doing other things. The same is not true when we are listening and issuing voice commands. As far as locating information and manipulating available information, Schneiderman?s claim is probably correct. But a lot depends on the form factor. A small cellphone or PDA requires a screen keyboard for text processing, and that is notoriously inefficient. And although touch-screen menus can present a lot of information with several different dimensions and navigation methods, it forces the user to select among choices rather than input data freely. Perhaps ?free input? is not vital to an application, but just think about the number of times that you need to type data or cut and paste input when websurfing. Sure, hyperlinks allow you to move around with clicks, but at some point you will need to type an email address or fill in a form or correct an email address.
Voice recognition technologies haven?t made much progress either, and they may be limited by the form factor as well, but they might work in situations where the user needs to give information quickly. Which takes longer to do: 1)talk to someone at the next desk, 2)send someone an email or 3)make a web page of what you want to say to a person? Obviously talking is the speediest method, although there is no permanent record of the speech unless it has been recorded somewhere. The question is perhaps unfair. If one needs to wait on hold for telephone technical support for 30 minutes, then perhaps it would have been easier to email a question or have written chat (although email replies aren?t speedy either). If there is a need to store speech or user actions, the visual interface seems preferable. Still, with the advent of voicexml and tellme applications, it seems that it will be just as easy to store translations of auditory information into databases.
The matter of ?attention span? is an important one. Yes, it is true that auditory feedback captures attention of users: it might be easier to concentrate on writing a letter in a room covered with billboard advertisements than with the radio on. On the other hand, if the visual surroundings of a person are unchanging, then the auditory method seems effective. People can talk on cellphones while driving cars (well, assuming they are on a stretch of long highway). I find that I can listen to books on tape effortlessly with headphones on the city bus without being distracted by the people or the bus stops (I could perhaps read it even faster, but the busses are bumpy, and sometimes I am standing up). The context is important here. Travelling on the subway and driving a car are two time periods occupying a significant portion of time in our lives. PDA?s are used often during such ?idle times.?
The problem with visual interfaces is field of vision. When the view is too densely filled, it?s easy not to notice an alert message. On the other hand, nonlinear visual interfaces might allow simplified ?views? and the ability to focus on one particular aspect of the interface. Despite the problem of overlooking data, a visual interface is ideally suited for monitoring different categories of data. On the other hand, there may be situations where you would want to command a user?s full attention. In a car a person might be more likely to use seatbeats or drive within the speed limit if failure to do so caused the car to make a loud buzzing sound. The problem with auditory interfaces is that multiple buzzing noises might drown each other out. The programmer might even need to make sure that two auditory messages don?t occur simultaneously. Also, there is the annoyance factor. People are fairly tolerant of when a red warning light mistakenly goes off on their dashboard. They are less tolerant when a malfunctioning car alarm goes off over and over.
My point is not to suggest that auditory interfaces are the thing of the future. Auditory interfaces have advantages in certain contexts and disadvantages in others. A nice nontroversial conclusion, right? It depends not only on the context and task, but also on the form factor and technological feasibility. Voice recognition quality, latency and backlit screens seem to be among the many technological hurdles facing engineers and programmers today. But five years later, these hurdles may disappear and other hurdles may arise.
Not that I am against using one?s eyes more. Rudolph Arnheim in his famous book ?Visual Thinking? seemed to think that the visual metaphor was more appropriate than the auditory one. Helen Kellar, in her famous essay ?Three Days to See? mentions her desire to teach a course on ?How to really use your eyes? because ?the seeing see little.?