Dan Sperber (2002) The future of writing. Virtual symposium “text-e”

Text written in English, French and Italian for the virtual symposium text-e, organised by the Association Euro-Edu, the Bibliothèque Publique d’Information du Centre Pompidou and the Société GiantChair on the impact of the Web on reading, writing and the diffusion of knowledge. The symposium took place from October 15th 2001 until the end of March 2002. The debates can be read on the site of the symposium.

The future of writing

Dan Sperber

If you are reading this text, chances are that you use spoken and written language with the same ease. You and I live in environments where language is omnipresent in the form of either acoustic or visual stimuli, and everyday we are likely to be processing more written text than spoken text. We tend to value the ability to read and write as much we value our more basic perceptual and motor abilities. We see literacy as essential to self-realization. We easily forget that writing is a recent invention in the history of Homo Sapiens, that universal literacy became a goal only a few generations ago, and that this goal is far from being achieved. Even when we do remember that writing is recent and that widespread literacy is new, we take it for granted that they are here to stay. Well, are they?

The most controversial thesis I will defend here is that the revolution in information and communication technology may soon turn writing into a relic of the past: it will be replaced by the automatic transcription of speech – whereas reading is here to stay. My aim, however, is not to prophesize, but to reflect on the future with the help of tools developed within the cognitive and social sciences.

However controversial this view, I should point out that an even more radical claim has been made elsewhere: it is that both writing and reading will soon be things of the past, a cumbersome pair of prosthetic practices that, in retrospect, will come to be regarded as a mere parenthesis in human history. This has been argued in particular by William Crossman: “By enabling us to access stored information orally-aurally, talking computers will finally make it possible for us to replace all written language with spoken language. We will be able to store and retrieve information simply by talking, listening, and looking at graphics, not at text. With this giant step forward into the past, we´re about to recreate oral culture on a more efficient and reliable technological foundation” (“The Coming Age of Talking Computers”, The Futurist, Dec. 1999). I argue, however, that there the is a relevant asymmetry between writing and reading that should ensure the survival of the latter.

The past and the present

Before taking a peek into the future, let us look at the past and present. In most of the human societies that have ever existed, children became competent adult without the help of any formal teaching. They acquired language, knowledge of their natural and social environment, techniques, rules of etiquette, tales, songs, and other cultural competencies without any formal training or schooling. They may have been helped by advices and corrections of adults and other children, but such pedagogical assistance comes in support of a spontaneous process of acquisition, and is very different from institutional teaching. Institutional teaching typically serves to transmit knowledge and competencies that would hardly ever be acquired spontaneously, and that therefore, if they were not systematically taught, would be unlikely to emerge and stabilize as elements of culture.

Particularly striking is the contrast between the acquisition of language and that of writing. In ordinary conditions, language acquisition occurs spontaneously in every child at a very early age. Pedagogical assistance (which is virtually absent in some societies) plays at most a marginal role. By contrast, writing and reading are acquired, if at all, through a lengthy and intensive process of deliberate training in interaction with a teacher. Is it that writing systems are more complex than languages? Quite the opposite. A language such as English, Amharic, or Chinese is a much more complex object than an alphabetic, syllabic or even a logographic writing system. In fact, linguists have not yet succeeded in providing a fully explicit grammar of any language, whereas writing systems are based on fully explicit rules. The remarkable difference in the patterns of acquisition of language on the one hand and writing on the other hand has to do with psychological predispositions: humans are predisposed to spontaneously acquire the language of their community. They have no such predisposition for the acquisition of writing. It is writing systems, rather, that had to adapt to human perceptual and motor predispositions that had emerged well before the invention of writing. How is it, given these conditions, that writing systems have emerged, spread, and stabilized at all?

Writing did not emerge as the commonly shared ingredient of culture that it has now become in modern societies, but as a specialized skill, practiced by professional scribes in the service of the State. Specialized skills emerge because the demand for the products of these skills is sufficient to cause (either through economic motivation or through coercion by the end-users) a minority of people to become specialists. The cognitive difficulty involved in the acquisition of such professional skills is overcome, within the small group of specialists, by a heavy investment in training apprentices. Teaching the skill typically becomes the subject-matter of a second-order, didactic skill.

The development of writing resulted in the accumulation and diversification of written texts. This, together with correlated economic and political transformations, caused the costs involved in acquiring literacy to become lower than the benefits of literacy for an ever-increasing proportion of the population. In modern society the benefits are greater than the cost for the majority of the population, and illiteracy has become a stigma and therefore a cost in its own right.

There is another important factor that helps explain the generalization of literacy. Once the skill is properly acquired, writing becomes a kind of automatism: one can write without paying any conscious attention to the hand movements involved (and this is true of typing as well). Similarly, for the proficient reader, reading is just another form of automatic visual pattern recognition. Most earlier forms of writing, such as Sumerian cuneiform with its relatively cumbersome materials and tools, did not lend themselves to a similar kind of fluency.

Thus two facts explain the spread of literacy: the fact that the benefits became, for more and more people, greater than the costs; and the fact that, once the initial costs of acquisition of the skills are paid, the costs of using these skills are comparatively negligible. These two facts are linked. If the distribution of costs and benefits over the life span of individuals were more even, or, in other terms, if the marginal cost of writing and reading did not dramatically drop with proficiency, then people would read and write much less. (This incidentally, was the situation when writing had to be done in stone or in clay.) With a less frequent use of writing and reading, there would be fewer written texts to read, and fewer people disposed to read them. As a consequence, the benefits of writing and reading would be smaller, and might not compare favorably with the costs, except for a small group of professional scribes. Actually, once learnt, writing and reading are easy and generally profitable. The greater the number of people who read and write, the greater the benefits involved in being able to do so oneself, and the greater the motivation in having one’s children acquire the skills. How then could the future of writing and reading be in doubt? The short answer is that writing is not the only way to produce written texts.

Until recently, many rich or powerful people would dictate to a secretary rather than write themselves. Some literary and historical works, such as Milton’s Paradise Lost or Napoleon’s and Las Cases’ Mémorial de Sainte-Hélène were dictated. Dictating may be advantageous for reasons of speed, or it may be a matter of necessity as in the case of the older Milton, who had lost his sight. Still, if given the choice, most of us would rather write than dictate. The main reason, I presume, is that when you dictate you have much less control over your text than when you write. In any case, traditional dictation was a form of division of scriptural labor, not a way of rendering writing altogether obsolete. Now, however, the new information technologies are about to provide a novel form of dictation without the shortcomings of the old, and in such a way that the division of labor will not be between employer and employee but between people and machines.

Speech recognition software that provides speech-to-text conversion has been rapidly improving over the past few years, allowing one to talk in natural continuous speech at a conversational pace and see one’s words appear on the screen. At present, the rate of error is still too high, the program requires initial training, and many users who don’t really need such a program get discouraged. I take it for granted, however, that these shortcomings will be overcome and that, in a matter of years, it will be possible to speak normally, have the machine transcribe one’s speech with very few errors while distinguishing, in the flow of speech, instructions (e.g. “underline!”) to be obeyed from data to be transcribed. It will become easier to dictate to a machine than it ever was to dictate to a secretary. More generally, it will be easier to give instructions to the computer (and to all kinds of appliances, vehicles and other machines) orally than through keyboards, mice, and other manual devices. Machines will be able to provide information orally rather than through screens. Thanks to progress in text-to-speech technology, machines will be able to read aloud written texts, in a quite natural sounding voice. Natural language oral interactions with machines will become the norm rather than the exception.

However imperfect at present, these speech-to-text and text-to-speech technologies are already transforming the lives of people who, because of visual, hearing, or motor impairments, or because of dyslexia, have difficulties writing or reading. The obvious reason why millions of illiterate people around the world don’t also take advantage of these technologies is just poverty – which explains why they are illiterate in the first place.

Soon, then, the costs and benefits or writing and reading will be compared not just with those of illiteracy but also with those of alternative ways of creating and accessing texts, provided by new technologies. How may this affect the future of writing and reading?

Individual choices

Whereas speech is an event unfolding in time, a written text is an object with a greater or lesser permanence in space (greater when engraved in stone, lesser when chalked on the blackboard). Because of this difference in their temporal and spatial mode of existence, speech and writing are suited for different uses. The development of writing has not resulted in the decline of speech. I know of no evidence showing that speech is less used, or less well used, in literate than in illiterate societies. If anything, the opposite seems to be the case. The development of writing has resulted rather in the emergence of new uses of language, in larger and denser social networks, and therefore, also, in new opportunities to use speech and greater sophistication in the art of speech.

What will happen if mechanical speech-to-text and text-to-speech conversions now become ordinary tools in the process of communication? Will they cause the emergence of additional uses of language, as was the case with writing, or will they displace writing as a tool, and, if so, with what consequences? It is important here to distinguish the activity of writing (handwriting or typing), the written text, and the activity of reading. If speech-to-text conversion were used systematically and text-to-speech conversion only occasionally (a plausible scenario), this would be the end, or at least the marginalization, of the activity of writing but neither of the written text nor of the activity of reading. If text-to-speech conversion were used systematically and speech-to-text conversion only occasionally (a much less plausible scenario), this would be the end, or at least the marginalization, of the activity of reading but neither of the written text nor of the activity of writing. If both speech-to-text and text-to-speech were used systematically (Crossman’s prediction), this would be the end of both the activity of writing and that of reading and therefore of the written text: machines would use machine language to encode appropriate information for conversion from and into speech; these machine language encodings would not look like our written texts, and, anyhow, would not be seen, let alone read, by anyone.

Whether or not societies end up replacing writing and reading with conversion technologies will not directly result from a collective decision based on a vision of the societal consequences, but from the accumulation of individual decisions. To what extent, then, are individuals likely to adopt these new technologies?

Speech-to-text conversion

How well could individuals achieve, by means of speech-to-text conversion, the various goals they pursue in writing texts? At first blush and in general, what can be done with a written text does not depend on whether initially it has been handwritten, typed, dictated to a secretary, or transcribed by a machine. The few exceptions – olograph wills and scented love letters for instance – are no more obstacles to the generalization of speech-to-text conversion than they have been to the generalization of word-processing.

For an individual, choosing to produce a written text by means of speech-to-text conversion rather than by one or another form of writing is not going to be such a momentous decision and will be determined by considerations of practicality and taste. Speech-to-text conversion has one obvious and truly major practical advantage over writing: speech is several times faster than hand writing or even typing. It has one obvious practical disadvantage: speech is noisy and could not comfortably be used as a method for composing texts in most work, classroom, or even home environments of today. However, if speaking turned out to be a much more effective mode of producing written texts, working space could be reorganized (or maybe noise could be selectively controlled by means of yet other new technologies).

The main argument that may come to mind in favor of keeping up the activity of writing is not so much practical as it is intellectual. Writing allows one to express one’s thought in a richer, subtler and more controlled way than speech. Writers can write, correct, rewrite and, in the end, produce a text free of the hesitations and repairs of oral utterances. The stylistic richness and specificity of the written text comes from the exploitation of these possibilities. Note however that these possibilities result not from the activity of writing per se, but from the fact that writers can read what they write as they write it. Imagine that, as you write, you could only see the words you were writing, and that once written they would become invisible and unerasable: then all the stylistic advantages of writing over speech would be lost (and worse: since writing is slower than speech, the amount of text that the writer could hold in short term memory would be smaller, so that writing would produce shorter and simpler sentences than those produced in speech). Imagine on the other hand that what you dictated to a machine could immediately be read on the screen, and that, moreover, it could be easily corrected by means of oral commands (and maybe of some manual commands too): this essentially oral interaction with a machine would offer opportunities of stylistic elaboration identical to those of writing. The creative potential of writing does not come from the movements of the hand but from those of the eye. In other words what makes the process of writing uniquely valuable is the simultaneous reading of what you write.

There is an important aesthetic reason to prefer speech-to-text conversion to writing. However used we may be to moving a pen over paper or to pressing keys, speech is much more natural. At first it will seem awkward to dictate to a machine, but once the awkwardness is overcome, it may become an extraordinary relief to be free from the artificiality, the muscular tension, the fidgetiness of writing and to hear the sound of one’s own voice as one expresses oneself through language. Once it will be possible to by-pass writing, many people may come to realize what a source of discomfort it always was to them.

If speech-to-text technology proves effective and congenial, people may end up giving up writing altogether without ever deciding to do so or even noticing that they have done so (just as many of us have, in fact, ceased to write by hand).The cumulative effect of such individual decisions at a cultural level is hard to predict, but it is likely to be considerable.

Text-to-speech conversion

Text-to-speech conversion is a way to have a written text read to one instead of reading it. Just as some wealthy or powerful people have used secretaries in order to dictate and not to write themselves, so they have had texts read to them by hired readers. What you get from a text read to you is different from what you get by reading it yourself. The tone of voice of the reader contributes to the way in which you interpret the text. In some cases – an actor reading a poem, a mother reading a story to her child – being read to may be wonderful. But in general, we would rather interpret what we read in our own silent voice. Moreover, we may never grow fond of the tone of voice of a computer and we may remain justly reticent to be influenced by it in our interpretation of a text.

Whereas it may be pleasurable and even illuminating to hear a narrative or a poem read aloud, there are other kinds of texts that are much better comprehended when read alone. Such texts are typically written to be silently read, and are hard or impossible to follow when they are listened to. Anybody who has been bored to death by a scholar reading aloud a written lecture knows what I am talking about. To understand why this is so, consider the role of short term memory in comprehension. In the process of listening to speech (whether spontaneous speech or the reading aloud of a written text), the information given by every spoken sound must be attended to and retained in short term memory long enough to allow linguistic decoding, or it is lost (although some of it can be reconstructed from the context). Not so with reading: the written text provides an effective external short term memory store that can be scanned back and forth. This allows readers to follow the text at their own pace as opposed to listening at the pace of the speaker. Readers can first skim the text and then peruse it. They can choose to go back to some earlier passage if they retrospectively become aware of its relevance, or in order to check the consistency of the text. When you read, you loose the extra input provided by tone of voice and gestures, but you gain in the range and depth of what you are able to comprehend and extract from a text.

The fact that readers can see a whole page and readily access any other part of the text provides writers with opportunities not shared by speakers. Writers can use more complex sentences. They can highlight the organization of their text with paragraphs, titles, and subtitles. They can depart from a strict linear organization of the text by adding footnotes, cross-references, or appendixes. They can produce new kinds of objects that are at once linguistic and graphic, such as structured lists and tables. Even in oral presentations, most teachers and lecturers have found it useful or even necessary to provide written text and other graphic documents for the audience to read or examine, in the form of writing on the blackboard, handouts, or, by now, screen projections. Many of the current forms and functions of writing take advantage of the short-term memory effects of a visual presentation. Possibly, some of these functions could be fulfilled by talking machines, but not all of them. For instance, it might be easier just to ask the machine to read a short dictionary entry than to look it up using the alphabetic order. On the other hand, browsing is, and is likely to remain, more effective when done visually than acoustically.

From a practical point of view, listening to a text is much slower than reading. It is also noisier (but this can be easily corrected with headphones). Possibly the stronger obstacle to the abandonment of reading is the role it plays, not in accessing texts, but in producing them. As I pointed out, what we rightly value most in the activity of writing is not the hand movements (or else typing would not have replaced handwriting to this extent) but the fact that we can read what we write as we write it.

All this considered, it is quite implausible that the cumulative effect of individual decisions to use text-to-speech conversion will result in the replacement, at a societal level, of the activity of reading by the systematic use of text-to-speech technology.

Cultural implications

I have attempted so far to develop the following argument: practically all the benefits that seem to come with writing and to justify investing so much resources into teaching the skill are, in fact, benefits derived from reading. Even the apparent expressive advantages of writing over speech come from the fact that, as you write, you read what you are writing. Writing is essentially a cost paid in order to be able to profit from reading. This cost was unavoidable, and it still is – but not for very long now. As soon as technology will make it possible to see one’s speech properly transcribed as it unfolds, and to modify the transcription by means of oral instructions (and also, probably, of pointing and highlighting hand movements), writing will present no advantage that is sufficient to justify its cost. In contrast, having a machine read aloud is in most cases less appealing than reading on one’s own.

The cumulative effect of individual decisions to use these new technologies will soon bring about, at the societal level, the near disappearance of writing, whereas people will go on reading. The individual decisions I am talking about will be made by people who will already have paid the main cost involved in writing and reading, that is, not the cost of using these skills, but the cost of acquiring them. Even with this cost paid, it will become preferable to move to the oral production of written texts, just as the fact of having learnt to write by hand is not stopping most of us to write almost exclusively with a keyboard.

Once writing isn’t practiced anymore (except by calligraphists), what will happen to its teaching?

Whatever the tongue and the writing system, the teaching of writing always involves an overcost when compared to the teaching of reading. Reading can be taught on its own, whereas the teaching of writing presupposes that of reading. Since the teaching of writing and that of reading have been systematically linked, we have no controlled comparison that would allow us to estimate the overcost involved in the teaching of writing proper. Moreover, even a controlled comparison would not really allow us to estimate the economy of effort that would result from teaching children just to read, for all past and present pedagogies (with very few exceptions, such as cases of students with specific disabilities) aim at jointly teaching both skills. If reading were to be taught on its own, the pedagogy would have to be rethought, and particular attention would have to be paid to the role that computers could play in it. It is quite conceivable that, using the new technologies, reading could be taught on its own in a much more intuitive and easy way than the reading-writing pair.

Does all this mean that, once writing will have been replaced by transcription, only reading will be taught, and that the resources thus freed (children’s, teachers’, and parents time) can be used otherwise? Certainly not. Such a cultural transition is a complex process and meets various factors of inertia.

In developed countries, the people who might have the greatest interest in the demise of writing, that is children, are not in a position to judge, and anyhow won’t be asked. The first generations of adults who will move to dictation after years of writing will already have paid the price of learning. The fact of having paid the price, the familiarity with the practice, the absence of distinction between the teaching of writing and that of reading, the contempt or the compassion for illiterate people, all these will converge and make these adults fervent defenders of the teaching of writing. Teachers trained to teach writing, and who often do it with outstanding dedication and patience, will be reluctant to admit that all this knowledge might be outdated. One easily anticipates passionate pleas and diatribes of defenders of writing, who, even though they won’t anymore be practicing writing themselves, will feel that they are defending culture itself against, worse than illiterates, henchmen of illiteracy. One may assume that the teaching of writing will long outlive its obsolescence.

This scenario, where writing remains among us as a compulsory scholastic activity, is not the only plausible one. It ignores various factors that could tilt things another way. Teaching in general is likely to undergo radical changes as a result of the development of the new technologies. The acquisition of reading skills might take place earlier and more spontaneously thanks to the interaction with machines, this resulting in a de facto dissociation between the teaching of reading and that of writing. Writing might end up playing a major role only in writing classes and being less and less used in the teaching of other subject-matters. In such conditions, the teaching of writing would rapidly loose much of its significance. New generations of adults could be tempted to grant it fewer resources and to render it optional.

Even this modified scenario does not take into account the diversity of situations across countries. In many countries, most of the resources for education are invested in the teaching of literacy, and the illiteracy, at least partial, of a great part of the population is a major obstacle to economic development. In such countries, the use of speech-to-text and also text-to-speech conversion technologies, if their cost were sufficiently lowered, might turn out to be an outstanding way of accelerating both the social promotion of individuals and collective economic development. If so, in these countries, education will have to be rethought on a new basis: while, at present, writing skills occupy center stage, they might in the future be made almost redundant.

Even if it resulted from the accumulation of modest and sensible individual decisions, the marginalization of writing and of its teaching might well have major cultural effects. These effects are hard to foresee at present. It is all too easy to speak of a return to orality. The most profound effect that writing has had on human civilizations has been to allow them to become truly cumulative instead of evolving forever within the limits of human long-term memory. Far from reversing these effects, the new technologies allow new forms of cultural accumulation as well as new ways of mining the accumulated information.

Still, the generalization of the oral production of written texts is likely to have significant effects on the texts themselves. These effects might be on the subtle rather than on the dramatic side, and be therefore comparable to the effects of the progressive replacement of handwriting by typing, and then of simple typing by word processing. This move has favored the emergence or the development of new styles and new genres in a way that has not yet been systematically studied. The composition of written texts by means of the voice might have deeper effects. Various forms of writing have resulted in some degree of divergence (varying from tongue to tongue) between oral and written dialects. Will a return to the natural organ of linguistic expression put an end to this divergence, or will it cause the emergence of new dialects?

The very symbols used in the different writing systems result from a compromise between the needs of the hand and those of the eye. Printing, and now the computer, have made possible the development of new characters which, however, must still remain similar enough to handwritten ones. This constraint could altogether disappear; a new evolution of writing systems could emerge, exclusively guided by considerations of visual ergonomics and esthetics.

One can imagine anything. On the other hand, to speculate in a manner that is both informed and reasoned is difficult. Difficult but not altogether impossible, I hope.