Tag Archives: voice recognition

Thought Recognition and BCIS

The Economist kicked off their 2018 year with a bold prediction: “Brain-computer interfaces may change what it means to be human.”

In their lead article, they suggest that BCIS (Brain Computer Interfaces) like the BrainGate System are leading the way into a new world: where mind control works.

I feel like I did in 1979 when I first heard about the Apple II. The whole world was mainframe computing and time-sharing of those monsters, and yet two guys in a garage blow a massive hole through this paradigm, turn it on its head, and invent personal computing.

Think about it: personal computing had been evolving and constantly improving now for almost forty year!

Back then, I could see the future vaguely, in very partial outlines, without much practical effect, but with intense curiosity.

Another example is voice recognition. I still remember being introduced to the subject, way back in …. 1970? I got all excited about it, until I realized …. it sucked! And it wasn’t going to get much better anytime soon. But I remember saying to myself: I can’t be fooled by the first versions of voice recognition. I can’t lull myself to sleep. I need to watch this space because it will evolve and improve over time.

If you think about it, technology version 1-10 always sucks. The history of speech recognition in the 1950’s and 1960’s is, well, pathetic.

IBM’s SHOEBOX was introduced at the 1962 World’s Fair.
DARPA got involved in the late 1970’s, and then partnered with Carnegie Mellon on HARPY – a major advance.
Threshold Technology was formed then, in order to advance commercialization of primitive speech recognition.
And now we have SIRI.

And sure, enough, after almost 40 years of trying, voice recognition is getting really, really good. Can we see a time within the next 10 years when voice recognition replaces most keyboard applications?

I think so.

And so it is with this subject. We are at the very, very beginning, when it all sounds vague, with partial outlines, without much practical effect, and yet ….. it fills me with intense curiosity.

What could the next fifty years bring?

Is it possible that we will be able to think something, and have that something (a thought? a prescribed action? an essay?) become physical?

Read on…..


CREDIT: Economist Article on The Next Frontier

TECHNOLOGIES are often billed as transformative. For William Kochevar, the term is justified. Mr Kochevar is paralysed below the shoulders after a cycling accident, yet has managed to feed himself by his own hand. This remarkable feat is partly thanks to electrodes, implanted in his right arm, which stimulate muscles. But the real magic lies higher up. Mr Kochevar can control his arm using the power of thought. His intention to move is reflected in neural activity in his motor cortex; these signals are detected by implants in his brain and processed into commands to activate the electrodes in his arms.

An ability to decode thought in this way may sound like science fiction. But brain-computer interfaces (BCIs) like the BrainGate system used by Mr Kochevar provide evidence that mind-control can work. Researchers are able to tell what words and images people have heard and seen from neural activity alone. Information can also be encoded and used to stimulate the brain. Over 300,000 people have cochlear implants, which help them to hear by converting sound into electrical signals and sending them into the brain. Scientists have “injected” data into monkeys’ heads, instructing them to perform actions via electrical pulses.

As our Technology Quarterly in this issue explains, the pace of research into BCIs and the scale of its ambition are increasing. Both America’s armed forces and Silicon Valley are starting to focus on the brain. Facebook dreams of thought-to-text typing. Kernel, a startup, has $100m to spend on neurotechnology. Elon Musk has formed a firm called Neuralink; he thinks that, if humanity is to survive the advent of artificial intelligence, it needs an upgrade. Entrepreneurs envisage a world in which people can communicate telepathically, with each other and with machines, or acquire superhuman abilities, such as hearing at very high frequencies.

These powers, if they ever materialise, are decades away. But well before then, BCIs could open the door to remarkable new applications. Imagine stimulating the visual cortex to help the blind, forging new neural connections in stroke victims or monitoring the brain for signs of depression. By turning the firing of neurons into a resource to be harnessed, BCIs may change the idea of what it means to be human.

That thinking feeling
Sceptics scoff. Taking medical BCIs out of the lab into clinical practice has proved very difficult. The BrainGate system used by Mr Kochevar was developed more than ten years ago, but only a handful of people have tried it out. Turning implants into consumer products is even harder to imagine. The path to the mainstream is blocked by three formidable barriers—technological, scientific and commercial.

Start with technology. Non-invasive techniques like an electroencephalogram (EEG) struggle to pick up high-resolution brain signals through intervening layers of skin, bone and membrane. Some advances are being made—on EEG caps that can be used to play virtual-reality games or control industrial robots using thought alone. But for the time being at least, the most ambitious applications require implants that can interact directly with neurons. And existing devices have lots of drawbacks. They involve wires that pass through the skull; they provoke immune responses; they communicate with only a few hundred of the 85bn neurons in the human brain. But that could soon change. Helped by advances in miniaturisation and increased computing power, efforts are under way to make safe, wireless implants that can communicate with hundreds of thousands of neurons. Some of these interpret the brain’s electrical signals; others experiment with light, magnetism and ultrasound.

Clear the technological barrier, and another one looms. The brain is still a foreign country. Scientists know little about how exactly it works, especially when it comes to complex functions like memory formation. Research is more advanced in animals, but experiments on humans are hard. Yet, even today, some parts of the brain, like the motor cortex, are better understood. Nor is complete knowledge always needed. Machine learning can recognise patterns of neural activity; the brain itself gets the hang of controlling BCIS with extraordinary ease. And neurotechnology will reveal more of the brain’s secrets.

Like a hole in the head
The third obstacle comprises the practical barriers to commercialisation. It takes time, money and expertise to get medical devices approved. And consumer applications will take off only if they perform a function people find useful. Some of the applications for brain-computer interfaces are unnecessary—a good voice-assistant is a simpler way to type without fingers than a brain implant, for example. The idea of consumers clamouring for craniotomies also seems far-fetched. Yet brain implants are already an established treatment for some conditions. Around 150,000 people receive deep-brain stimulation via electrodes to help them control Parkinson’s disease. Elective surgery can become routine, as laser-eye procedures show.

All of which suggests that a route to the future imagined by the neurotech pioneers is arduous but achievable. When human ingenuity is applied to a problem, however hard, it is unwise to bet against it. Within a few years, improved technologies may be opening up new channels of communications with the brain. Many of the first applications hold out unambiguous promise—of movement and senses restored. But as uses move to the augmentation of abilities, whether for military purposes or among consumers, a host of concerns will arise. Privacy is an obvious one: the refuge of an inner voice may disappear. Security is another: if a brain can be reached on the internet, it can also be hacked. Inequality is a third: access to superhuman cognitive abilities could be beyond all except a self-perpetuating elite. Ethicists are already starting to grapple with questions of identity and agency that arise when a machine is in the neural loop.

These questions are not urgent. But the bigger story is that neither are they the realm of pure fantasy. Technology changes the way people live. Beneath the skull lies the next frontier.

This article appeared in the Leaders section of the print edition under the headline “The next frontier”

================== REFERENCE: History of Speech Recognition =====



Speech Recognition Through the Decades: How We Ended Up With Siri

By Melanie Pinola
PCWorld | NOV 2, 2011 6:00 PM PT

Looking back on the development of speech recognition technology is like watching a child grow up, progressing from the baby-talk level of recognizing single syllables, to building a vocabulary of thousands of words, to answering questions with quick, witty replies, as Apple’s supersmart virtual assistant Siri does.

Listening to Siri, with its slightly snarky sense of humor, made us wonder how far speech recognition has come over the years. Here’s a look at the developments in past decades that have made it possible for people to control devices using only their voice.

1950s and 1960s: Baby Talk
The first speech recognition systems could understand only digits. (Given the complexity of human language, it makes sense that inventors and engineers first focused on numbers.) Bell Laboratories designed in 1952 the “Audrey” system, which recognized digits spoken by a single voice. Ten years later, IBM demonstrated at the 1962 World’s Fair its “Shoebox” machine, which could understand 16 words spoken in English.

Labs in the United States, Japan, England, and the Soviet Union developed other hardware dedicated to recognizing spoken sounds, expanding speech recognition technology to support four vowels and nine consonants.
They may not sound like much, but these first efforts were an impressive start, especially when you consider how primitive computers themselves were at the time.

1970s: Speech Recognition Takes Off

Speech recognition technology made major strides in the 1970s, thanks to interest and funding from the U.S. Department of Defense. The DoD’s DARPA Speech Understanding Research (SUR) program, from 1971 to 1976, was one of the largest of its kind in the history of speech recognition, and among other things it was responsible for Carnegie Mellon’s “Harpy” speech-understanding system. Harpy could understand 1011 words, approximately the vocabulary of an average three-year-old.

Harpy was significant because it introduced a more efficient search approach, called beam search, to “prove the finite-state network of possible sentences,” according to Readings in Speech Recognition by Alex Waibel and Kai-Fu Lee. (The story of speech recognition is very much tied to advances in search methodology and technology, as Google’s entrance into speech recognition on mobile devices proved just a few years ago.)

The ’70s also marked a few other important milestones in speech recognition technology, including the founding of the first commercial speech recognition company, Threshold Technology, as well as Bell Laboratories’ introduction of a system that could interpret multiple people’s voices.

1980s: Speech Recognition Turns Toward Prediction
Over the next decade, thanks to new approaches to understanding what people say, speech recognition vocabulary jumped from about a few hundred words to several thousand words, and had the potential to recognize an unlimited number of words. One major reason was a new statistical method known as the hidden Markov model.
Rather than simply using templates for words and looking for sound patterns, HMM considered the probability of unknown sounds’ being words. This foundation would be in place for the next two decades (see Automatic Speech Recognition—A Brief History of the Technology Development by B.H. Juang and Lawrence R. Rabiner).

Equipped with this expanded vocabulary, speech recognition started to work its way into commercial applications for business and specialized industry (for instance, medical use). It even entered the home, in the form of Worlds of Wonder’s Julie doll (1987), which children could train to respond to their voice. (“Finally, the doll that understands you.”)
See how well Julie could speak:

However, whether speech recognition software at the time could recognize 1000 words, as the 1985 Kurzweil text-to-speech program did, or whether it could support a 5000-word vocabulary, as IBM’s system did, a significant hurdle remained: These programs took discrete dictation, so you had … to … pause … after … each … and … every … word.

Next page: Speech recognition for the masses, and the future of speech recognition

Voice Recognition – Update from the Economist

Excellent comment on the state of the art of voice recognition in the Economist. The entire article is below.

I think its fair to say, as the article does, “we’re in 1994 for voice.” In other words, just like the internet had core technology in place in 1994, no one really had a clue about what it would ultimately mean to society.

My guess is …. it will be a game-changer of the first order.

ECHO, SIRI, CORTANA – the beginning of a new era!

Just like the GUI, the mouse, and WINDOWS allowed computers to go mainstream, my instinct is that removing a keyboard as a requirement will take the computer from a daily tool, and will make it a second-by-second tool. The Apple Watch, which looks rather benign right now, could easily become the central means of communication. And the hard-to-use keyboard on the iPhone will become increasingly a white elephant – rarely used and quaint.


CREDIT: http://www.economist.com/technology-quarterly/2017-05-01/language


Language: Finding a voice
Computers have got much better at translation, voice recognition and speech synthesis, says Lane Greene. But they still don’t understand the meaning of language.

I’M SORRY, Dave. I’m afraid I can’t do that.” With chilling calm, HAL 9000, the on-board computer in “2001: A Space Odyssey”, refuses to open the doors to Dave Bowman, an astronaut who had ventured outside the ship. HAL’s decision to turn on his human companion reflected a wave of fear about intelligent computers.

When the film came out in 1968, computers that could have proper conversations with humans seemed nearly as far away as manned flight to Jupiter. Since then, humankind has progressed quite a lot farther with building machines that it can talk to, and that can respond with something resembling natural speech. Even so, communication remains difficult. If “2001” had been made to reflect the state of today’s language technology, the conversation might have gone something like this: “Open the pod bay doors, Hal.” “I’m sorry, Dave. I didn’t understand the question.” “Open the pod bay doors, Hal.” “I have a list of eBay results about pod doors, Dave.”

Creative and truly conversational computers able to handle the unexpected are still far off. Artificial-intelligence (AI) researchers can only laugh when asked about the prospect of an intelligent HAL, Terminator or Rosie (the sassy robot housekeeper in “The Jetsons”). Yet although language technologies are nowhere near ready to replace human beings, except in a few highly routine tasks, they are at last about to become good enough to be taken seriously. They can help people spend more time doing interesting things that only humans can do. After six decades of work, much of it with disappointing outcomes, the past few years have produced results much closer to what early pioneers had hoped for.

Speech recognition has made remarkable advances. Machine translation, too, has gone from terrible to usable for getting the gist of a text, and may soon be good enough to require only modest editing by humans. Computerised personal assistants, such as Apple’s Siri, Amazon’s Alexa, Google Now and Microsoft’s Cortana, can now take a wide variety of questions, structured in many different ways, and return accurate and useful answers in a natural-sounding voice. Alexa can even respond to a request to “tell me a joke”, but only by calling upon a database of corny quips. Computers lack a sense of humour.

When Apple introduced Siri in 2011 it was frustrating to use, so many people gave up. Only around a third of smartphone owners use their personal assistants regularly, even though 95% have tried them at some point, according to Creative Strategies, a consultancy. Many of those discouraged users may not realise how much they have improved.
In 1966 John Pierce was working at Bell Labs, the research arm of America’s telephone monopoly. Having overseen the team that had built the first transistor and the first communications satellite, he enjoyed a sterling reputation, so he was asked to take charge of a report on the state of automatic language processing for the National Academy of Sciences. In the period leading up to this, scholars had been promising automatic translation between languages within a few years.

But the report was scathing. Reviewing almost a decade of work on machine translation and automatic speech recognition, it concluded that the time had come to spend money “hard-headedly toward important, realistic and relatively short-range goals”—another way of saying that language-technology research had overpromised and underdelivered. In 1969 Pierce wrote that both the funders and eager researchers had often fooled themselves, and that “no simple, clear, sure knowledge is gained.” After that, America’s government largely closed the money tap, and research on language technology went into hibernation for two decades.

The story of how it emerged from that hibernation is both salutary and surprisingly workaday, says Mark Liberman. As professor of linguistics at the University of Pennsylvania and head of the Linguistic Data Consortium, a huge trove of texts and recordings of human language, he knows a thing or two about the history of language technology. In the bad old days researchers kept their methods in the dark and described their results in ways that were hard to evaluate. But beginning in the 1980s, Charles Wayne, then at America’s Defence Advanced Research Projects Agency, encouraged them to try another approach: the “common task”.

Many early approaches to language technology got stuck in a conceptual cul-de-sac

Step by step
Researchers would agree on a common set of practices, whether they were trying to teach computers speech recognition, speaker identification, sentiment analysis of texts, grammatical breakdown, language identification, handwriting recognition or anything else. They would set out the metrics they were aiming to improve on, share the data sets used to train their software and allow their results to be tested by neutral outsiders. That made the process far more transparent. Funding started up again and language technologies began to improve, though very slowly.

Many early approaches to language technology—and particularly translation—got stuck in a conceptual cul-de-sac: the rules-based approach. In translation, this meant trying to write rules to analyse the text of a sentence in the language of origin, breaking it down into a sort of abstract “interlanguage” and rebuilding it according to the rules of the target language. These approaches showed early promise. But language is riddled with ambiguities and exceptions, so such systems were hugely complicated and easily broke down when tested on sentences beyond the simple set they had been designed for. Nearly all language technologies began to get a lot better with the application of statistical methods, often called a “brute force” approach. This relies on software scouring vast amounts of data, looking for patterns and learning from precedent. For example, in parsing language (breaking it down into its grammatical components), the software learns from large bodies of text that have already been parsed by humans. It uses what it has learned to make its best guess about a previously unseen text. In machine translation, the software scans millions of words already translated by humans, again looking for patterns. In speech recognition, the software learns from a body of recordings and the transcriptions made by humans. Thanks to the growing power of processors, falling prices for data storage and, most crucially, the explosion in available data, this approach eventually bore fruit. Mathematical techniques that had been known for decades came into their own, and big companies with access to enormous amounts of data were poised to benefit. People who had been put off by the hilariously inappropriate translations offered by online tools like BabelFish began to have more faith in Google Translate. Apple persuaded millions of iPhone users to talk not only on their phones but to them. The final advance, which began only about five years ago, came with the advent of deep learning through digital neural networks (DNNs). These are often touted as having qualities similar to those of the human brain: “neurons” are connected in software, and connections can become stronger or weaker in the process of learning.

But Nils Lenke, head of research for Nuance, a language-technology company, explains matter-of-factly that “DNNs are just another kind of mathematical model,” the basis of which had been well understood for decades. What changed was the hardware being used. Almost by chance, DNN researchers discovered that the graphical processing units (GPUs) used to render graphics fluidly in applications like video games were also brilliant at handling neural networks. In computer graphics, basic small shapes move according to fairly simple rules, but there are lots of shapes and many rules, requiring vast numbers of simple calculations. The same GPUs are used to fine-tune the weights assigned to “neurons” in DNNs as they scour data to learn. The technique has already produced big leaps in quality for all kinds of deep learning, including deciphering handwriting, recognising faces and classifying images. Now they are helping to improve all manner of language technologies, often bringing enhancements of up to 30%. That has shifted language technology from usable at a pinch to really rather good. But so far no one has quite worked out what will move it on from merely good to reliably great.

Speech recognition: I hear you
Computers have made huge strides in understanding human speech

WHEN a person speaks, air is forced out through the lungs, making the vocal chords vibrate, which sends out characteristic wave patterns through the air. The features of the sounds depend on the arrangement of the vocal organs, especially the tongue and the lips, and the characteristic nature of the sounds comes from peaks of energy in certain frequencies. The vowels have frequencies called “formants”, two of which are usually enough to differentiate one vowel from another. For example, the vowel in the English word “fleece” has its first two formants at around 300Hz and 3,000Hz. Consonants have their own characteristic features.

In principle, it should be easy to turn this stream of sound into transcribed speech. As in other language technologies, machines that recognise speech are trained on data gathered earlier. In this instance, the training data are sound recordings transcribed to text by humans, so that the software has both a sound and a text input. All it has to do is match the two. It gets better and better at working out how to transcribe a given chunk of sound in the same way as humans did in the training data. The traditional matching approach was a statistical technique called a hidden Markov model (HMM), making guesses based on what was done before. More recently speech recognition has also gained from deep learning.

English has about 44 “phonemes”, the units that make up the sound system of a language. P and b are different phonemes, because they distinguish words like pat and bat. But in English p with a puff of air, as in “party”, and p without a puff of air, as in “spin”, are not different phonemes, though they are in other languages. If a computer hears the phonemes s, p, i and n back to back, it should be able to recognise the word “spin”.

But the nature of live speech makes this difficult for machines. Sounds are not pronounced individually, one phoneme after the other; they mostly come in a constant stream, and finding the boundaries is not easy. Phonemes also differ according to the context. (Compare the l sound at the beginning of “light” with that at the end of “full”.)

Speakers differ in timbre and pitch of voice, and in accent. Conversation is far less clear than careful dictation. People stop and restart much more often than they realise.
All the same, technology has gradually mitigated many of these problems, so error rates in speech-recognition software have fallen steadily over the years—and then sharply with the introduction of deep learning. Microphones have got better and cheaper. With ubiquitous wireless internet, speech recordings can easily be beamed to computers in the cloud for analysis, and even smartphones now often have computers powerful enough to carry out this task.

Bear arms or bare arms?
Perhaps the most important feature of a speech-recognition system is its set of expectations about what someone is likely to say, or its “language model”. Like other training data, the language models are based on large amounts of real human speech, transcribed into text. When a speech-recognition system “hears” a stream of sound, it makes a number of guesses about what has been said, then calculates the odds that it has found the right one, based on the kinds of words, phrases and clauses it has seen earlier in the training text.

At the level of phonemes, each language has strings that are permitted (in English, a word may begin with str-, for example) or banned (an English word cannot start with tsr-). The same goes for words. Some strings of words are more common than others. For example, “the” is far more likely to be followed by a noun or an adjective than by a verb or an adverb. In making guesses about homophones, the computer will have remembered that in its training data the phrase “the right to bear arms” came up much more often than “the right to bare arms”, and will thus have made the right guess.

Training on a specific speaker greatly cuts down on the software’s guesswork. Just a few minutes of reading training text into software like Dragon Dictate, made by Nuance, produces a big jump in accuracy. For those willing to train the software for longer, the improvement continues to something close to 99% accuracy (meaning that of each hundred words of text, not more than one is wrongly added, omitted or changed). A good microphone and a quiet room help.

Advance knowledge of what kinds of things the speaker might be talking about also increases accuracy. Words like “phlebitis” and “gastrointestinal” are not common in general discourse, and uncommon words are ranked lower in the probability tables the software uses to guess what it has heard. But these words are common in medicine, so creating software trained to look out for such words considerably improves the result. This can be done by feeding the system a large number of documents written by the speaker whose voice is to be recognised; common words and phrases can be extracted to improve the system’s guesses.

As with all other areas of language technology, deep learning has sharply brought down error rates. In October Microsoft announced that its latest speech-recognition system had achieved parity with human transcribers in recognising the speech in the Switchboard Corpus, a collection of thousands of recorded conversations in which participants are talking with a stranger about a randomly chosen subject.

Error rates on the Switchboard Corpus are a widely used benchmark, so claims of quality improvements can be easily compared. Fifteen years ago quality had stalled, with word-error rates of 20-30%. Microsoft’s latest system, which has six neural networks running in parallel, has reached 5.9% (see chart), the same as a human transcriber’s. Xuedong Huang, Microsoft’s chief speech scientist, says that he expected it to take two or three years to reach parity with humans. It got there in less than one.

The improvements in the lab are now being applied to products in the real world. More and more cars are being fitted with voice-activated controls of various kinds; the vocabulary involved is limited (there are only so many things you might want to say to your car), which ensures high accuracy. Microphones—or often arrays of microphones with narrow fields of pick-up—are getting better at identifying the relevant speaker among a group.

Some problems remain. Children and elderly speakers, as well as people moving around in a room, are harder to understand. Background noise remains a big concern; if it is different from that in the training data, the software finds it harder to generalise from what it has learned. So Microsoft, for example, offers businesses a product called CRIS that lets users customise speech-recognition systems for the background noise, special vocabulary and other idiosyncrasies they will encounter in that particular environment. That could be useful anywhere from a noisy factory floor to a care home for the elderly.

But for a computer to know what a human has said is only a beginning. Proper interaction between the two, of the kind that comes up in almost every science-fiction story, calls for machines that can speak back.

Hasta la vista, robot voice
Machines are starting to sound more like humans
“I’LL be back.” “Hasta la vista, baby.” Arnold Schwarzenegger’s Teutonic drone in the “Terminator” films is world-famous. But in this instance film-makers looking into the future were overly pessimistic. Some applications do still feature a monotonous “robot voice”, but that is changing fast.

Examples of speech synthesis from OSX synthesiser:
A basic sample:

An advanced sample:

Example from Amazon’s “Polly” synthesiser:
Amazon’s Polly:

Creating speech is roughly the inverse of understanding it. Again, it requires a basic model of the structure of speech. What are the sounds in a language, and how do they combine? What words does it have, and how do they combine in sentences? These are well-understood questions, and most systems can now generate sound waves that are a fair approximation of human speech, at least in short bursts.
Heteronyms require special care. How should a computer pronounce a word like “lead”, which can be a present-tense verb or a noun for a heavy metal, pronounced quite differently? Once again a language model can make accurate guesses: “Lead us not into temptation” can be parsed for its syntax, and once the software has worked out that the first word is almost certainly a verb, it can cause it to be pronounced to rhyme with “reed”, not “red”.

Traditionally, text-to-speech models have been “concatenative”, consisting of very short segments recorded by a human and then strung together as in the acoustic model described above. More recently, “parametric” models have been generating raw audio without the need to record a human voice, which makes these systems more flexible but less natural-sounding.

DeepMind, an artificial-intelligence company bought by Google in 2014, has announced a new way of synthesising speech, again using deep neural networks. The network is trained on recordings of people talking, and on the texts that match what they say. Given a text to reproduce as speech, it churns out a far more fluent and natural-sounding voice than the best concatenative and parametric approaches.

The last step in generating speech is giving it prosody—generally, the modulation of speed, pitch and volume to convey an extra (and critical) channel of meaning. In English, “a German teacher”, with the stress on “teacher”, can teach anything but must be German. But “a German teacher” with the emphasis on “German” is usually a teacher of German (and need not be German). Words like prepositions and conjunctions are not usually stressed. Getting machines to put the stresses in the correct places is about 50% solved, says Mark Liberman of the University of Pennsylvania.

Many applications do not require perfect prosody. A satellite-navigation system giving instructions on where to turn uses just a small number of sentence patterns, and prosody is not important. The same goes for most single-sentence responses given by a virtual assistant on a smartphone.

But prosody matters when someone is telling a story. Pitch, speed and volume can be used to pass quickly over things that are already known, or to build interest and tension for new information. Myriad tiny clues communicate the speaker’s attitude to his subject. The phrase “a German teacher”, with stress on the word “German”, may, in the context of a story, not be a teacher of German, but a teacher being explicitly contrasted with a teacher who happens to be French or British.

Text-to-speech engines are not much good at using context to provide such accentuation, and where they do, it rarely extends beyond a single sentence. When Alexa, the assistant in Amazon’s Echo device, reads a news story, her prosody is jarringly un-humanlike. Talking computers have yet to learn how to make humans want to listen.

Machine translation: Beyond Babel
Computer translations have got strikingly better, but still need human input
IN “STAR TREK” it was a hand-held Universal Translator; in “The Hitchhiker’s Guide to the Galaxy” it was the Babel Fish popped conveniently into the ear. In science fiction, the meeting of distant civilisations generally requires some kind of device to allow them to talk. High-quality automated translation seems even more magical than other kinds of language technology because many humans struggle to speak more than one language, let alone translate from one to another.

Computer translation is still known as “machine translation”
The idea has been around since the 1950s, and computerised translation is still known by the quaint moniker “machine translation” (MT). It goes back to the early days of the cold war, when American scientists were trying to get computers to translate from Russian. They were inspired by the code-breaking successes of the second world war, which had led to the development of computers in the first place. To them, a scramble of Cyrillic letters on a page of Russian text was just a coded version of English, and turning it into English was just a question of breaking the code.

Scientists at IBM and Georgetown University were among those who thought that the problem would be cracked quickly. Having programmed just six rules and a vocabulary of 250 words into a computer, they gave a demonstration in New York on January 7th 1954 and proudly produced 60 automated translations, including that of “Mi pyeryedayem mislyi posryedstvom ryechyi,” which came out correctly as “We transmit thoughts by means of speech.” Leon Dostert of Georgetown, the lead scientist, breezily predicted that fully realised MT would be “an accomplished fact” in three to five years.

Instead, after more than a decade of work, the report in 1966 by a committee chaired by John Pierce, mentioned in the introduction to this report, recorded bitter disappointment with the results and urged researchers to focus on narrow, achievable goals such as automated dictionaries. Government-sponsored work on MT went into near-hibernation for two decades. What little was done was carried out by private companies. The most notable of them was Systran, which provided rough translations, mostly to America’s armed forces.
La plume de mon ordinateur
The scientists got bogged down by their rules-based approach. Having done relatively well with their six-rule system, they came to believe that if they programmed in more rules, the system would become more sophisticated and subtle. Instead, it became more likely to produce nonsense. Adding extra rules, in the modern parlance of software developers, did not “scale”.

Besides the difficulty of programming grammar’s many rules and exceptions, some early observers noted a conceptual problem. The meaning of a word often depends not just on its dictionary definition and the grammatical context but the meaning of the rest of the sentence. Yehoshua Bar-Hillel, an Israeli MT pioneer, realised that “the pen is in the box” and “the box is in the pen” would require different translations for “pen”: any pen big enough to hold a box would have to be an animal enclosure, not a writing instrument.

How could machines be taught enough rules to make this kind of distinction? They would have to be provided with some knowledge of the real world, a task far beyond the machines or their programmers at the time. Two decades later, IBM stumbled on an approach that would revive optimism about MT. Its Candide system was the first serious attempt to use statistical probabilities rather than rules devised by humans for translation. Statistical, “phrase-based” machine translation, like speech recognition, needed training data to learn from. Candide used Canada’s Hansard, which publishes that country’s parliamentary debates in French and English, providing a huge amount of data for that time. The phrase-based approach would ensure that the translation of a word would take the surrounding words properly into account.

But quality did not take a leap until Google, which had set itself the goal of indexing the entire internet, decided to use those data to train its translation engines; in 2007 it switched from a rules-based engine (provided by Systran) to its own statistics-based system. To build it, Google trawled about a trillion web pages, looking for any text that seemed to be a translation of another—for example, pages designed identically but with different words, and perhaps a hint such as the address of one page ending in /en and the other ending in /fr. According to Macduff Hughes, chief engineer on Google Translate, a simple approach using vast amounts of data seemed more promising than a clever one with fewer data.

Training on parallel texts (which linguists call corpora, the plural of corpus) creates a “translation model” that generates not one but a series of possible translations in the target language. The next step is running these possibilities through a monolingual language model in the target language. This is, in effect, a set of expectations about what a well-formed and typical sentence in the target language is likely to be. Single-language models are not too hard to build. (Parallel human-translated corpora are hard to come by; large amounts of monolingual training data are not.) As with the translation model, the language model uses a brute-force statistical approach to learn from the training data, then ranks the outputs from the translation model in order of plausibility.

Statistical machine translation rekindled optimism in the field. Internet users quickly discovered that Google Translate was far better than the rules-based online engines they had used before, such as BabelFish. Such systems still make mistakes—sometimes minor, sometimes hilarious, sometimes so serious or so many as to make nonsense of the result. And language pairs like Chinese-English, which are unrelated and structurally quite different, make accurate translation harder than pairs of related languages like English and German. But more often than not, Google Translate and its free online competitors, such as Microsoft’s Bing Translator, offer a usable approximation.

Such systems are set to get better, again with the help of deep learning from digital neural networks. The Association for Computational Linguistics has been holding workshops on MT every summer since 2006. One of the events is a competition between MT engines turned loose on a collection of news text. In August 2016, in Berlin, neural-net-based MT systems were the top performers (out of 102), a first.
Now Google has released its own neural-net-based engine for eight language pairs, closing much of the quality gap between its old system and a human translator.
This is especially true for closely related languages (like the big European ones) with lots of available training data. The results are still distinctly imperfect, but far smoother and more accurate than before. Translations between English and (say) Chinese and Korean are not as good yet, but the neural system has brought a clear improvement here too.

What machines cannot yet do is have true conversations
The Coca-Cola factor

Neural-network-based translation actually uses two networks. One is an encoder. Each word of an input sentence is converted into a multidimensional vector (a series of numerical values), and the encoding of each new word takes into account what has happened earlier in the sentence. Marcello Federico of Italy’s Fondazione Bruno Kessler, a private research organisation, uses an intriguing analogy to compare neural-net translation with the phrase-based kind. The latter, he says, is like describing Coca-Cola in terms of sugar, water, caffeine and other ingredients. By contrast, the former encodes features such as liquidness, darkness, sweetness and fizziness.
Once the source sentence is encoded, a decoder network generates a word-for-word translation, once again taking account of the immediately preceding word. This can cause problems when the meaning of words such as pronouns depends on words mentioned much earlier in a long sentence. This problem is mitigated by an “attention model”, which helps maintain focus on other words in the sentence outside the immediate context.

Neural-network translation requires heavy-duty computing power, both for the original training of the system and in use. The heart of such a system can be the GPUs that made the deep-learning revolution possible, or specialised hardware like Google’s Tensor Processing Units (TPUs). Smaller translation companies and researchers usually rent this kind of processing power in the cloud. But the data sets used in neural-network training do not need to be as extensive as those for phrase-based systems, which should give smaller outfits a chance to compete with giants like Google.
Fully automated, high-quality machine translation is still a long way off. For now, several problems remain. All current machine translations proceed sentence by sentence. If the translation of such a sentence depends on the meaning of earlier ones, automated systems will make mistakes. Long sentences, despite tricks like the attention model, can be hard to translate. And neural-net-based systems in particular struggle with rare words.

Training data, too, are scarce for many language pairs. They are plentiful between European languages, since the European Union’s institutions churn out vast amounts of material translated by humans between the EU’s 24 official languages. But for smaller languages such resources are thin on the ground. For example, there are few Greek-Urdu parallel texts available on which to train a translation engine. So a system that claims to offer such translation is in fact usually running it through a bridging language, nearly always English. That involves two translations rather than one, multiplying the chance of errors.
Even if machine translation is not yet perfect, technology can already help humans translate much more quickly and accurately. “Translation memories”, software that stores already translated words and segments, first came into use as early as the 1980s. For someone who frequently translates the same kind of material (such as instruction manuals), they serve up the bits that have already been translated, saving lots of duplication and time.

A similar trick is to train MT engines on text dealing with a narrow real-world domain, such as medicine or the law. As software techniques are refined and computers get faster, training becomes easier and quicker. Free software such as Moses, developed with the support of the EU and used by some of its in-house translators, can be trained by anyone with parallel corpora to hand. A specialist in medical translation, for instance, can train the system on medical translations only, which makes them far more accurate.
At the other end of linguistic sophistication, an MT engine can be optimised for the shorter and simpler language people use in speech to spew out rough but near-instantaneous speech-to-speech translations. This is what Microsoft’s Skype Translator does. Its quality is improved by being trained on speech (things like film subtitles and common spoken phrases) rather than the kind of parallel text produced by the European Parliament.

Translation management has also benefited from innovation, with clever software allowing companies quickly to combine the best of MT, translation memory, customisation by the individual translator and so on. Translation-management software aims to cut out the agencies that have been acting as middlemen between clients and an army of freelance translators. Jack Welde, the founder of Smartling, an industry favourite, says that in future translation customers will choose how much human intervention is needed for a translation. A quick automated one will do for low-stakes content with a short life, but the most important content will still require a fully hand-crafted and edited version. Noting that MT has both determined boosters and committed detractors, Mr Welde says he is neither: “If you take a dogmatic stance, you’re not optimised for the needs of the customer.”

Translation software will go on getting better. Not only will engineers keep tweaking their statistical models and neural networks, but users themselves will make improvements to their own systems. For example, a small but much-admired startup, Lilt, uses phrase-based MT as the basis for a translation, but an easy-to-use interface allows the translator to correct and improve the MT system’s output. Every time this is done, the corrections are fed back into the translation engine, which learns and improves in real time. Users can build several different memories—a medical one, a financial one and so on—which will help with future translations in that specialist field.

TAUS, an industry group, recently issued a report on the state of the translation industry saying that “in the past few years the translation industry has burst with new tools, platforms and solutions.” Last year Jaap van der Meer, TAUS’s founder and director, wrote a provocative blogpost entitled “The Future Does Not Need Translators”, arguing that the quality of MT will keep improving, and that for many applications less-than-perfect translation will be good enough.

The “translator” of the future is likely to be more like a quality-control expert, deciding which texts need the most attention to detail and editing the output of MT software. That may be necessary because computers, no matter how sophisticated they have become, cannot yet truly grasp what a text means.

Meaning and machine intelligence: What are you talking about?
Machines cannot conduct proper conversations with humans because they do not understand the world

IN “BLACK MIRROR”, a British science-fiction satire series set in a dystopian near future, a young woman loses her boyfriend in a car accident. A friend offers to help her deal with her grief. The dead man was a keen social-media user, and his archived accounts can be used to recreate his personality. Before long she is messaging with a facsimile, then speaking to one. As the system learns to mimic him ever better, he becomes increasingly real.

This is not quite as bizarre as it sounds. Computers today can already produce an eerie echo of human language if fed with the appropriate material. What they cannot yet do is have true conversations. Truly robust interaction between man and machine would require a broad understanding of the world. In the absence of that, computers are not able to talk about a wide range of topics, follow long conversations or handle surprises.

Machines trained to do a narrow range of tasks, though, can perform surprisingly well. The most obvious examples are the digital assistants created by the technology giants. Users can ask them questions in a variety of natural ways: “What’s the temperature in London?” “How’s the weather outside?” “Is it going to be cold today?” The assistants know a few things about users, such as where they live and who their family are, so they can be personal, too: “How’s my commute looking?” “Text my wife I’ll be home in 15 minutes.”
And they get better with time. Apple’s Siri receives 2bn requests per week, which (after being anonymised) are used for further teaching. For example, Apple says Siri knows every possible way that users ask about a sports score. She also has a delightful answer for children who ask about Father Christmas. Microsoft learned from some of its previous natural-language platforms that about 10% of human interactions were “chitchat”, from “tell me a joke” to “who’s your daddy?”, and used such chat to teach its digital assistant, Cortana.

The writing team for Cortana includes two playwrights, a poet, a screenwriter and a novelist. Google hired writers from Pixar, an animated-film studio, and The Onion, a satirical newspaper, to make its new Google Assistant funnier. No wonder people often thank their digital helpers for a job well done. The assistants’ replies range from “My pleasure, as always” to “You don’t need to thank me.”
Good at grammar

How do natural-language platforms know what people want? They not only recognise the words a person uses, but break down speech for both grammar and meaning. Grammar parsing is relatively advanced; it is the domain of the well-established field of “natural-language processing”. But meaning comes under the heading of “natural-language understanding”, which is far harder.

First, parsing. Most people are not very good at analysing the syntax of sentences, but computers have become quite adept at it, even though most sentences are ambiguous in ways humans are rarely aware of. Take a sign on a public fountain that says, “This is not drinking water.” Humans understand it to mean that the water (“this”) is not a certain kind of water (“drinking water”). But a computer might just as easily parse it to say that “this” (the fountain) is not at present doing something (“drinking water”).

As sentences get longer, the number of grammatically possible but nonsensical options multiplies exponentially. How can a machine parser know which is the right one? It helps for it to know that some combinations of words are more common than others: the phrase “drinking water” is widely used, so parsers trained on large volumes of English will rate those two words as likely to be joined in a noun phrase. And some structures are more common than others: “noun verb noun noun” may be much more common than “noun noun verb noun”. A machine parser can compute the overall probability of all combinations and pick the likeliest.

A “lexicalised” parser might do even better. Take the Groucho Marx joke, “One morning I shot an elephant in my pyjamas. How he got in my pyjamas, I’ll never know.” The first sentence is ambiguous (which makes the joke)—grammatically both “I” and “an elephant” can attach to the prepositional phrase “in my pyjamas”. But a lexicalised parser would recognise that “I [verb phrase] in my pyjamas” is far more common than “elephant in my pyjamas”, and so assign that parse a higher probability.

But meaning is harder to pin down than syntax. “The boy kicked the ball” and “The ball was kicked by the boy” have the same meaning but a different structure. “Time flies like an arrow” can mean either that time flies in the way that an arrow flies, or that insects called “time flies” are fond of an arrow.

“Who plays Thor in ‘Thor’?” Your correspondent could not remember the beefy Australian who played the eponymous Norse god in the Marvel superhero film. But when he asked his iPhone, Siri came up with an unexpected reply: “I don’t see any movies matching ‘Thor’ playing in Thor, IA, US, today.” Thor, Iowa, with a population of 184, was thousands of miles away, and “Thor”, the film, has been out of cinemas for years. Siri parsed the question perfectly properly, but the reply was absurd, violating the rules of what linguists call pragmatics: the shared knowledge and understanding that people use to make sense of the often messy human language they hear. “Can you reach the salt?” is not a request for information but for salt. Natural-language systems have to be manually programmed to handle such requests as humans expect them, and not literally.

Multiple choice
Shared information is also built up over the course of a conversation, which is why digital assistants can struggle with twists and turns in conversations. Tell an assistant, “I’d like to go to an Italian restaurant with my wife,” and it might suggest a restaurant. But then ask, “is it close to her office?”, and the assistant must grasp the meanings of “it” (the restaurant) and “her” (the wife), which it will find surprisingly tricky. Nuance, the language-technology firm, which provides natural-language platforms to many other companies, is working on a “concierge” that can handle this type of challenge, but it is still a prototype.
Such a concierge must also offer only restaurants that are open. Linking requests to common sense (knowing that no one wants to be sent to a closed restaurant), as well as a knowledge of the real world (knowing which restaurants are closed), is one of the most difficult challenges for language technologies.

Common sense, an old observation goes, is uncommon enough in humans. Programming it into computers is harder still. Fernando Pereira of Google points out why. Automated speech recognition and machine translation have something in common: there are huge stores of data (recordings and transcripts for speech recognition, parallel corpora for translation) that can be used to train machines. But there are no training data for common sense.

Brain scan: Terry Winograd
The Winograd Schema tests computers’ “understanding” of the real world

THE Turing Test was conceived as a way to judge whether true artificial intelligence has been achieved. If a computer can fool humans into thinking it is human, there is no reason, say its fans, to say the machine is not truly intelligent.
Few giants in computing stand with Turing in fame, but one has given his name to a similar challenge: Terry Winograd, a computer scientist at Stanford. In his doctoral dissertation Mr Winograd posed a riddle for computers: “The city councilmen refused the demonstrators a permit because they feared violence. Who feared violence?”

It is a perfect illustration of a well-recognised point: many things that are easy for humans are crushingly difficult for computers. Mr Winograd went into AI research in the 1960s and 1970s and developed an early natural-language program called SHRDLU that could take commands and answer questions about a group of shapes it could manipulate: “Find a block which is taller than the one you are holding and put it into the box.” This work brought a jolt of optimism to the AI crowd, but Mr Winograd later fell out with them, devoting himself not to making machines intelligent but to making them better at helping human beings. (These camps are sharply divided by philosophy and academic pride.) He taught Larry Page at Stanford, and after Mr Page went on to co-found Google, Mr Winograd became a guest researcher at the company, helping to build Gmail.

In 2011 Hector Levesque of the University of Toronto became annoyed by systems that “passed” the Turing Test by joking and avoiding direct answers. He later asked to borrow Mr Winograd’s name and the format of his dissertation’s puzzle to pose a more genuine test of machine “understanding”: the Winograd Schema. The answers to its battery of questions were obvious to humans but would require computers to have some reasoning ability and some knowledge of the real world. The first official Winograd Schema Challenge was held this year, with a $25,000 prize offered by Nuance, the language-software company, for a program that could answer more than 90% of the questions correctly. The best of them got just 58% right.
Though officially retired, Mr Winograd continues writing and researching. One of his students is working on an application for Google Glass, a computer with a display mounted on eyeglasses. The app would help people with autism by reading the facial expressions of conversation partners and giving the wearer information about their emotional state. It would allow him to integrate linguistic and non-linguistic information in a way that people with autism find difficult, as do computers.

Asked to trick some of the latest digital assistants, like Siri and Alexa, he asks them things like “Where can I find a nightclub my Methodist uncle would like?”, which requires knowledge about both nightclubs (which such systems have) and Methodist uncles (which they don’t). When he tried “Where did I leave my glasses?”, one of them came up with a link to a book of that name. None offered the obvious answer: “How would I know?”

Knowledge of the real world is another matter. AI has helped data-rich companies such as America’s West-Coast tech giants organise much of the world’s information into interactive databases such as Google’s Knowledge Graph. Some of the content of that appears in a box to the right of a Google page of search results for a famous figure or thing. It knows that Jacob Bernoulli studied at the University of Basel (as did other people, linked to Bernoulli through this node in the Graph) and wrote “On the Law of Large Numbers” (which it knows is a book).

Organising information this way is not difficult for a company with lots of data and good AI capabilities, but linking information to language is hard. Google touts its assistant’s ability to answer questions like “Who was president when the Rangers won the World Series?” But Mr Pereira concedes that this was the result of explicit training. Another such complex query—“What was the population of London when Samuel Johnson wrote his dictionary?”—would flummox the assistant, even though the Graph knows about things like the historical population of London and the date of Johnson’s dictionary. IBM’s Watson system, which in 2011 beat two human champions at the quiz show “Jeopardy!”, succeeded mainly by calculating huge numbers of potential answers based on key words by probability, not by a human-like understanding of the question.

Making real-world information computable is challenging, but it has inspired some creative approaches. Cortical.io, a Vienna-based startup, took hundreds of Wikipedia articles, cut them into thousands of small snippets of information and ran an “unsupervised” machine-learning algorithm over it that required the computer not to look for anything in particular but to find patterns. These patterns were then represented as a visual “semantic fingerprint” on a grid of 128×128 pixels. Clumps of pixels in similar places represented semantic similarity. This method can be used to disambiguate words with multiple meanings: the fingerprint of “organ” shares features with both “liver” and “piano” (because the word occurs with both in different parts of the training data). This might allow a natural-language system to distinguish between pianos and church organs on one hand, and livers and other internal organs on the other.
Proper conversation between humans and machines can be seen as a series of linked challenges: speech recognition, speech synthesis, syntactic analysis, semantic analysis, pragmatic understanding, dialogue, common sense and real-world knowledge. Because all the technologies have to work together, the chain as a whole is only as strong as its weakest link, and the first few of these are far better developed than the last few.

The hardest part is linking them together. Scientists do not know how the human brain draws on so many different kinds of knowledge at the same time. Programming a machine to replicate that feat is very much a work in progress.

Looking ahead: For my next trick
Talking machines are the new must-haves
IN “WALL-E”, an animated children’s film set in the future, all humankind lives on a spaceship after the Earth’s environment has been trashed. The humans are whisked around in intelligent hovering chairs; machines take care of their every need, so they are all morbidly obese. Even the ship’s captain is not really in charge; the actual pilot is an intelligent and malevolent talking robot, Auto, and like so many talking machines in science fiction, he eventually makes a grab for power.

Speech is quintessentially human, so it is hard to imagine machines that can truly speak conversationally as humans do without also imagining them to be superintelligent. And if they are super intelligent, with none of humans’ flaws, it is hard to imagine them not wanting to take over, not only for their good but for that of humanity. Even in a fairly benevolent future like “WALL-E’s”, where the machines are doing all the work, it is easy to see that the lack of anything challenging to do would be harmful to people.
Fortunately, the tasks that talking machines can take off humans’ to-do lists are the sort that many would happily give up. Machines are increasingly able to handle difficult but well-defined jobs. Soon all that their users will have to do is pipe up and ask them, using a naturally phrased voice command. Once upon a time, just one tinkerer in a given family knew how to work the computer or the video recorder. Then graphical interfaces (icons and a mouse) and touchscreens made such technology accessible to everyone. Frank Chen of Andreessen Horowitz, a venture-capital firm, sees natural-language interfaces between humans and machines as just another step in making information and services available to all. Silicon Valley, he says, is enjoying a golden age of AI technologies. Just as in the early 1990s companies were piling online and building websites without quite knowing why, now everyone is going for natural language. Yet, he adds, “we’re in 1994 for voice.”
1995 will soon come. This does not mean that people will communicate with their computers exclusively by talking to them. Websites did not make the telephone obsolete, and mobile devices did not make desktop computers obsolete. In the same way, people will continue to have a choice between voice and text when interacting with their machines.
Not all will choose voice. For example, in Japan yammering into a phone is not done in public, whether the interlocutor is a human or a digital assistant, so usage of Siri is low during business hours but high in the evening and at the weekend. For others, voice-enabled technology is an obvious boon. It allows dyslexic people to write without typing, and the very elderly may find it easier to talk than to type on a tiny keyboard. The very young, some of whom today learn to type before they can write, may soon learn to talk to machines before they can type.
Those with injuries or disabilities that make it hard for them to write will also benefit. Microsoft is justifiably proud of a new device that will allow people with amyotrophic lateral sclerosis (ALS), which immobilises nearly all of the body but leaves the mind working, to speak by using their eyes to pick letters on a screen. The critical part is predictive text, which improves as it gets used to a particular individual. An experienced user will be able to “speak” at around 15 words per minute.
People may even turn to machines for company. Microsoft’s Xiaoice, a chatbot launched in China, learns to come up with the responses that will keep a conversation going longest. Nobody would think it was human, but it does make users open up in surprising ways. Jibo, a new “social robot”, is intended to tell children stories, help far-flung relatives stay in touch and the like.

Another group that may benefit from technology is smaller language communities. Networked computers can encourage a winner-take-all effect: if there is a lot of good software and content in English and Chinese, smaller languages become less valuable online. If they are really tiny, their very survival may be at stake. But Ross Perlin of the Endangered Languages Alliance notes that new software allows researchers to document small languages more quickly than ever. With enough data comes the possibility of developing resources—from speech recognition to interfaces with software—for smaller and smaller languages. The Silicon Valley giants already localise their services in dozens of languages; neural networks and other software allow new versions to be generated faster and more efficiently than ever.

There are two big downsides to the rise in natural-language technologies: the implications for privacy, and the disruption it will bring to many jobs.

Increasingly, devices are always listening. Digital assistants like Alexa, Cortana, Siri and Google Assistant are programmed to wait for a prompt, such as “Hey, Siri” or “OK, Google”, to activate them. But allowing always-on microphones into people’s pockets and homes amounts to a further erosion of traditional expectations of privacy. The same might be said for all the ways in which language software improves by training on a single user’s voice, vocabulary, written documents and habits.

All the big companies’ location-based services—even the accelerometers in phones that detect small movements—are making ever-improving guesses about users’ wants and needs. The moment when a digital assistant surprises a user with “The chemist is nearby—do you want to buy more haemorrhoid cream, Steve?” could be when many may choose to reassess the trade-off between amazing new services and old-fashioned privacy. The tech companies can help by giving users more choice; the latest iPhone will not be activated when it is laid face down on a table. But hackers will inevitably find ways to get at some of these data.

The other big concern is for jobs. To the extent that they are routine, they face being automated away. A good example is customer support. When people contact a company for help, the initial encounter is usually highly scripted. A company employee will verify a customer’s identity and follow a decision-tree. Language technology is now mature enough to take on many of these tasks.

For a long transition period humans will still be needed, but the work they do will become less routine. Nuance, which sells lots of automated online and phone-based help systems, is bullish on voice biometrics (customers identifying themselves by saying “my voice is my password”). Using around 200 parameters for identifying a speaker, it is probably more secure than a fingerprint, says Brett Beranek, a senior manager at the company. It will also eliminate the tedium, for both customers and support workers, of going through multi-step identification procedures with PINs, passwords and security questions. When Barclays, a British bank, offered it to frequent users of customer-support services, 84% signed up within five months.

Digital assistants on personal smartphones can get away with mistakes, but for some business applications the tolerance for error is close to zero, notes Nikita Ivanov. His company, Datalingvo, a Silicon Valley startup, answers questions phrased in natural language about a company’s business data. If a user wants to know which online ads resulted in the most sales in California last month, the software automatically translates his typed question into a database query. But behind the scenes a human working for Datalingvo vets the query to make sure it is correct. This is because the stakes are high: the technology is bound to make mistakes in its early days, and users could make decisions based on bad data.
This process can work the other way round, too: rather than natural-language input producing data, data can produce language. Arria, a company based in London, makes software into which a spreadsheet full of data can be dragged and dropped, to be turned automatically into a written description of the contents, complete with trends. Matt Gould, the company’s chief strategy officer, likes to think that this will free chief financial officers from having to write up the same old routine analyses for the board, giving them time to develop more creative approaches.

Carl Benedikt Frey, an economist at Oxford University, has researched the likely effect of artificial intelligence on the labour market and concluded that the jobs most likely to remain immune include those requiring creativity and skill at complex social interactions. But not every human has those traits. Call centres may need fewer people as more routine work is handled by automated systems, but the trickier inquiries will still go to humans.

Much of this seems familiar. When Google search first became available, it turned up documents in seconds that would have taken a human operator hours, days or years to find. This removed much of the drudgery from being a researcher, librarian or journalist. More recently, young lawyers and paralegals have taken to using e-discovery. These innovations have not destroyed the professions concerned but merely reshaped them.

Machines that relieve drudgery and allow people to do more interesting jobs are a fine thing. In net terms they may even create extra jobs. But any big adjustment is most painful for those least able to adapt. Upheavals brought about by social changes—like the emancipation of women or the globalisation of labour markets—are already hard for some people to bear. When those changes are wrought by machines, they become even harder, and all the more so when those machines seem to behave more and more like humans. People already treat inanimate objects as if they were alive: who has never shouted at a computer in frustration? The more that machines talk, and the more that they seem to understand people, the more their users will be tempted to attribute human traits to them.

That raises questions about what it means to be human. Language is widely seen as humankind’s most distinguishing trait. AI researchers insist that their machines do not think like people, but if they can listen and talk like humans, what does that make them? As humans teach ever more capable machines to use language, the once-obvious line between them will blur.

Amazon’s Echo

See NYT article below on Amazon’s Echo (and note comparisons to other voice command systems, such as Siri, Google Now, and Cortana:

“If it moves nimbly, keeping ahead of Apple and Google, Amazon could transform the Echo into a something like a residential hub, the one device to control pretty much everything attached to your home.”

Functionality at the moment is:

– telling you the weather
– playing music you ask for
– adding stuff to your shopping list
– reordering items you frequently buy from Amazon
– giving you a heads-up about your nearing calendar appointments
– setting a kitchen timer
– answering the most basic of search queries

Amazon Echo, a.k.a. Alexa, Is a Personal Aide in Need of Schooling

The Amazon Echo, a wireless speaker and artificially intelligent personal assistant, can tell you the weather, play music and reorder items you frequently buy from Amazon, among other things.

THIS week, I asked a friend for help: “Alexa, can you write this review for me?”
“What’s your question?” Alexa responded.
“Can you write this review for me?”
“Review is spelled R-E-V-I-E-W.”
“Thanks,” I said. “That about sums it up.”

O.K., so Alexa isn’t perfect; far from it, in fact. If there is one glaring flaw in the Amazon Echo — the tiny wireless speaker and artificially intelligent personal assistant, a machine that one always addresses with the honorific “Alexa,” as if she’s some kind of digital monarch — it is that she is quite stupid.

If Alexa were a human assistant, you’d fire her, if not have her committed. “Sorry, I didn’t understand the question I heard” is her favorite response, though honestly she really doesn’t sound very sorry. She’ll resort to that line whether you ask her questions answered by a simple Google search (“How much does a cup of flour weigh?”) or something more complicated (“Alexa, what was that Martin Scorsese movie with Joe Pesci and Robert De Niro?”).

Other times, she is mind-numbingly literal. One night during the N.B.A. playoffs, I asked, “Alexa, what’s the score of the basketball game?” She proceeded to give me a two-minute, 18-part definition of the word “score” that included “a seduction culminating in sexual intercourse.” Not exactly what I was going for.

And yet, after spending three weeks testing the Echo, I really kind of love Alexa. She is just smart enough to be useful. And she keeps getting smarter. This week, after a long invitation-only preview period, Amazon began selling the Echo to the public. At $179.99, Alexa is more expensive than I’d like. (Subscribers to Amazon’s $99-a-year Prime subscription service could buy the Echo for only $100 during the preview.) But if you’re the type who enjoys taking chances on early, halfway useful tech novelties, the Echo is a fun thing to try.

And if you’re anything like me, after a week with the Echo, you may feel the device begin to change how you think about home tech. It will not seem far-fetched to expect that one day soon, you’ll have an all-knowing, all-seeing talking assistant to control your lights, thermostat, entertainment system and just about anything else at home. In Alexa, Amazon has created the perfect interface to control your home; if it adds some more intelligence, it would be quite handy.

The Echo is a stout, plain-looking cylinder, about the height of a toaster, that you can park just about anywhere you have Wi-Fi access, though it seems most useful in the kitchen. It comes with a remote control that you don’t really need, because after a quick initial setup using your smartphone, you can control pretty much everything the Echo does with your voice. (The remote does have a microphone that allows you to speak to the Echo from far away.) From there, the Echo is terrifically easy to use — say “Alexa” and ask your question.

At the moment, there are only a handful of uses for the Echo. She’s great at telling you the weather, adding stuff to your shopping list, reordering items you frequently buy from Amazon, giving you a heads-up about your nearing calendar appointments, and answering the most basic of search queries.

She is pretty good at playing music, though her main source is Amazon Prime Music, a streaming service that is included with a Prime membership. Prime Music’s selection is dreadfully limited, though, and at the moment, the Echo can’t connect to many other streaming services. Thankfully, with a few quick voice commands, Alexa can connect to your phone like any other Bluetooth speaker. That way, she can take control of music you play from most apps, including streaming apps like Spotify. You can’t call out for specific songs this way, but you can say “Alexa, pause” or “Alexa, next” and she’ll control the tunes playing from your phone.

The Echo is also a very good kitchen timer. Put your cookies in the oven; yell out, “Alexa, set timer for 12 minutes”; and she’s off. It’s far easier than fumbling with buttons on the microwave, especially when you have your hands full.

But wait a minute — can’t you do pretty much all this on your phone, your smartwatch or many other devices? Yes, you can, but Alexa is right there. She’s always plugged in. She’s always listening, and she’s fast. It’s surprising how much of a difference a few milliseconds make in maintaining the illusion of intelligence in our machines. Because Alexa is far quicker to spring into action than Siri, Apple’s digital personal assistant, especially Siri on the Apple Watch, I found her to be much more pleasant to use, even if she is frequently wrong.

Amazon says that it plans to constantly improve the Echo. During the preview period, it added a host of new features, including the ability to control some smart-home devices, built-in integration with the Pandora streaming service, and traffic information for your morning commute. I’m hoping Amazon creates an open system — what developers call an API — for the Echo, which will allow a wide variety of online services and apps to connect to the device. If it moves nimbly, keeping ahead of Apple and Google, Amazon could transform the Echo into a something like a residential hub, the one device to control pretty much everything attached to your home.

At the moment, that dream is far off. But dumb as she sometimes sounds, Alexa may be just smart enough to make it happen.

Voice Recognition 3.0

Can we just pass right over voice recognition 2.0? I am getting really tired and bored after 25+ years of incrementalism. For sure, the incrementalism has paid off – just look at cars and iPhones and siri and a host of call center applications (which are equally annoying and progressive).

So here is my version of voice recognition 3.0. It would have these characteristics:

Goodbye Typing: it is so good that typing basically becomes a secondary application. Most emails and text messages etc would be generated by voice not be typing.
Grammar, stutter, and uh checks: it is so good that it comes to recognize uh’s and simple makes sure that I want to delete them and then does so automatically. Ditto bad grammar.
Personal vocabularies – it is so good that it builds a personal vocabulary for me over time and remembers it. For example, why can’t I expect my voice recognition system to learn that I live in Serenbe? And will use it in my communications a lot? Why does the best system today choke and think I am saying “Saran Be” or “Seren Bee”. But that is easy. Tough is that it should know that my brother lives in West Townsend, and that the movie I saw last Saturday was called The Hunger Games (and all I should need to say is “the movie I saw last Saturday night” or “the people I met at lunch today” – in other words it should draw regularly and liberally from my calendar and emails as I allow it to do so).
Search: I am tired of talking about this one. We should be able to search using natural language – just buy saying what is on my mind. “I vaguely recall a great article in one of the national magazines, I guess it was published in the last 24 months, and it spoke about genomic mapping getting cheaper and better. Can you find me that article?

History of Computing


See also 2001 and 2007 posts (this is a more extensive more current update):


History of Computing


Eckert and Mauchley develop UNIVAC, the first commercially marketed computer. It is used to compile the results of the U.S. census, marking the first time this census is handled by a programmable computer.
In his paper “Computing Machinery and Intelligence,” Alan Turing presents the Turing Test, a means for determining whether a machine is intelligent.
Commercial color television is first broadcast in the United States, and transcontinental black-and-white television is available within the next year.
Claude Elwood Shannon writes “Programming a Computer for Playing Chess,” published in Philosophical Magazine.
Eckert and Mauchley build EDVAC, which is the first computer to use the stored-program concept. The work takes place at the Moore School at the University of Pennsylvania.
Paris is the host to a Cybernetics Congress.
UNIVAC, used by the Columbia Broadcasting System (CBS) television network, successfully predicts the election of Dwight D. Eisenhower as president of the United States.
Pocket-sized transistor radios are introduced.
Nathaniel Rochester designs the 701, IBM’s first production-line electronic digital computer. It is marketed for scientific use.
The chemical structure of the DNA molecule is discovered by James D. Watson and Francis H. C. Crick.
Philosophical Investigations by Ludwig Wittgenstein and Waiting for Godot, a play by Samuel Beckett, are published. Both documents are considered of major importance to modern existentialism.
Marvin Minsky and John McCarthy get summer jobs at Bell Laboratories.
William Shockley’s Semiconductor Laboratory is founded, thereby starting Silicon Valley.
The Remington Rand Corporation and Sperry Gyroscope join forces and become the Sperry-Rand Corporation. For a time, it presents serious competition to IBM.
IBM introduces its first transistor calculator. It uses 2,200 transistors instead of the 1,200 vacuum tubes that would otherwise be required for equivalent computing power.
A U.S. company develops the first design for a robotlike machine to be used in industry.
IPL-II, the first artificial intelligence language, is created by Allen Newell, J. C. Shaw, and Herbert Simon.
The new space program and the U.S. military recognize the importance of having computers with enough power to launch rockets to the moon and missiles through the stratosphere. Both organizations supply major funding for research.
The Logic Theorist, which uses recursive search techniques to solve mathematical problems, is developed by Allen Newell, J. C. Shaw, and Herbert Simon.
John Backus and a team at IBM invent FORTRAN, the first scientific computer-programming language.
Stanislaw Ulam develops MANIAC I, the first computer program to beat a human being in a chess game.
The first commercial watch to run on electric batteries is presented by the Lip company of France.
The term Artificial Intelligence is coined at a computer conference at Dartmouth College.
Kenneth H. Olsen founds Digital Equipment Corporation.
The General Problem Solver, which uses recursive search to solve problems, is developed by Allen Newell, J. C. Shaw, and Herbert Simon.
Noam Chomsky writes Syntactic Structures, in which he seriously considers the computation required for natural-language understanding. This is the first of the many important works that will earn him the title Father of Modern Linguistics.
An integrated circuit is created by Texas Instruments’ Jack St. Clair Kilby.
The Artificial Intelligence Laboratory at the Massachusetts Institute of Technology is founded by John McCarthy and Marvin Minsky.
Allen Newell and Herbert Simon make the prediction that a digital computer will be the world’s chess champion within ten years.
LISP, an early AI language, is developed by John McCarthy.
The Defense Advanced Research Projects Agency, which will fund important computer-science research for years in the future, is established.
Seymour Cray builds the Control Data Corporation 1604, the first fully transistorized supercomputer.
Jack Kilby and Robert Noyce each develop the computer chip independently. The computer chip leads to the development of much cheaper and smaller computers.
Arthur Samuel completes his study in machine learning. The project, a checkers-playing program, performs as well as some of the best players of the time.
Electronic document preparation increases the consumption of paper in the United States. This year, the nation will consume 7 million tons of paper. In 1986, 22 million tons will be used. American businesses alone will use 850 billion pages in 1981, 2.5 trillion pages in 1986, and 4 trillion in 1990.
COBOL, a computer language designed for business use, is developed by Grace Murray Hopper, who was also one of the first programmers of the Mark I.
Xerox introduces the first commercial copier.
Theodore Harold Maimen develops the first laser. It uses a ruby cylinder.
The recently established Defense Department’s Advanced Research Projects Agency substantially increases its funding for computer research.
There are now about six thousand computers in operation in the United States.
Neural-net machines are quite simple and incorporate a small number of neurons organized in only one or two layers. These models are shown to be limited in their capabilities.
The first time-sharing computer is developed at MIT.
President John F. Kennedy provides the support for space project Apollo and inspiration for important research in computer science when he addresses a joint session of Congress, saying, “I believe we should go to the moon.”
The world’s first industrial robots are marketed by a U.S. company.
Frank Rosenblatt defines the Perceptron in his Principles of Neurodynamics. Rosenblatt first introduced the Perceptron, a simple processing element for neural networks, at a conference in 1959.
The Artificial Intelligence Laboratory at Stanford University is founded by John McCarthy.
The influential Steps Toward Artificial Intelligence by Marvin Minsky is published.
Digital Equipment Corporation announces the PDP-8, which is the first successful minicomputer.
IBM introduces its 360 series, thereby further strengthening its leadership in the computer industry.
Thomas E. Kurtz and John G. Kenny of Dartmouth College invent BASIC (Beginner’s All-purpose Symbolic Instruction Code).
Daniel Bobrow completes his doctoral work on Student, a natural-language program that can solve high-school-level word problems in algebra.
Gordon Moore’s prediction, made this year, says integrated circuits will double in complexity each year. This will become known as Moore’s Law and prove true (with later revisions) for decades to come.
Marshall McLuhan, via his Understanding Media, foresees the potential for electronic media, especially television, to create a “global village” in which “the medium is the message.”
The Robotics Institute at Carnegie Mellon University, which will become a leading research center for AI, is founded by Raj Reddy.
Hubert Dreyfus presents a set of philosophical arguments against the possibility of artificial intelligence in a RAND corporate memo entitled “Alchemy and Artificial Intelligence.”
Herbert Simon predicts that by 1985 “machines will be capable of doing any work a man can do.”
The Amateur Computer Society, possibly the first personal computer club, is founded by Stephen B. Gray. The Amateur Computer Society Newsletter is one of the first magazines about computers.
The first internal pacemaker is developed by Medtronics. It uses integrated circuits.
Gordon Moore and Robert Noyce found Intel (Integrated Electronics) Corporation.
The idea of a computer that can see, speak, hear, and think sparks imaginations when HAL is presented in the film 2001: A Space Odyssey, by Arthur C. Clarke and Stanley Kubrick.
Marvin Minsky and Seymour Papert present the limitation of single-layer neural nets in their book Perceptrons. The book’s pivotal theorem shows that a Perceptron is unable to determine if a line drawing is fully connected. The book essentially halts funding for neural-net research.
The GNP, on a per capita basis and in constant 1958 dollars, is $3,500, or more than six times as much as a century before.
The floppy disc is introduced for storing data in computers.
c. 1970
Researchers at the Xerox Palo Alto Research Center (PARC) develop the first personal computer, called Alto. PARC’s Alto pioneers the use of bit-mapped graphics, windows, icons, and mouse pointing devices.
Terry Winograd completes his landmark thesis on SHRDLU, a natural-language system that exhibits diverse intelligent behavior in the small world of children’s blocks. SHRDLU is criticized, however, for its lack of generality.
The Intel 4004, the first microprocessor, is introduced by Intel.
The first pocket calculator is introduced. It can add, subtract, multiply, and divide.
Continuing his criticism of the capabilities of AI, Hubert Dreyfus publishes What Computers Can’t Do, in which he argues that symbol manipulation cannot be the basis of human intelligence.
Stanley H. Cohen and Herbert W. Boyer show that DNA strands can be cut, joined, and then reproduced by inserting them into the bacterium Escherichia coli. This work creates the foundation for genetic engineering.
Creative Computing starts publication. It is the first magazine for home computer hobbyists.
The 8-bit 8080, which is the first general-purpose microprocessor, is announced by Intel.
Sales of microcomputers in the United States reach more than five thousand, and the first personal computer, the Altair 8800, is introduced. It has 256 bytes of memory.
BYTE, the first widely distributed computer magazine, is published.
Gordon Moore revises his observation on the doubling rate of transistors on an integrated circuit from twelve months to twenty-four months.
Kurzweil Computer Products introduces the Kurzweil Reading Machine (KRM), the first print-to-speech reading machine for the blind. Based on the first omni-font (any font) optical character recognition (OCR) technology, the KRM scans and reads aloud any printed materials (books, magazines, typed documents).
Stephen G. Wozniak and Steven P. Jobs found Apple Computer Corporation.
The concept of true-to-life robots with convincing human emotions is imaginatively portrayed in the film Star Wars.
For the first time, a telephone company conducts large-scale experiments with fiber optics in a telephone system.
The Apple II, the first personal computer to be sold in assembled form and the first with color graphics capability, is introduced and successfully marketed. (JCR buys first Apple II at KO in 1978
Speak & Spell, a computerized learning aid for young children, is introduced by Texas Instruments. This is the first product that electronically duplicates the human vocal tract on a chip.
In a landmark study by nine researchers published in the Journal of the American Medical Association, the performance of the computer program MYCIN is compared with that of doctors in diagnosing ten test cases of meningitis. MYCIN does at least as well as the medical experts. The potential of expert systems in medicine becomes widely recognized.
Dan Bricklin and Bob Frankston establish the personal computer as a serious business tool when they develop VisiCalc, the first electronic spreadsheet.
AI industry revenue is a few million dollars this year.
As neuron models are becoming potentially more sophisticated, the neural network paradigm begins to make a comeback, and networks with multiple layers are commonly used.
Xerox introduces the Star Computer, thus launching the concept of Desktop Publishing. Apple’s Laserwriter, available in 1985, will further increase the viability of this inexpensive and efficient way for writers and artists to create their own finished documents.
IBM introduces its Personal Computer (PC).
The prototype of the Bubble Jet printer is presented by Canon.
Compact disc players are marketed for the first time.
Mitch Kapor presents Lotus 1-2-3, an enormously popular spreadsheet program.
Fax machines are fast becoming a necessity in the business world.
The Musical Instrument Digital Interface (MIDI) is presented in Los Angeles at the first North American Music Manufacturers show.
Six million personal computers are sold in the United States.
The Apple Macintosh introduces the “desktop metaphor,” pioneered at Xerox, including bit-mapped graphics, icons, and the mouse.
William Gibson uses the term cyberspace in his book Neuromancer.
The Kurzweil 250 (K250) synthesizer, considered to be the first electronic instrument to successfully emulate the sounds of acoustic instruments, is introduced to the market.
Marvin Minsky publishes The Society of Mind, in which he presents a theory of the mind where intelligence is seen to be the result of proper organization of a hierarchy of minds with simple mechanisms at the lowest level of the hierarchy.
MIT’s Media Laboratory is founded by Jerome Weisner and Nicholas Negroponte. The lab is dedicated to researching possible applications and interactions of computer science, sociology, and artificial intelligence in the context of media technology.
There are 116 million jobs in the United States, compared to 12 million in 1870. In the same period, the number of those employed has grown from 31 percent to 48 percent, and the per capita GNP in constant dollars has increased by 600 percent. These trends show no signs of abating.
Electronic keyboards account for 55.2 percent of the American musical keyboard market, up from 9.5 percent in 1980.
Life expectancy is about 74 years in the United States. Only 3 percent of the American workforce is involved in the production of food. Fully 76 percent of American adults have high-school diplomas, and 7.3 million U.S. students are enrolled in college.
NYSE stocks have their greatest single-day loss due, in part, to computerized trading.
Current speech systems can provide any one of the following: a large vocabulary, continuous speech recognition, or speaker independence.
Robotic-vision systems are now a $300 million industry and will grow to $800 million by 1990.
Computer memory today costs only one hundred millionth of what it did in 1950.
Marvin Minsky and Seymour Papert publish a revised edition of Perceptrons in which they discuss recent developments in neural network machinery for intelligence.
In the United States, 4,700,000 microcomputers, 120,000 minicomputers, and 11,500 mainframes are sold this year.
W. Daniel Hillis’s Connection Machine is capable of 65,536 computations at the same time.
Notebook computers are replacing the bigger laptops in popularity.
Intel introduces the 16-megahertz (MHz) 80386SX, 2.5 MIPS microprocessor.
Nautilus, the first CD-ROM magazine, is published.
The development of HypterText Markup Language by researcher Tim Berners-Lee and its release by CERN, the high-energy physics laboratory in Geneva, Switzerland, leads to the conception of the World Wide Web.
Cell phones and e-mail are increasing in popularity as business and personal communication tools.
The first double-speed CD-ROM drive becomes available from NEC.
The first personal digital assistant (PDA), a hand-held computer, is introduced at the Consumer Electronics Show in Chicago. The developer is Apple Computer.
The Pentium 32-bit microprocessor is launched by Intel. This chip has 3.1 million transistors.
The World Wide Web emerges.
America Online now has more than 1 million subscribers.
Scanners and CD-ROMs are becoming widely used.
Digital Equipment Corporation introduces a 300-MHz version of the Alpha AXP processor that executes 1 billion instructions per second.
Compaq Computer and NEC Computer Systems ship hand-held computers running Windows CE.
NEC Electronics ships the R4101 processor for personal digital assistants. It includes a touch-screen interface.
Deep Blue defeats Gary Kasparov, the world chess champion, in a regulation tournament.
Dragon Systems introduces Naturally Speaking, the first continuous-speech dictation software product.
Video phones are being used in business settings.
Face-recognition systems are beginning to be used in payroll check-cashing machines.
The Dictation Division of Lernout & Hauspie Speech Products (formerly Kurzweil Applied Intelligence) introduces Voice Xpress Plus, the first continuous-speech-recognition program with the ability to understand natural-language commands.
Routine business transactions over the phone are beginning to be conducted between a human customer and an automated system that engages in a verbal dialogue with the customer (e.g., United Airlines reservations).
Investment funds are emerging that use evolutionary algorithms and neural nets to make investment decisions (e.g., Advanced Investment Technologies).
The World Wide Web is ubiquitous. It is routine for high-school students and local grocery stores to have web sites.
Automated personalities, which appear as animated faces that speak with realistic mouth movements and facial expressions, are working in laboratories. These personalities respond to the spoken statements and facial expressions of their human users. They are being developed to be used in future user interfaces for products and services, as personalized research and business assistants, and to conduct transactions.
Microvision’s Virtual Retina Display (VRD) projects images directly onto the user’s retinas. Although expensive, consumer versions are projected for 1999.
“Bluetooth” technology is being developed for “body” local area networks (LANs) and for wireless communication between personal computers and associated peripherals. Wireless communication is being developed for high-bandwidth connection to the Web.
Ray Kurzweil’s The Age of Spiritual Machines: When Computers Exceed Human Intelligence is published, available at your local bookstore!


A $1,000 personal computer can perform about a trillion calculations per second.
Personal computers with high-resolution visual displays come in a range of sizes, from those small enough to be embedded in clothing and jewelry up to the size of a thin book.
Cables are disappearing. Communication between components uses short-distance wireless technology. High-speed wireless communication provides access to the Web.
The majority of text is created using continuous speech recognition. Also ubiquitous are language user interfaces (LUIs).
Most routine business transactions (purchases, travel, reservations) take place between a human and a virtual personality. Often, the virtual personality includes an animated visual presence that looks like a human face.
Although traditional classroom organization is still common, intelligent courseware has emerged as a common means of learning.
Pocket-sized reading machines for the blind and visually impaired, “listening machines” (speech- to- text conversion) for the deaf, and computer- controlled orthotic devices for paraplegic individuals result in a growing perception that primary disabilities do not necessarily impart handicaps.
Translating telephones (speech-to-speech language translation) are commonly used for many language pairs.< Accelerating returns from the advance of computer technology have resulted in continued economic expansion. Price deflation, which had been a reality in the computer field during the twentieth century, is now occurring outside the computer field. The reason for this is that virtually all economic sectors are deeply affected by the accelerating improvement in the price performance of computing. Human musicians routinely jam with cybernetic musicians. Bioengineered treatments for cancer and heart disease have greatly reduced the mortality from these diseases. The neo-Luddite movement is growing. 2019 A $1,000 computing device (in 1999 dollars) is now approximately equal to the computational ability of the human brain. Computers are now largely invisible and are embedded everywhere -- in walls, tables, chairs, desks, clothing, jewelry, and bodies. Three-dimensional virtual reality displays, embedded in glasses and contact lenses, as well as auditory "lenses," are used routinely as primary interfaces for communication with other persons, computers, the Web, and virtual reality. Most interaction with computing is through gestures and two-way natural-language spoken communication. Nanoengineered machines are beginning to be applied to manufacturing and process-control applications. High-resolution, three-dimensional visual and auditory virtual reality and realistic all-encompassing tactile environments enable people to do virtually anything with anybody, regardless of physical proximity. Paper books or documents are rarely used and most learning is conducted through intelligent, simulated software-based teachers. Blind persons routinely use eyeglass-mounted reading-navigation systems. Deaf persons read what other people are saying through their lens displays. Paraplegic and some quadriplegic persons routinely walk and climb stairs through a combination of computer-controlled nerve stimulation and exoskeletal robotic devices. The vast majority of transactions include a simulated person. Automated driving systems are now installed in most roads. People are beginning to have relationships with automated personalities and use them as companions, teachers, caretakers, and lovers. Virtual artists, with their own reputations, are emerging in all of the arts. There are widespread reports of computers passing the Turing Test, although these tests do not meet the criteria established by knowledgeable observers. 2029 A $1,000 (in 1999 dollars) unit of computation has the computing capacity of approximately 1,000 human brains. Permanent or removable implants (similar to contact lenses) for the eyes as well as cochlear implants are now used to provide input and output between the human user and the worldwide computing network. Direct neural pathways have been perfected for high-bandwidth connection to the human brain. A range of neural implants is becoming available to enhance visual and auditory perception and interpretation, memory, and reasoning. Automated agents are now learning on their own, and significant knowledge is being created by machines with little or no human intervention. Computers have read all available human- and machine-generated literature and multimedia material. There is widespread use of all-encompassing visual, auditory, and tactile communication using direct neural connections, allowing virtual reality to take place without having to be in a "total touch enclosure." The majority of communication does not involve a human. The majority of communication involving a human is between a human and a machine. There is almost no human employment in production, agriculture, or transportation. Basic life needs are available for the vast majority of the human race. There is a growing discussion about the legal rights of computers and what constitutes being "human." Although computers routinely pass apparently valid forms of the Turing Test, controversy persists about whether or not machine intelligence equals human intelligence in all of its diversity. Machines claim to be conscious. These claims are largely accepted. 2049 The common use of nanoproduced food, which has the correct nutritional composition and the same taste and texture of organically produced food, means that the availability of food is no longer affected by limited resources, bad crop weather, or spoilage.< Nanobot swarm projections are used to create visual-auditory-tactile projections of people and objects in real reality. 2072 Picoengineering (developing technology at the scale of picometers or trillionths of a meter) becomes practical.1 By the year 2099 There is a strong trend toward a merger of human thinking with the world of machine intelligence that the human species initially created. There is no longer any clear distinction between humans and computers. Most conscious entities do not have a permanent physical presence. Machine-based intelligences derived from extended models of human intelligence claim to be human, although their brains are not based on carbon-based cellular processes, but rather electronic and photonic equivalents. Most of these intelligences are not tied to a specific computational processing unit. The number of software-based humans vastly exceeds those still using native neuron-cell-based computation. Even among those human intelligences still using carbon-based neurons, there is ubiquitous use of neural-implant technology, which provides enormous augmentation of human perceptual and cognitive abilities. Humans who do not utilize such implants are unable to meaningfully participate in dialogues with those who do. Because most information is published using standard assimilated knowledge protocols, information can be instantly understood. The goal of education, and of intelligent beings, is discovering new knowledge to learn. Femtoengineering (engineering at the scale of femtometers or one thousandth of a trillionth of a meter) proposals are controversial.2 Life expectancy is no longer a viable term in relation to intelligent beings. Some many millenniums hence . . . Intelligent beings consider the fate of the Universe.