The buzz around generative AI has intensified in the wake of ChatGPT, which grew to over 100 million users within two months of its launch. But ChatGPT is just the tip of the generative AI iceberg.
Generative AI’s history actually goes back nine decades. Though innovations and developments abound, its progress commercially has been relatively slow — until recently. The combination of larger labeled data sets, faster computers and new ways of automatically encoding unlabeled data has hastened generative AI’s development over the past five years. The last year alone has seen near-sentient chatbots, dozens of new services for generating images from descriptions, and the adaptation of large language models (LLMs) to virtually every aspect of business.
Much of today’s developments were built on advancements in computational linguistics and natural language processing. Likewise, early work on procedural content generation has led to content generation in games, and parametric design work has set the stage for industrial design.
90 years and counting Many key milestones dot the landscape of generative AI’s development and innovation.
1957 Linguist Noam Chomsky published Syntactic Structures, which describes grammatical rules for parsing and generating natural language sentences. The book also supports techniques like syntactical parsing and grammar checking.
1963 Computer science professor Ivan Sutherland introduced Sketchpad, an interactive 3D software platform that allowed users to procedurally modify 2D and 3D content. In 1968, Sutherland and fellow professor David Evans started Evans & Sutherland. Some of their students went on to start Pixar, Adobe and Silicon Graphics.
1964 Mathematician and architect Christopher Alexander published Notes on the Synthesis of Form, which spelled out principles for automating design that later influenced the parametric and generative design of products. In 1976, he authored A Pattern Language, which was influential in architecture and inspired new software development approaches. Chatbots may have evolved slowly for six decades, but in just a matter of months, the launch of ChatGPT has catapulted generative AI into the social consciousness.
1966 MIT professor Joseph Weizenbaum created the first chatbot, Eliza, which simulates conversations with a psychotherapist. MIT News reported in 2008 that Weizenbaum “grew skeptical of AI” and “was shocked to discover that many users were taking his program seriously and were opening their hearts to it.” The Automatic Language Processing Advisory Committee (ALPAC) reported that machine translation and computational linguistics were not living up to their promises and led to research funding cuts in both technologies for the next 20 years. Mathematician Leonard E. Baum introduced probabilistic hidden Markov models, which were later used in speech recognition, analyzing proteins and generating responses.
1968 Computer science professor Terry Winograd created SHRDLU, the first multimodal AI that could manipulate and reason out a world of blocks according to instructions from a user.
1969 William A. Woods introduced the augmented transition network, a type of graph theoretic structure for translating information into a form that computers could process. He built one of the first natural language systems called LUNAR to answer questions about the Apollo 11 moon rocks for the NASA Manned Spacecraft Center.
1970 Yale computer science and psychology professor Roger Schank, co-founder of the Cognitive Sciences Society, developed the conceptual dependency theory to mathematically describe the processes involved in natural language understanding and reasoning.
1978 Don Worth created the rogue-like game Beneath Apple Manor for the Apple II while a programmer at UCLA. The game used procedural content generation to programmatically create a rich game world that could run on the limited computer hardware available at the time.
1980 Michael Toy and Glenn Wichman developed the Unix-based game Rogue, which used procedural content generation to dynamically generate new game levels. Toy co-founded the company A.I. Design to port the game to the PC a few years later. The game inspired subsequent interest in using procedural content generation in the gaming industry to generate levels, characters, textures and other elements.
1985 Computer scientist and philosopher Judea Pearl introduced Bayesian network causal analysis, which provided statistical techniques for representing uncertainty that led to methods for generating content in a specific style, tone or length.
1986 Michael Irwin Jordan laid the foundation for the modern use of recurrent neural networks (RNNs) with the publication of “Serial order: a parallel distributed processing approach.” His innovation used backpropagation to reduce error, which opened the door for further research and the widespread adoption of RNNs for processing language a few years later.
1988 Software provider PTC launched Pro/Engineer, the first application that allowed designers to quickly generate new designs by adjusting parameters and constraints in a controlled fashion. The solid modeling software, now called PTC Creo, helps companies like Caterpillar and John Deere develop industrial equipment faster.
1989 Yann LeCun, Yoshua Bengio and Patrick Haffner demonstrated how convolutional neural networks (CNNs) can be used to recognize images. LeNet-5 was an early implementation of the new techniques for accurately identifying handwritten numbers. Even though it took some time, improvements in computer hardware and labeled data sets made it possible for the new approach to scale up, thanks to the ImageNet database in 2006 and the AlexNet CNN architecture in 2012. Convolutional neural networks are among the older and more popular deep learning models compared to the relatively new generative adversarial networks.
1990 Sepp Hochreiter and Jurgen Schmidhuber introduced the Long Short-Term Memory (LSTM) architecture, which helped overcome some of the problems with RNNs. LSTM gave RNNs support for memory and helped spur research into tools for analyzing longer text sequences. A team of researchers at Bell Communications Research, the University of Chicago and the University of Western Ontario published the paper “Indexing by Latent Semantic Analysis.” The new technique provided a method to identify the semantic relationship between words found in a sample of training text, paving the way for deep learning techniques like word2vec and BERT (Bidirectional Encoder Representations from Transformers).
2000 Yoshua Bengio, Rejean Ducharme, Pascal Vincent and Christian Jauvin at the University of Montreal published “A Neural Probabilistic Language Model,” which suggested a method to model language using feed-forward neural networks. The paper led to further research into techniques to encode words into vectors representing their meaning and context automatically. It also demonstrated how backpropagation could help train RNNs for modeling languages.
2006 Data scientist Fei-Fei Li set up the ImageNet database, which laid the foundation for visual object recognition. The database planted the seeds for advances in recognizing objects with AlexNet and generating them later. IBM Watson originated with the initial goal of beating a human on the iconic quiz show Jeopardy!. In 2011, the question-answering computer system defeated the show’s all-time (human) champion Ken Jennings.
2011 Apple released Siri, a voice-powered personal assistant that can generate responses and take actions in response to voice requests.
2012 Alex Krizhevsky designed the AlexNet CNN architecture, pioneering a new way of automatically training neural networks that takes advantage of recent GPU advances. In the ImageNet Large Scale Visual Recognition Challenge that year, AlexNet recognized images with an error rate more than 10.8% lower than the runner-up. It inspired research into scaling deep learning algorithms in parallel on GPUs.
2013 Google researcher Tomas Mikolov and colleagues introduced word2vec to identify semantic relationships between words automatically. This technique made it easier to transform raw text into vectors that deep learning algorithms could process.
2014 Research scientist Ian Goodfellow developed generative adversarial networks, which pit two neural networks against each other to generate increasingly realistic content. One neural network generates new content while the other discriminates between real and generated data. Improvements in both networks led to better quality content over time. Diederik Kingma and Max Welling introduced variational autoencoders for generative modeling. VAEs are used to generate images, videos and text. The algorithm found better ways of representing input data and transforming back to the original or into another format.
2015 Autodesk began publishing research on Project Dreamcatcher, a generative design tool that uses algorithms to create new designs. Users can describe intended properties such as materials, size and weight. Stanford researchers published work on diffusion models in the paper “Deep Unsupervised Learning using Nonequilibrium Thermodynamics.” The technique provided a way to reverse engineer the process of adding noise to a final image. It synthesizes pictures and videos, generates text and models languages.
2016 Microsoft released the chatbot TAY (thinking about you), which responded to questions submitted via Twitter. Users soon began tweeting inflammatory concepts to the chatbot, which quickly generated racist and sexually charged messages in response. Microsoft shut it down after 16 hours.
2017 Google announced work on using AI to design the TPU (Tensor Processing Unit) chip for deep learning workloads. Google researchers developed the concept of transformers in the seminal paper “Attention is all you need.” The article inspired subsequent research into tools that could automatically parse unlabeled text into LLMs. Siemens partnered with Frustum to integrate generative design capabilities into the Siemens NX product design tools. The new capabilities use AI to generate new design variations. PTC, a Siemens competitor, acquired Frustum the following year for its own generative design offering. Autodesk debuted a commercial implementation of its Project Dreamcatcher research as Autodesk Generative Design.
2018 Google researchers implemented transformers into BERT, which was trained on more than 3.3 billion words, consisted of 110 million parameters and could automatically learn the relationship between words in sentences, paragraphs and even books to predict the meaning of text. Google DeepMind researchers developed AlphaFold for predicting protein structures. The innovative technology laid the foundation for generative AI applications in medical research, drug development and chemistry. OpenAI released GPT (Generative Pre-trained Transformer). Trained on about 40 gigabytes of data and consisting of 117 million parameters, GPT paved the way for subsequent LLMs in content generation, chatbots and language translation. The GPT-4 large language model far outweighs its predecessor in parameters and capabilities.
2019 The Malaria No More charity and soccer star David Beckham used deep fake technology to translate his speech and facial movements into nine languages as part of an urgent appeal to end malaria worldwide. A U.K.-based energy firm CEO transferred €220,000 ($243,000) to a Hungarian bank after hackers impersonated his parent company boss using an audio deep fake with an urgent request. And the world was alerted to a new era of social engineering cyber attacks. OpenAI released GPT-2 with 1.5 billion parameters. Trained on a data set of eight million webpages, GPT-2’s objective was to predict the next word, given all the previous words within some text.
2020 Open AI released GPT-3, the largest ever neural network consisting of 175 billion parameters and requiring 800 gigabytes to store. In the first nine months, OpenAI reported that more than 300 applications were using GPT-3 and thousands of developers were building on the platform. Researchers at Google, the University of California, Berkeley and UC San Diego published “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis.” The new technique galvanized research and innovation into 3D content generation. Microsoft researchers developed VIsual VOcabulary (VIVO) pretraining of image captioning algorithms without captioned data. Tests indicated this training can surpass many human captioners.
2021 Cerebras Systems used AI to help generate the design for the WSE-2, a single chip the size of a complete silicon wafer with more than 850,000 cores and 2.6 trillion transistors. OpenAI introduced Dall-E, which can generate images from text prompts. The name is a combination of WALL-E, the name of a fictional robot, and the artist Salvador Dali. The new tool introduced the concept of contrastive language-image pretraining (CLIP) to rank captions for images found on the internet.
2022 OpenAI released Dall-E 2, a smaller and more efficient image generator that uses a diffusion model to generate images. The AI system can generate images and art from a description in natural language. GAN training can generate realistic human faces, synthetic data or facsimiles of humans. Nvidia created NGP Instant NeRF code for quickly transforming pictures into 3D images and content. Google DeepMind released a paper on Gato, a general-purpose multimodal AI that can perform more than 600 tasks, including captioning text, generating robot instructions, playing video games and navigating environments. Researchers from Runway Research, Stability AI and CompVis LMU released Stable Diffusion as open source code that can automatically generate image content from a text prompt. This technique is a new method of combining an auto-encoder to transform data into an intermediate format so Diffusion models can process it more efficiently. OpenAI released ChatGPT in November to provide a chat-based interface to its GPT 3.5 LLM. It attracted more than 100 million users within two months, representing the fastest ever consumer adoption of a service.
2023 Getty Images and a group of artists separately sued several companies that implemented Stable Diffusion for copyright infringement. Their suit claimed Stability AI, Midjourney and DeviantArt, among others, scraped Getty’s content without consent. Microsoft integrated a version of ChatGPT into its Bing search engine. Google quickly followed with plans to release the Bard chat service based on its Lamda engine. And the controversy over detecting AI- generated content heated up. OpenAI released GPT-4 multimodal LLM, which can receive both text and image prompts. A who’s who of technology leaders, including Elon Musk, Steve Wozniak and thousands more signatories, called for a pause on the development of advanced AI systems “more powerful than GPT-4.”
GPT’s reaction to this article:
As an AI language model, I don’t have personal opinions, but I can provide you with some insights about the article. The article provides a comprehensive overview of the history of generative AI, from its early milestones to the latest developments, such as ChatGPT and GPT-4. It highlights the key contributions of various researchers and their impact on the field of generative AI. Additionally, the article mentions some of the controversies and challenges associated with the development of advanced AI systems. Overall, the article provides a useful summary of the history and current state of generative AI.