New OpenAI model: GPT-4o

Last week, OpenAI announced GPT-4o, their new generative AI model that can accept multiple types of media (text, audio, image, video) as prompts and can generate multiple types of media (but not video) in response, all from one interface. That means you can talk to GPT-4o, and it has vision capabilities.

Ron Martinez, who has a lifelong interest in both publishing and the expressive uses of emerging technologies (see InventionArts.ai and TheAiCAM.com) commented on a private listserv about the importance of the release: “While today’s announcements may at first seem to be a set of feature upgrades, taken together this new model opens up a range of applications that would require or benefit from a natural, very low latency conversation with a machine that can see and hear you, as well as respond to your words. It foreshadows a kind of digital-animism, wherein devices around us may be imbued with multi-sensory perception and a form of reasoning, along with a voice and the ability to show us graphically what they wish to convey. In my view, this is the emergence of a generative framework from which a wide range of inventions and new or redefined applications will erupt, as profound as mobile + social was some 15 years ago. Worth taking seriously, and seriously contemplating potential effects, both beneficial and troubling.”

Here’s what the New York Times says about the latest version of ChatGPT (gift link). Unfortunately, there’s been worrying behavior from OpenAI and its CEO Sam Altman as far as its respect for creative people, discussed by Casey Newton at Platformer.