Categories
LLM OpenAI

OpenAI launches Realtime API, Vision fine tuning, Prompt Caching, and more

Bind AI: On October 1, 2024, OpenAI hosted its annual DevDay event. The company announced four API updates and features specified for enhancing developer capabilities. These announcements included the introduction of the Realtime API, vision fine-tuning, prompt caching, and model distillation. The announcements at this year’s event will improve the functionality and efficiency of OpenAI’s existing suite of tools, providing developers with more features and customs to create more responsive and intelligent systems.

Let’s review these four OpenAI DevDay announcements to see what they’re about and what they mean for you—the developer.

OpenAI Realtime API

Image Credit: BecomingHuman.ai

One of the most noteworthy announcements was the launch of the Realtime API, an event-based API that communicates over a WebSocket. This feature allows developers to build multimodal applications that interact with users in real time, facilitating a more dynamic and engaging experience. OpenAI also mentioned that its Chat Completions API will soon support audio input and output. This update will cater to use cases that do not require the real-time low-latency benefits of the Realtime API. With this update, developers can flexibly input text or audio into GPT-4o and receive a response in their preferred format, whether it is text, audio, or both.

For example, health and fitness apps can use this API to create personalized coaching experiences. An example highlighted during the event was Healthify, which uses the Realtime API to enable natural conversations between users and its AI coach, Ria. This feature allows human dietitians to step in when necessary, ensuring that users receive tailored support based on their individual needs.

Vision Fine-Tuning

Another OpenAI DevDay announcement is the addition of vision fine-tuning for the GPT-4o model. This lets developers improve how well the model understands images along with text, boosting its ability to handle visual recognition and analysis tasks. With this fine-tuning, developers can build applications that need advanced image processing skills.

(Example of a speed limit sign tagged successfully by a vision fine-tuned GPT-4o model that was incorrectly tagged by the GPT-4o base model. | Courtesy: OpenAI)

For example, Grab, a leading food delivery service in Southeast Asia, used vision fine-tuning to enhance its mapping services GrabMap. They trained GPT-4o with street-level imagery collected from drivers to improve its ability to localize traffic signs and count lane dividers, resulting in a 20% increase in lane count accuracy. This advancement not only automates mapping processes but also enhances operational efficiency. 

Besides, companies like Automat have used vision fine-tuning to improve their robotic process automation (RPA) agents. They trained the model with screenshots of user interfaces and, as a result, boosted its success rate in identifying UI elements from 16.6% to 61.67%, highlighting the powerful impact of this technology.

OpenAI Prompt Caching

OpenAI also announced prompt caching, a feature designed to optimize API usage by reducing costs and latency for developers. Previously announced for Anthropic’s Claude, this innovation allows automatic discounts on inputs that have been recently processed by the model. When developers reuse prompts or contexts across multiple API calls—common in long conversations or iterative coding tasks—they can benefit from a 50% discount on those inputs. Prompt caching is automatically applied to various models including GPT-4o and its mini versions. 

OpenAI Prompt Caching Pricing:

The system caches prompts longer than 1,024 tokens, allowing developers to use previously computed results without incurring additional costs. This feature simplifies operations while promoting more effective resource use in AI applications.

Model Distillation

The final major announcement was the introduction of model distillation, which allows developers to create cost-efficient models by fine-tuning smaller versions using outputs from larger models like GPT-4o. This process simplifies what has traditionally been a complex and error-prone task by providing an integrated workflow within OpenAI’s platform. With model distillation, developers can generate datasets from real-world examples using outputs from advanced models. This capability enables them to train smaller models—such as GPT-4o mini—while maintaining high performance at a lower cost. The new workflow includes features like stored completions for capturing input-output pairs and evaluation tools for assessing model performance. 

For instance, businesses can now distill their models iteratively by capturing data from their production environments and refining their models based on real-world interactions. This approach not only reduces operational costs but also enhances model accuracy over time.

Final Thoughts

OpenAI’s DevDay showcased a commitment to empowering developers with innovative tools that enhance AI capabilities across various domains. The introduction of the real-time API promises more interactive applications; vision fine-tuning opens new avenues for image analysis; prompt caching optimizes resource usage; and model distillation simplifies the creation of cost-effective AI solutions. 

These advancements reflect OpenAI’s ongoing efforts to make powerful AI technologies accessible and efficient for developers worldwide. As these features roll out, they are expected to drive significant improvements in how AI is integrated into everyday applications, fostering creativity and innovation in the tech landscape. For other recent news and LLM analysis, check our Bind AI blog.