Amazon recently unveiled its new Nova Foundation models for the Amazon Bedrock platform. The announcement claimed that Nova models, across its various variants, will provide frontier intelligence and industry-leading price performance. But how much of a challenge do these models present for the field benchmarks like OpenAI’s GPT-4o and o1 and Anthropic’s Claude 3.5 Sonnet? It’ll be interesting to examine that.
This blog overviews Amazon Nova foundation models and compares them with OpenAI and the Claude model family for their LLM performance, pricing, and benchmarks.
Overview of Nova Models
Amazon Nova foundation models come in numerous variants, designed to cater to various applications. Here’s a summary:
Amazon Nova Micro
A text-only model optimized for low-latency responses—capable of processing 210 tokens per second. It supports a context length of 128K tokens.
Amazon Nova Lite
A multimodal model that efficiently handles text, images, and videos, designed for rapid processing at low costs.
Amazon Nova Pro
This model serves as a versatile multimodal solution, offering an ideal balance of speed, accuracy, and affordability. It supports a context length of 300K tokens and is particularly effective in instruction-following tasks.
Amazon Nova Premier
Scheduled for a later release.
Here’s a detailed table for the technical assessment of these models:
Feature | Amazon Nova Pro | Amazon Nova Lite | Amazon Nova Micro |
Model | Amazon Nova Pro | Amazon Nova Lite | Amazon Nova Micro |
Model ID | amazon.nova-pro-v1:0 | amazon.nova-lite-v1:0 | amazon.nova-micro-v1:0 |
Inference Profile ID | us.amazon.nova-pro-v1:0 | us.amazon.nova-lite-v1:0 | us.amazon.nova-micro-v1:0 |
Input modalities | Text, Image, Video | Text, Image, Video | Text |
Output Modalities | Text | Text | Text |
Context Window | 300k | 300k | 128k |
Max Output Tokens | 5k | 5k | 5k |
Supported Languages | 200+ | 200+ | 200+ |
Regions | US East (N. Virginia) | US East (N. Virginia) | US East (N. Virginia) |
Document Support | pdf, csv, doc, docx, xls, xlsx, html, txt, md | pdf, csv, doc, docx, xls, xlsx, html, txt, md | No |
Converse API | Yes | Yes | Yes |
InvokeAPI | Yes | Yes | Yes |
Streaming | Yes | Yes | Yes |
Batch Inference | Yes | Yes | Yes |
Fine Tuning | Yes | Yes | Yes |
Provisioned Throughput | Yes | Yes | Yes |
The Nova family also includes specialized models for the creative generation:
- Amazon Nova Canvas: Focused on image generation with high customization capabilities.
- Amazon Nova Reel: Designed to produce high-quality video content from text and image prompts.
All models are integrated into Amazon Bedrock, allowing for seamless experimentation and fine-tuning with proprietary data.
Benchmark Results
Amazon has conducted extensive benchmarking of the Nova models across various industry-standard metrics. The results illustrate the competitive performance of these models:
- Nova Micro achieved superior performance across all applicable benchmarks compared to Meta’s LLaMa 3.1 (8B) and Google’s Gemini 1.5 Flash (8B), demonstrating its capability in low-latency applications.
- Nova Pro performed equal to or better than OpenAI’s GPT-4o on 17 out of 20 benchmarks and outperformed Google’s Gemini 1.5 Pro on 16 out of 21 benchmarks. Notably, it excelled in instruction-following tasks as measured by the Comprehensive RAG Benchmark (CRAG) and the Berkeley Function Calling Leaderboard.
- Multimodal Capabilities: The Nova models are designed to handle multimodal inputs effectively, with Nova Pro showing remarkable proficiency in integrating visual intelligence alongside text processing.
Amazon Nova Foundation Models vs OpenAI Models
When comparing Amazon’s flagship Nova Pro with OpenAI’s GPT-4o, 4o-mini, and o1 models, we can analyze several critical metrics. Here’s a detailed table showcasing the differences:
Metric | GPT-4o | GPT-4o mini | OpenAI o1 | Amazon Nova Pro |
Input Tokens ($/1M) | $2.50 | $0.15 | $15.00 | $2.50 |
Output Tokens ($/1M) | $10.00 | $0.60 | $60.00 | $3.20 |
Benchmark Performance | Equal or better on 17/20 | Higher quality (71) | Higher quality (85) | Equal or better on 17/20 |
Latency (tokens/sec) | ~100 | ~99.7 | ~32.3 | ~200 |
Context Length | 128K tokens | 128K tokens | 128K tokens | 300K tokens |
From the table above, it’s clear that while GPT-4o and GPT-4o mini offer impressive performance in certain benchmarks, Amazon Nova Pro presents a compelling value proposition. It offers significantly lower operational costs for both input and output tokens, making it a more cost-effective choice for many applications. Additionally, Nova Pro’s latency is nearly double that of GPT-4o, enabling faster response times and higher throughput.
OpenAI o1, though more expensive, excels in specific tasks like summarization, as indicated by its higher quality scores in benchmarks. However, its higher latency and cost might limit its suitability for certain use cases.
In a practical evaluation of summarization tasks of GPT-4o and Amazon Nova Pro, here’s what we find:
- GPT-4o: For an input of 489 tokens and an output of 130 tokens, the total cost was approximately $0.00122.
- Amazon Nova Pro: For a similar task with 494 input tokens and 152 output tokens, the cost was around $0.00068, representing a 44% cost reduction compared to GPT-4o.
When comparing the lower-tier Nova Lite and Nova Micro with OpenAI’s GPT-4o mini and GPT-3.5 Turbo, here’s a summary of what we found:
- Nova Micro, the text-only model, offers remarkable speed with a throughput of 210 tokens per second, outperforming GPT-4o mini’s throughput of approximately 85.2 tokens per second. This makes Nova Micro particularly suitable for applications requiring rapid responses, such as chatbots or real-time data processing.
- As for pricing, Nova Micro is significantly cheaper than the GPT-4o mini. For input tokens, Nova Micro costs about $0.04 per million compared to GPT-4o mini’s $0.15 per million, making it roughly 4.3 times cheaper. Similarly, for output tokens, Nova Micro charges about $0.14 per million versus GPT-4o mini’s $0.60 per million 35.
- Nova Lite, while slightly more expensive than Nova Micro at $0.06 per million for input tokens and $0.24 per million for output tokens, still presents a cost-effective option compared to GPT-4o mini and GPT-3.5 Turbo.
Amazon Nova Foundation Models vs Claude 3.5 Sonnet Models
The comparison extends to Anthropic’s Claude model family, particularly Claude 3.5 Sonnet and Claude 3.5 Haiku. Here’s how they stack up against Amazon Nova Pro:
Metric | Claude 3.5 Sonnet | Claude 3.5 Haiku | Amazon Nova Pro |
Input Tokens ($/1M) | $3.00 | ~$2.40 | $0.80 |
Output Tokens ($/1M) | $15.00 | ~$12.00 | $3.20 |
Benchmark Performance | Equal or better on 9/20 | Equal on most benchmarks | Equal or better on 17/20 |
Latency (tokens/sec) | ~90 | ~85 | ~200 |
While Claude 3.5 Sonnet and Haiku excel in creative writing tasks such as poetry and storytelling, Amazon Nova Pro offers a compelling blend of cost-effectiveness, performance, and versatility.
- Cost-Effectiveness: Nova Pro stands out with significantly lower costs for both input and output tokens compared to Claude models, making it a more budget-friendly option for a wide range of applications.
- Performance: Nova Pro demonstrates superior latency, processing input at nearly double the speed of Claude models. This translates to faster response times and increased throughput, making it ideal for real-time applications. While Claude models might outperform Nova Pro in specific creative writing benchmarks, Nova Pro’s strong performance across various tasks, including coding and reasoning, makes it a versatile choice.
- Real-world Example: To illustrate the cost difference, consider a practical example: For an input of 500 tokens and an output of 200 tokens, Claude 3.5 Sonnet would cost approximately $0.006. In contrast, Nova Pro would cost around $0.0016 for a similar task with 520 input tokens and 250 output tokens, representing a 73% cost reduction.
When comparing Amazon’s Nova Lite and Nova Micro with Anthropic’s Claude 3 Haiku and Claude 3 Opus, several distinctions in performance, cost, and capabilities become apparent. Nova Lite, a multimodal model, excels in processing text, images, and video, supporting a context length of 300K tokens. It has been shown to outperform Claude 3 Haiku in 10 out of 12 benchmarks, making it a strong contender for applications requiring diverse input types. In contrast, Claude 3 Haiku is primarily focused on text and offers a lower context length of 128K tokens, which may limit its effectiveness in handling larger datasets.
On the cost front, both Nova models offer significant savings compared to Claude’s offerings. Nova Micro is priced at approximately $0.000035 per input token and $0.00014 per output token, making it one of the most economical options available. In comparison, Claude 3 Haiku costs around $0.0008 per input token and $0.0004 per output token, while Claude 3 Opus is even more expensive at $0.015 per input token and $0.075 per output token. This substantial difference in pricing positions Amazon’s Nova models as attractive alternatives for businesses seeking high performance at a lower cost.
Summary
The introduction of Amazon’s Nova foundation models is a game-changer in the LLM landscape. These new models are well-equipped to meet the demands of modern AI applications, with the top-tier Nova Pro submitting itself as one of the better models in the market.
As organizations and individuals increasingly seek cost-effective solutions without compromising quality or speed, it’s great to have smart choices. But your decision between the Nova, OpenAI, or Claude models will require ‘actual’ testing. You can try Claude 3.5 Sonnet, GPT-4o, 4o-mini, Claude 3.5 Haiku, and many other models on Bind AI. Plus, you also get advanced features for coding, web search, and integrations that make sure your tests are coherent.
Go ahead and select your model and give it a try.