Amazon Bedrock Pricing

Pricing overview

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

With Amazon Bedrock, you will be charged for model inference and customization. You have a choice of two pricing plans for inference: 1. On-Demand and Batch: This mode allows you to use FMs on a pay-as-you-go basis without having to make any time-based term commitments. 2. Provisioned Throughput: This mode allows you to provision sufficient throughput to meet your application's performance requirements in exchange for a time-based term commitment.

Pricing Models

On Demand and Batch

With the On-Demand mode, you only pay for what you use, with no time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed. A token comprises a few characters and refers to the basic unit of text that a model learns to understand the user input and prompt. For image-generation models, you are charged for every image generated.

Cross-region inference: On-Demand mode also supports cross-region inference for some models. It enables developers to seamlessly manage traffic bursts by utilizing compute across different AWS Regions and get higher throughput limits and enhanced resilience. There's no additional charge for using cross-region inference and the price is calculated basis the region you made the request in (source region).

With Batch mode, you can provide a set of prompts as a single input file and receive responses as a single output file, allowing you to get simultaneous large-scale predictions. The responses are processed and stored in your Amazon S3 bucket so you can access them at a later time. Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.

Latency Optimized (Public Preview)

Latency-optimized inference for foundation models in Amazon Bedrock delivers faster response times for models and helps improve responsiveness for your generative AI applications. You can use the use latency-optimized inference for Anthropic's Claude 3.5 Haiku model, and Meta's Llama 3.1 405B and 70B models. As verified by Anthropic, with latency-optimized inference on Amazon Bedrock, Claude 3.5 Haiku runs faster on AWS than anywhere else. Additionally, with latency-optimized inference in Bedrock, Llama 3.1 405B and 70B runs faster on AWS than any other major cloud provider. Learn more here.

Provisioned Throughput

With the Provisioned Throughput mode, you can purchase model units for a specific base or custom model. The Provisioned Throughput mode is primarily designed for large consistent inference workloads that need guaranteed throughput. Custom models can only be accessed using Provisioned Throughput. A model unit provides a certain throughput, which is measured by the maximum number of input or output tokens processed per minute. With the Provisioned Throughput pricing, you are charged by the hour, you have the flexibility to choose between 1-month or 6-month commitment terms.

Custom Model Import

Custom Model Import allows you to leverage your prior model customization investments within Amazon Bedrock and consume them in the same fully-managed manner as Bedrock’s existing hosted foundation models. You can import custom weights for supported model architectures and serve the custom model using On-Demand mode. There is no charge to import a custom model to Bedrock. Once you import a model, you will be able to access it on-demand without requiring to perform any control plane action. You are only charged for model inference, based on the number of copies of your custom model required to service your inference volume and the duration each model copy is active, billed in 5-minute windows. A model copy is a single instance of an imported model ready to serve inference requests. The price per model copy per minute depends on factors such as architecture, context length, AWS Region, compute unit version (hardware generation), and is tiered by model copy size.

Marketplace models

Amazon Bedrock Marketplace allows you to discover, test, and use over 100 popular, emerging, and specialized foundation models in Bedrock. Amazon Bedrock Marketplace models are deployed to endpoints where you can select your desired number of instances and instance types as well as configure your auto-scaling policies to meet the demands of your workload. For proprietary models, you are charged the software price set by the model provider (per hour, billable in per second increments, or per request) and an infrastructure price based on the instance you select. You can see these prices prior to subscribing to the provider model and also from the model listing in AWS Marketplace. For publicly available models, you are charged only the infrastructure price based on the instance you select. Learn more here.

Customization and Optimization

Model Customization

With Amazon Bedrock, you can customize FMs with your data to deliver tailored responses for specific tasks and your business context. You can fine-tune models with labeled data or using continued pretraining with unlabeled data. For customization of a text-generation model, you are charged for the model training based on the total number of tokens processed by the model (number of tokens in the training data corpus x the number of epochs) and for model storage charged per month per model. An epoch refers to one full pass through your training dataset during fine-tuning or continued pretraining. Inferences using customized models are charged under the Provisioned Throughput plan and requires you purchase Provisioned Throughput. One model unit is made available with no commitment term for inference on a customized model. You will be charged for the number of hours you use in the first model unit for custom model inference. If you want to increase your throughput beyond one model unit, then you must purchase a 1-month or 6-month commitment term.

Model Distillation

With Amazon Bedrock Model Distillation you pay for what you use. Synthetic data generation is charged at on-demand pricing of the selected teacher model. Fine-tuning of the student model is charged at model customization rates. Since a distilled model is a customized model, inferences using customized models are charged under the Provisioned Throughput plan and requires customers to purchase Provisioned Throughput.

Prompt Caching

With prompt caching on Amazon Bedrock, you can cache repeated context across API calls to reduce your costs and response latencies. Prompts often contain common context or prefixes such as long, multi-turn conversations, many-shot examples and detailed instructions that refine model behavior. Using existing Amazon Bedrock APIs, you can specify the prompt prefixes that you want to cache for five minutes in an AWS account-specific cache. During that time, any requests with matching prefixes receive a discount of up to 90% on cached tokens and a latency improvement of up to 85%. Prices and performance improvements vary by model and prompt length, but your caches are always isolated to your AWS account.

Tools

Guardrails

Amazon Bedrock Guardrails helps you to implement customized safeguards and responsible AI policies for your generative AI applications. It provides additional customizable safety protections on top of the native protections offered by FMs. It is the only responsible AI capability offered by a major cloud provider that helps enable customers to build and customize safety, privacy, and truthfulness protections for their generative AI applications in a single solution, and it works with all FMs in Amazon Bedrock, as well as fine-tuned models. Bedrock Guardrails can also be integrated with Amazon Bedrock Agents and Amazon Bedrock Knowledge Bases to build generative AI applications aligned with your responsible AI policies. Additionally, it offers an ApplyGuardrail API to help evaluate user inputs and model responses generated by any custom or third-party FM outside of Bedrock.

Knowledge Bases and Data Automation

Amazon Bedrock Knowledge Bases is a fully managed Retrieval-Augmented Generation (RAG) workflow that enables customers to create highly accurate, low-latency, secure, and custom generative AI applications by incorporating contextual information from their own data sources. It supports various data sources, including S3, and Confluence, Salesforce, and SharePoint, in preview. It also offers document ingestion for streaming data. Bedrock Knowledge Bases converts unstructured data into embeddings, stores them in vector databases, and enables retrieval from diverse data stores. It also integrates with Kendra for managed retrieval and supports structured data retrieval using natural language to SQL.

Amazon Bedrock Data Automation transforms unstructured, multimodal content into structured data formats for use cases like intelligent document processing, video analysis, and RAG. Bedrock Data Automation can generate Standard Output content using predefined defaults which are modality specific, like scene-by-scene descriptions of videos, audio transcripts or automated document analysis. Customers can additionally create Custom Outputs by specifying their output requirements in Blueprints based on their own data schema that they can then easily load into an existing database or data warehouse. Through an integration with Knowledge Bases, Bedrock Data Automation can also be used to parse content for RAG applications, improving the accuracy and relevancy of results by including information embedded in both images and text.

Agents

Amazon Bedrock Agents offer you the ability to build and configure autonomous agents within your application. These agents securely connect to your company's data sources and augment user requests with the right information to generate accurate responses. You can create a single and multi-agent application with just a few quick steps, accelerating the time it takes to build generative AI applications. These agents support code interpretation to dynamically generate and execute code as well as return of control, which allows you to define an action schema and get the control back whenever the agent invokes the action. Additionally, Amazon Bedrock Agents can retain memory across interactions, offering more personalized and seamless user experiences.

Flows

Amazon Bedrock Flows is a workflow authoring and execution feature of Bedrock for generative AI applications. It accelerates the creation, testing, and deployment of user-defined generative AI workflows through an intuitive visual builder and a set of APIs. It allows you to seamlessly link the latest foundation models, Prompts, Agents, Knowledge Base, Guardrails, and AWS services (such as Amazon Lex, AWS Lambda, Amazon S3) along with business logic to build generative AI workflows. You can easily test and version your workflows, and run it in a secure serverless environment through a visual interface or API without having to stand up your own infrastructure.

Evaluations

Model Evaluation: With Amazon Bedrock model evaluation you pay for what you use, with no minimum volume commitments on the number of prompts or responses. For automatic (programmatic) evaluation, you only pay for the inference from your choice of model in the evaluation. The automatically-generated algorithmic scores are provided at no extra charge. During the Public Preview for automatic (Model/LLM-as-a-judge) evaluation, you only pay for the inference from your choice of generator model and evaluator model. In an LLM-as-a-judge model evaluation job, the built-in metrics use system judge prompt templates unique to each metric and available judge model that will be charged as part of your token usage, and the judge prompts are available in the public AWS documentation for transparency. For human-based evaluation where you bring your own workteam, you are charged for the model inference in the evaluation, and a charge of $0.21 per completed human task. A human task is defined as an instance of a human worker submitting an evaluation of a single prompt and its associated inference responses in the human evaluation user interface. The price is the same whether you have one or two models in your evaluation job and also the same regardless of how many evaluation metrics and rating methods you include. The charges for the human tasks will appear under the Amazon SageMaker section in your AWS bill and are the same for all AWS Regions. There is no separate charge for the workforce, as the workforce is supplied by you. For an evaluation managed by AWS, pricing is customized for your evaluation needs in a private engagement while working with the AWS expert evaluations team.

Amazon Bedrock Knowledge Bases Evaluation (RAG evaluation): With Amazon Bedrock Knowledge Bases Evaluation (RAG evaluation) you pay for what you use, with no minimum volume commitments on the number of prompts or responses. During the Public Preview, you only pay for the inference from your choice of generator model and evaluator model (the evaluation job uses an LLM-as-a-judge), as well as any charges incurred from using the Knowledge Base in the evaluation job according to Amazon Bedrock Knowledge Bases pricing. In a Knowledge Base Evaluation (RAG evaluation) job, the built-in metrics use system judge prompt templates unique to each metric and available judge model that will be charged as part of your token usage, and the judge prompts are available in the public AWS documentation for transparency. Some metrics involve doing judge model inference on retrieved context from your Knowledge Base or your ground truth answers in addition to the input prompt, which affects the costs associated with each metric - more information on each metric can be found in the public AWS documentation for evaluations.

Pricing details

Pricing is dependent on the modality, provider, and model. Please select the model provider to see detailed pricing.

Amazon Bedrock offers select foundation models (FMs) from leading AI providers like Anthropic, Meta, Mistral AI, and Amazon for batch inference at a 50% lower price compared to on-demand inference pricing. Please refer to model list here.

AI21 Labs

AI21 Labs

On-Demand pricing

AI21 Labs models	Price per 1,000 input tokens	Price per 1,000 output tokens
Jamba 1.5 Large	$0.002	$0.008
Jamba 1.5 Mini	$0.0002	$0.0004
Jurassic-2 Mid	$0.0125	$0.0125
Jurassic-2 Ultra	$0.0188	$0.0188
Jamba-Instruct	$0.0005	$0.0007

Amazon
- Amazon Nova
- Amazon Titan
- Other Amazon
- Amazon Nova
- Amazon Nova
  
  Pricing for Understanding Models
  
  Pricing for Creative Content Generation models
- Amazon Titan
- Amazon Titan
- Other Amazon
Anthropic

Anthropic

On-Demand and Batch pricing

Region: US East (N. Virginia) and US West (Oregon)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)	Price per 1,000 input tokens (cache write)	Price per 1,000 input tokens (cache read)
Claude 3.5 Sonnet**	$0.003	$0.015	$0.0015	$0.0075	$0.00375	$0.0003
Claude 3.5 Haiku	$0.0008	$0.004	$0.0005	$0.0025	$0.001	$0.00008
Claude 3 Opus*	$0.015	$0.075	$0.0075	$0.0375	NA	NA
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625	NA	NA
Claude 3 Sonnet	$0.003	$0.015	$0.0015	$0.0075	NA	NA
Claude 2.1	$0.008	$0.024	NA	NA	NA	NA
Claude 2.0	$0.008	$0.024	NA	NA	NA	NA
Claude Instant	$0.0008	$0.0024	NA	NA	NA	NA
*Claude 3 Opus is currently available in the US West (Oregon) Region
**Pricing for Claude 3.5 Sonnet is applicable to each version of Claude 3.5 Sonnet (v1 and v2) - Claude 3.5 Sonnet v2 is currently available in the US West (Oregon) Region

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)	Price per 1,000 input tokens (cache write)	Price per 1,000 input tokens (cache read)
Claude 3.5 Sonnet**	$0.003	$0.015	$0.0015	$0.0075	$0.00375	$0.0003
Claude 3.5 Haiku	$0.001	$0.005	$0.0005	$0.0025	$0.001	$0.00008
Claude 3 Opus*	$0.015	$0.075	$0.0075	$0.0375
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625
Claude 3 Sonnet	$0.003	$0.015	$0.0015	$0.0075
Claude 2.1	$0.008	$0.024	N/A	N/A
Claude 2.0	$0.008	$0.024	N/A	N/A
Claude Instant	$0.0008	$0.0024	N/A	N/A
*Claude 3 Opus is currently available in the US West (Oregon) Region
**Pricing for Claude 3.5 Sonnet is applicable to each version of Claude 3.5 Sonnet (v1 and v2) - Claude 3.5 Sonnet v2 is currently available in the US West (Oregon) Region

Region: Europe (London)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)
Claude 3 Sonnet	$0.003	$0.015	$0.0015	$0.0075
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625

Region: Europe (Zurich)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3.5 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Region: South America (Sao Paolo)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Region: Canada (Central)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Region: Asia Pacific (Mumbai)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Region: Asia Pacific (Sydney)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Region: Asia Pacific (Tokyo)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)
Claude Instant	$0.0008	$0.0024	N/A	N/A
Claude 2.0/2.1	$0.008	$0.024	N/A	N/A
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625
Claude 3.5 Sonnet	$0.003	$0.015	$0.0015	$0.0075

Region: Asia Pacific (Singapore)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)
Claude Instant	$0.0008	$0.0024	$0.0004	$0.0012
Claude 2.0/2.1	$0.008	$0.024	$0.004	$0.012
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625
Claude 3.5 Sonnet	$0.003	$0.015	N/A	N/A

Region: Europe (Paris)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3 Haiku

$0.00025

$0.00125

$0.000125

$0.000625

Claude 3 Sonnet

$0.003

$0.015

$0.0015

$0.0075

Region: Europe (Frankfurt)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens	Price per 1,000 input tokens (batch)	Price per 1,000 output tokens (batch)
Claude Instant	$0.0008	$0.0024	N/A	N/A
Claude 2.0/2.1	$0.008	$0.024	N/A	N/A
Claude 3 Sonnet	$0.003	$0.015	$0.0015	$0.0075
Claude 3.5 Sonnet	$0.003	$0.015	$0.0015	$0.0075
Claude 3 Haiku	$0.00025	$0.00125	$0.000125	$0.000625

Region: Asia Pacific (Seoul)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3.5 Sonnet

$0.003

$0.015

N/A

Claude 3 Haiku

$0.00025

$0.00125

N/A

Region: US East (Ohio)

Anthropic models

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per 1,000 input tokens (batch)

Price per 1,000 output tokens (batch)

Claude 3.5 Sonnet

$0.003

$0.015

N/A

Claude 3 Haiku

$0.00025

$0.00125

N/A

Region: AWS GovCloud (US-West) & AWS GovCloud (US-East)

Anthropic models	Price per 1,000 input tokens	Price per 1,000 output tokens
Claude 3.5 Sonnet	$0.0036	$0.018
Claude 3 Haiku	$0.0003	$0.0015

Latency Optimized Inference

Region: US East (Ohio)

	Price per 1,000 input tokens	Price per 1,000 output tokens
Claude 3.5 Haiku	$0.001	$0.005

Provisioned Throughput pricing

Region: US East (N. Virginia) and US West (Oregon)

Anthropic models	Price per hour per model with no commitment	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Claude Instant	$44.00	$39.60	$22.00
Claude 2.0/2.1	$70.00	$63.00	$35.00

Anthropic models	Price per hour per model with no commitment	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Claude Instant	$44.00	$39.60	$22.00
Claude 2.0/2.1	$70.00	$63.00	$35.00

Region: Asia Pacific (Tokyo)

Anthropic models	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Claude Instant	$53.00	$29.00
Claude 2.0/2.1	$86.00	$48.00

Region: Europe (Frankfurt)

Anthropic models	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Claude Instant	$49.00	$27.00
Claude 2.0/2.1	$79.00	$44.00

Please reach out to your AWS account team for more details on model units.

Cohere

Cohere

On-Demand pricing

Cohere models	Price per 1,000 input tokens	Price per 1,000 output tokens
Command	$0.0015	$0.0020
Command-Light	$0.0003	$0.0006
Command R	$0.0030	$0.0150
Command R	$0.0005	$0.0015
Embed - English	$0.0001	N/A
Embed - Multilingual	$0.0001	N/A

Cohere models	Price per 1,000 queries**
Rerank 3.5	$2.00
**You are charged for number of queries where a query can contain up to 100 document chunks. If the query contains more than 100 document chunks, it is counted as multiple queries. For example, if a request contains 350 documents, it will be treated as 4 queries. Please note that each document can only contain upto 500 tokens (inclusive of the query and document’s total tokens), and if the token length is higher than 512 tokens, it is broken down into multiple documents.

Pricing for customization (fine-tuning)

Cohere models	Price to train 1,000 tokens	Price to store each custom model per month	Price to infer from a custom model per model unit per hour (with no-commit Provisioned Throughput pricing)
Cohere Command	$0.004	$1.95	$49.50
Cohere Command-Light	$0.001	$1.95	$8.56

*Total tokens trained = number of tokens in training data corpus x number of epochs

Provisioned Throughput pricing

Cohere models	Price per hour per model with no commitment	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Cohere Command	$49.50	$39.60	$23.77
Cohere Command - Light	$8.56	$6.85	$4.11
Embed - English	$7.12	$6.76	$6.41
Embed - Multilingual	$7.12	$6.76	$6.41

Please reach out to your AWS account or sales team for more details on model units.

Meta Llama

Meta Llama

Llama 3.3

On-Demand and Batch pricing

Llama 3.2

On-Demand and Batch pricing

Llama 3.1

On-Demand and Batch pricing

Pricing for model customization (fine-tuning)

Provisioned Throughput pricing

Llama 3

On-Demand pricing

Llama 2

On-Demand pricing

Region: US East (N. Virginia) and US West (Oregon)

Meta models	Price per 1,000 input tokens	Price per 1,000 output tokens
Llama 2 Chat (13B)	$0.00075	$0.001
Llama 2 Chat (70B)	$0.00195	$0.00256

Pricing for model customization (fine-tuning)

Meta models	Price to train 1,000 tokens	*Price to store each custom model per month**	Price to infer from a custom model for 1 model unit per hour (with no-commit Provisioned Throughput pricing)
Llama 2 Pretrained (13B)	$0.00149	$1.95	$23.50
Llama 2 Pretrained (70B)	$0.00799	$1.95	$23.50

*Custom model storage = $1.95

Provisioned Throughput pricing

Meta models	Price per hour per model unit for 1-month commitment	Price per hour per model unit for 6-month commitment
Llama 2 Pretrained and Chat (13B)	$21.18	$13.08
Llama 2 Pretrained (70B)	$21.18	$13.08

*Llama 2 Pre-trained models are available only in provisioned throughput after customization.

Please reach out to your AWS account or sales team for more details on model units.

Mistral AI
Mistral AI
Stability AI

Stability AI

On-Demand pricing

Stability AI Model	Price per generated image
Stable Diffusion 3.5 Large	$0.08
Stable Image Core	$0.04
Stable Diffusion 3 Large	$0.08
Stable Image Ultra	$0.14

Previous generation of image models offered by Stability AI are priced per image, depending on step count and image resolution.

Stability AI model	Image resolution	Price per image generated for standard quality (<=50 steps)	Price per image generated for premium quality (>50 steps)
SDXL 1.0	Up to 1024 x 1024	$0.04	$0.08

Provisioned Throughput pricing

Stability AI model	Price per hour per model unit for 1-month commitment*	Price per hour per model unit for 6-month commitment*
SDXL 1.0	$49.86	$46.18

*Includes inference for base and custom models

Please reach out to your AWS account or sales team for more details on model units.

Currently, model customization (fine-tuning) is not supported for Stability AI models on Amazon Bedrock.

Custom Model Import

Custom Model Import

Llama
Multimodal Llama
Mistral
Mixtral
Flan

Llama

Regions: US East (N. Virginia) and US West (Oregon)

Custom Model Unit version	v1.0
Price per Custom Model Unit per min*	$0.0785
Monthly storage cost per Custom Model Unit	$1.95

The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Llama 3.1 8B 128K model requires 2 Custom Model Units, a Llama 3.1 70B 128k model requires 8 Custom Model Units.
*Billed in 5 minute windows

Multimodal Llama

Regions: US East (N. Virginia) and US West (Oregon)

Custom Model Unit version	v1.0
Price per Custom Model Unit per min*	$0.0785
Monthly storage cost per Custom Model Unit	$1.95

The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Llama 3.2 11B 128K model requires 4 Custom Model Units.
*Billed in 5 minute windows

Mistral

Regions: US East (N. Virginia) and US West (Oregon)

Custom Model Unit version	v1.0
Price per Custom Model Unit per min*	$0.0785
Monthly storage cost per Custom Model Unit	$1.95

The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Mistral 7B 32K model requires 1 Custom Model Unit.
*Billed in 5 minute windows

Mixtral

Regions: US East (N. Virginia) and US West (Oregon)

Custom Model Unit version	v1.0
Price per Custom Model Unit per min*	$0.0785
Monthly storage cost per Custom Model Unit	$1.95

The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Mixtral 8x7B 32K model requires 4 Custom Model Units.
*Billed in 5 minute windows

Flan

Regions: US East (N. Virginia) and US West (Oregon)

Custom Model Unit version	v1.0
Price per Custom Model Unit per min*	$0.0785
Monthly storage cost per Custom Model Unit	$1.95

The Custom Model Units needed to host a model depend on a variety of factors - notably the model architecture, model parameter count, and context length. The exact number of Custom Model Units needed will be determined at the time of import. For reference, Flan-T5 XL 512 model requires 1 Custom Model Unit.
*Billed in 5 minute windows

On-Demand Inference Pricing:
You are billed in 5-minute windows for the duration your model copy is active starting from the first successful invocation. The maximum throughput and concurrency limit per model copy depends on factors such as input/output token mix, hardware type, model size, architecture, inference optimizations, and is determined during the model import workflow.

Bedrock automatically scales the number of model copies depending on your usage patterns. If there are no invocations for a 5-minute period, Bedrock will scale down to zero and scale back up when you invoke your model. While scaling back up, you may experience a cold-start duration (in tens of seconds) depending on model size. Bedrock also scales up the number of model copies if your inference volume consistently exceeds the concurrency limits of a single model copy. Note: There is a default maximum of 3 model copies per account per imported model that can be increased through Service Quotas.

Pricing tools (details)

Flows
Knowledge Bases
Guardrails
Model Evaluation
Data Automation

Flows
Amazon Bedrock Flows

You are charged based on the number of node transitions required to execute your application. Bedrock Flows counts a node transition each time a node in your workflow is executed. You are charged for the total number of node transitions across all your flows.

All charges are metered daily and billed monthly starting February 1st, 2025.

Price per 1,000 node transitions

$0.035

Additional Charges

You may incur additional charges if the execution of your application workflow utilizes other AWS services or transfers data. For example, if your workflow invokes an Amazon Bedrock Guardrail policy, you will be billed for the number of text units processed by the policy.
Knowledge Bases
Structured Data Retrieval (SQL Generation)

Structured Data Retrieval is charged for each request to generate a SQL query. The SQL query generated is used to retrieve the data from structured data stores.

Rerank models

Rerank models are designed to improve the relevance and accuracy of responses in Retrieval Augmented Generation (RAG) applications. They are charged per query.

**You are charged for number of queries where a query can contain up to 100 document chunks. If the query contains more than 100 document chunks, it is counted as multiple queries. For example, if a request contains 350 documents, it will be treated as 4 queries. Please note that each document can only contain upto 512 tokens (inclusive of the query and document’s total tokens), and if the token length is higher than 512 tokens, it is broken down into multiple documents. A query is equivalent to a search unit.
Guardrails

Price per 1,000 node transitions
$0.035

Amazon Bedrock Guardrails

Guardrail policy*	Price per 1,000 text units**
Content filters (text content)****	$0.15
Denied topics	$0.15
Contextual grounding check***	$0.1
Sensitive information filters (PII)	$0.1
Sensitive information filters (regular expression)	Free
Word filters	Free

On-Demand pricing

* Each guardrail policy is optional and can be enabled based on your application requirements. Charges will be incurred based on the policy type used in the guardrail. For example, if a guardrail is configured with content filters and denied topics, charges will be incurred for these two policies, while there will be no charges associated with sensitive information filters.

**A text unit can contain up to 1000 characters. If a text input is more than 1000 characters, it is processed as multiple text units, each containing 1000 characters or less. For example, if a text input contains 5600 characters, it will be charged for 6 text units.

***Contextual grounding check uses a reference source and a query to determine if the model response is grounded based on the source and relevant to the query. The total number of text units charged is calculated by combining all the characters in the source, query, and model response.

****Pricing for content filters for detecting and filtering out harmful image content will be announced when the feature is generally available.

Model Evaluation

Model Evaluation

Model evaluation is charged for the inference from your choice of model. Automatically-generated algorithmic scores are provided at no extra charge. For human-based evaluation where you bring your own workstream, you are charged for the model inference in the evaluation, and a charge of $0.21 per completed human task.

Model

Price per 1,000 input tokens

Price per 1,000 output tokens

Price per human task

Model selected for evaluation

Based on model selected

$0.21

Data Automation
Data Automation
- Amazon Bedrock Knowledge Bases offers a Bedrock Data Automation integration to provide more relevant and accurate responses for multimodal data. When setting up a Knowledge Base, you can select Bedrock Data Automation as your parsing method to analyze and extract meaningful insights from images or documents, which can include figures, charts, and diagrams. During processing, Bedrock Data Automation extracts meaningful information from ingested documents and images, which is then used in subsequent Knowledge Base steps for chunking, embedding, and storage. When integrated with Knowledge Bases, Bedrock Data Automation delivers and charges for standardized output.

Pricing examples

AI21 labs

An application developer makes the following API calls to Amazon Bedrock: a request to AI21’s Jurassic-2 Mid model to summarize an input of 10K tokens of input text to an output of 2K tokens.

Total cost incurred = 10K tokens/1000 * $0.0125 2K tokens/1000 * $0.0125 = $0.15
Amazon

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Amazon Titan Text Lite model to summarize an input of 2K tokens of input text to an output of 1K tokens.

Total hourly cost incurred is = 2K tokens/1000 * $0.0003 1K tokens/1000 * $0.0004 = $0.001.

An application developer makes the following API calls to Amazon Bedrock: a request to the Amazon Titan Image Generator base model to generate 1000 images of 1024 x 1024 in size of standard quality.

Total cost incurred = 1000 images * $0.01 per image = $10

Customization (fine-tuning and continued pretraining) pricing

An application developer customizes an Amazon Titan Image Generator model using 1000 image-text pairs. After training, the developer uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1-month commitment term) to host the customized model.

Monthly cost incurred for fine-tuning = fine-tuning training ($.005 * 500 * 64), where $0.005 is the price per image seen, 500 is the number of steps, and 64 is the batch size, custom model storage per month ($1.95) 1 hour of custom model inference ($21) = $160 $1.95 21 = $182.95

Provisioned Throughput pricing

An application developer buys two model units of Amazon Titan Text Express with a 1-month commitment for their text summarization use case.

Total monthly cost incurred = 2 model units * $18.40/hour * 24 hours * 31 days = $27,379.20

An application developer buys one model unit of the base Amazon Titan Image Generator model with a 1-month commitment.

Total cost incurred = 1 model unit * $16.20 * 24 hours * 31 days = $12,052.80
Anthropic

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock in the US West (Oregon) Region: a request to Anthropic’s Claude model to summarize an input of 11K tokens of input text to an output of 4K tokens.

Total cost incurred = 11K tokens/1000 * $0.008 4K tokens/1000 * $0.024 = $0.088 $0.096 = $0.184

Provisioned Throughput pricing

An application developer buys one model unit of Anthropic Claude Instant in the US West (Oregon) Region:

Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
Cohere

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock: a request to Cohere’s Command model to summarize an input of 6K tokens of input text to an output of 2K tokens.

Total cost incurred = 6K tokens/1,000 * $0.0015 2K tokens/1,000 * $0.0020 = $0.013

An application developer makes the following API calls to Amazon Bedrock: A request to Cohere’s Command - Light model to summarize an input of 6K tokens of input text to an output of 2K tokens.

Total cost incurred = 6K tokens/1000 * $0.0003 2K tokens/1000 * $0.0006 = $0.003

An application developer makes the following API calls to Amazon Bedrock: A request to either Cohere’s Embed English or Embed Multilingual model to generate embeddings for 10K tokens of input.

Total cost incurred = 10K tokens/1000 * $0.0001 = $.001

Customization (fine-tuning) pricing

An application developer customizes a Cohere Command model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.

Monthly cost incurred for fine-tuning = Fine-tuning training ($0.004 * 1000) custom model storage per month ($1.95) 1 hour of custom model inference ($49.50) = $55.45

Monthly cost incurred for provisioned throughput (1-month commitment) of custom model = $39.60

Provisioned Throughput pricing

An application developer, buys one model unit of Cohere Command with a 1-month commitment for their text summarization use case.

Total monthly cost incurred = 1 model unit * $39.60 * 24 hours * 31 days = $29,462.40
Meta Llama

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock: a request to Meta’s Llama 2 Chat (13B) model to summarize an input of 2K tokens of input text to an output of 500 tokens.

Total cost incurred = 2K tokens/1000 * $0.00075 500 tokens/1000 * $0.001 = $0.002

Customization (fine-tuning) pricing

An application developer customizes the Llama 2 Pretrained (70B) model using 1000 tokens of data. After training, uses custom model provisioned throughput for 1 hour to evaluate the performance of the model. The fine-tuned model is stored for 1 month. After evaluation, the developer uses provisioned throughput (1mo commit) to host the customized model.

Monthly cost incurred for fine-tuning = Fine tuning training ($0.00799 * 1000) custom model storage per month ($1.95) 1 hour of custom model inference ($23.50) = $33.44

Monthly cost incurred for provisioned throughput (a 1-month commit) of custom model = $21.18

Provisioned Throughput pricing

An application developer buys one model unit of Meta Llama 2 with a 1-month commitment for their text summarization use case.

Total monthly cost incurred = 1 model unit * $21.18 * 24 hours * 31 days = $15,757.92
Mistral AI

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral 7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.

Total hourly cost incurred = 2K tokens/1000 * $0.00015 1K tokens/1000 * $0.0002 = $0.0005

An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mixtral 8x7B model to summarize an input of 2K tokens of input text to an output of 1K tokens.

Total hourly cost incurred = 2K tokens/1000 * $0.00045 1K tokens/1000 * $0.0007 = $0.0016

An application developer makes the following API calls to Amazon Bedrock on an hourly basis: a request to Mistral Large model to summarize an input of 2K tokens of input text to an output of 1K tokens.

Total hourly cost incurred = 2K tokens/1000 * $0.008 1K tokens/1000 * $0.024 = $0.04
Stability AI

On-Demand pricing

An application developer makes the following API calls to Amazon Bedrock: a request to the SDXL model to generate a 512 x 512 image with a step size of 70 (premium quality).

Total cost incurred = 1 image * $0.036 per image = $0.036

An application developer makes the following API calls to Amazon Bedrock: A request to the SDXL 1.0 model to generate a 1024 x 1024 image with a step size of 70 (premium quality).

Total cost incurred = 1 image * $0.08 per image = $0.08

Provisioned Throughput pricing

An application developer buys one model unit of SDXL 1.0 with a 1-month commitment.

Total cost incurred = 1 * $49.86 * 24 hours * 31 days = $37,095.84

Model evaluation

Model evaluation example 1:

On-demand pricing
An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.

The dataset contains 50 prompts, and the developer requires one worker to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter).

There will be 50 tasks in this evaluation job (one task for each prompt-response set per each worker). The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15,000 tokens for Anthropic Claude Instant and 20,000 tokens for Anthropic Claude 2.1.

The following charges are incurred for this model evaluation job:

Item	Number of input tokens	Price per 1000 input tokens	Cost of input	Number of output tokens	Price per 1000 output tokens	Cost of output	Number of human tasks	Price per human task	Cost of human tasks	Total
Claude Instant Inference	5000	$0.0008	$0.004	15000	$0.0024	$0.036				$0.04
Claude 2.1 Inference	5000	$0.008	$0.04	20000	$0.024	$0.48				$0.52
Human Tasks							50	$0.21	$10.50	$10.50
Total										$11.06

Model evaluation example 2:

On-demand pricing
An application developer submits a dataset for human-based model evaluation using Anthropic Claude 2.1 and Anthropic Claude Instant in the US East (N. Virginia) AWS Region.

The dataset contains 50 prompts, and the developer requires two workers to rate each prompt-response set (configurable in the evaluation job creation as “workers per prompt” parameter). There will be 100 tasks in this evaluation job (1 task for each prompt-response set per each worker: 2 workers x 50 prompt-response sets = 100 human tasks).

The 50 prompts combine to 5000 input tokens, and the associated responses combine to 15000 tokens for Anthropic Claude Instant and 20000 tokens for Anthropic Claude 2.1.

The following charges are incurred for this model evaluation job:

Item	Number of input tokens	Price per 1000 input tokens	Cost of input	Number of output tokens	Price per 1000 output tokens	Cost of output	Number of human tasks	Price per human task	Cost of human tasks	Total
Claude Instant Inference	5000	$0.0008	$0.0040	15000	$0.0024	$0.036				$0.04
Claude 2.1 Inference	5000	$0.008	$0.0400	20000	$0.024	$0.48				$0.52
Human Tasks							100	$0.21	$21.00	$21.00
Total										$21.56

Amazon Bedrock Guardrails

Example 1: Customer support chatbot
An application developer creates a customer support chatbot and uses content filters to block harmful content and denied topics to filter undesirable queries and responses.

The chatbot serves 1000 user queries per hour. Each user query has an average input length of 200 characters and receives a FM response of 1500 characters.

Each user query of 200 characters correspond to 1 text unit.

Each FM response of 1,500 characters correspond to 2 text units.

Text units processed each hour = (1 2) * 1000 queries = 3000 text units

Total cost incurred per hour for content filters and denied topics = 3000 * ($0.15 $0.15) / 1000 = $0.90

Example 2: Call center transcript summarization
An application developer creates an application to summarize chat transcripts between users and support agents. It uses sensitive information filter to redact personally identifiable information (PII) in the generated summaries for 10,000 conversations.

Each generated summary has an average of 3,500 characters that corresponds to 4 text units.

Total cost incurred to summarize 10,000 conversations = 10000 * 4 * ($0.1/1000) = $4

Item	Number of input tokens	Price per 1000 input tokens	Cost of input	Number of output tokens	Price per 1000 output tokens	Cost of output	Number of human tasks	Price per human task	Cost of human tasks	Total
Claude Instant Inference	5000	$0.0008	$0.004	15000	$0.0024	$0.036				$0.04
Claude 2.1 Inference	5000	$0.008	$0.04	20000	$0.024	$0.48				$0.52
Human Tasks							100	$0.21	$21.00	$21.00
Total										$21.56

Custom Model Import

Pricing Example: An application developer imports a customized Llama 3.1 type model that is 8B parameter in size with a 128K sequence length in us-east-1 region and deletes the model after 1 month. This requires 2 Custom Model Units. So, the price per minute will be $0.1570 because 2 Custom Model Units are required. The model storage costs for 2 Custom Model Units would be $3.90 for the month.

There is no charge to import the model. The first successful invocation is at 8:03 AM, at which time the metering starts. The 5-minute metering windows are from 8:03 AM - 8:07 AM; 8:07 AM - 8:11 AM, and so on. If there is at least one invocation during any 5-minute period, the window will be considered active for billing. If there is no invocation from 8:07 AM - 8:11 AM, the metering will stop at 8:11 AM. In this case, the bill would be calculated as follows: $0.1570 * 5 minutes * 3 five minute windows = $2.355.
Amazon Bedrock Knowledge Bases

Pricing Example 1 (Reranking using Amazon Rerank 1.0 model)

In a given month, you make 2 million requests to Rerank API using Amazon Rerank 1.0 model – 1 million requests contain fewer than 100 documents each and hence will be charged for one request each. The remaining 1 million requests contain 120-150 documents, and hence each request will be charged for 2 requests.

Price for one request = $0.001
Total charge = 1,000,000 * $0.001 1,000,000*2*$0.001= $3000

Pricing Example 2: (Structured data retrieval)

An application developer creates a support chatbot that queries structured data stored in Amazon Redshift. The developer creates a Bedrock Knowledge Base and connects to Amazon Redshift. The chatbot serves 10000 user queries per hour. Each user query will cost $0.002 per GenerateQuery API to generate SQL from user query.

Total cost incurred for generating SQL per hour = $0.002*10000 = $20.
Total cost incurred in month = $20*24*30 = $1440
Flows

Example: News summarization
An application developer creates a flow to automate news summarization for traders. The flow includes an Input node that takes in an S3 location, and a S3 retrieval node that retrieves 10 files that include articles from 10 major news agency in S3 (2 node transitions). It then uses an iterator node to invoke a model with a prompt node to summarize each file ( 10 files x 2 node transitions). It then collects all the results using a collector node, write the results to S3 using S3 storage node, and complete in an Output node ( 3 node transition). They run this flow every half hour of every week day.

The number of node transition per flow execution is: 2 1 10*2 3 = 25 node transitions/flow execution

The number of flow execution per month is: 24 hours *2* 5 days * 4 weeks = 960 flow executions/month.

Total monthly bill is: 25 * 960 * $0.035/1000 = $0.84

Additional charges
The bill will also include additional charges for AWS services used in the workflow execution, including Amazon S3 usages in the retrieval and storage nodes, and Amazon Bedrock foundation model usage in the prompt node.
Data Automation

Pricing example 1:
Let’s say you process a 1,000 page document using BDA Custom Output. All 1,000 pages are processed using blueprint 1 which has 15 fields. The per page price for any blueprint with 30 fields or less is $0.040. The total cost would be $40.

Total pages processed = 1,000
Price per page for blueprints with less than 30 fields = $0.040
Total charge = 1,000 * $0.040 = $40

Pricing example 2:
Let’s say you process 2 documents using BDA Custom Output. Document 1 has 40 pages and is processed using blueprint 1 which has 20 fields. Document 2 has 10 pages and is processed using blueprint 2, which has 40 fields. The per page price of blueprint 1 is $0.040 since it contains 30 fields or less. The per page price of blueprint 2 is $0.045. The processing cost for Document 1 using blueprint 1 is $1.60. The processing cost for Document 2 using blueprint 2 is $0.45. The total cost of processing both documents would be $2.05.

Total pages processed = 50
Price per page for Blueprint 1 with less than 30 fields = $0.040
Price per page for Blueprint 2 with 40 fields = $0.040 (# of additional fields above 30 *$0.0005 per field)
Number of additional fields above 30 = 40 - 30 = 10
Price per page for Blueprint 2 with 40 fields = $0.040 (10 *$0.0005 per field) = $0.045
Charge for Document 1 using Blueprint 1 = 40 pages x $0.040 per page = $1.6
Charge for Document 2 using Blueprint 2 = 10 pages x $0.045 per page = $0.45
Total charge = Charge for Document 1 Charge for Document 2 = $1.6 $0.45 = $2.05

Pricing example 3:
Let’s say you process a 60 minute video using BDA Standard Output. The per minute price for video standard output is $0.050. The total cost would be $3.00.

Total minutes processed = 60
Price per minute for video standard output = $0.050
Total charge = 60 * $0.050 = $3.00

Pricing example 4:
Let’s say you process 2,000 images using BDA Custom Output. The first 1,000 images are processed using blueprint 1, which has 10 fields. The last 1,000 pages are processed using blueprint 2, which has 40 fields. The per image price for blueprint 1 is $0.005, since it contains 30 fields or less. The per image price of blueprint 2 is $0.01. The processing cost for the first 1,000 images using blueprint 1 is $5.00. The processing cost for the second 1,000 images using blueprint 2 is $10.00. The total cost of processing all 2,000 images would be $15.00

Cost for first 1000 images = 1,000 images * $0.005 per image = $5.00
Cost for second 1,000 images = 1,000 images * ($0.005 (# of additional fields above 30 *$0.0005 per field))
= 1,000 * ($0.005 ((40-30)*$0.0005))
= 1,000 * ($0.005 (10*$0.0005)) = $10.00
Total cost = $5.00 $10.00 = $15.00

Pricing example 5:
Let’s assume that you want to use Bedrock Data Automation Standard Output to process 15,000 minutes of meeting audio recordings in your organization. The total cost of processing all 15,000 audio minutes would be $90.

Total minutes processed = 15,000 minutes
Total charge = 15,000 min × $0.006 = $90

Pricing Example 6:
Let’s say you setup Bedrock Knowledge Bases to use Bedrock Data Automation as a parser and then ingest a 1000 page document. Note, that the Bedrock Knowledge Bases and Bedrock Data Automation integration uses standard output. The per page price for standard output is $0.010. The total cost would be $10.

Total pages processed = 1,000
Price per page for standard output = $0.010
Total charge = 1,000 * $0.010 = $10

Amazon Bedrock Pricing

Pricing overview

Pricing Models

On Demand and Batch

Latency Optimized (Public Preview)

Provisioned Throughput

Custom Model Import

Marketplace models

Customization and Optimization

Model Customization

Model Distillation

Prompt Caching

Tools

Guardrails

Knowledge Bases and Data Automation

Agents

Flows

Evaluations

Pricing details

AI21 Labs

Amazon Nova

Pricing for Understanding Models

Pricing for Creative Content Generation models

Amazon Titan

Anthropic

Cohere

Meta Llama

Mistral AI

Stability AI

Custom Model Import

Pricing tools (details)

Amazon Bedrock Flows

Amazon Bedrock Guardrails

Model Evaluation

Data Automation

Pricing examples

AI21 labs

Amazon

On-Demand pricing

Customization (fine-tuning and continued pretraining) pricing

Provisioned Throughput pricing

Anthropic

On-Demand pricing

Provisioned Throughput pricing

Cohere

On-Demand pricing

Customization (fine-tuning) pricing

Meta Llama

On-Demand pricing

Customization (fine-tuning) pricing

Provisioned Throughput pricing

Mistral AI

On-Demand pricing

Stability AI

On-Demand pricing

Provisioned Throughput pricing

Model evaluation

Model evaluation example 1:

Model evaluation example 2:

Amazon Bedrock Guardrails

Custom Model Import

Amazon Bedrock Knowledge Bases

Flows

Data Automation

Ending Support for Internet Explorer