Skip to main content

Envoy AI Gateway v0.3.x

Release version introducing intelligent inference routing with Endpoint Picker Provider,enhanced observability features, Google Vertex AI support, and enhanced provider integrations.

v0.3.0

August 16, 2025
Endpoint Picker SupportGoogle Vertex AIExpanded Provider EcosystemArchitecture ImprovementsInferencePool SupportGateway API Inference ExtensionOpenInference TracingModel Name VirtualizationProduction Ready ProvidersDynamic Load BalancingConfigurable MetricsOpenInference Tracing
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.

✨ New Features

Endpoint Picker Provider (EPP) Integration

Gateway API Inference Extension Support

Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.

Dual Integration Approaches

Support for both HTTPRoute + InferencePool and AIGatewayRoute + InferencePool integration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.

Dynamic Load Balancing

Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.

Extensible Architecture

Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.

Expanded Provider Ecosystem

Google Vertex AI Production Support

Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →

Anthropic on Vertex AI Integration

Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.

Enhanced Gemini Capabilities

Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.

Strengthened OpenAI-Compatible Ecosystem

Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.

Observability Enhancements

OpenInference Tracing Support

Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →

Configurable Metrics Labels

Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.

Embeddings Metrics Support

Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.

Enhanced GenAI Metrics

Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →

Infrastructure and Configuration

Model Name Virtualization

Added a new modelNameOverride field in the backendRef of AIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →

Unified Gateway Support

Enhanced Gateway resource management by allowing both standard HTTPRoute and AIGatewayRoute to be attached to the same Gateway object. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.

🔗 API Updates

  • BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
  • Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
  • modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.

Deprecations

  • backendSecurityPolicyRef Pattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.
  • AIGatewayRoute's targetRefs Pattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.
  • AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.
  • controller.envoyGatewayNamespace helm value is no longer necessary: This value is no longer necessary and is redundant when configured.
  • controller.podEnv helm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.

📖 Upgrade Guidance

For users upgrading from v0.2.x to v0.3.0:

  1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.
  2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.
    --- a/manifests/envoy-gateway-config/config.yaml
    +++ b/manifests/envoy-gateway-config/config.yaml
    @@ -43,9 +43,19 @@ data:
    extensionManager:
    hooks:
    xdsTranslator:
    + translation:
    + listener:
    + includeAll: true
    + route:
    + includeAll: true
    + cluster:
    + includeAll: true
    + secret:
    + includeAll: true
    post:
    - - VirtualHost
    - Translation
    + - Cluster
    + - Route
  3. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.
  4. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.
  5. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.

📦 Dependencies Versions

Go 1.24.6

Updated to latest Go version for improved performance and security.

Envoy Gateway v1.5

Built on Envoy Gateway for proven data plane capabilities.

Envoy v1.35

Leveraging Envoy Proxy's battle-tested networking capabilities.

Gateway API v1.3.1

Support for latest Gateway API specifications.

Gateway API Inference Extension v0.5.1

Integration with Gateway API Inference Extension for intelligent endpoint selection.

⏩ Patch Releases

🙏 Acknowledgements

This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.

The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.

🔮 What's Next (beyond v0.3)

We're already working on exciting features for future releases:

  • Support for MCP Protocol - Handle routing for MCP requests on the Envoy AI Gateway
  • Restore support for referencing Kubernetes Services in AIServiceBackend to enable seamless integration with Kubernetes-native backends
  • What else do you want to see? Get involved and open an issue and let us know!