Envoy AI Gateway v0.3.x
v0.3.0
✨ New Features
Endpoint Picker Provider (EPP) Integration
Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
Support for both HTTPRoute + InferencePool
and AIGatewayRoute + InferencePool
integration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.
Expanded Provider Ecosystem
Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.
Observability Enhancements
Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →
Infrastructure and Configuration
Added a new modelNameOverride
field in the backendRef
of AIGatewayRoute
, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
Enhanced Gateway resource management by allowing both standard HTTPRoute
and AIGatewayRoute
to be attached to the same Gateway
object. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.
🔗 API Updates
- BackendSecurityPolicy TargetRefs: Added
targetRefs
field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns. - Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
- modelNameOverride in the backend reference of AIGatewayRoute: Added
modelNameOverride
field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.
Deprecations
backendSecurityPolicyRef
Pattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.AIGatewayRoute
'stargetRefs
Pattern: ThetargetRefs
pattern is no longer supported forAIGatewayRoute
. Existing configurations will continue to work but should be migrated toparentRefs
.AIGatewayRoute
'sschema
Field: Theschema
field is no longer needed forAIGatewayRoute
. Existing configurations will continue to work but should be removed before v0.4.controller.envoyGatewayNamespace
helm value is no longer necessary: This value is no longer necessary and is redundant when configured.controller.podEnv
helm value will be removed: Usecontroller.extraEnvVars
instead. Thecontroller.podEnv
value will be removed in v0.4.
📖 Upgrade Guidance
For users upgrading from v0.2.x to v0.3.0:
- Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.
- Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.
--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
hooks:
xdsTranslator:
+ translation:
+ listener:
+ includeAll: true
+ route:
+ includeAll: true
+ cluster:
+ includeAll: true
+ secret:
+ includeAll: true
post:
- - VirtualHost
- Translation
+ - Cluster
+ - Route - Migrate Gateway target references - Update from the deprecated
AIGatewayRoute.targetRefs
pattern to the newAIGatewayRoute.parentRefs
approach after the upgrade to v0.3.0. - Migrate backendSecurityPolicy references - Update from the deprecated
AIServiceBackend.backendSecurityPolicyRef
pattern to the newBackendSecurityPolicy.targetRefs
approach after the upgrade to v0.3.0. - Remove AIGatewayRoute.schema - remove the
schema
field fromAIGatewayRoute
resources after the upgrade to v0.3.0, as it is no longer used.
📦 Dependencies Versions
Updated to latest Go version for improved performance and security.
Built on Envoy Gateway for proven data plane capabilities.
Leveraging Envoy Proxy's battle-tested networking capabilities.
Support for latest Gateway API specifications.
Integration with Gateway API Inference Extension for intelligent endpoint selection.
⏩ Patch Releases
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.
🔮 What's Next (beyond v0.3)
We're already working on exciting features for future releases:
- Support for MCP Protocol - Handle routing for MCP requests on the Envoy AI Gateway
- Restore support for referencing Kubernetes Services in AIServiceBackend to enable seamless integration with Kubernetes-native backends
- What else do you want to see? Get involved and open an issue and let us know!