How to securely connect ADK agents to models on Cloud Run

minherz1 pts1 comments

How to Securely Connect ADK Agents to Models on Cloud Run - minherz: another techno-blog

The Agent Development Kit (ADK) simplifies authentication for agents and tools, but is more challenging with the LiteLLM connector when accessing models hosted on Cloud Run. This guide explores how to acquire Google-signed OpenID (ID) tokens and inject them into the LiteLLM communication channel using ADK.

Google Cloud Run provides a robust, built-in access control mechanism based on enforced authentication and IAM policies. When it is enabled, only calls that are made by authenticated accounts which have the specific Cloud Run Invoker role, are accepted, protecting your service from unauthorized invocations.

To implement the authenticated call you have to implement the following steps:

Acquire credentials either by implementing a sign-in process or using Application Default Credentials

Fetch an ID token from a user or workload identity

Use the identity token as a bearer token in the HTTP Authorization header when you call Cloud Run endpoint

Agent Development Kit (ADK) greatly simplifies implementation of these steps for you. When your agent calls another agent or an MCP (Model Context Protocol) server that runs on Cloud Run, the framework handles discovering the application’s default credentials, fetching the token, and injecting the token into each call between agents and calls from agents to remote MCP tools. The framework also implements token refreshing when the current token is expired. As a developer, you don’t need to implement anything beyond configuring your remote MCP server or agent objects.

The situation is different when you configure an ADK agent to use a model that is deployed in Cloud Run. ADK provides a LiteLLM connector that allows an agent to use non-Gemini models hosted at remote endpoints. However, you will need to take care of making authenticated calls to the model yourself. How can you do that?

Method 1: Static header

The LiteLLM connector uses the litellm Python package to call the remote endpoints exposing Ollama, vLLM and other LLM engines. The package supports passing custom HTTP headers in the calls to LiteLLM APIs like acompletion, using the external_headers parameter, which is set to the map of header’s names and values.

from google.adk.agents import Agent<br>from google.adk.models import LiteLlm<br>import google.auth<br>import google.auth.transport.requests<br>from google.oauth2 import id_token

# obtains a Google-signed ID token for the given audience (Cloud Run service URL).<br>aud = "https://model-123456789012.us-central1.run.app"<br>creds, _ = google.auth.default()<br>auth_req = google.auth.transport.requests.Request()<br>creds.refresh(auth_req)<br>if hasattr(creds, "id_token") and creds.id_token:<br>token = creds.id_token<br>else:<br>token = id_token.fetch_id_token(auth_req, aud)

# set up the model<br>model = LiteLlm(<br>model=f"ollama_chat/gemma3:270m",<br>api_base=aud,<br>extra_headers={<br>"Authorization": f"Bearer {token}",<br>},

agent = Agent(<br>name="content_builder",<br>model=model,<br>instruction="agent system instructions",

This approach works as long as the token is valid. Once the fetched token is expired the calls to the model will fail with the &ldquo;HTTP 401 Unauthorized&rdquo; status code because Cloud Run will block calls made with the expired token. This method would fit for agents deployed on Cloud Run with scale to zero configuration and having a low invocation frequency. In this deployment pattern the agent will be frequently restarted which will lead to fetching a new authentication token.

Method 2: Dynamic token injection

When the agent is expected to run continuously, the token has to be refreshed upon receiving an error. To implement this, you would need to extend the LiteLLMClient class of ADK:

import os<br>from typing import Any, Optional, Union<br>from fastapi import HTTPException<br>from google.adk.models.lite_llm import LiteLLMClient<br>import google.auth<br>import google.auth.transport.requests<br>from google.oauth2 import id_token<br>from litellm.exceptions import AuthenticationError<br>from litellm import CustomStreamWrapper, ModelResponse

_creds, _ = google.auth.default()

def _get_auth_token(aud: str) -> Optional[str]:<br>"""<br>Obtains a Google-signed ID token for the given audience (Cloud Run service URL).<br>"""

try:<br>auth_req = google.auth.transport.requests.Request()<br># support using user credentials for local development<br>_creds.refresh(auth_req)<br>if hasattr(_creds, "id_token") and _creds.id_token:<br>return _creds.id_token<br># fetch token for service account credentials<br>fetched_id_token = id_token.fetch_id_token(auth_req, aud)<br>return fetched_id_token<br>except Exception as e:<br>print(f"Error obtaining ID token for {aud}: {e}")<br>return None

class LiteLLMClientEx(LiteLLMClient):<br>"""<br>Overrides the LiteLLMClient to inject a bearer token into the request headers.<br>"""

def __init__(self, audience: str, **data: Any) -> None:<br>self.token: Optional[str] = None<br>self.aud = audience<br>super().__init__(**data)

async def...

token google import agent cloud from

Related Articles