Skip to main content

File Search in the Responses API

LiteLLM now supports file_search in the Responses API across both:

  • providers that support it natively (like OpenAI / Azure), and
  • providers that do not (like Anthropic, Bedrock, and other non-native providers) via emulation.

What this is​

file_search lets models retrieve grounded context from your vector stores and answer with citations. LiteLLM keeps one OpenAI-compatible output shape while routing requests through either native passthrough or an emulated fallback.

Two paths are covered:

PathWhen it runsWhat LiteLLM does
Native passthroughProvider natively supports file_search (OpenAI, Azure)Decodes unified vector store ID → forwards to provider as-is
Emulated fallbackProvider doesn't support file_search (Anthropic, Bedrock, etc.)Converts to a function tool → intercepts tool call → runs vector search → synthesizes OpenAI-format output

In tools[].vector_store_ids, LiteLLM accepts both provider-native IDs (e.g. vs_...) and managed vector store unified IDs (URL-safe base64 strings from the proxy managed-vector flow), e.g. litellm.responses(..., tools=[{"type": "file_search", "vector_store_ids": ["bGl0ZWxsbV9wcm94eT..."]}]).

Usage​

1. Setup config.yaml​

config.yaml
model_list:
- model_name: gpt-4.1
litellm_params:
model: openai/gpt-4.1
api_key: os.environ/OPENAI_API_KEY

- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY

2. Start the proxy​

litellm --config config.yaml
Proxy call
from openai import OpenAI

client = OpenAI(base_url="http://localhost:4000", api_key="sk-your-proxy-key")

response = client.responses.create(
model="claude-sonnet", # swap to "gpt-4.1" for native path
input="What does LiteLLM support?",
tools=[{
"type": "file_search",
"vector_store_ids": ["vs_abc123"]
}],
include=["file_search_call.results"],
)

print(response.output)

Behavior Matrix​

PathSDK modelProxy modelBehavior
Native passthroughopenai/gpt-4.1gpt-4.1Provider executes native file_search
Emulated fallbackanthropic/claude-sonnet-4-5claude-sonnetLiteLLM converts to function tool and synthesizes OpenAI-format output

Architecture Diagram​

Prerequisites​

pip install 'litellm[proxy]'
export OPENAI_API_KEY="sk-..." # for native path
export ANTHROPIC_API_KEY="sk-ant-..." # for emulated path

Example response shape​

Validating the Output Format​

Regardless of which path ran, the response always follows the OpenAI Responses API format:

{
"output": [
{
"type": "file_search_call",
"id": "fs_abc123",
"status": "completed",
"queries": ["What does LiteLLM support?"],
"search_results": null
},
{
"type": "message",
"role": "assistant",
"content": [
{
"type": "output_text",
"text": "LiteLLM is a unified interface...",
"annotations": [
{
"type": "file_citation",
"index": 150,
"file_id": "file-xxxx",
"filename": "knowledge.txt"
}
]
}
]
}
]
}

Validation script:

Validate response structure
def validate_file_search_response(response):
"""Assert that response follows OpenAI file_search output format."""
output = response.output
assert len(output) >= 2, "Expected at least 2 output items"

# First item: file_search_call
fs_call = output[0]
fs_type = fs_call["type"] if isinstance(fs_call, dict) else fs_call.type
assert fs_type == "file_search_call", f"Expected file_search_call, got {fs_type}"

fs_status = fs_call["status"] if isinstance(fs_call, dict) else fs_call.status
assert fs_status == "completed"

# Second item: message
msg = output[1]
msg_type = msg["type"] if isinstance(msg, dict) else msg.type
assert msg_type == "message"

content = msg["content"] if isinstance(msg, dict) else msg.content
assert len(content) > 0
text_block = content[0]
text = text_block["text"] if isinstance(text_block, dict) else text_block.text
assert isinstance(text, str) and len(text) > 0

print("✅ Response structure valid")
print(f" Queries: {fs_call['queries'] if isinstance(fs_call, dict) else fs_call.queries}")
print(f" Answer length: {len(text)} chars")
annotations = text_block["annotations"] if isinstance(text_block, dict) else text_block.annotations
print(f" Citations: {len(annotations)}")

validate_file_search_response(response)

Q&A​

  • Why do I see UnsupportedParamsError? This usually means file_search was passed to a provider that does not support it natively and emulation could not route correctly. Check:
    • The model string is valid (for example, anthropic/claude-sonnet-4-5).
    • custom_llm_provider resolves correctly so LiteLLM can load the provider config.
  • Why does vector search return no results? Common causes:
    • The vector store ID is wrong or has no files attached.
    • In LiteLLM-managed stores, file ingestion is not complete (status != completed).
    • The query is too narrow; try a broader query.
  • Why am I getting 403 Access denied on vector store calls? The caller does not have access to that vector store.
    • The store may belong to another team.
    • Use an admin/proxy key if your setup requires cross-team access.
  • Why are annotations empty in emulated mode? file_citation annotations require file_id metadata in search results. If your vector backend does not return file-level metadata, the answer text is still generated but citations can be empty.

What to check next​