SelfReason AI gateway

Build once, then route each request through the best execution path for the job: SelfReason on-device models for privacy and speed, cloud models for larger context or deeper reasoning, and automatic fallbacks when reliability matters more than provider loyalty.

Why this gateway exists

  • One OpenAI-compatible surface instead of separate local and cloud integrations
  • Local-first routing so private and latency-sensitive requests stay close to the user
  • We use this same gateway in our client apps across Web, Desktop, Android, extensions, and bots
  • Shared sessions, prompts, tools, and schemas across Web, Desktop, Android, extensions, and bots
  • Fallback policies so one model outage does not become your product outage
  • A cleaner path for products already built around SelfReason workflows

What you get

Flexible routing

  • local: force SelfReason's on-device runtime
  • auto: try local first, then escalate only when limits or quality thresholds require it
  • cloud: go directly to hosted models for the largest context windows or strongest reasoning
  • fallback: define backup models or providers for reliability-sensitive flows

One contract across execution modes

  • OpenAI-compatible chat completion patterns
  • Real-time streaming for chat, summarization, and agent UX
  • Tool calling for workflow actions, database lookups, and API orchestration
  • Structured outputs for typed JSON responses
  • Shared observability for latency, safety actions, and route decisions

A practical integration shape

The gateway is designed to feel familiar if you already use modern AI SDK patterns. The exact base URL, models, and policy fields depend on your deployment, but the request pattern stays consistent:

from selfreason import AIClient

client = AIClient(
	api_key='YOUR_SELFREASON_KEY',
	base_url='https://YOUR_GATEWAY_BASE/v1'
)

stream = client.responses.stream(
	model='auto',
	messages=[
		{'role': 'system', 'content': 'You are a concise assistant.'},
		{'role': 'user', 'content': 'Summarize this ticket and suggest next steps.'},
	],
	route='auto',
	fallbacks=[
		{'model': 'cloud-balanced'}
	],
	metadata={
		'app': 'support-dashboard'
	}
)

for event in stream:
	if event.type == 'token':
		print(event.text, end='')

Why it differs from a generic LLM proxy

  • The gateway is designed around SelfReason's local runtime, not just cloud-to-cloud rerouting
  • You can keep one application contract even when requests move between device and cloud
  • Route decisions can factor prompt size, safety posture, device capability, energy state, and user quality overrides
  • It fits naturally with SelfReason sessions, prompts, and browser-native tooling instead of bolting on another isolated proxy layer

Pick the right path

  • Start here if you need routing, fallbacks, streaming, and a shared app contract
  • Read SelfReason on-device API if your main concern is on-device inference, offline behavior, and model warm-up strategy
  • Explore SelfReason if you want to see the local runtime in a product-facing form

Build with confidence

  • Keep privacy-sensitive work local by default
  • Escalate only when larger context or stronger reasoning is worth the tradeoff
  • Use fallbacks to turn provider outages into policy decisions instead of user-visible failures

Need a capability or model that is not covered yet? Open a ticket and help shape the SelfReason AI Gateway roadmap.