SelfReason on-device API

For apps that want the fastest path and the strongest privacy posture, the local API runs directly against SelfReason-compatible on-device models. That makes it a strong fit for classification, rewriting, moderation, summarization, and lightweight agent tasks that should stay on the device.

What local execution is good at

  • Fast text classification and tagging
  • Draft rewriting for tone, clarity, and length
  • Summaries, highlights, and TL;DR generation
  • Language detection and lightweight translation flows
  • Safety prechecks, redaction, and policy hints before anything leaves the device
  • Offline or unstable-network workflows

How local execution works

  1. Download the model bundle during setup or idle time.
  2. Bind the runtime to browser or app APIs.
  3. Run inference locally with no extra network hop.
  4. Stream tokens or structured results back to the caller.

Local-first design principles

  • Privacy first: keep sensitive text on-device whenever possible
  • Fast response times: remove network latency from the common path
  • Progressive enhancement: degrade to lighter heuristics when a model is unavailable
  • Sandboxed execution: keep memory and capability boundaries tight
  • Explicit escalation: let the gateway decide when a request should leave the local path

Where the local API fits best

Great matches

  • Intent classification before tool execution
  • UI rewriting, inline drafting, and response cleanup
  • Article, chat, and ticket summarization
  • Personal workflows that must work offline
  • Mobile or extension experiences where privacy is a product feature

Less ideal matches

  • Very large prompts that exceed the local context window
  • Heavy multi-step reasoning that benefits from larger cloud models
  • Tasks that require centralized audit trails or strict hosted governance

Deployment notes

  • Web: lazy-load models after first idle and cache with a versioned checksum
  • Desktop: ship a warm local runtime for lower cold-start cost
  • Android: deliver larger model assets as optional downloads and verify hashes before activation
  • Extension: persist model assets carefully and validate integrity after updates

Local observability

  • Token usage per request and per session
  • Warm start versus cold start latency
  • Guardrail trigger counts and categories
  • Route reason codes when a request leaves the local path

Common local error classes

  • MODEL_UNAVAILABLE: the model is missing or not ready yet
  • LIMIT_CONTEXT: the prompt is too large for the current local window
  • SAFETY_BLOCK: local guardrails flagged the output or intermediate content
  • DEVICE_CAPACITY: the current device cannot safely run the selected model tier

Performance tips

  • Preload during idle time instead of waiting for the first user action
  • Stream tokens early to improve perceived responsiveness
  • Chunk large documents and summarize in stages
  • Compress older context before it crowds out the active window
  • Keep frequently used models warm ahead of peak usage periods
  • Read SelfReason AI Gateway for routing, fallbacks, and the shared contract
  • Explore SelfReason for the product experience built on the same local-first ideas

Need more than local execution alone? Pair this API with the gateway layer when you need cloud escalation, provider redundancy, or stricter orchestration controls.