SelfReason on-device API
For apps that want the fastest path and the strongest privacy posture, the local API runs directly against SelfReason-compatible on-device models. That makes it a strong fit for classification, rewriting, moderation, summarization, and lightweight agent tasks that should stay on the device.
What local execution is good at
- Fast text classification and tagging
- Draft rewriting for tone, clarity, and length
- Summaries, highlights, and TL;DR generation
- Language detection and lightweight translation flows
- Safety prechecks, redaction, and policy hints before anything leaves the device
- Offline or unstable-network workflows
How local execution works
- Download the model bundle during setup or idle time.
- Bind the runtime to browser or app APIs.
- Run inference locally with no extra network hop.
- Stream tokens or structured results back to the caller.
Local-first design principles
- Privacy first: keep sensitive text on-device whenever possible
- Fast response times: remove network latency from the common path
- Progressive enhancement: degrade to lighter heuristics when a model is unavailable
- Sandboxed execution: keep memory and capability boundaries tight
- Explicit escalation: let the gateway decide when a request should leave the local path
Where the local API fits best
Great matches
- Intent classification before tool execution
- UI rewriting, inline drafting, and response cleanup
- Article, chat, and ticket summarization
- Personal workflows that must work offline
- Mobile or extension experiences where privacy is a product feature
Less ideal matches
- Very large prompts that exceed the local context window
- Heavy multi-step reasoning that benefits from larger cloud models
- Tasks that require centralized audit trails or strict hosted governance
Deployment notes
- Web: lazy-load models after first idle and cache with a versioned checksum
- Desktop: ship a warm local runtime for lower cold-start cost
- Android: deliver larger model assets as optional downloads and verify hashes before activation
- Extension: persist model assets carefully and validate integrity after updates
Local observability
- Token usage per request and per session
- Warm start versus cold start latency
- Guardrail trigger counts and categories
- Route reason codes when a request leaves the local path
Common local error classes
MODEL_UNAVAILABLE: the model is missing or not ready yetLIMIT_CONTEXT: the prompt is too large for the current local windowSAFETY_BLOCK: local guardrails flagged the output or intermediate contentDEVICE_CAPACITY: the current device cannot safely run the selected model tier
Performance tips
- Preload during idle time instead of waiting for the first user action
- Stream tokens early to improve perceived responsiveness
- Chunk large documents and summarize in stages
- Compress older context before it crowds out the active window
- Keep frequently used models warm ahead of peak usage periods
Related docs
- Read SelfReason AI Gateway for routing, fallbacks, and the shared contract
- Explore SelfReason for the product experience built on the same local-first ideas
Need more than local execution alone? Pair this API with the gateway layer when you need cloud escalation, provider redundancy, or stricter orchestration controls.
