Back to Wiki
Development
Last updated: 2024-12-26•6 min read
Deployment
Deploying AI agents in production
Deployment
Moving an AI agent from a local Jupyter notebook to a scalable production environment brings a unique set of challenges.
Deployment Architectures
1. Stateless Microservices
The most common pattern. The agent logic runs in a container (Docker) and exposes an HTTP endpoint (REST or GraphQL).
- Pros: Easy to scale horizontally behind a load balancer.
- Cons: Managing state (conversation history) requires an external database (Redis/Postgres).
2. Stateful WebSocket Servers
For real-time, streaming interactions, WebSockets are preferred over HTTP.
- Pros: Lower latency, supports streaming tokens effectively.
- Cons: Harder to scale (sticky sessions needed), connection management is complex.
3. Edge Deployment
Running smaller models directly on the user's device (browser or mobile).
- Model: TensorFlow.js, ONNX Runtime, or specialized mobile models (e.g., Gemma 2B).
- Pros: Zero latency, works offline, better privacy.
- Cons: Limited model capability, drains battery.
Key Considerations
Streaming
Users hate waiting 5 seconds for a full paragraph to appear.
- Token Streaming: Send each chunk of text as it is generated. This creates a perception of instant response.
- Protocol: Server-Sent Events (SSE) is often easier than WebSockets for simple one-way streaming.
Rate Limiting & Cost Control
LLM APIs are expensive.
- User Quotas: Limit requests per user/day.
- Caching: Semantic caching (e.g., GPTCache) to store responses for similar queries and save costs.
Security
- Prompt Injection: Ensure specific firewalls (like Reask or Lakera) are in place to detect malicious inputs.
- Data Leakage: Ensure the agent doesn't output sensitive data (PII) from its training or context.
CI/CD for Agents
- Code Change: Developer commits new prompt or logic.
- Unit Tests: Run code assertions.
- Eval Run: Run a subset of "Golden Dataset" queries through the agent.
- Gate: If generic evaluation score drops > 5%, block deployment.
- Deploy: Push to staging, then production.
Development
Quick Navigation
Article Info
Category:Development
Last Updated:2024-12-26
Read Time:6 min read
Related Articles:3