OpenAI-Compatible API Reference

WaddleAI provides a fully compatible OpenAI API that can be used as a drop-in replacement for OpenAI's API. All requests include additional WaddleAI features like security scanning, token management, and routing.

Base URL

https://your-waddleai-proxy.com/v1

Authentication

Use your WaddleAI API key in the Authorization header:

Authorization: Bearer wa-your-api-key-here

Chat Completions

POST /v1/chat/completions

Create a chat completion response. Identical to OpenAI's API with additional WaddleAI features.

Request

curl https://your-waddleai-proxy.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wa-your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "temperature": 0.7,
    "max_tokens": 150
  }'

Request Body Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model to use (e.g., "gpt-4", "claude-3-opus", "llama2")
`messages`	array	Yes	Array of message objects
`temperature`	number	No	Sampling temperature (0-2)
`max_tokens`	integer	No	Maximum tokens to generate
`top_p`	number	No	Nucleus sampling parameter
`frequency_penalty`	number	No	Frequency penalty (-2 to 2)
`presence_penalty`	number	No	Presence penalty (-2 to 2)
`stop`	string/array	No	Stop sequences
`stream`	boolean	No	Whether to stream responses

WaddleAI-Specific Headers

Header	Description
`X-WaddleAI-Route`	Force routing to specific provider (e.g., "openai", "anthropic")
`X-WaddleAI-Memory`	Enable conversation memory with session ID
`X-WaddleAI-Security`	Override security policy ("strict", "balanced", "permissive")

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1699896916,
  "model": "gpt-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm doing well, thank you for asking. How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 19,
    "total_tokens": 31,
    "waddleai_tokens": 8
  },
  "waddleai": {
    "provider": "openai",
    "model_used": "gpt-4",
    "security_passed": true,
    "routing_rule": "default",
    "cost_waddleai": 8,
    "cost_usd": 0.008
  }
}

Error Responses

{
  "error": {
    "type": "quota_exceeded",
    "message": "Daily token quota exceeded",
    "code": "quota_exceeded",
    "details": {
      "daily_used": 10000,
      "daily_limit": 10000,
      "monthly_used": 50000,
      "monthly_limit": 100000
    }
  }
}

Streaming Responses

Set "stream": true to receive server-sent events:

curl https://your-waddleai-proxy.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wa-your-api-key" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Response:

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1699896916,"model":"gpt-4","choices":[{"index":0,"delta":{"role":"assistant","content":"1"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","created":1699896916,"model":"gpt-4","choices":[{"index":0,"delta":{"content":", 2"},"finish_reason":null}]}

...

data: [DONE]

Models

GET /v1/models

List available models across all configured providers.

Request

curl https://your-waddleai-proxy.com/v1/models \
  -H "Authorization: Bearer wa-your-api-key"

Response

{
  "object": "list",
  "data": [
    {
      "id": "gpt-4",
      "object": "model",
      "created": 1699896916,
      "owned_by": "openai",
      "provider": "openai",
      "capabilities": ["chat", "completion"],
      "context_length": 8192,
      "cost_per_waddleai_token": 0.001
    },
    {
      "id": "claude-3-opus",
      "object": "model", 
      "created": 1699896916,
      "owned_by": "anthropic",
      "provider": "anthropic",
      "capabilities": ["chat"],
      "context_length": 200000,
      "cost_per_waddleai_token": 0.0015
    },
    {
      "id": "llama2",
      "object": "model",
      "created": 1699896916,
      "owned_by": "meta",
      "provider": "ollama",
      "capabilities": ["chat", "completion"],
      "context_length": 4096,
      "cost_per_waddleai_token": 0.0001
    }
  ]
}

Completions (Legacy)

POST /v1/completions

Generate text completions (legacy endpoint, chat completions recommended).

Request

curl https://your-waddleai-proxy.com/v1/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wa-your-api-key" \
  -d '{
    "model": "gpt-3.5-turbo",
    "prompt": "Once upon a time",
    "max_tokens": 100,
    "temperature": 0.7
  }'

Response

{
  "id": "cmpl-abc123",
  "object": "text_completion",
  "created": 1699896916,
  "model": "gpt-3.5-turbo",
  "choices": [
    {
      "text": " there was a small village nestled in the mountains...",
      "index": 0,
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 4,
    "completion_tokens": 100,
    "total_tokens": 104,
    "waddleai_tokens": 12
  }
}

Embeddings

POST /v1/embeddings

Create embeddings for text inputs (if supported by target model).

Request

curl https://your-waddleai-proxy.com/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer wa-your-api-key" \
  -d '{
    "model": "text-embedding-ada-002",
    "input": "The food was delicious and the waiter was friendly."
  }'

Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "embedding": [0.0023064255, -0.009327292, ...],
      "index": 0
    }
  ],
  "model": "text-embedding-ada-002",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8,
    "waddleai_tokens": 2
  }
}

WaddleAI Extensions

Usage Information

Get current usage and quota information:

GET /api/usage

curl https://your-waddleai-proxy.com/api/usage \
  -H "Authorization: Bearer wa-your-api-key"

Response:

{
  "total_waddleai_tokens": 1500,
  "total_llm_input_tokens": 8000,
  "total_llm_output_tokens": 4000,
  "total_requests": 45,
  "llm_breakdown": {
    "openai_gpt4": {"input": 5000, "output": 2500},
    "anthropic_claude": {"input": 2000, "output": 1000},
    "ollama_llama2": {"input": 1000, "output": 500}
  },
  "daily_usage": {
    "2024-01-15": {"waddleai_tokens": 500, "requests": 15},
    "2024-01-14": {"waddleai_tokens": 750, "requests": 20}
  }
}

GET /api/quota

curl https://your-waddleai-proxy.com/api/quota \
  -H "Authorization: Bearer wa-your-api-key"

Response:

{
  "quota_ok": true,
  "daily": {
    "used": 1200,
    "limit": 10000,
    "remaining": 8800,
    "ok": true
  },
  "monthly": {
    "used": 15000,
    "limit": 100000,
    "remaining": 85000,
    "ok": true
  }
}

Security Alerts

Get recent security alerts (if you have appropriate permissions):

GET /api/security/threats

curl https://your-waddleai-proxy.com/api/security/threats \
  -H "Authorization: Bearer wa-your-api-key"

Response:

{
  "recent_threats": [
    {
      "timestamp": "2024-01-15T10:30:00Z",
      "threat_type": "prompt_injection",
      "severity": "high",
      "blocked": true,
      "description": "Detected instruction override attempt"
    }
  ],
  "stats": {
    "last_24h": {
      "total_threats": 3,
      "blocked": 3,
      "allowed": 0
    }
  }
}

Rate Limits

WaddleAI enforces multiple types of limits:

Limit Type	Default	Description
Requests per minute	60	API calls per minute
Daily tokens	10,000	WaddleAI tokens per day
Monthly tokens	100,000	WaddleAI tokens per month

Rate limit information is included in response headers:

X-RateLimit-Limit-RPM: 60
X-RateLimit-Remaining-RPM: 45
X-RateLimit-Reset-RPM: 1699896976
X-RateLimit-Limit-Daily: 10000
X-RateLimit-Remaining-Daily: 8800

Error Codes

Code	Type	Description
400	`invalid_request`	Invalid request format
400	`security_blocked`	Request blocked by security scanning
401	`invalid_api_key`	Invalid or expired API key
403	`insufficient_permissions`	Insufficient permissions
429	`rate_limit_exceeded`	Rate limit exceeded
429	`quota_exceeded`	Token quota exceeded
500	`server_error`	Internal server error
502	`provider_error`	Upstream LLM provider error
503	`service_unavailable`	Service temporarily unavailable

Best Practices

Authentication

Store API keys securely in environment variables
Use different keys for different environments
Rotate keys regularly

Error Handling

import openai
from openai import OpenAIError

try:
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello"}]
    )
except openai.RateLimitError as e:
    # Handle quota/rate limit exceeded
    print(f"Rate limited: {e}")
    # Implement exponential backoff
except openai.APIError as e:
    # Handle API errors
    print(f"API error: {e}")

Performance

Use connection pooling for high-volume applications
Implement request caching where appropriate
Monitor usage patterns and optimize model selection

Cost Optimization

Choose appropriate models for each task
Monitor WaddleAI token consumption
Use cheaper models for simple tasks
Implement usage budgets and alerts

For more advanced features, see the Claude Integration guide which covers the Management API.