Configuration Reference¶
Complete reference for all configuration options
Environment Variables¶
All configuration is managed through environment variables loaded from .env file.
Core Settings¶
ENVIRONMENT¶
- Type:
str - Default:
development - Options:
development,staging,production - Description: Application environment mode
LOG_LEVEL¶
- Type:
str - Default:
INFO - Options:
DEBUG,INFO,WARNING,ERROR,CRITICAL - Description: Logging verbosity level
API_RATE_LIMIT¶
- Type:
str - Default:
30/minute - Format:
{count}/{time_unit} - Description: Rate limiting for API endpoints
LLM Configuration¶
LLM_PROVIDER¶
- Type:
str - Default:
openai - Options:
openai,azure,bedrock,anthropic - Description: LLM service provider
LLM_MODEL¶
- Type:
str - Default:
gpt-5-mini - Options: Provider-specific model names
- Description: LLM model to use for generation
OpenAI Models:
LLM_MODEL=gpt-5-mini # Recommended: Fast, cheap
LLM_MODEL=gpt-5 # Higher quality
LLM_MODEL=gpt-4o # Best quality, expensive
Azure OpenAI Models:
AWS Bedrock Models:
LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
LLM_MODEL=anthropic.claude-3-haiku-20240307-v1:0
LLM_MODEL=amazon.titan-text-express-v1
LLM_TEMPERATURE¶
- Type:
float - Default:
0.2 - Range:
0.0-2.0 - Description: Response randomness (lower = more deterministic)
LLM_MAX_TOKENS¶
- Type:
int - Default:
4000 - Description: Maximum tokens in LLM response
OpenAI Settings¶
OPENAI_API_KEY¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=openai) - Description: OpenAI API key
OPENAI_ORG_ID¶
- Type:
str - Optional: Yes
- Description: OpenAI organization ID
Azure OpenAI Settings¶
AZURE_OPENAI_ENDPOINT¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=azure) - Format:
https://{resource-name}.openai.azure.com - Description: Azure OpenAI endpoint URL
AZURE_OPENAI_API_KEY¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=azure) - Description: Azure OpenAI API key
AZURE_OPENAI_DEPLOYMENT¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=azure) - Description: Deployment name for the model
AZURE_OPENAI_API_VERSION¶
- Type:
str - Default:
2023-05-15 - Description: Azure OpenAI API version
AWS Bedrock Settings¶
AWS_REGION¶
- Type:
str - Default:
us-east-1 - Description: AWS region for Bedrock
AWS_ACCESS_KEY_ID¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=bedrock) - Description: AWS access key ID
AWS_SECRET_ACCESS_KEY¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=bedrock) - Description: AWS secret access key
Anthropic Settings¶
ANTHROPIC_API_KEY¶
- Type:
str - Required: Yes (if
LLM_PROVIDER=anthropic) - Description: Anthropic API key
Vector Store Configuration¶
VECTOR_STORE_TYPE¶
- Type:
str - Default:
faiss - Options:
faiss,qdrant - Description: Vector store backend
QDRANT_URL¶
- Type:
str - Default:
http://localhost:6333 - Required: Yes (if
VECTOR_STORE_TYPE=qdrant) - Description: Qdrant server URL
QDRANT_API_KEY¶
- Type:
str - Optional: Yes
- Description: Qdrant API key (if authentication enabled)
QDRANT_COLLECTION_NAME¶
- Type:
str - Default:
greengovrag - Description: Qdrant collection name
FAISS_INDEX_PATH¶
- Type:
str - Default:
data/vectors/faiss_index - Description: Path to FAISS index file
Embedding Configuration¶
EMBEDDING_MODEL¶
- Type:
str - Default:
sentence-transformers/all-MiniLM-L6-v2 - Description: Embedding model name
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
EMBEDDING_MODEL=sentence-transformers/all-mpnet-base-v2
EMBEDDING_DIMENSIONS¶
- Type:
int - Default:
384 - Description: Embedding vector dimensions
Database Configuration¶
DATABASE_URL¶
- Type:
str - Required: Yes
- Format:
postgresql://{user}:{password}@{host}:{port}/{database} - Description: PostgreSQL connection string
DATABASE_POOL_SIZE¶
- Type:
int - Default:
20 - Description: Database connection pool size
DATABASE_MAX_OVERFLOW¶
- Type:
int - Default:
10 - Description: Max overflow connections beyond pool size
Cloud Storage Configuration¶
CLOUD_PROVIDER¶
- Type:
str - Default:
local - Options:
local,aws,azure - Description: Cloud storage provider
AWS_S3_BUCKET¶
- Type:
str - Required: Yes (if
CLOUD_PROVIDER=aws) - Description: S3 bucket name for document storage
AZURE_STORAGE_CONNECTION_STRING¶
- Type:
str - Required: Yes (if
CLOUD_PROVIDER=azure) - Description: Azure Blob Storage connection string
AZURE_STORAGE_CONTAINER¶
- Type:
str - Default:
greengovrag - Description: Azure Blob Storage container name
Caching Configuration¶
CACHE_ENABLED¶
- Type:
bool - Default:
true - Description: Enable query result caching
CACHE_TTL_SECONDS¶
- Type:
int - Default:
3600 - Description: Cache time-to-live in seconds
REDIS_URL¶
- Type:
str - Optional: Yes
- Format:
redis://{host}:{port}/{db} - Description: Redis connection string for caching
ETL Pipeline Configuration¶
CHUNK_SIZE¶
- Type:
int - Default:
500 - Description: Text chunk size in tokens
CHUNK_OVERLAP¶
- Type:
int - Default:
100 - Description: Overlap between chunks in tokens
CHUNK_BATCH_SIZE¶
- Type:
int - Default:
100 - Description: Number of chunks to process in parallel
DOCUMENTS_CONFIG_PATH¶
- Type:
str - Default:
configs/documents_config.yml - Description: Path to document sources configuration
RAW_DATA_DIR¶
- Type:
str - Default:
data/raw - Description: Directory for downloaded documents
PROCESSED_DATA_DIR¶
- Type:
str - Default:
data/processed - Description: Directory for processed chunks
RAG Configuration¶
TOP_K_RESULTS¶
- Type:
int - Default:
5 - Description: Number of documents to retrieve
MIN_RELEVANCE_SCORE¶
- Type:
float - Default:
0.3 - Range:
0.0-1.0 - Description: Minimum relevance score to include document
ENABLE_HYBRID_SEARCH¶
- Type:
bool - Default:
true - Description: Enable BM25 + vector hybrid search
BM25_WEIGHT¶
- Type:
float - Default:
0.3 - Range:
0.0-1.0 - Description: Weight for BM25 score in hybrid search
CORS Configuration¶
CORS_ORIGINS¶
- Type:
list[str] - Default:
["http://localhost:3000"] - Format: Comma-separated URLs
- Description: Allowed CORS origins
CORS_ALLOW_CREDENTIALS¶
- Type:
bool - Default:
true - Description: Allow credentials in CORS requests
Configuration Profiles¶
Development Profile¶
# .env.development
ENVIRONMENT=development
LOG_LEVEL=DEBUG
VECTOR_STORE_TYPE=faiss
DATABASE_URL=postgresql://greengovrag:greengovrag@localhost:5432/greengovrag
CLOUD_PROVIDER=local
LLM_PROVIDER=openai
LLM_MODEL=gpt-5-mini
CACHE_ENABLED=false
Production Profile¶
# .env.production
ENVIRONMENT=production
LOG_LEVEL=INFO
VECTOR_STORE_TYPE=qdrant
QDRANT_URL=http://qdrant:6333
DATABASE_URL=postgresql://greengovrag:${DB_PASSWORD}@rds-endpoint:5432/greengovrag
CLOUD_PROVIDER=aws
AWS_S3_BUCKET=greengovrag-prod-documents
LLM_PROVIDER=azure
LLM_MODEL=gpt-5-mini
CACHE_ENABLED=true
CACHE_TTL_SECONDS=3600
Configuration Loading¶
Configuration is loaded in this order (later sources override earlier):
- Default values in
backend/green_gov_rag/config.py - Environment variables from
.envfile - System environment variables (highest priority)
Example:
from green_gov_rag.config import settings
# Access configuration
print(settings.llm_provider) # 'openai'
print(settings.llm_model) # 'gpt-5-mini'
Validation¶
Configuration is validated on startup using Pydantic:
from pydantic import BaseSettings, validator
class Settings(BaseSettings):
llm_provider: str = "openai"
@validator("llm_provider")
def validate_provider(cls, v):
allowed = ["openai", "azure", "bedrock", "anthropic"]
if v not in allowed:
raise ValueError(f"LLM provider must be one of {allowed}")
return v
Validation errors will prevent the application from starting.
Secrets Management¶
Local Development¶
Use .env file (never commit to Git):
Production (AWS)¶
Use AWS Systems Manager Parameter Store:
aws ssm put-parameter \
--name "/greengovrag/prod/openai-api-key" \
--value "sk-..." \
--type "SecureString"
Production (Azure)¶
Use Azure Key Vault:
Last Updated: 2025-11-22