Local Docker Deployment¶
Quick setup for development and testing using Docker Compose
Prerequisites¶
- Docker Desktop 4.0+ (or Docker Engine + Docker Compose)
- 8GB RAM minimum
- 20GB free disk space
- Git installed
Quick Start (5 Minutes)¶
1. Clone Repository¶
2. Configure Environment¶
Edit .env with your API keys:
# Required: LLM API Key
OPENAI_API_KEY=sk-your-key-here
# Optional: Vector store (default: FAISS)
VECTOR_STORE_TYPE=faiss
# Optional: Database (defaults provided)
DATABASE_URL=postgresql://greengovrag:greengovrag@postgres:5432/greengovrag
3. Start Services¶
Production mode (backend + frontend + database):
Development mode (includes Airflow UI):
4. Verify Services¶
Backend API:
Frontend: Open http://localhost:3000
Airflow UI (dev mode): Open http://localhost:8080 - Username: admin - Password: admin
5. Run ETL Pipeline¶
Using Airflow UI (dev mode): 1. Open http://localhost:8080 2. Enable greengovrag_full_pipeline DAG 3. Click "Trigger DAG"
Using CLI:
6. Query RAG System¶
curl -X POST http://localhost:8000/api/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the NGER reporting requirements?",
"max_sources": 5
}'
Architecture¶
Docker Compose Services¶
services:
postgres: # PostgreSQL database
backend: # FastAPI application
frontend: # React application
airflow-webserver: # Airflow UI (dev only)
airflow-scheduler: # Airflow scheduler (dev only)
redis: # Airflow backend (dev only)
Ports¶
| Service | Port | URL |
|---|---|---|
| Backend API | 8000 | http://localhost:8000 |
| Frontend | 3000 | http://localhost:3000 |
| PostgreSQL | 5432 | localhost:5432 |
| Airflow UI | 8080 | http://localhost:8080 |
| Redis | 6379 | localhost:6379 |
Volumes¶
postgres_data: Database persistencevector_data: FAISS vector indexraw_documents: Downloaded PDFsprocessed_data: Processed chunksairflow_logs: Airflow logs (dev mode)
Configuration Options¶
Vector Store Selection¶
FAISS (Default - In-Memory):
Qdrant (Production-like):
# Add to docker-compose.yml
qdrant:
image: qdrant/qdrant:latest
ports:
- "6333:6333"
volumes:
- qdrant_data:/qdrant/storage
# In .env
VECTOR_STORE_TYPE=qdrant
QDRANT_URL=http://qdrant:6333
LLM Provider Selection¶
OpenAI (Default):
Azure OpenAI:
LLM_PROVIDER=azure
LLM_MODEL=gpt-5-mini
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_DEPLOYMENT=gpt-5-mini
AWS Bedrock:
LLM_PROVIDER=bedrock
LLM_MODEL=anthropic.claude-3-sonnet-20240229-v1:0
AWS_REGION=us-east-1
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
Database Configuration¶
PostgreSQL (Default):
External PostgreSQL:
# Remove postgres service from docker-compose.yml
DATABASE_URL=postgresql://user:pass@external-host:5432/dbname
Development Workflow¶
Hot Reload¶
Backend has hot reload enabled by default:
Running Tests¶
# Inside backend container
docker-compose exec backend pytest tests/
# With coverage
docker-compose exec backend pytest --cov=green_gov_rag tests/
Accessing Database¶
# PostgreSQL shell
docker-compose exec postgres psql -U greengovrag -d greengovrag
# Run SQL query
docker-compose exec postgres psql -U greengovrag -d greengovrag \
-c "SELECT COUNT(*) FROM documents;"
Viewing Logs¶
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f backend
# Last 100 lines
docker-compose logs --tail=100 backend
Rebuilding Services¶
# Rebuild single service
docker-compose build backend
# Rebuild and restart
docker-compose up --build backend
# Clean rebuild (remove volumes)
docker-compose down -v
docker-compose up --build
Troubleshooting¶
Issue: Port Already in Use¶
Error: Bind for 0.0.0.0:8000 failed: port is already allocated
Solution: Change port in docker-compose.yml:
Issue: Database Connection Failed¶
Error: FATAL: password authentication failed
Solution: Reset database:
Issue: Out of Memory¶
Error: Container crashes with OOM
Solution: Increase Docker memory: - Docker Desktop: Settings → Resources → Memory (increase to 8GB+) - Linux: Edit /etc/docker/daemon.json
Issue: pgvector Extension Not Found¶
Error: extension "vector" is not available
Solution: Ensure init script runs:
docker-compose down -v
docker-compose up postgres
# Wait for "database system is ready to accept connections"
docker-compose up backend
Issue: Airflow DAG Not Showing¶
Solution: Wait for DAG to load (30-60 seconds), then refresh:
Issue: Vector Store Empty¶
Error: No documents found in vector store
Solution: Run ETL pipeline:
Performance Optimization¶
Use Qdrant Instead of FAISS¶
For better performance with large datasets:
- Add Qdrant service to
docker-compose.yml - Set
VECTOR_STORE_TYPE=qdrant - Run ETL pipeline
Limit Document Sources¶
For faster testing, limit sources in backend/configs/documents_config.yml:
sources:
- type: federal_legislation
enabled: true # Only enable a few sources
- type: state_legislation
enabled: false # Disable others
Reduce Chunk Size¶
For faster embedding generation:
Cleanup¶
Stop Services¶
# Stop but keep data
docker-compose stop
# Stop and remove containers (keep volumes)
docker-compose down
# Stop and remove everything (including data)
docker-compose down -v
Remove Unused Images¶
Free Disk Space¶
Next Steps¶
- Configuration Guide - Detailed configuration options
- First Query - Run your first RAG query
- AWS Deployment - Deploy to production
- Monitoring - Set up monitoring and logging
Last Updated: 2025-11-22