Testing Guide¶
Comprehensive guide to testing in GreenGovRAG
Table of Contents¶
- Overview
- Test Structure
- Running Tests
- Writing Unit Tests
- Writing Integration Tests
- Test Markers
- Mocking and Fixtures
- Code Coverage
- Testing Best Practices
- CI/CD Testing
- Debugging Tests
Overview¶
Testing is a critical part of GreenGovRAG's development process. We maintain high test coverage to ensure reliability, catch regressions early, and enable confident refactoring.
Testing Philosophy¶
- Write tests first (Test-Driven Development encouraged)
- Test behavior, not implementation (focus on what, not how)
- Keep tests simple and readable (tests are documentation)
- Isolate tests (no dependencies between tests)
- Fast feedback (unit tests should run in milliseconds)
- Comprehensive coverage (aim for 70%+ overall, 90%+ for core logic)
Test Types¶
- Unit Tests: Test individual functions/classes in isolation
- Integration Tests: Test interactions between components
- End-to-End Tests: Test complete workflows (future enhancement)
- Performance Tests: Benchmark critical paths (future enhancement)
Testing Framework¶
We use pytest with these key plugins:
pytest-cov: Code coverage reportingpytest-mock: Simplified mockingpytest-asyncio: Async test support
Test Structure¶
Directory Organization¶
backend/tests/
├── __init__.py
├── conftest.py # Shared fixtures and configuration
├── test_api/ # API endpoint tests
│ ├── __init__.py
│ ├── test_query.py
│ ├── test_documents.py
│ └── test_admin.py
├── test_rag/ # RAG component tests
│ ├── __init__.py
│ ├── test_vector_store.py
│ ├── test_enhanced_response.py
│ ├── test_llm_factory.py
│ └── test_hybrid_search.py
├── test_etl/ # ETL pipeline tests
│ ├── __init__.py
│ ├── test_pipeline.py
│ ├── test_ingest.py
│ ├── test_chunker.py
│ └── test_sources/
│ ├── test_epbc_scraper.py
│ └── test_sa_legislation.py
├── test_models/ # Database model tests
│ ├── __init__.py
│ └── test_document.py
├── fixtures/ # Test data files
│ ├── sample_documents/
│ │ ├── test.pdf
│ │ └── test.html
│ └── mock_responses/
│ └── llm_responses.json
└── utils.py # Test utilities and helpers
Test File Naming¶
# Good: Follows pytest discovery pattern
test_vector_store.py
test_enhanced_response.py
test_query_endpoint.py
# Bad: Won't be discovered by pytest
vector_store_test.py
tests_for_rag.py
rag.py
Test Function Naming¶
# Good: Descriptive test names
def test_query_endpoint_returns_relevant_documents():
"""Test that query endpoint returns relevant documents."""
pass
def test_vector_store_handles_empty_query():
"""Test vector store behavior with empty query."""
pass
def test_metadata_tagger_extracts_jurisdiction():
"""Test metadata tagger extracts jurisdiction correctly."""
pass
# Bad: Vague test names
def test_query():
pass
def test_vector_store():
pass
def test_1():
pass
Running Tests¶
Basic Test Execution¶
cd backend
# Run all tests
pytest
# Run specific test file
pytest tests/test_rag/test_vector_store.py
# Run specific test function
pytest tests/test_rag/test_vector_store.py::test_faiss_similarity_search
# Run tests matching pattern
pytest -k "vector_store"
pytest -k "test_query or test_search"
# Verbose output
pytest -v
# Very verbose (show print statements)
pytest -vv
# Stop on first failure
pytest -x
# Stop after N failures
pytest --maxfail=3
# Run last failed tests
pytest --lf
# Run failed tests first, then others
pytest --ff
Running by Markers¶
# Run only unit tests
pytest -m unit
# Run only integration tests
pytest -m integration
# Skip slow tests
pytest -m "not slow"
# Run integration but not slow
pytest -m "integration and not slow"
# Run unit and integration
pytest -m "unit or integration"
Parallel Execution¶
# Install pytest-xdist
pip install pytest-xdist
# Run tests in parallel (auto-detect CPU count)
pytest -n auto
# Run tests on 4 cores
pytest -n 4
Running with Coverage¶
# Run with coverage report
pytest --cov=green_gov_rag
# HTML coverage report
pytest --cov=green_gov_rag --cov-report=html
# XML coverage report (for CI)
pytest --cov=green_gov_rag --cov-report=xml
# Show missing lines
pytest --cov=green_gov_rag --cov-report=term-missing
# Coverage for specific module
pytest --cov=green_gov_rag.rag --cov-report=html
Writing Unit Tests¶
Unit tests verify individual functions or classes in isolation.
Basic Unit Test¶
"""Tests for document chunking functionality."""
from green_gov_rag.etl.chunker import chunk_text
def test_chunk_text_splits_by_size():
"""Test that chunk_text splits text into correct sizes."""
text = "a" * 2500
chunks = chunk_text(text, chunk_size=1000, chunk_overlap=100)
assert len(chunks) == 3
assert len(chunks[0]) == 1000
assert len(chunks[1]) == 1000
assert len(chunks[2]) == 500
def test_chunk_text_maintains_overlap():
"""Test that chunks maintain specified overlap."""
text = "abcdefghij" * 200 # 2000 chars
chunks = chunk_text(text, chunk_size=1000, chunk_overlap=100)
# Verify overlap between consecutive chunks
overlap = chunks[0][-100:]
start_of_next = chunks[1][:100]
assert overlap == start_of_next
def test_chunk_text_handles_empty_input():
"""Test that chunk_text handles empty string."""
chunks = chunk_text("", chunk_size=1000)
assert chunks == []
def test_chunk_text_with_text_smaller_than_chunk_size():
"""Test chunking text smaller than chunk size."""
text = "Short text"
chunks = chunk_text(text, chunk_size=1000)
assert len(chunks) == 1
assert chunks[0] == text
Testing Classes¶
"""Tests for DocumentProcessor class."""
import pytest
from pathlib import Path
from green_gov_rag.etl.ingest import DocumentProcessor
from green_gov_rag.models.document import Document
class TestDocumentProcessor:
"""Test suite for DocumentProcessor."""
def test_init_sets_default_values(self):
"""Test that initialization sets correct defaults."""
processor = DocumentProcessor()
assert processor.chunk_size == 1000
assert processor.chunk_overlap == 200
assert processor.enable_ocr is False
def test_init_accepts_custom_values(self):
"""Test initialization with custom values."""
processor = DocumentProcessor(
chunk_size=500,
chunk_overlap=50,
enable_ocr=True,
)
assert processor.chunk_size == 500
assert processor.chunk_overlap == 50
assert processor.enable_ocr is True
def test_process_file_returns_chunks(self, tmp_path):
"""Test that process_file returns document chunks."""
# Create temporary test file
test_file = tmp_path / "test.txt"
test_file.write_text("Test content " * 100)
processor = DocumentProcessor(chunk_size=100)
chunks = processor.process_file(test_file)
assert len(chunks) > 0
assert all(isinstance(chunk, str) for chunk in chunks)
def test_process_file_raises_on_missing_file(self):
"""Test that process_file raises error for missing file."""
processor = DocumentProcessor()
with pytest.raises(FileNotFoundError):
processor.process_file(Path("nonexistent.txt"))
Testing with Parametrize¶
"""Tests using parametrization."""
import pytest
from green_gov_rag.rag.location_ner import extract_location
@pytest.mark.parametrize(
"query,expected_location",
[
("What are the rules in Adelaide?", "Adelaide"),
("Can I build in Dubbo Regional?", "Dubbo Regional"),
("Regulations for Sydney City Council", "Sydney"),
("Rules in South Australia", "South Australia"),
],
)
def test_extract_location_finds_australian_locations(query, expected_location):
"""Test location extraction from various queries."""
location = extract_location(query)
assert expected_location.lower() in location.lower()
@pytest.mark.parametrize(
"chunk_size,chunk_overlap",
[
(1000, 100),
(500, 50),
(2000, 200),
],
)
def test_chunker_with_various_sizes(chunk_size, chunk_overlap):
"""Test chunking with different size parameters."""
text = "a" * 5000
chunks = chunk_text(text, chunk_size=chunk_size, chunk_overlap=chunk_overlap)
# Verify all chunks respect size limit
assert all(len(chunk) <= chunk_size for chunk in chunks)
Testing Exceptions¶
"""Tests for error handling."""
import pytest
from green_gov_rag.rag.vector_store import VectorStoreFactory, VectorStoreError
def test_vector_store_factory_raises_on_invalid_type():
"""Test that factory raises error for invalid store type."""
with pytest.raises(ValueError, match="Unsupported vector store type"):
VectorStoreFactory.create(store_type="invalid")
def test_query_raises_on_empty_query():
"""Test that empty query raises appropriate error."""
from green_gov_rag.api.routes.query import query_endpoint
with pytest.raises(ValueError) as exc_info:
query_endpoint(query="")
assert "Query cannot be empty" in str(exc_info.value)
def test_database_connection_error_handling(monkeypatch):
"""Test handling of database connection errors."""
from green_gov_rag.models.database import get_session
def mock_failing_connection():
raise ConnectionError("Database unavailable")
monkeypatch.setattr("green_gov_rag.models.database.engine.connect", mock_failing_connection)
with pytest.raises(ConnectionError):
session = get_session()
Writing Integration Tests¶
Integration tests verify interactions between components.
Database Integration Tests¶
"""Integration tests for database operations."""
import pytest
from sqlmodel import Session, select
from green_gov_rag.models.database import engine, create_db_and_tables
from green_gov_rag.models.document import Document
@pytest.fixture(scope="function")
def test_db():
"""Create test database for each test."""
create_db_and_tables()
yield
# Cleanup after test
Document.metadata.drop_all(engine)
@pytest.mark.integration
def test_document_crud_operations(test_db):
"""Test create, read, update, delete operations."""
with Session(engine) as session:
# Create
doc = Document(
title="Test Document",
content="Test content",
source_url="https://example.com",
)
session.add(doc)
session.commit()
session.refresh(doc)
assert doc.id is not None
# Read
retrieved = session.get(Document, doc.id)
assert retrieved.title == "Test Document"
# Update
retrieved.title = "Updated Title"
session.commit()
# Verify update
updated = session.get(Document, doc.id)
assert updated.title == "Updated Title"
# Delete
session.delete(updated)
session.commit()
# Verify deletion
deleted = session.get(Document, doc.id)
assert deleted is None
API Integration Tests¶
"""Integration tests for API endpoints."""
import pytest
from fastapi.testclient import TestClient
from green_gov_rag.api.main import app
@pytest.fixture
def client():
"""Create test client."""
return TestClient(app)
@pytest.mark.integration
def test_query_endpoint_integration(client):
"""Test query endpoint with real components."""
response = client.post(
"/api/query",
json={
"query": "What are the EPBC Act requirements?",
"top_k": 5,
},
)
assert response.status_code == 200
data = response.json()
assert "answer" in data
assert "sources" in data
assert isinstance(data["sources"], list)
assert len(data["sources"]) <= 5
@pytest.mark.integration
def test_document_list_endpoint(client):
"""Test document listing endpoint."""
response = client.get("/api/documents")
assert response.status_code == 200
data = response.json()
assert "documents" in data
assert isinstance(data["documents"], list)
Vector Store Integration Tests¶
"""Integration tests for vector store."""
import pytest
from green_gov_rag.rag.vector_store import VectorStoreFactory
from green_gov_rag.rag.embeddings import get_embeddings
@pytest.fixture
def vector_store():
"""Create test vector store."""
embeddings = get_embeddings()
store = VectorStoreFactory.create(store_type="faiss")
return store
@pytest.mark.integration
@pytest.mark.slow
def test_vector_store_similarity_search(vector_store):
"""Test vector similarity search."""
# Add test documents
texts = [
"The EPBC Act protects threatened species.",
"Native vegetation clearing requires approval.",
"Environmental impact assessments are mandatory.",
]
vector_store.add_texts(texts)
# Search
results = vector_store.similarity_search(
"How to protect endangered animals?",
k=2,
)
assert len(results) == 2
assert "EPBC Act" in results[0].page_content
Test Markers¶
We use pytest markers to categorize tests.
Available Markers¶
Defined in backend/pyproject.toml:
# Unit tests (default, fast)
@pytest.mark.unit
def test_something():
pass
# Integration tests (require services)
@pytest.mark.integration
def test_database_integration():
pass
# Slow tests (take >1 second)
@pytest.mark.slow
def test_large_document_processing():
pass
# End-to-end tests
@pytest.mark.e2e
def test_complete_workflow():
pass
Using Markers¶
"""Example test file with markers."""
import pytest
@pytest.mark.unit
def test_fast_unit_test():
"""Fast unit test."""
assert 1 + 1 == 2
@pytest.mark.integration
def test_database_integration():
"""Integration test requiring database."""
# Test with database
pass
@pytest.mark.slow
@pytest.mark.integration
def test_slow_integration():
"""Slow integration test."""
# Long-running integration test
pass
@pytest.mark.skip(reason="Feature not implemented yet")
def test_future_feature():
"""Test for future feature."""
pass
@pytest.mark.skipif(
condition=not has_gpu(),
reason="Requires GPU",
)
def test_gpu_acceleration():
"""Test GPU-accelerated processing."""
pass
Mocking and Fixtures¶
Common Fixtures¶
Defined in tests/conftest.py:
"""Shared test fixtures."""
import pytest
from unittest.mock import Mock, MagicMock
from pathlib import Path
@pytest.fixture
def sample_document():
"""Provide sample document for testing."""
return {
"title": "Test Document",
"content": "Test content",
"source_url": "https://example.com",
"document_type": "regulation",
}
@pytest.fixture
def sample_documents():
"""Provide list of sample documents."""
return [
{"title": f"Document {i}", "content": f"Content {i}"}
for i in range(5)
]
@pytest.fixture
def mock_llm():
"""Mock LLM for testing."""
llm = Mock()
llm.invoke.return_value = "Mocked LLM response"
return llm
@pytest.fixture
def mock_vector_store():
"""Mock vector store for testing."""
store = MagicMock()
store.similarity_search.return_value = [
Mock(page_content="Test document 1"),
Mock(page_content="Test document 2"),
]
return store
@pytest.fixture
def temp_test_file(tmp_path):
"""Create temporary test file."""
test_file = tmp_path / "test.txt"
test_file.write_text("Test content")
return test_file
Using Fixtures¶
"""Tests using fixtures."""
import pytest
def test_with_sample_document(sample_document):
"""Test using sample document fixture."""
assert sample_document["title"] == "Test Document"
def test_with_mock_llm(mock_llm):
"""Test using mocked LLM."""
response = mock_llm.invoke("Test query")
assert response == "Mocked LLM response"
mock_llm.invoke.assert_called_once_with("Test query")
def test_with_temp_file(temp_test_file):
"""Test using temporary file."""
content = temp_test_file.read_text()
assert content == "Test content"
Mocking External Services¶
"""Tests mocking external services."""
import pytest
from unittest.mock import patch, Mock
@patch("green_gov_rag.rag.llm_factory.OpenAI")
def test_query_with_mocked_openai(mock_openai):
"""Test query endpoint with mocked OpenAI."""
# Configure mock
mock_instance = Mock()
mock_instance.invoke.return_value = "Mocked response"
mock_openai.return_value = mock_instance
# Test code that uses OpenAI
from green_gov_rag.rag.enhanced_response import EnhancedRAGPipeline
pipeline = EnhancedRAGPipeline()
result = pipeline.query("Test question")
assert "Mocked response" in result
@pytest.fixture
def mock_requests(monkeypatch):
"""Mock requests library."""
mock_response = Mock()
mock_response.status_code = 200
mock_response.json.return_value = {"data": "test"}
def mock_get(*args, **kwargs):
return mock_response
monkeypatch.setattr("requests.get", mock_get)
return mock_response
def test_document_scraper_with_mocked_requests(mock_requests):
"""Test scraper with mocked HTTP requests."""
from green_gov_rag.etl.sources.epbc_scraper import EPBCScraper
scraper = EPBCScraper()
documents = scraper.fetch_documents()
# Verify mocked request was used
assert len(documents) > 0
Code Coverage¶
Coverage Requirements¶
- Minimum overall: 70%
- Core RAG logic: 90%+
- API endpoints: 80%+
- ETL pipeline: 80%+
- Utilities: 70%+
Checking Coverage¶
# Generate coverage report
pytest --cov=green_gov_rag --cov-report=html
# Open HTML report
open htmlcov/index.html # macOS
xdg-open htmlcov/index.html # Linux
# Show coverage in terminal
pytest --cov=green_gov_rag --cov-report=term-missing
# Check coverage for specific module
pytest --cov=green_gov_rag.rag --cov-report=term
# Fail if coverage below threshold
pytest --cov=green_gov_rag --cov-fail-under=70
Coverage Configuration¶
In backend/pyproject.toml:
[tool.coverage.run]
source = ["green_gov_rag"]
omit = [
"*/tests/*",
"*/__pycache__/*",
"*/site-packages/*",
"*/.venv/*",
]
[tool.coverage.report]
exclude_lines = [
"pragma: no cover",
"def __repr__",
"if __name__ == .__main__.:",
"raise AssertionError",
"raise NotImplementedError",
"if TYPE_CHECKING:",
]
Improving Coverage¶
# Mark code that shouldn't be covered
def debug_only_function(): # pragma: no cover
"""Only used for debugging."""
print("Debug info")
# Cover edge cases
def test_all_branches():
"""Test all code branches."""
# Test happy path
result = function(valid_input)
assert result.success
# Test error cases
with pytest.raises(ValueError):
function(invalid_input)
# Test edge cases
result = function(edge_case_input)
assert result.handled_correctly
Testing Best Practices¶
1. Arrange-Act-Assert Pattern¶
def test_document_processing():
"""Test document processing."""
# Arrange - Set up test data
document = create_test_document()
processor = DocumentProcessor(chunk_size=1000)
# Act - Execute the code being tested
result = processor.process(document)
# Assert - Verify the results
assert len(result.chunks) > 0
assert result.metadata["processed"] is True
2. Test One Thing at a Time¶
# Good: Single responsibility
def test_chunker_respects_size_limit():
"""Test that chunker respects size limit."""
text = "a" * 5000
chunks = chunk_text(text, chunk_size=1000)
assert all(len(chunk) <= 1000 for chunk in chunks)
def test_chunker_maintains_overlap():
"""Test that chunker maintains overlap."""
text = "abcd" * 500
chunks = chunk_text(text, chunk_size=1000, chunk_overlap=100)
overlap = chunks[0][-100:]
assert overlap == chunks[1][:100]
# Bad: Testing multiple things
def test_chunker():
"""Test chunker."""
# Tests size, overlap, edge cases all at once
# Hard to debug when it fails
pass
3. Use Descriptive Names¶
# Good: Clear what's being tested
def test_query_endpoint_returns_404_when_document_not_found():
pass
# Bad: Unclear
def test_query():
pass
4. Avoid Test Interdependence¶
# Bad: Tests depend on each other
class TestDocumentProcessor:
document = None
def test_create_document(self):
self.document = create_document()
assert self.document
def test_process_document(self):
# Depends on test_create_document running first
process(self.document)
# Good: Independent tests
class TestDocumentProcessor:
def test_create_document(self):
document = create_document()
assert document
def test_process_document(self):
document = create_document() # Set up own test data
result = process(document)
assert result
5. Clean Up Resources¶
# Good: Using fixtures for cleanup
@pytest.fixture
def temp_database():
"""Create temporary database."""
db = create_test_db()
yield db
db.cleanup() # Automatic cleanup
def test_with_database(temp_database):
# Database automatically cleaned up after test
pass
CI/CD Testing¶
GitHub Actions Workflow¶
Tests run automatically on push and pull requests:
# .github/workflows/test.yml
name: Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.12'
- name: Install dependencies
run: |
cd backend
pip install -e .[dev]
- name: Run tests
run: |
cd backend
pytest --cov=green_gov_rag --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
Local CI Simulation¶
# Run same checks as CI
cd backend
# Format check
ruff format --check .
# Lint check
ruff check .
# Type check
mypy green_gov_rag tests
# Tests with coverage
pytest --cov=green_gov_rag --cov-report=xml --cov-fail-under=70
Debugging Tests¶
Running in Debug Mode¶
# Add pytest.set_trace() for debugging
def test_complex_logic():
"""Test complex logic."""
result = complex_function()
import pytest; pytest.set_trace() # Debugger will stop here
assert result.is_valid
VS Code Debugging¶
- Set breakpoint in test file
- Press F5 or use Debug panel
- Select "Python: Pytest" configuration
Print Debugging¶
# Show print statements
pytest -s
# Show very verbose output
pytest -vv
# Show local variables on failure
pytest --showlocals
Debugging Failures¶
# Show full traceback
pytest --tb=long
# Show only failed test output
pytest --tb=short
# Show minimal output
pytest --tb=line
Next step: Learn how to submit your changes with the Pull Request Guide!