Glossary¶

Key terms and concepts in GreenGovRAG

General Terms¶

RAG (Retrieval-Augmented Generation)¶

A pattern that combines information retrieval from a knowledge base with large language model generation to produce factual, grounded responses.

Vector Embedding¶

A numerical representation of text in high-dimensional space, where semantically similar texts are positioned close together.

Semantic Search¶

Search that understands meaning and context rather than just keyword matching. Uses vector embeddings to find conceptually similar content.

Hybrid Search¶

Combination of keyword-based search (BM25) and semantic search (vector similarity) for improved retrieval accuracy.

Chunking¶

Process of splitting large documents into smaller segments (chunks) suitable for embedding and retrieval.

ETL (Extract, Transform, Load)¶

Pipeline for downloading source documents, processing them into chunks, generating embeddings, and loading into the vector store.

Document Types¶

Federal Legislation¶

Commonwealth/national-level laws and regulations applicable across Australia.

Examples: EPBC Act, NGER Act

State Legislation¶

State-specific laws and regulations.

Examples: NSW Environmental Planning & Assessment Act, SA Native Vegetation Act

Local Government Policy¶

Council-level planning schemes, development controls, and local laws.

Examples: City of Adelaide Development Plan, Local Environment Plans (LEPs)

Emissions Reporting¶

Documents related to greenhouse gas emissions measurement, reporting, and verification.

Examples: NGER Guidelines, GHG Protocol, ISSB Standards

Geographic Terms¶

LGA (Local Government Area)¶

Geographic area under the jurisdiction of a local council. Identified by ABS (Australian Bureau of Statistics) codes.

Example: City of Adelaide (LGA code: 40070)

Jurisdiction¶

Level of government authority:

Federal: Commonwealth level
State: NSW, VIC, SA, WA, QLD, TAS, NT, ACT
Local: Council/LGA level

Spatial Metadata¶

Geographic information about document applicability (state, LGA codes, spatial scope).

Technical Terms¶

Vector Store¶

Database optimized for storing and searching high-dimensional vectors.

Implementations:

FAISS (Facebook AI Similarity Search): In-memory, file-based
Qdrant: Persistent, distributed vector database

pgvector¶

PostgreSQL extension for vector similarity search, enabling hybrid queries combining SQL and semantic search.

Token¶

Unit of text for LLM processing. Roughly 0.75 words in English.

Example: "The quick brown fox" ≈ 5 tokens

Embedding Model¶

Machine learning model that converts text into vector embeddings.

Default: sentence-transformers/all-MiniLM-L6-v2 (384 dimensions)

HNSW (Hierarchical Navigable Small World)¶

Graph-based algorithm for approximate nearest neighbor search in vector stores. Provides sub-linear search time.

LLM Terms¶

LLM (Large Language Model)¶

AI model trained on vast text corpora, capable of understanding and generating human-like text.

Providers: OpenAI (GPT-5-mini, GPT-5, GPT-4o), Anthropic (Claude), AWS (Bedrock), Azure (OpenAI)

Prompt¶

Input text provided to an LLM to elicit a response.

Temperature¶

Parameter controlling randomness in LLM output (0.0 = deterministic, 2.0 = very random).

GreenGovRAG default: 0.2 (low randomness for factual responses)

Context Window¶

Maximum amount of text (in tokens) an LLM can process in a single request.

Examples:

GPT-5-mini: 16K tokens
GPT-4o: 128K tokens
Claude 3: 200K tokens

Hallucination¶

When an LLM generates plausible-sounding but factually incorrect information. RAG mitigates this by grounding responses in retrieved documents.

RAG-Specific Terms¶

Trust Score¶

Confidence metric (0.0-1.0) indicating how well the retrieved sources support the generated answer.

Calculation: Weighted average of source relevance scores

Source Document¶

Original regulatory document retrieved to support an answer.

Relevance Score¶

Similarity score (0.0-1.0) indicating how relevant a retrieved document is to the query.

Citation¶

Reference to the specific section and page of a source document used in the response.

Example: Clean Energy Regulator (2024), Scope 2 Emissions Guideline, Page 42, Section 3.2.1

Deep Link¶

URL pointing directly to a specific page or section within a source document.

Example: https://example.gov.au/doc.pdf#page=42

ESG Terms¶

Framework for evaluating organizational impacts and sustainability practices.

NGER (National Greenhouse and Energy Reporting)¶

Australian legislation requiring large emitters to report greenhouse gas emissions and energy consumption.

ISSB (International Sustainability Standards Board)¶

Global body developing sustainability disclosure standards (IFRS S1/S2).

GHG Protocol¶

International standard for measuring and reporting greenhouse gas emissions.

Scope 1 Emissions¶

Direct emissions from owned or controlled sources.

Examples: Company vehicles, on-site fuel combustion

Scope 2 Emissions¶

Indirect emissions from purchased electricity, steam, heating, cooling.

Scope 3 Emissions¶

All other indirect emissions in the value chain (upstream and downstream).

Examples: Purchased goods, business travel, waste disposal, product use

Carbon Neutral¶

Achieving net-zero carbon emissions by balancing emissions with carbon removal or offsets.

GHG (Greenhouse Gas)¶

Gases that trap heat in the atmosphere: CO₂, CH₄, N₂O, SF₆, HFCs, PFCs, NF₃.

Database Terms¶

ORM (Object-Relational Mapping)¶

Programming technique that maps database tables to Python classes.

GreenGovRAG uses: SQLModel (built on SQLAlchemy + Pydantic)

Migration¶

Versioned change to database schema, managed by Alembic.

JSONB¶

PostgreSQL's binary JSON data type, allowing efficient querying of nested structured data.

Index¶

Database structure that speeds up data retrieval.

Types:

B-tree: Standard index for equality/range queries
GIN: Generalized Inverted Index for JSONB/array data
IVFFlat: Index for vector similarity search (pgvector)

Connection Pool¶

Pre-established database connections reused for multiple queries, improving performance.

Cloud Terms¶

CDK (Cloud Development Kit)¶

AWS infrastructure-as-code framework using TypeScript/Python.

Bicep¶

Azure's domain-specific language for infrastructure-as-code.

ECS (Elastic Container Service)¶

AWS container orchestration service.

Fargate¶

Serverless compute engine for containers (AWS ECS/EKS).

Container Apps¶

Azure's serverless container platform.

RDS (Relational Database Service)¶

AWS managed PostgreSQL/MySQL service.

Spot Instance¶

AWS EC2 instance at reduced price (up to 70% off) with potential interruption.

CloudWatch¶

AWS monitoring and logging service.

Application Insights¶

Azure monitoring and telemetry service.

API Terms¶

REST API¶

Web API using HTTP methods (GET, POST, PUT, DELETE) for resource manipulation.

FastAPI¶

Modern Python web framework for building APIs with automatic OpenAPI documentation.

Swagger/OpenAPI¶

API documentation standard. GreenGovRAG auto-generates docs at /docs.

Security mechanism allowing web apps from one domain to access resources from another.

Rate Limiting¶

Restricting the number of API requests a client can make in a time period.

GreenGovRAG default: 30 requests/minute

Endpoint¶

Specific URL path for accessing API functionality.

Example: /api/query (RAG query endpoint)

Development Terms¶

Docker¶

Platform for running applications in isolated containers.

Docker Compose¶

Tool for defining and running multi-container Docker applications.

Airflow¶

Workflow orchestration platform for scheduling and monitoring ETL pipelines.

Note: Used in local development only; production uses GitHub Actions.

CI/CD (Continuous Integration/Continuous Deployment)¶

Automated testing and deployment pipeline.

GreenGovRAG uses: GitHub Actions

Alembic¶

Database migration tool for SQLAlchemy/SQLModel.

Ruff¶

Fast Python linter and code formatter.

MyPy¶

Static type checker for Python.

Pytest¶

Python testing framework.

Performance Terms¶

Latency¶

Time delay between request and response.

Target: < 2 seconds (p95) for RAG queries

Throughput¶

Number of requests processed per unit time.

Example: 1000 queries/hour

Cache Hit Rate¶

Percentage of requests served from cache rather than recomputed.

Target: > 50%

p50, p95, p99¶

Percentile metrics:

p50 (median): 50% of requests faster than this
p95: 95% faster (5% slower)
p99: 99% faster (1% slower)

Auto-scaling¶

Automatically adjusting compute resources based on load.

Monitoring Terms¶

Health Check¶

Endpoint verifying system components are operational.

GreenGovRAG: /api/health

Alert¶

Automated notification when metrics exceed thresholds.

Examples: High CPU, error rate spike

Dashboard¶

Visual display of system metrics and KPIs.

Tools: CloudWatch, Azure Monitor, Grafana

Trace¶

Record of a request's path through distributed system components.

Tools: AWS X-Ray, Application Insights

Log¶

Time-stamped record of events and errors.

Formats: JSON (structured), plain text

Acronyms¶

Acronym	Full Term
ABS	Australian Bureau of Statistics
API	Application Programming Interface
AWS	Amazon Web Services
BM25	Best Matching 25 (ranking function)
CDN	Content Delivery Network
CDK	Cloud Development Kit (AWS)
CER	Clean Energy Regulator
CLI	Command Line Interface
CORS	Cross-Origin Resource Sharing
CPU	Central Processing Unit
CRUD	Create, Read, Update, Delete
CSV	Comma-Separated Values
DDoS	Distributed Denial of Service
DNS	Domain Name System
ECS	Elastic Container Service (AWS)
EIA	Environmental Impact Assessment
EPBC	Environment Protection and Biodiversity Conservation (Act)
ESG	Environmental, Social, Governance
ETL	Extract, Transform, Load
FAISS	Facebook AI Similarity Search
GHG	Greenhouse Gas
GIN	Generalized Inverted Index
GPT	Generative Pre-trained Transformer
HNSW	Hierarchical Navigable Small World
HTTP	Hypertext Transfer Protocol
HTTPS	HTTP Secure
IAM	Identity and Access Management
IaC	Infrastructure as Code
ISSB	International Sustainability Standards Board
JSON	JavaScript Object Notation
JSONB	JSON Binary (PostgreSQL)
JWT	JSON Web Token
KPI	Key Performance Indicator
LEP	Local Environment Plan
LGA	Local Government Area
LLM	Large Language Model
NER	Named Entity Recognition
NGER	National Greenhouse and Energy Reporting
NSG	Network Security Group (Azure)
OAuth	Open Authorization
OIDC	OpenID Connect
ORM	Object-Relational Mapping
PDF	Portable Document Format
PII	Personally Identifiable Information
PITR	Point-In-Time Recovery
QPS	Queries Per Second
RAG	Retrieval-Augmented Generation
RBAC	Role-Based Access Control
RDS	Relational Database Service (AWS)
REST	Representational State Transfer
RTO	Recovery Time Objective
RPO	Recovery Point Objective
S3	Simple Storage Service (AWS)
SLA	Service Level Agreement
SQL	Structured Query Language
SSM	Systems Manager (AWS)
SSL	Secure Sockets Layer
TLS	Transport Layer Security
TTL	Time To Live
URL	Uniform Resource Locator
VNet	Virtual Network (Azure)
VPC	Virtual Private Cloud (AWS)
WAF	Web Application Firewall
YAML	YAML Ain't Markup Language