Verified Queries

Verified Queries (VQs) are curated, production-validated SQL patterns that serve as few-shot examples for the NLQ engine. They are the highest-trust tier of SQL generation — when a question matches a verified query, the LLM receives the pattern as context and generates highly accurate SQL.

How It Works

The verified query system operates in three phases:

Matching — When a question arrives, the system scores existing verified queries for relevance using keyword overlap, table coverage, and semantic similarity
Injection — Matched VQs are injected into the LLM prompt as few-shot examples
Promotion — Queries that receive consistent positive feedback can be promoted to verified status

Finding Relevant VQs (`verified_query_loader.py`)

The find_relevant() function scores VQs against the incoming question:

Keyword scoring — Matches question keywords against VQ descriptions and SQL content
Table overlap — Prioritizes VQs that use the same tables as the schema context
Semantic fallback — When keyword scoring is insufficient, uses embedding similarity (if available)
Exact match — Direct string matching for known question patterns

from epistom.semantic_core.loaders.verified_query_loader import find_relevant
 
vqs = find_relevant(
    question="What is the monthly revenue trend?",
    tables=["invoices", "customers"],
    workspace_id=workspace_id,
    top_k=3
)

The format_for_prompt() function converts matched VQs into the prompt format expected by the LLM.

VQ Pipeline (`vq_pipeline/`)

The pipeline manages the lifecycle of verified queries:

QueryMiner (query_miner.py) — Mines the query log for frequently asked questions and high-confidence patterns that are candidates for verification
FeedbackPromoter (feedback_promoter.py) — Automatically promotes queries with consistent positive feedback ratings above a threshold
GenAIGenerator (genai_generator.py) — Uses the LLM to generate candidate VQs from concept definitions, which human stewards then review
SQLImporter (sql_importer.py) — Imports SQL patterns from external sources (dbt models, Looker views, etc.)

Architecture

Verified queries are stored in PostgreSQL with the following key fields:

question_pattern — The natural language pattern this VQ answers
sql_template — The validated SQL
tables — Tables referenced (for scoring)
status — Active, pending review, or retired
execution_count — How many times this VQ has been matched and used
workspace_id — Workspace isolation for multi-tenant deployments

Configuration

Verified queries can be loaded from:

Database — PostgreSQL verified_queries table
JSON files — demo/verified_queries.json for demo data
API — REST endpoints for CRUD operations

The invalidate_cache() function clears the in-memory VQ cache, forcing a reload from the database on the next request.

Technical Details

VQ matching uses a scoring algorithm that weights keyword overlap, table coverage, and freshness
Active VQ IDs are cached with a TTL for performance (get_active_vq_ids())
The promotion threshold is configurable — queries need N positive feedbacks with no negative feedback in the last M days
VQs support dialect-specific SQL (PostgreSQL, Snowflake, BigQuery) via the dialect compiler

How It Works

Finding Relevant VQs (verified_query_loader.py)

VQ Pipeline (vq_pipeline/)

Architecture

Configuration

Technical Details

On this page

Finding Relevant VQs (`verified_query_loader.py`)

VQ Pipeline (`vq_pipeline/`)