Pramiti Docs

NLQ Engine

Query routing, SQL generation, validation gate, and self-correction

The NLQ Engine is the core orchestrator that transforms natural language questions into validated, executed SQL. It routes questions through multiple trust tiers, generates SQL with semantic context, validates it before execution, and self-corrects on failure.

How It Works

Query Router (query_router.py)

The query router classifies incoming questions by intent using keyword scoring and pattern matching. It returns a RouteDecision with the classified intent and matched ontology classes.

Intent classes:

IntentDescriptionLLM Required
knowledgeDirect definition lookup (acronyms, concepts)No
vocabularyOntology context + LLM synthesis for plain EnglishYes
discoverySchema exploration ("what tables exist?")No
analyticalFull SQL generation pipelineYes
impactBackward/forward propagation analysisNo
ontopSPARQL template match against Virtual Knowledge GraphNo
metricComposable metric computationYes

The classify_intent() function uses keyword-to-class mappings that can be customized per workspace. Keywords are scored and matched against registered ontology classes to determine the best routing path.

SemanticEngine (engine.py)

The SemanticEngine class is the main orchestrator. Its answer() method:

  1. Sanitizes the input question (PII scrubbing, injection guards)
  2. Calls the query router to classify intent
  3. Routes to the appropriate trust tier handler
  4. Returns an NLQResult with SQL, data, confidence score, and metadata

Key features:

  • Circuit breaker — Prevents cascading failures when LLM or database is down
  • Soft timeout — Queries that exceed EPISTOM_QUERY_TIMEOUT_SECONDS are cancelled
  • PII masking — Results are scanned for PII patterns and redacted
  • Multi-source routing — Questions spanning multiple data sources can route through Trino federation

NLQ Engine (nlq_engine.py)

The NLQEngine class handles the LLM interaction for SQL generation:

  1. Builds a prompt with semantic context (schema, definitions, verified queries)
  2. Sends it to the configured LLM via the LLMAdapter
  3. Extracts SQL from the response
  4. Passes it through the SQL validator

Prompt Assembler (prompt_assembler.py)

Assembles the LLM prompt by combining:

  • Database schema (relevant tables and columns)
  • Ontology definitions (what concepts mean)
  • Verified query examples (few-shot patterns)
  • Business rules (constraints and aggregation rules)
  • Question sanitization (removes PII patterns, injection attempts)

Self-Correction (self_correction.py)

When the SQL validator rejects generated SQL, the self-correction module:

  1. Analyzes the validation errors
  2. Appends error context to the prompt
  3. Asks the LLM to regenerate with corrections
  4. Re-validates the corrected SQL

This loop runs up to 2 times before returning a failure.

SQL Validator (sql_validator.py)

The pre-execution validation gate checks SQL before it touches the database:

  • Column existence — Every referenced column must exist in the schema
  • Table existence — Every referenced table must be in a registered source
  • Join path validation — JOIN conditions must use valid foreign key relationships
  • PII column check — Queries selecting PII-annotated columns are flagged or blocked
  • Anti-pattern detection — Common LLM SQL mistakes (HAVING without GROUP BY, correlated subqueries, etc.)
  • Multi-statement rejection — Only single SELECT statements are allowed
  • SQL injection guard — DDL, DML, and system commands are rejected

The validator uses both AST parsing (via sqlglot) and regex fallback for maximum coverage.

result = validate_sql(sql, schema, source="demo_postgres")
if result.valid:
    # Safe to execute
else:
    # result.errors contains structured error descriptions

Configuration

EPISTOM_LLM_PROVIDER=anthropic
EPISTOM_LLM_MODEL_ID=claude-sonnet-4-5
EPISTOM_MAX_QUERY_ROWS=1000
EPISTOM_QUERY_TIMEOUT_SECONDS=300
EPISTOM_SQL_READONLY=true

On this page