Enterprise Vector Knowledge Graph
A reusable knowledge graph architecture for compliance-aware document intelligence — deployed in insurance, piloted in research.
Project Overview
A reusable architecture for knowledge graph-based document intelligence, designed for compliance-sensitive domains where retrieval needs to be traceable, relationship-aware, and bounded by explicit rules.
The core idea is to treat a Neo4j knowledge graph as the retrieval layer — with vector embeddings on graph nodes for semantic search and explicit edges encoding compliance boundaries, entity relationships, and domain logic. Queries resolve through a combination of graph traversal and semantic similarity, rather than flat vector search alone.
Built once for Phonx AI's insurance compliance use case and structured to be redeployable. A second deployment is running in a research institution pilot, retrieving across academic literature and institutional documentation.
Why Standard RAG Falls Short Here
Standard retrieval-augmented generation retrieves by semantic similarity — find the chunks most similar to the query, inject them into the prompt, generate a response. That works well for general knowledge bases. It breaks down in compliance-sensitive domains for three reasons:
In regulatory text, similar-sounding language can mean legally distinct things. A flat cosine match on embeddings does not distinguish between a general statement and an overriding specific exception.
Questions like "does plan A cover procedure B for patient type C" require traversing relationships — plan → benefit → eligibility rule → exception. Flat retrieval can surface all the relevant chunks but cannot assemble the answer from the relationships between them.
Standard RAG generates a response then filters it, or relies on prompting to constrain output. For regulated domains, constraints need to be upstream of generation — encoded in how retrieval is structured, not bolted on after.
The graph is not a performance optimisation over flat vector search — it is a different data model that makes relationship traversal and compliance-boundary enforcement first-class operations at retrieval time, not filtering passes after generation.
The Retrieval Pipeline
Five steps, executed on each query:
The incoming query is embedded using the same model used to embed graph nodes at ingestion time, ensuring the vector space is consistent.
A vector similarity search over Neo4j's vector index returns the top-k semantically relevant nodes — entities, regulations, policy sections, or domain concepts depending on the deployment.
For each retrieved node, the graph is traversed outward through typed relationships — prerequisites, exceptions, superseding rules, related entities. This surfaces context that a flat embedding match would miss entirely.
Before any content reaches the LLM context, explicit compliance edges are checked. Content connected to a boundary node that restricts what the system can assert is flagged or excluded at this stage.
The retrieved and validated subgraph is serialised into a structured context for the LLM. Every assertion in the generated response is traceable to a specific node and its source document.
Key Design Decisions
Neo4j handles both vector search (via its native vector index) and graph traversal in a single query. This avoids the synchronisation complexity of maintaining a separate vector store alongside a graph database.
Embeddings are attached to individual nodes, not to document chunks. This means semantic search returns graph entities — not text passages — so the result is already in a form that supports relationship traversal.
Compliance constraints are modelled as explicit relationship types in the graph schema. The compliance check at step 04 is a graph query, not a prompt instruction. It cannot be overridden by the LLM.
Documents are parsed, entities and relationships are extracted, and the graph is upserted. The ingestion pipeline is idempotent — re-running on an updated document updates the affected nodes without duplicating the graph.
A single POST endpoint accepts the query, runs the retrieval pipeline, and returns a structured response with the generated answer and the source nodes that grounded it. Consumers can render citations or ignore them.
The core schema — nodes, vector index, compliance edges, audit log — is domain-agnostic. Domain-specific content comes from the ingestion configuration, not the graph model. This is what made the research deployment possible without rebuilding the system.
Traceability at Every Query
Tamper-evident audit log
Every query and its response are logged with the source node IDs that grounded the answer, a timestamp, and a hash of the response content. The log is append-only and the hash makes post-hoc modification detectable. Compliance officers can reconstruct exactly what the system retrieved and asserted for any past query.
Encryption at rest and in transit
The Neo4j instance and the document store are encrypted at rest. The API surface is HTTPS only. For the Phonx deployment running on-premises, the encryption configuration was a hard requirement before going live — the knowledge graph indexes regulated insurance documents and policyholder-adjacent data.
Where It Runs
Phonx AI — US Insurance
The knowledge graph is the compliance backbone for Phonx's voice AI system. It holds insurance plan structures, eligibility rules, carrier-specific policies, and the regulatory boundaries that govern what the voice agent can assert during a live call. Graph traversal runs inside the real-time inference loop — the retrieval result must land within the one-second turn budget.
ProductionResearch Institution — Document Intelligence
A pilot deployment for a research institution indexing academic literature, internal reports, and institutional documentation. The compliance boundary model is lighter in this deployment — the priority is relationship-aware retrieval across a large corpus, with citation traceability for researchers reviewing the sources behind any answer.
PilotTechnologies Used
What I'd Approach Differently Today
Invest more in the graph schema design upfront.
The schema evolved during the Phonx build as we discovered new relationship types the compliance model needed. Schema changes in a populated Neo4j graph are manageable but costly. If I were starting again I'd spend two or three times as long on schema design before ingesting any production data — specifically mapping all compliance edge types and their directionality before the first document touches the graph.
Build the evaluation harness earlier.
Retrieval quality in a knowledge graph is harder to measure than in flat RAG because "correctness" depends on both the retrieved nodes and the traversal path. I built the evaluation tooling late, which meant early tuning decisions on neighbourhood expansion depth were made by inspection rather than measurement. A held-out query set with expected source-node annotations should be the first artefact, not a late addition.
Abstract the compliance edge model earlier for reuse.
The research deployment reused the core architecture but required significant work to re-parameterise the compliance edge model for a different domain. That gap was smaller than rebuilding from scratch, but it pointed to a real opportunity: the compliance boundary layer should be a first-class configuration surface, not something that requires touching the schema definition directly.