← Back to Projects

Enterprise Vector Knowledge Graph Platform

Problem

A pharmaceutical research centre operated in a regulatory labyrinth spanning multiple domains. Researchers needed a way to validate SOPs, compliance documents and procedures against guidelines from seven different governing bodies.

Context / Business Need

The goal was to build a cross-domain validation and compliance engine that centralised regulatory guidelines, enabled audit-ready responses and provided instant clarity to engineering and scientific teams. The solution had to run within the client’s private network and integrate with existing knowledge systems.

Constraints

  • Seven separate governing bodies with varying regulations and vocabularies.
  • Requirement to run entirely within the client’s secure network.
  • Support for 5,000+ pages of documentation with near-real-time updates.
  • Interoperability with existing document management and search tools.
  • Ensuring accurate, legally defensible responses for audits.

My Role

As lead architect, I defined the vector knowledge graph data model, designed the ingestion and embedding pipelines, selected similarity metrics and oversaw the API layer. I collaborated with domain experts to ensure regulatory completeness and consulted security teams to meet stringent privacy requirements.

System-Thinking Approach

We approached the platform as a living knowledge ecosystem. Each regulation, SOP and domain entity became a node in the graph, with vector embeddings capturing semantic meaning. A hybrid search strategy combined vector similarity with rule-based compliance logic to deliver precise responses.

MVP Design

The MVP ingested documents from seven regulators, extracted entities, generated embeddings via a transformer model and stored them with graph relationships. A search API offered contextual retrieval and validation suggestions, with an audit log for traceability.

Architecture Breakdown

  • An ingestion service parsing PDF, Word and HTML documents.
  • An embedding service built on domain-tuned transformer models.
  • A graph database supporting vector indexes and rule-based edges.
  • A compliance engine applying regulatory rules across domains.
  • An API layer providing search, validation and audit logging.

Final Solution & Results

The deployed platform indexed 5,000+ documents and delivered 98%+ contextual accuracy. Research teams validated SOPs and resolved knowledge gaps in minutes instead of days, with strong traceability for audits.

Tech Stack

  • Python, FastAPI and Node.js
  • Transformer models (domain-tuned) with Faiss vector indexing
  • Graph database (Neo4j) with vector search extensions
  • Docker & Kubernetes (on-prem)
  • GraphQL and REST APIs