Build and Query Knowledge Graphs with LLMs


Graphs are relevant

A Knowledge Graph could be defined as a structured representation of information that connects concepts, entities, and their relationships in a way that mimics human understanding. It is often used to organise and integrate data from various sources, enabling machines to reason, infer, and retrieve relevant information more effectively.

In a previous post on Medium I made the point that this kind of structured representation can be used to enhance and perfect the performances of LLMs in Retrieval Augmented Generation applications. We could speak of GraphRAG as an ensemble of techniques and strategies employing a graph-based representation of knowledge to better serve information to LLMs compared to more standard approaches that could be taken for “Chat with your documents” use cases.

The “vanilla” RAG approach relies on vector similarity (and, sometimes, hybrid search) with the goal of retrieving from a vector database pieces of information (chunks of documents) that are similar to the user’s input, according to some similarity measure such as cosine or euclidean. These pieces of information are then passed to a Large Language Model that is prompted to use them as context to generate a relevant output to the user’s query.

My argument is that the biggest point of failure in those kind of applications is similarity search relying on explicit mentions in the knowledge base (intra-document level), leaving the LLM blind to cross-references between documents, or even to implied (implicit) and contextual references. In brief, the LLM is limited as it cannot reason at a inter-document level.

This can be addressed moving away from pure vector representations and vector stores to a more comprehensive way of organizing the knowledge base, extracting concepts from each piece of text and storing while keeping track of relationships between pieces of information.

Graph structure is in my opinion the best way of organizing a knowledge base with documents containing cross-references and implicit mentions to each other like it always happens inside organizations and enterprises. A graph main features are in fact

  • Entities (Nodes): they represent real-world objects like people, places, organizations, or abstract concepts;
  • Relationships (Edges): they define how entities are connected between them (i.e: “Bill → WORKS_AT → Microsoft”);
  • Attributes (Properties): provide additional details about entities (e.g., Microsoft’s founding year, revenue, or location) or relationships ( i.e. “Bill → FRIENDS_WITH since: 2021 → Mark”).

A Knowledge Graph can then be defined as the Graph representation of corpora of documents coming from a coherent domain. But how exactly do we move from vector representation and vector databases to a Knowledge Graph?

Further, how do we even extract the key information to build a Knowledge Graph?

In this article, I will present my point of view on the subject, with code examples from a repository I developed while learning and experimenting with Knowledge Graphs. This repository is publicly available on my Github and contains:

  • the source code of the project
  • example notebooks written while building the repo
  • a Streamlit app to showcase work done until this point
  • a Docker file to built the image for this project without having to go through the manual installation of all the software needed to run the project.

The article will present the repo in order to cover the following topics:

Tech Stack Breakdown of the tools available, with a brief presentation of each of the components used to build the project.

How to get the Demo up and running in your own local environment.

How to perform the Ingestion Process of documents, including extracting concepts from them and assembling them into a Knowledge Graph.

How to query the Graph, with a focus on the variety of possible strategies that can be employed to perform semantic search, graph query language generation and hybrid search.

If you are a Data Scientist, a ML/AI Engineer or just someone curious on how to build smarter search systems, this guide will walk you through the full workflow with code, context and clarity.


Tech Stack Breakdown

As a Data Scientist who started learning programming in 2019/20, my main language is of course Python. Here, I am using its 3.12 version.

This project is built with a focus on open-source tools and free-tier accessibility both on the storage side as well as on the availability of Large Language Models. This makes it a good starting point for newcomers or for those who are not willing to pay for cloud infrastructure or for OpenAI’s API KEYs.

The source code is, however, written with production use cases in mind — focusing not just on quick demos, but on how to transition a project to real-world deployment. The code is therefore designed to be easily customizable, modular, and extendable, so it could be adapted to your own data sources, LLMs, and workflows with minimal friction.

Below is a breakdown of the key components and how they work together. You can also read the repo’s README.md for further information on how to get up and running with the demo app.

🕸️ Neo4j — Graph Database + Vector Store

Neo4j powers the knowledge graph layer and also stores vector embeddings for semantic search. The core of Neo4j is Cypher, the query language needed to interact with a Neo4j Database. Some of the key other features from Neo4j that are used in this project are:

  • GraphDB: To store structured relationships between entities and concepts.
  • VectorDB: Embedding support allows similarity search and hybrid queries.
  • Python SDK: Neo4j offers a python driver to interact with its instance and wrap around it. Thanks to the python driver, knowing Cypher is not mandatory to interact with the code in this repo. Thanks to the SDK, we are able to use other python graph Data Science libraries as well, such as networkx or python-louvain.
  • Local Development: Neo4j offers a Desktop version and it also could be easily deployed via Docker images into containers or on any Virtual Machine (Linux/macOS/Windows).
  • Production Cloud: You can also use Neo4j Aura for a fully-managed solution; this comes with a free tier, and it’s ready to be hosted in any cloud of your choice depending on your needs.

🦜 LangChain — Agent Framework for LLM Workflows

LangChain is used to coordinate how LLMs interact with tools like the vector index and the entities in the Knowledge Graphs, and of course with the user input.

  • Used to define custom agents and toolchains.
  • Integrates with retrievers, memory, and prompt templates.
  • Makes it easy to swap in different LLM backends.

🤖 LLMs + Embeddings

LLMs and Embeddings can be invoked both from a local deployment using Ollama or an online endpoint of your choice. I am currently using the Groq free-tier API to experiment, switching between gemma2-9b-it and various versions of Llama, such as meta-llama/llama-4-scout-17b-16e-instruct . For Embeddings, I am using mxbai-embed-large running via Ollama on my M1 Macbook Air; on the same setup I was also able to run llama3.2 (2B) in the past, keeping in mind my hardware limitations.

Both Ollama and Groq are plug and play and have Langchain’s wrappers.

👑 Streamlit — Frontend UI for Interactions & Demos

I have written a small demo app using Streamlit, a python library that allows developers to build minimal frontend layers without writing any HTML or CSS, just pure python.

In this demo app you will see how to

  • Ingest your documents into Neo4j under a Graph-based representation.
  • Run live demos of the graph-based querying, showcasing key differences between various querying strategies.

Streamlit’s main advantages is that it’s super lightweight, fast to deploy, and doesn’t require a separate frontend framework or backend. Its features make it the perfect fit for demos and prototypes such as this one.

This is what an app looks like in Streamlit

However, it is not suitable for production apps because of it limited customisation features and UI control, as well as the absence of a native way to perform authorisation and authentication, and a proper way to handle scaling. Going from demo to production usually requires a more suitable front-end framework and a clear separation between back-end and front-end frameworks and their responsibilities.

🐳 Docker — Containerisation for Local Dev & Deployment

Docker is a tool that lets you package your application and all its dependencies into a container — a lightweight, standalone, and portable environment that runs consistently on any system.

Since I imagined it could be challenging to manage all the mentioned dependencies, I also added a Dockerfile for building an image of the app, so that Neo4j, Ollama and the app itself could run in isolated, reproducible containers via docker-compose.

To run the demo app yourself, you can follow the instructions on the README.md

Now that the tech stack we are going to use has been presented, we can deep dive into how the app actually works behind the curtains, starting from the ingestion pipeline.


From Text Corpus to Knowledge Graph

As I previously mentioned, it is recommendable that documents that are being ingested into a Knowledge Graph come from the same domain. These could be manuals from the medical domain on diseases and their symptoms, code documentation from past projects, or newspaper articles on a particular subject. 

Being a politics geek, to test and play with my code, I choose pdf Press Materials from the European Commission’s Press corner.

Once the documents have been collected, we have to ingest them into the Knowledge Graph.

The ingestion pipeline needs to follow the steps reported below

The reference source code for this part of the article is in src/ingestion.

1. Load files into a machine-friendly format

In the code example below, the class Ingestoris used to infer the mime type of each file we’re trying to read and langchain’s document loaders are employed to read its content accordingly; this allows for customisations regarding the format of source files that will populate our Knowledge Graph.

class Ingestor:
    """ 
    Base `Ingestor` Class with common methods. 
    Can be specialized by source.
    """ 
    def ___init__(self, source: Source):
        self.source = source
    
    @abstractmethod
    def list_files(self)-> List[str]:
        pass

    @abstractmethod
    def file_preparation(self, file) -> Tuple[str, dict]:
        pass

    @staticmethod
    def load_file(filepath: str, metadata: dict) -> List[Document]:
        mime = magic.Magic(mime=True)
        mime_type = mime.from_file(filepath) or metadata.get('Content-Type')
        if mime_type == 'inode/x-empty':
            return []

        loader_class = MIME_TYPE_MAPPING.get(mime_type)
        if not loader_class:
            logger.warning(f'Unsupported MIME type: mime_type for file filepath, skipping.')
            return []
        
        if loader_class == PDFPlumberLoader:
            loader = loader_class(
                file_path=filepath,
                extract_images=False,
            )
        elif loader_class == Docx2txtLoader:
            loader = loader_class(
                file_path=filepath
            )
        elif loader_class == TextLoader:
            loader = loader_class(
                file_path=filepath
            )
        elif loader_class == BSHTMLLoader:
            loader = loader_class(
                file_path=filepath,
                open_encoding="utf-8",
            )
        try: 
            return loader.load()
        except Exception as e:
            logger.warning(f"Error loading file: filepath with exception: e")   
            pass 
            
    @staticmethod
    def merge_pages(pages: List[Document]) -> str:
        return "\n\n".join(page.page_content for page in pages)

    @staticmethod
    def create_processed_document(file: str, document_content: str, metadata: dict):
        processed_doc = ProcessedDocument(filename=file, source=document_content, metadata=metadata)
        return processed_doc

    def ingest(self, filename: str, metadata: Dict[str, Any]) -> ProcessedDocument | None:
        """ 
        Loads a file from a path and turn it into a `ProcessedDocument`
        """

        base_name = os.path.basename(filename)

        document_pages = self.load_file(filename, metadata)

        try: 
            document_content = self.merge_pages(document_pages)
        except(TypeError):
            logger.warning(f"Empty document filename, skipping..")
        
        if document_content is not None:
            processed_doc = self.create_processed_document(
                base_name, 
                document_content, 
                metadata
            )
            return processed_doc
        
    def batch_ingest(self) -> List[ProcessedDocument]:
        """
        Ingests all files in a folder
        """
        processed_documents = []
        for file in self.list_files():
            file, metadata = self.file_preparation(file)
            processed_doc = self.ingest(file, metadata)
            if processed_doc:
                processed_documents.append(processed_doc)
        return processed_documents

2. Clean and split document content into text chunks

This is necessary for the graph extraction phase ahead of us. To clean texts, depending on domain and on the document’s format, it might make sense to write custom cleaning and chunking functions. This is where the document’s chunks list is populated.

Chunking size, overlap and other possible configurations here could be domain dependent and should be configured according to the expertise of the DS / AI Engineer; the class in charge of chunking is exemplified below.

class Chunker:
    """
    Contains methods to chunk the text of a (list of) `ProcessedDocument`.
    """

    def __init__(self, conf: ChunkerConf):
        self.chunker_type = conf.type

        if self.chunker_type == "recursive":

            self.chunk_size = conf.chunk_size
            self.chunk_overlap = conf.chunk_overlap

            self.splitter = RecursiveCharacterTextSplitter(
                chunk_size=self.chunk_size, 
                chunk_overlap=self.chunk_overlap, 
                is_separator_regex=False
            )
        
        else: 
            logger.warning(f"Chunker type 'self.chunker_type' not supported.")

    def _chunk_document(self, text: str) -> list[str]:
        """Chunks the document and returns a list of chunks."""
        return self.splitter.split_text(text)

    def get_chunked_document_with_ids(
        self, 
        text: str, 
        ) -> list[dict]:
        """Chunks the document and returns a list of dictionaries with chunk ids and chunk text."""
        return [
            
                "chunk_id": i + 1,
                "text": chunk,
                "chunk_size": self.chunk_size, 
                "chunk_overlap": self.chunk_overlap
            
            for i, chunk in enumerate(self._chunk_document(text))
        ]
    
    def chunk_document(self, doc: ProcessedDocument) -> ProcessedDocument:
        """
        Chunks the text of a `ProcessedDocument` instance.
        """
        chunks_dict = self.get_chunked_document_with_ids(doc.source)
        
        doc.chunks = [Chunk(**chunk) for chunk in chunks_dict]

        logger.info(f"DOcument doc.filename has been chunked into len(doc.chunks) chunks.")
        
        return doc

    def chunk_documents(self, docs: List[ProcessedDocument]) -> List[ProcessedDocument]:
        """
        Chunks the text of a list of `ProcessedDocument` instances.
        """
        updated_docs = []
        for doc in docs:
            updated_docs.append(self.chunk_document(doc))
        return updated_docs

3. Extract Concepts Graph

For each chunk in the document, we want to extract a graph of concepts. To do so, we program a custom agent powered by a LLM with this precise task. Langchain comes in handy here due to a method called with_structured_output that wraps LLM calls and lets you define the expected output schema using a pydantic model. This ensures that the LLM of your choice returns structured, validated responses and not free-form text.

This is what the GraphExtractor looks like:

class GraphExtractor:
    """ 
    Agent able to extract informations in a graph representation format from a given text.
    """
    def __init__(self, conf: LLMConf, ontology: Optional[Ontology]=None):
        self.conf = conf
        self.llm = fetch_llm(conf)
        self.prompt = get_graph_extractor_prompt()

        self.prompt.partial_variables = 
            'allowed_labels':ontology.allowed_labels if ontology and ontology.allowed_labels else "", 
            'labels_descriptions': ontology.labels_descriptions if ontology and ontology.labels_descriptions else "", 
            'allowed_relationships': ontology.allowed_relations if ontology and ontology.allowed_relations else ""
        

    def extract_graph(self, text: str) -> _Graph:
        """ 
        Extracts a graph from a text.
        """

        if self.llm is not None:
            try:
                graph: _Graph = self.llm.with_structured_output(
                    schema=_Graph
                    ).invoke(
                        input=self.prompt.format(input_text=text)
                    )

                return graph 
                
            except Exception as e:
                logger.warning(f"Error while extracting graph: e")

Notice that the expected output _Graph is defined as:

class _Node(Serializable):
    id: str
    type: str
    properties: Optional[Dict[str, str]] = None

class _Relationship(Serializable):
    source: str
    target: str
    type: str
    properties: Optional[Dict[str, str]] = None

class _Graph(Serializable):
    nodes: List[_Node]
    relationships: List[_Relationship]

Optionally, the LLM agent in charge of extracting a graph from chunks can be provided with an Ontology describing the domain of the documents. 

An ontology can be described as the formal specification of the types of entities and relationships that can exist in the graph — it is, essentially, its blueprint.

class Ontology(BaseModel):
    allowed_labels: Optional[List[str]]=None
    labels_descriptions: Optional[Dict[str, str]]=None
    allowed_relations: Optional[List[str]]=None

4. Embed each chunk of the document

Next, we want to obtain a vector representation of the text contained in each chunk. This can be done using the Embeddings model of your choice and passing the list of documents to the ChunkEmbedder class.

class ChunkEmbedder:
    """ Contains methods to embed Chunks from a (list of) `ProcessedDocument`."""
    def __init__(self, conf: EmbedderConf):
        self.conf = conf
        self.embeddings = get_embeddings(conf)

        if self.embeddings:
            logger.info(f"Embedder of type 'self.conf.type' initialized.")

    def embed_document_chunks(self, doc: ProcessedDocument) -> ProcessedDocument:
        """
        Embeds the chunks of a `ProcessedDocument` instance.
        """
        if self.embeddings is not None:
            for chunk in doc.chunks:
                chunk.embedding = self.embeddings.embed_documents([chunk.text])
                chunk.embeddings_model = self.conf.model
            logger.info(f"Embedded len(doc.chunks) chunks.")
            return doc
        else: 
            logger.warning(f"Embedder type 'self.conf.type' is not yet implemented")

    def embed_documents_chunks(self, docs: List[ProcessedDocument]) -> List[ProcessedDocument]:
        """
        Embeds the chunks of a list of `ProcessedDocument` instances.
        """
        if self.embeddings is not None:
            for doc in docs:
                doc = self.embed_document_chunks(doc)
            return docs
        else: 
            logger.warning(f"Embedder type 'self.conf.type' is not yet implemented")
            return docs

5. Save the embedded chunks into the Knowledge Graph

Finally, we have to upload the documents and their chunks in our Neo4j instance. I’ve built upon the already available Neo4jGraph langchain class to create a customised version for this repo.

The code of the KnowledgeGraph class is available at src/graph/knowledge_graph.py and this is how its core method add_documents works:

a. for each file, create a Document node on the Graph with its properties (metadata) such as the source of the file, the name, the ingestion date..

b. for each chunk, create a Chunk node, connected to the original Document node by a relationship (PART_OF) and save the embedding of the chunk as a property of the node; connect each Chunk node with the following with another relationship (NEXT).

c. for each chunk, save the extracted subgraph: nodes, relationships and their properties; we also connect them to their source Chunk with a relationship (MENTIONS).

d. perform hierarchical clustering on the Graph to detect communities of nodes inside it. Then, use a LLM to summarise the resulting communities obtaining Community Reports and embed said summaries. 

Communities in a graph are clusters or groups of nodes that are more densely connected to each other than to the rest of the graph. In other words, nodes within the same community have many connections with each other and relatively fewer connections with nodes outside the group.

The result of this process in Neo4j looks something like this: data structured into entities and relationships with their properties, just as we wanted. In particular, Neo4j also offers the opportunity to have multiple vector indexes in the same instance, and we exploit this feature to separate the embeddings of chunks from those of communities.

Knowledge Graph obtained from European Commission Press Corner’s PDFs: we can observe Document nodes (lightblue), Chunk nodes (pink) and Entity nodes (orange). Blue nodes represent Community Reports and green nodes are for Graph Metrics.

In the image above, you might have noticed that some nodes in the Graph are more connected to each other, while other nodes have fewer connection and lie on the borders of the Graph. Since the image you are looking at is produced from the European Commission’s Press Corner pdfs, it is only normal that in the center we could find entities such as “Von Der Leyen” (President of the European Commission) or even “European Commission”: in fact, those are some of the most mentioned entities in our Knowledge Graph.

Below, you can find a more zoomed-in screenshot, where relationship and entity names are actually visible. The original filename of the document (lightblue) at the center is “Commission sets course for Europe’s AI leadership with an ambitious AI Continent Action Plan”. Apparently the extraction of entities and relationships via LLM worked fairly fine on this one.

Here labels and relationships are visible and can be used to get a grasp on the subject of one of the press releases.

Once the Knowledge Graph has been created, we can employ LLMs and Agents to query it and ask questions on the available documents. Let’s go for it!


Graph-informed Retrieval Augmented Generation

Since the release of ChatGPT in late 2022, I have built my fair share of POCs and Demos on Retrieval Augmented Generation, “chat-with-your-documents” use cases.

They all share the same methodology for giving the end user the desired answer: embed the user question, perform similarity search on the vector store of choice, retrieve k chunks (pieces of information) from the vector store, then pass the user’s question and the context obtained from those chunks to a LLM; finally, answer the question.

You might want to add some memory of the conversation (read: a chat history) and even callbacks to perform some guardrail activities such as keeping track of tokens spent in the process and latency of the answer. Many vector stores also allow for hybrid search, which is the same process mentioned above, only adding a filter on chunks based on their metadata before the similarity search even happens.

This is the level of complexity you get with this kind of RAG applications: choose the number of k texts you want to retrieve, predetermine the filters, choose the LLM in charge of answering. Eventually, these kind of approaches reach an asymptote in terms of performance, and you might be left with only a handful of options on how to tweak the LLM parameters to better handle user queries.

Instead, what does the RAG approach looks like with a Knowledge Graph? The honest answer to that question is: It really boils down on what kind of questions you are going to ask.

While learning about Knowledge Graphs and their applications in real world use cases, I spent a long time reading. Blogposts, articles and Medium posts, even some books. The more I dug, the more questions came to my mind, the less definitive my answers: apparently, when dealing with knowledge that is structured BOTH in a graph representation and into vector indexes, a lot of options open up.

After my reading, I spent some time developing my own answers (and the code that goes with it) on strategies that can be applied when querying the Knowledge Graph using Large Language Models. What follows is a brief excursus on my take on the subject.

the reference source code is part of the GraphAgentResponder class, available at src/agents/graph_qa.py .

1. Enhanced RAG

First of all, you could always perform the usual RAG process: chunks embeddings are, after all, available in the Graph, and they can be retrieved like with any other vector database. Moreover, you could also perform hybrid search, since those chunks are actually nodes, and nodes have properties (attributes) that can be used for filtering.

What if we went a step further and took into consideration that Chunk nodes are connected to each other by a NEXT relationship, and to the original Document node by a PART_OF relationship? Querying for similarity on embeddings and then enriching the context with neighbouring chunks might unlock new possibilities when it comes to level of details provided in the answer, but also the ability to quote the source file and its metadata.

Querying for neighbouring Chunks to enhance similarity search

For instance, similarity search could end up pointing to chunk 2 of the previously mentioned document, but querying for neighbouring chunks could allow the LLM to provide a more detailed, contextualised answer.

To give a working example, let’s consider a question on the nodes depicted in the screenshot previously shown about a press release focused on Artificial Intelligence: “what is the EU strategy on Artificial Intelligence?”. This is what our Agent answers:

> `The EU strategy on artificial intelligence is outlined in the AI Continent Action Plan, which aims to make Europe a global leader in AI. The plan focuses on five key pillars:`
> 1. `Building a large-scale AI data and computing infrastructure`
> 2. `Creating a true internal market for data (with a comprehensive Data Union Strategy to be launched in 2025)`
> 3. `Developing algorithms and fostering AI adoption in strategic EU sectors (through the Apply AI Strategy)`
> 4. `Strengthening AI skills and talents (through initiatives such as the Talent Pool, Marie Skłodowska-Curie Action, and AI fellowships schemes)`
> 
> `The plan also includes initiatives such as setting up AI Factories and AI Gigafactories to support EU AI startups, industry, and researchers in developing AI models and applications.`

2. Community Reports

Another version of enhanced RAG that came to my mind while building this first version of the repository employs Community Reports.

In the last step of the ingestion phase in fact, we used hierarchical clustering techniques such as the Leiden or Louvain algorithm to detect communities between nodes in the Knowledge Graph, and then employed a LLM to summarise those communities, obtaining Community Reports.

These reports are stored in the Graph under a different vector index than the one for Chunks, and both of those two indexes can be queried via similarity search. Further, retrieved contexts can be combined when passing them to the LLM to obtain answers that are grounded at the intra-document level but also are informed of what is happening in the graph at the inter-document level.

Community Reports stored in the Knowledge Graph as additional nodes with a summary and its embedding

This strategy is certainly better fitted for those kind of questions that might need information from multiple sources in the form of overviews or high level summaries.

Let’s go back to the “What is the EU strategy on Artificial Intelligence?” question and see how the bot answers now, being able to retrieve both chunks and community reports via similarity search:

> `The EU strategy on artificial intelligence aims to make the EU a leader in the development and deployment of AI that is trustworthy, safe, and respects human values and rights.`
> 
> 
> `To achieve this goal, in April 2019, the European Commission presented a Communication on AI, which outlined a European approach to AI. The approach focuses on:` 
> 
> 1. **`Promoting excellence in AI**: The EU wants to support research and innovation in AI, and encourage the development of AI startups and scale-ups.`
> 2. **`Ensuring trust in AI**: The EU aims to build trust in AI through a robust regulatory and legal framework. This includes the development of AI that respects EU values, such as the General Data Protection Regulation (GDPR).`
> 
> `The strategy also highlights the need for:`
> 
> - **`Investing in AI education and skills**: Developing the skills of the future workforce.`
> - **`Addressing the socio-economic changes driven by AI**: Preparing for changes in the job market.`
> - **`Establishing an EU AI ecosystem**: Encouraging collaboration and coordination among member states.`
> 
> `The European Commission also established an **AI regulatory framework**, proposing to regulate AI based on a **risk-based approach**, with stricter rules for high-risk AI applications. Would you like more information on specific aspects of the EU AI strategy?`

Clearly, the answer is more high-leveled than before. This is expected and is in fact what happens when having access to inter-documents contexts.

3. Cypher Queries

Moving away from the purely RAG-based strategy, a different option at our disposal now that we have our knowledge base structured in a graph is to ask the LLM to traverse it using a graph query language. In Neo4j, this means that we want to instruct the LLM with the schema of the graph and then ask it to write Cypher queries to inspect nodes, entities and relationships, based on the user’s question.

This is all possible thanks to the GraphCyperQAChain, which is a Chain class from langchain for question-answering against a graph by generating Cypher statements.

In the example below you are seeing what happens if you ask to the LLM the question “Who is Thomas Regnier?”.

The model writes a Cypher query similar to

MATCH (person:Person name: "Thomas Regnier")-[r]-(connected)
RETURN person.name AS name,
  type(r) AS relationship_type,
  labels(connected) AS connected_node_labels,
  connected

and after looking at the intermediate results answers like:

Thomas Regnier is the Contact person for Tech Sovereignity, 
defence, space and Research of the European Commission
Query for “Who is Thomas Regnier?” would result in these nodes being fetched in my graph

Another example question that you might be wanting to ask and that needs graph traversal capabilities to be answered could be “What Document mentions Europe Direct?”. The question would lead the Agent to write a Cypher query that search for the Europe Direct node → search for Chunk nodes mentioning that node → follow the PART_OF relationship that goes from Chunk to Document node(s).

This is what the answer look like:

> `The following documents mention Europe Direct:`
> 1. `STATEMENT/25/964`
> 2. `STATEMENT/25/1028`
> 3. `European Commission Press release (about Discover EU travel passes)`
> `These documents provide a phone number (00 800 67 89 10 11) and an email for Europe Direct for general public inquiries.`

Notice that this purely query-based approach might work out best for those questions that have a concise and direct answer inside the Knowledge Graph or when the Graph schema is well defined. Of course, the concept of schema in the Graph is tightly linked with the Ontology concept mentioned in the ingestion part of this article: the more precise and descriptive the Ontology, the better defined the schema, the easier for the LLM to write Cypher queries to inspect the Graph.

4. Community Subgraph

This strategy is a combination of the approach on CommunityReport and the Cypher approach, and can be broken down in the following steps:

  • obtain the most relevant Community Report(s) via similarity search
  • fetch the Chunks belonging to the most relevant communities
  • follow the MENTIONS relationship of those Chunks and use the community ids to obtain a community subgraph
  • pass the resulting context and a dictionary representing the subgraph to a Large Language Model to decide how to answer to the user.
Example of Community subgraph representing nodes in Leiden community between 0 and 2 in my graph

This is the most “work in progress” out of the strategies I listed so far, with results that vary greatly between different runs. They are not currently consistent and usually the LLM tends to get confused by all the information gathered. However, I do have the feeling this approach is worth exploring and investigating a little more.

What is immediately clear is how this strategy could get really complicated very soon. To handle this, I am pondering whether to use filters of some sort when fetching the community subgraph, while a bigger context size for the LLM (or a bigger LLM) could also surely help.

5. Cypher + RAG

The last strategy I want to propose is based on the combination of the Enhanced RAG approach and the Cypher Approach. In fact, it employs both the context from similarity search as well as the intermediate steps of the <strong>GraphCypherQAChain</strong> to come up with an exhaustive and coherent answer.

To give you an example of how the Agent might behave, let’s go back to one of the questions used for the query-based approach: “what documents mentions Europe Direct?”.

The answer is shown below. As can be noticed, it is both shorter and more informative.

> `The following documents mention Europe Direct:`
> 
> - `A press release about the European Commission offering 36,000 free EU travel passes to 18-year-olds, where general public inquiries can be made through Europe Direct by phone or email.`
> - `A statement about the European Commission's efforts to support businesses, workers, and Europeans, which includes contact information for general public inquiries through Europe Direct.`
> - `A press release about progress in research and innovation in Europe, which also provides contact information for general public inquiries through Europe Direct.`
> 
> `You can contact Europe Direct by phone at 00 800 67 89 10 11 or by email.`

This answering method is currently one of the most complete approaches I came up with, and it also has a fallback strategy: if something goes wrong on the query generation part (say, a query is too complex to write, or the LLM dedicated to it reaches its tokens limit), the Agent can still rely on the Enhanced RAG approach, so that we still get an answer from it.

Summing up and approach comparison

In the past few paragraphs, I presented my take on different answering strategies available when our knowledge base is well-organised into a Graph. My presentation however is far from complete: many other possibilities could be available and I plan to continue on studying on the matter and come up with more options.

In my opinion, since Graphs unlock so many options, the goal has to be understanding how these strategies would behave under different scenarios — from lightweight semantic lookups to multi-hop reasoning over a richly linked knowledge graph — and how to make informed trade-offs depending on the use case.

When building real-world applications, it’s critical to weight answering strategies not just by accuracy, but also by cost, speed, and scalability.

When deciding what strategy to employ, the key drivers that we might want to look at are

  • Tokens Usage: How many tokens are consumed per query, especially when traversing multi-hop paths or injecting large subgraphs into the prompt
  • Latency: The time it takes to process a retrieval + generation cycle, including graph traversal, prompt construction, and model inference
  • Performance: The quality and relevance of the generated responses, with respect to semantic fidelity, factual grounding, and coherence.

Below, I present a comparison table breaking down the answering methods proposed in this section, under the light of these drivers.


Closing Remarks

In this article, we walked through a complete pipeline for building and interacting with knowledge graphs using LLMs — from document ingestion all the way to querying the graph through a demo app.

We covered:

  • How to ingest documents and transform unstructured content into a structured Knowledge Graph representation using semantic concepts and relationships extracted via LLMs
  • How to host the Knowledge Graph in Neo4j
  • How to query the graph using a variety of strategies, from vector similarity and hybrid search to graph traversal and multi-hop reasoning — depending on the retrieval task
  • How the pieces integrate into a fully functional demo created with Streamlit and containerized with Docker.

Now I would like to hear opinions and comments.. and contributions are also welcome!

If you find this project useful, have ideas for new features, or want to help improve the existing components, feel free to jump in, open issues or sending in Pull Requests.

Thank you for reading until this point!


References

[1]. Data showcased in this article come from the European Commission’s press corner: https://ec.europa.eu/commission/presscorner/home/en. Press releases are available under Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here