Review Paper: Knowledge graph-based image classification | by Mohammad Aryayi | May, 2024


Integrating knowledge into deep learning networks is a popular trend among artificial intelligence researchers. Using knowledge to improve the performance of machine learning models is a popular approach in the field of fine-grained classification. In such problems, there can be high similarities between images from different classes, as seen in the images below from the Caltech-ucsd birds-200–2011 datase.

Six examples of birds from different classes that are difficult to distinguish from one another due to their great similarity

The main challenges of incorporating domain knowledge into deep learning models include:

  • Defining the domain knowledge and determining its relevance to the problem at hand
  • Representing the domain knowledge in a suitable format for the model
  • Addressing the overlap between the domain knowledge and the knowledge that can be learned by the deep learning model itself.

Other challenges pertain to the incorporation methods themselves, such as determining the mechanisms to adjust the amount of domain knowledge to be incorporated, deciding when to incorporate knowledge, and identifying the most suitable areas for incorporation.

In this paper, the authors introduce a method called KGIC (Knowledge Graph-based Image Classification) that utilizes a knowledge graph to improve the performance of CNNs for image classification. The proposed model introduces a new loss function derived from knowledge represented as a knowledge graph. In this method, the integration of knowledge occurs only during the training phase.

Related work

The related work section in this paper is well-done. Usually, I tend to skip or quickly read this part of papers, but if you’re interested in the topic, I recommend giving it a read. The authors organize the related work greatly, and it can provide valuable information for you (and me!).


The authors of this paper introduced KGIC (for Knowledge Graph-based Image Classification) relying on a knowledge graph to enhance the performance of CNNs for image classification. KGIC involves two key steps: (1) creating a knowledge graph from attribute/value information associated with the training dataset and computing embeddings for nodes, and (2) formulating a loss function that accounts for the similarity between image representations and classes. The figure below provides an overview of KGIC.

Overview of KGIC: an embedding algorithm (Node2vec) takes an input the knowledge graph and compute node representations according their neighbourhoods. This representations are used in the loss function during training step.

In the beginning of this section, they explain some basic concepts about graphs that will be used in the following parts. After that, they discuss the important features that graph embedding should have. The main objective is to introduce a method that fulfills these three requirements:

  • Maximize the similarity between pairs of image nodes belonging to the same class
  • Maximize the similarity between the image nodes and their corresponding class nodes
  • Keep the disimilarity between the pairs of class nodes induced by the knowledge.

More precisely, the goal of embedding of the knowledge is to help identify complex situation (low similarity between the images within the same class and high similarity between two different classes) and take into account this information when training the classifier.

The creation of the knowledge graph is crucial. To create that, they defined the nodes as follows:

  • one node for each image
  • one node for each class
  • one (attribute-value) node for each couple (a,v), where a is an attribute and v is one of the possible values of attribute
  • when a hierarchy is available, one (concept) node for each concept in the hierarchy

The edges together with their weights are created as below:

  • For each image π‘₯, let 𝑐 be its class, let 𝑣π‘₯ and 𝑣𝑐 be the nodes representing π‘₯ and 𝑐 respectively. An edge {𝑣π‘₯, 𝑣𝑐 } is created with the weight 𝑠 = 1.
  • For each image π‘₯ that is described by an attribute π‘Ž with the value 𝑣 and with the weight 𝑀π‘₯,(π‘Ž,𝑣), let 𝑣π‘₯ and π‘£π‘Žπ‘£ be the nodes representing π‘₯ and (π‘Ž, 𝑣) respectively. An edge {𝑣π‘₯, π‘£π‘Žπ‘£} is created with the weight 𝑠 = 𝑀π‘₯,(π‘Ž,𝑣).
  • For each concept 𝑐 and its superclass 𝑐′, let 𝑣𝑐 and 𝑣𝑐′ the nodes representing them respectively. An edge {𝑣𝑐 , 𝑣𝑐′ } is created
    with the weight 𝑠 = 1.
  • For each class 𝑐 and each couple (π‘Ž, 𝑣), let 𝑣𝑐 and π‘£π‘Žπ‘£ be the nodes representing them respectively. An edge {𝑣𝑐 , π‘£π‘Žπ‘£} is created with the weight by calculating the averages of the certainty for image in that class.

The figure below illustrates a portion of the knowledge graph constructed for Caltech-ucsd birds-200–2011 dataset.

Example of a knowledge graph created from Caltech-ucsd birds-200–2011 dataset. Each node represents an image, a visual concept (attribute/value) or a class. Edges between image and visual concept nodes are weighted based on the certainty available in the dataset. The edges between the class and the visual concept nodes are weighted by calculating the averages of the certainty for image in that class.

For knowledge graph embedding, the authors utilized the Node2Vec algorithm, which allowed them to obtain embedding representations for all the nodes in the graph. Using this data, they defined a knowledge loss function. This loss function acts as an injection point within the neural network architecture, where they incorporate the knowledge.

In the formula of this loss function, when the image is far from its associated class in the embedded system or when two classes are close to each other, it puts more weight. Finally, this knowledge loss will be added to the standard cross-entropy loss using a weighting coefficient, which is a hyperparameter. They introduced this constant coefficient because these two loss functions are not on the same scale. The figure below illustrates their algorithm.


The authors tested their method on three different public datasets and provided a detailed discussion about them. They also discussed the parameters of the Vision Transformer (ViT) networks used as classifiers, which you can find more information about in the main paper.


In the table below, you can observe the comparison between state-of-the-art methods and the method proposed in this paper for one of the datasets. As you can see, the proposed method achieves better results, and this trend continues for other datasets as well.

Furthermore, the authors investigated the impact of the weighting parameter used as a hyperparameter in the loss function. The results indicated that the optimal accuracy varies from one dataset to another, highlighting the need to test and determine the best value for each dataset. The figure below demonstrates the variation in results across different databases.

The variation of results on different datasets depending of 𝛼 for KGIC


Here are some key points from their conclusion:

  • A knowledge graph integration method in deep architecture is proposed and was applied to image classification
  • Additional knowledge loss is proposed to encourage the model to focus on identifying the boundary between the most challenging classes to discriminate.
  • A major step in this method is the creation and embedding of the knowledge graph which improved performance on all datasets with a margin of 0.1% to 3.1% compared to the best second competitor.

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here