The success of neural networks has often been traced back to their biological inspiration: the human brain. Just as our brains are composed of neurons that pass signals to one another, artificial neural networks replicated the idea, by passing information through layers of interconnected nodes. This simple yet powerful concept led to one of the most important breakthroughs of the century.
But while this neural analogy did great, most traditional networks are still limited in how they see the world — they assume that data comes in neat grids (like images) or ordered sequences (like text). The real world, however, is rarely that tidy. Think of a molecule made of atoms and bonds, a social network of people and friendships. Those can be represented by graphs — and they’re everywhere.
Theory of Graphs dates back to when Euler wrote his first paper in 1736 on the bridges of Königsberg. And as a result, these mathematical structures bacame widely used in the fields of computer science, biology, chemistry.
In 2008 Scariely et al. introduced the concept of Graph Neural Network. Unlike traditional neural networks, GNNs are chaotic in their nature. They allow us to build models that don’t just learn from individual data points, but from the connections between them. That makes them powerful tools for simulating physical systems, understanding complex interactions, and reasoning about structures that other networks can’t easily handle.
In this article, we’ll explore the basics of GNNs — what they are, how they work, and why they matter. In a follow-up, I’ll show how GNNs are being used to simulate the physical world — and how we might teach them to do more with less data.
At its core, a graph is just a collection of nodes (also called vertices) and edges. We donote it as G = (V,E) where V is defined as a set of nodes and E is a set of edges, E ⊆ V × V. These edges represent the connection between two nodes.
Beyond their mathematical elegance, graphs are incredibly versatile. They naturally arise in domains where data is best understood as a system of interactions:
In a social network, nodes are individuals and edges represent social ties.
In molecular chemistry, nodes are atoms and edges are chemical bonds.
Graphs can also be attributed, meaning that nodes and edges carry associated feature vectors. For instance, a node could have properties (e.g., velocity, temperature), while edges might encode distances, type of relations. These features, along with the graph’s topology, form the input for models like Graph Neural Networks.
Node features might include things like age and interests in a social network, or temperature and velocity in a physical simulation.
Edge features might encode the type of relationship, the strength of a connection, or even directionality.
What distinguishes graph-structured data from other formats is its irregularity. Unlike images (regular 2D grids) or sequences (ordered 1D lists), graphs don’t impose a fixed structure, a graph varry in size, number of neighbors for each node
This irregularity is precisely why traditional neural network architectures fall short — they’re not designed to handle variable neighborhood sizes.
GNNs address this by operating directly on graphs, leveraging their structure to perform computations that respect the relational nature of the data. But before diving into how they do that, it’s important to understand why traditional networks struggle — and what makes GNNs fundamentally different.
Traditional neural networks — such as CNNs or RNNs — operate on structured, regular domains like grids or sequences. Their success largely relies on built-in assumptions about data structure, known as inductive biases, that help models generalize beyond the training examples. For instance, CNNs embed the bias of translation equivariance, meaning they recognize patterns regardless of their exact position in an image, thanks to convolutional filters with shared weights.
But real-world data often defies these tidy assumptions. Data such as social networks, molecules, meshes, or interacting particles comes as graphs: irregular structures consisting of nodes and edges, with arbitrary connectivity. Traditional models struggle here, because the critical relationships aren’t defined by fixed spatial or sequential order but by complex interconnections.
Graph Neural Networks (GNNs) address this limitation by embedding a different type of inductive bias, suited explicitly for relational data. Central to this is permutation invariance: the idea that a graph’s identity doesn’t change if you rearrange the order of its nodes.
A GNN should give the same result whether a node is labeled “1” or “42,” as long as the structure and features are the same.
Another way to define this is by saying that if you transform the input graph, the output stays the same.
The goal is to create an embedding of the node features where the rows are permutable.
What if we need answers at the node level for example. In this case, another property we might consider is the permutation equivariance. We want still to be able to identify node outputs, which a permutation-invariant aggregator would destroy.
In other words we want to seek functions that don’t change the node order. So if we permute the nodes, it won’t matter if we do it before or after the function.
Imagine you rename or reorder the nodes of that graph — the actual structure hasn’t changed, just the labels. If your model is equivariant, then the output will be reordered in the same way as the input.
This idea is rooted in graph isomorphism. If two graphs are structurally identical (even if their node labels differ), a good GNN should produce the same result. That’s why permutation invariance and equivariance, are so important: the model must respect the structure, not the label order.
As you explore different GNN architectures (e.g., GCN, GraphSAGE, GAT, or GIN), always check whether they maintain this equivariance. While it’s not the only factor that affects expressiveness, it is a foundational design criterion that separates well-behaved models from those that could break under reordering.
Next, what we are looking to do is to choose the most expressive GNN architecture, but, we first need to ask: how powerful are GNNs really?
The expressive power of a GNN refers to its ability to distinguish different graph structures — a core challenge since graphs are inherently complex, with no fixed node ordering or spatial locality.
A classical test of expressiveness is graph isomorphism: can the GNN distinguish two non-isomorphic graphs? Message passing GNNs, the most common class, operate by passing and aggregating messages between neighboring nodes. Each node updates its representation by aggregating messages from its neighbors, and different GNNs (e.g., GCN, GraphSAGE, GAT) differ in how they perform this aggregation.However, not all aggregation functions are equally powerful. To evaluate this, we can model GNN neighborhood aggregation as a function over a multiset (a set with possible repeated elements) since node neighborhoods can contain repeated features.
If this function is injective, then different multisets (i.e., different neighborhood structures) yield different outputs, which is necessary to distinguish graph structures.
In simpler terms, we are asking if the GNN is able to distinguish between node 1 and 2. If the generated node embedding says that node 1 and 2 don’t have the same embedding then the graph fails.
If you are confused about why the term “injective” is used here. Let me break it down to you.
And well, in graphs this translates to
Classic GNNs like GCN use mean-pooling and GraphSAGE uses max-pooling — but both fail to be injective. Let’s say you have two different multisets of neighbor features:
- Multiset A: [a,a,b,b]
- Multiset B: [a,b]
Now suppose we use mean pooling to aggregate the features. The idea is to compute the average of the elements in each multiset. Let’s use one-hot encoded feature vectors to represent the node types:
Mathematically:
So
The key theoretical insight comes from Xu et al. (ICLR 2019), who showed that a GNN can become maximally expressive if its aggregation function is injective over multisets. Specifically, they proved that any injective function over a multiset can be modeled in the form:
The critical idea is that sum aggregation — unlike mean or max — retains information. This makes the entire operation injective, meaning it can uniquely represent different neighborhood structures.
This formulation is the foundation of the Graph Isomorphism Network (GIN). GIN sums the transformed neighbor features and passes the result through an MLP, allowing it to match the expressive power of the 1-dimensional Weisfeiler-Lehman (1-WL) test. As a result, GIN is considered the most expressive architecture among message-passing GNNs.
However, GIN still has limitations. While it can distinguish a wide range of neighborhood structures, it cannot count specific substructures like cycles or cliques, and it fails to distinguish nodes that occupy different positions in a symmetric graph. To overcome this, researchers developed two key enhancements:
- Structure-aware GNNs augment the input with features that capture graph motifs, such as cycle counts, node centrality, or eigenvalue-based features.
- Position-aware GNNs embed nodes based on their distances to randomly selected anchor nodes or anchor sets, effectively encoding their global position in the graph.
Briefly, building a truly expressive GNN starts with using an injective aggregator like sum. Further improvements are by integrating structural and positional node features. This combination currently represents the most effective and principled way to maximize a GNN’s expressive power.
- The sum operation preserves the count of each element in the multiset (unlike mean or max).
- The MLP is powerful enough (due to the Universal Approximation Theorem) to approximate any function over the sum.
- This makes the whole update rule injective, allowing GIN to distinguish different local structures — just like the Weisfeiler-Lehman test.
Among message-passing GNNs, we’ve seen that models like GIN, which use sum aggregation, are the most expressive — they can uniquely capture neighborhood structures and even match the discriminative power of the Weisfeiler-Lehman test. This makes them more expressive than GCNs or GATs, which rely on mean or attention-based pooling and can fail to distinguish certain graph patterns. That said, expressiveness isn’t everything, GATs may still be better for tasks where edge importance varies, and GCNs offer simplicity and speed. Ultimately, choosing the right GNN means balancing expressiveness with task-specific needs like scalability, and inductive capabilities.
- Kipf & Welling (2017)
Semi-Supervised Classification with Graph Convolutional Networks
arXiv:1609.02907
→ Introduced GCN, the foundational message-passing GNN. - Hamilton et al. (2017)
Inductive Representation Learning on Large Graphs
NIPS 2017 / arXiv:1706.02216
→ Proposed GraphSAGE with inductive learning and neighborhood sampling. - Veličković et al. (2018)
Graph Attention Networks
arXiv:1710.10903
→ Introduced GAT, which uses self-attention for neighborhood weighting. - Xu et al. (2019)
How Powerful Are Graph Neural Networks?
ICLR 2019 / arXiv:1810.00826
→ Introduced GIN and formalized the link to the 1-WL test. - You et al. (2021)
Identity-Aware Graph Neural Networks
AAAI 2021 / arXiv:2101.10320
→ Discussed structural limitations of GNNs and identity-aware designs.
- You et al. (2019)
Position-Aware Graph Neural Networks
ICML 2019 / arXiv:1906.04817
→ Addressed GNN limitations for position-sensitive tasks. - CS224W Lecture Notes
Machine Learning with Graphs — Stanford University
http://cs224w.stanford.edu
→ Excellent slides and resources for theoretical understanding of GNN.