Exploring Graphs: Basics for Data Scientists

Ishan | Virginia Tech & IIT Delhi
6 min readAug 20, 2023

--

Representation of a Complex Graph (Source: https://github.com/Elzawawy/graph-algorithms)

What is a Graph?

Imagine you have a bunch of friends, and you want to draw a picture that shows how you’re all connected by friendships. A graph is kind of like that picture, but for things that are connected in some way. It’s a way to show relationships between different items.

A graph is a way to organize and represent relationships between different items. Think of it like a map of connections. In this map, you have two main parts: nodes and edges.

In a graph, you have two main parts: nodes and edges. Here’s a simple formula to remember:

Graph = Nodes + Edges

Nodes: These are like dots or points on the map. Each node represents something, like a person, a place, a concept, or anything you want. For example, if you’re talking about people, each node could be a person’s name.

Edges: These are like lines connecting the nodes. An edge shows that there’s some kind of connection between two nodes. This connection could represent many things, like friendship, distance, or a relationship. For example, if you’re talking about people again, an edge could mean that two people are friends.

For example, let’s say you have a graph that represents your favorite places to visit in your town. The places would be nodes, and the paths you take to get from one place to another would be edges.

Now, there are different types of graphs. Two common types are:

  1. Undirected Graphs: These are like regular friendships where if you’re friends with someone, they’re also your friend. There’s no one-way relationship. It’s like a two-way street.
  2. Directed Graphs (Digraphs): These are like one-way streets. You can be friends with someone, but they might not be your friend back. You can go from one place to another, but not necessarily the other way around.

Here’s an example using a simple graph:

Let’s say you have four friends: Alice, Bob, Carol, and Dave. You can draw a graph like this:


Alice
/ \
Bob---Carol
\ /
Dave

In this graph:

  • Alice, Bob, Carol, and Dave are nodes.
  • The lines connecting them (edges) show who’s friends with whom.

Remember, graphs are a cool way to represent connections or relationships between things. Just like your friend map helps you remember who’s friends with who, graphs help

Graphs are used in many real-life situations to represent connections between things. They’re also used in computer science to solve problems like finding the shortest path between two places or figuring out the best way to organize data. Just like a map helps you navigate, graphs help computers navigate through relationships!

Graph Representations: Adjacency Matrix and Adjacency List

Representing graphs mathematically is essential for efficient algorithm development and analysis. We’ll delve into two primary methods of representing graphs: the adjacency matrix and the adjacency list.

Before we dive into graph representations, let’s briefly recap the components of a graph:

  • Nodes (Vertices): Represent entities or points in the graph.
  • Edges: Represent connections or relationships between nodes.

Adjacency Matrix

An adjacency matrix is a two-dimensional array that provides a compact way to represent the connections between nodes in a graph. Each row and column of the matrix corresponds to a node, and the value at the intersection of row “i” and column “j” indicates whether there is an edge between nodes “i” and “j.”

Formula:

Let “n” be the number of nodes in the graph.

  • If an edge exists between nodes “i” and “j,” then the value at the (i, j) position is 1.
  • If there’s no edge between nodes “i” and “j,” the value is usually 0.

Benefits:

  • Simplicity: Adjacency matrices are easy to understand and visualize for small graphs.
  • Quick Edge Queries: Determining if there’s an edge between two nodes takes O(1) time.
  • Space Complexity: For sparse graphs, the space complexity is relatively lower than dense graphs.

Interpretation:

Imagine you have a group of friends. You can create a matrix where the rows and columns represent your friends. If there’s a checkmark at the intersection of Row A and Column B, it means friend A is connected to friend B.

Use Cases:

  • Small to medium-sized graphs with a moderate number of nodes and edges.
  • Social network analysis, especially when edge existence matters.

Adjacency List

An adjacency list is a collection of lists or arrays, where each list represents a node and contains the nodes it’s directly connected to.

Formula:

For each node “i,” create a list that contains its neighbors (nodes connected to “i”).

Benefits:

  • Space Efficiency: Especially useful for sparse graphs, as it only stores information about existing edges.
  • Fast Neighbor Queries: Finding neighbors of a node takes O(1) time for each node’s list.

Interpretation:

Think of a notebook where you list each person and jot down their friends’ names. If you want to know someone’s friends, just flip to their page and see the list.

Use Cases:

  • Large graphs with many nodes and sparse connections.
  • Web pages and hyperlinks, where pages are nodes and hyperlinks are edges.

Choosing the Right Representation

The choice between adjacency matrix and adjacency list depends on the nature of the graph and the operations you’ll perform most frequently. Dense graphs might benefit from adjacency matrices, while sparse graphs could make better use of adjacency lists.

Graph Properties: Degree, Path, Cycle, Connectivity

Graphs are not just dots and lines; they come with fascinating properties that help us understand the relationships between nodes. We’ll explore four fundamental graph properties: Degree, Path, Cycle, and Connectivity.

1. Degree of a Node

The degree of a node in a graph refers to the number of edges connected to that node. In other words, it tells us how many neighbors a node has.

Interpretation:

  • Think of nodes as people and edges as connections between them. The degree of a person represents how many friends they have.

Formula:

=> For an undirected graph:

  • Degree of a node = Number of edges connected to that node

=> For a directed graph:

  • In-Degree of a node = Number of incoming edges
  • Out-Degree of a node = Number of outgoing edges

2. Path and Path Length

A path in a graph is a sequence of nodes where each consecutive pair is connected by an edge. The length of a path is the number of edges it contains.

Interpretation:

  • Imagine you’re following a path of friends from one person to another. Each step represents an edge, and the whole journey is a path.

Use Case:

  • Finding routes or connections between two points, like in navigation apps.

3. Cycle

A cycle in a graph is a path that starts and ends at the same node, passing through distinct nodes in between.

Interpretation:

  • Visualize this as taking a walk in a park and ending up at the same spot after visiting different places without retracing your steps.

Use Case:

  • Detecting loops or repeated patterns in a network.

4. Connectivity

Graph connectivity refers to how well nodes in a graph are connected. A graph can be connected or disconnected.

  • Connected Graph: Every pair of nodes has a path between them.
  • Disconnected Graph: There are at least two nodes that don’t have a path connecting them.

Use Case:

  • Ensuring that a communication network or social network is well-linked.

Whether it’s the degree of a node indicating popularity, the path length for efficient routing, the discovery of cycles for pattern recognition, or the assessment of connectivity, these properties provide insights into the intricacies of graph structures.

References

  • Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). “Introduction to Algorithms.” MIT Press.
  • NetworkX: Python library for creating, manipulating, and studying complex networks.
  • Graph Theory with Applications (Coursera): Course by the University of California, San Diego.

--

--

No responses yet