Algorithmically Speaking - #7: Dependency Graph Analysis
An example of how graph theory can influence software development.
Hello there!
Welcome to another edition of Algorithmically Speaking.
Today, I will show you some real-life examples of how graph theory is used in modern software development.
Most of what you will see today is the product of my experiments as an amateur data scientist during the past weeks. I have been dealing with data, visualizations, and feature engineering in the broader sense of its definition.
I’m pleased with my findings, and that’s the reason I come here today to tell you all about them.
Let’s get started!
Dependencies and Graphs
In software development, it is common to have multiple teams working on different applications and modules that will be assembled somehow to build the “final product”.
That final product can be anything. Examples go from a web application to a car to a recommendation system.
For the final product to be released it has to be tested first. For it to be tested, it usually has to be assembled, which means putting all the pieces together and running your tests.
Once again, testing a web application and a car will be different, but they must be tested.
In a dream world, all these different modules and applications that constitute the final product don’t depend on each other. This means that a team can make as many changes as they want to the module they are working on, which will not affect any other team.
But let’s be real. This is unimaginable.
Even the simplest web applications will struggle with not adjusting their “client“ side every time a major change is introduced on the “server“ side. It is also difficult to imagine that changes on, let’s say, a module that controls the energy distribution system of a car will not affect the module that controls the temperature.
The interesting part of this dependency system is that it can be modeled as a graph. Software developers can then find information on that graph that will help them on making better structural changes to their dependencies and improve aspects such as the time it takes to build the “final product“ just for testing purposes.
Let’s see some common analyses that we can do to dependency graphs.
Structural Analysis
Every team works on a different module or application as we said before. This is what one single module looks like in a dependency graph:
This module can have explicit modules depending on it. Which we will call “explicit downstream dependencies”:
However, other modules can also depend on the rightmost modules shown in the image above. These modules will also depend on our initial module. All modules that depend on the initial module are part of the so-called “transitive downstream dependencies”:
Knowing the number of transitive downstream dependencies for a module will allow the developers to know how many modules will be affected somehow every time changes are made to that specific module.
This is a useful metric to have because it allows us to understand the impact that apparently harmless changes can have on an organizational level. Changes to modules mean that people will be assigned to make those changes and time that could be used for something else now will be allocated to make those changes.
You sure want to make as few of these changes as possible the more modules, people, and time you will affect.
On a side note, the dependency graphs should always be Directed Acyclic Graphs. This means that the direction of the dependencies is well defined, in the sense that no two modules can depend on each other. Also, it cannot have a cycle of dependencies, because the order of prerequisites would not be well defined.
Another important metric to look at is the depth of the dependency chain. This will give a sense of how many “layers” of modules will be affected by changes made to a single module.
This metric is crucial to identify opportunities to adjust the dependency graph and make it “wider” instead of “deep”.
We can do several other structural analyses on dependency graphs, and we will discuss them in future posts.
For now, let’s focus on what information can we extract from these graphs if we include a bit of “behavioral analysis”.
Comment below if you are curious about a specific application of data structures and algorithms in modern software development. I will try to cover the most popular topics in future editions of the newsletter.
Behavioral Analysis
Let’s broadly define behavioral analysis as metrics that we can measure over time. These metrics will give us a sense of how things behave, and we can begin to notice patterns by analyzing those metrics over a sufficiently large period of time.
As long as the patterns we identify are positive, we keep doing what we are doing so far. If the pattern is starting to affect the development process which can mean that build times have gotten slower, then we take action to correct those behaviors.
Let’s take, for example, the number of changes that have been made to a particular module in the past month.
If you use Github, Gitlab, or any other alternative, this should be easy to calculate by requesting information about the commits made to specific repositories in a time period.
Take that number and multiply it by the number of transitive downstream dependencies of the module and you get a value for a metric usually called “Rebuilt Targets“.
This metric can give a score for every module showing how much the changes made to every module have affected others in the past month.
But we can take this idea even further.
In reality, this module that we have been analyzing all the time can also depend on other modules. We will call those “transitive upstream dependencies“.
As we can see in the image above, every change made to any of the modules on the left will affect the modules on the right.
This leads us to define a related metric, which is called “Rebuilt Targets by Transitive Dependencies“.
The metric is calculated by multiplying all the changes made to the transitive upstream dependencies of a module in the past month by the amount of transitive downstream dependencies of that module.
The score we assign to each module based on this metric can give a sense of which modules have been the bottlenecks of our dependency graph in the last 30 days or so.
Last Words
I hope you enjoyed this introduction to dependency graph analysis as much as I did when I was first diving into these topics a few weeks ago.
In future posts, we will discover new structural analysis metrics and we will be diving into the world of centrality algorithms in graphs.
Also, I want to start sharing a bit of my experience with data visualization so I might have an entire article showing how to create a dashboard to display all the metrics we gather from a dependency graph.
Feel free to read a little bit about those topics in the upcoming weeks and let me know if you find them as interesting as I do.
As always, if you think this post provides value for someone you know, share it with them. Nothing will make me happier.
See you next week,
Alberto