Mono Vs Multi Repository
Since the last couple of years, I’ve got into multiple debates around repository structures — Mono-repo Vs Multi-repo. Indubitably each organization look-out for a repository model that lets it move faster, prevents generating silos in teams, and promotes collective ownership.
A major factor that drives the biased discussion across is — many of the big organizations like (Google, Facebook, Twitter) are effectively using mono-repo. But noticeable is — Large companies that use a Mono-repo have developed advanced tooling to support and spent humongous resources, money, and time to work with it. For example, Facebook has built its custom filesystem and source control (customized Mercurial) to overcome the bottlenecks encountered with vanilla versions.
Further, when we talk about microservices, it’s a monolithic application broken down into multiple services, and each service performs a single business function without depending on other services or business functions. Each of the services is developed and deployed independently. To support a single application, the number of microservices can really grow, which forces the architectural team to upfront decide on the repository structure to follow before even a single line of code is written.
Let’s further discuss some of the advantages/drawbacks (not exactly drawbacks till the time right tools are in place) of using both the structures. Noticeable is that there are numerous tools available to support an organization to effectively implement both the repository structures.
The Mono Repository Construct
A Mono-repo is an architectural concept which follows a convention of keeping all the application code in a single repository. The repository can contain more than one logical project (e.g., a React native client and a web application). Within the repository, projects can be grouped/organized in a manner that is most suitable for the organization and the nature of the application.
Prime behaviour of a Mono repository –
- Centralisation — The codebase is contained in a single repository encompassing multiple projects.
- Visibility — Code is visible and accessible for all intended users
- Synchronisation — The development process is trunk-based; engineers commit to the head of the repo.
- Completeness — Any project in the repo can be built only from dependencies also checked into the repo. Dependencies are un-versioned; projects must use whatever version of their dependency is at the repo head.
- Standardisation — A shared set of tooling governs how engineers interact with the code, including building, testing, browsing, and reviewing code
Advantages of using Mono-repo
- Single Source of Truth — All the code in one single place
- Simplified Dependency Management — The repository houses the complete codebase and each of the components is integrated with its head version, so dependency management becomes a lot easier. Having said that, this also reduces dependency on artifact management tools like Nexus or Artifactory
- Coding Styles / Architectural Patterns — Standard Coding Styles/Architectural patterns can be enforced and governed in a much simpler fashion
- Simplified Code Sharing — The teams can collaborate easily as they have the visibility of the complete repository. Even learnings from other codes can drive the development faster
- Large-Scale Code Refactoring — Atomic commits in the entire codebase, Cross-module refactoring, and implementation is much easier
- Continuous Deployment Pipeline — No new configuration required; it can be as simple as configuring another folder from the repository
- Diamond Dependency Problem (Simply Avoided) — Diamond Dependency conflict is a scenario when the dependency issue arises when several packages have dependencies on the same shared packages or libraries, but they depend on different and incompatible versions of the shared packages. With a Mono-repository structure, such conflict can be simply avoided by keeping everyone at HEAD revision of it.
Drawbacks of using Mono-repo
- IDE slows down like anything with growing code — if the codebase is too big, it required longer indexing times
- GIT slowdown with increasing files and versions
- Broken master — A broken master affects everyone working in the Mono-repo. A practical problem and requires an addressal with a process in place with supported tooling.
- Long Build Times — Many applications face the issue of longer build time as the complete codebase is in a single repository, but again this can be avoided by having the right tooling/process in place.
How to overcome
- Identify bottlenecks and bring-in the customized tooling in place to handle the growing codebase
- All code should be reviewed before commit (automated tooling and manual)
- Respective set of Owners for each directory to approve the change
- Tests and automated checks to be performed before and after commit
- Auto-rollback of the commit in case breakage identified
- Task Ordering — for building common libraries/building blocks first
The Multi Repository Construct
In the continuously evolving world of technology, nowadays software architectures generally include smaller and more independent application modules, which can be deployed and operate independently. Microservices is one such example.
A Multi Repository Structure enables granular access and faster build times for discrete modules/services of an application. With the changes in the architectural patterns, the Multi-repo structure has also gained considerable popularity. Multi-repo holds the discrete modules/services in individual repositories, which can be further owned by same or different teams.
Prime behaviour of a Multi repository –
- Repo per module — Separate repository per service/module
- Simplifies Polyglot — if there is a mix of languages and teams, multi-repo is a pretty great structure to adopt
- Simplified DevOps Pipeline — New configurations required for each of the new repositories added, but it does simplify the CI pipeline execution.
Advantages of using Multi-repo
- Strong Ownership — Clear ownership of each of the repositories. A small team can own and independently develop and deploy
- Faster Build Time — Smaller the codebase, faster the build. Further helps in faster execution of CI pipeline
- Separate Repos — Isolated breakage of master
- Versioning — Each repository can manage its dependencies and versions, and can be referenced from the versions published.
Drawbacks of using Multi-repo
- Silos — Since the teams are scattered, teams tend to go into Silos. Humans are good at creating boundaries and silos. They don’t care what happens outside those boundaries.
- Code Style/ Architectural Patterns — With a different set of teams working on the different codebases, enforcing the guidelines and architectural patterns becomes challenging.
- Functionality Duplication (debatable)
- Multiple commits in multiple repositories even knowing that those are related
- Dependency Resolutions — Same dependency library can have different versions on different repositories. It becomes really hard to maintain and upgrade.
- Version Drift — For instance, 10 different versions of Spring Boot, three different JDK versions, numerous versions of dependent libraries.
How to overcome
- Collaboration Tools/Platforms — There are tools/platforms available (like Confluence, Jira, SharePoint, and more) for teams to collaborate, document, templatize, and share.
- An Environments Repo — Having a separate single repository that can be used for environment configuration makes it a lot easier.
- Have Dependency Version Management tools in place to ensure that all the dependency usages are tracked and served
Final Word — Which one to choose
We’ve now discussed both sides of Mono-repositories and Multi-repositories. There is no silver bullet as each structure has its own benefits and drawbacks. To determine which repository structures will work, important is to have a step back and analyze the nature of the application, and tools available in the Enterprise which can support it. A brief study on different organizational patterns can also add value in the decision.
In closing, I’ll say — it is best to evaluate discussed points/scenarios and the situation your teams are in. Some of these may have bigger impacts than others. From my experience, unified knowledge across the organization is very important, and with smaller teams, best is to start with Mono-repo. Larger and distributed teams would benefit more from Multi-Repo.