Published 2022-12-04
dream2nix

npm`s biggest issues

My node modules journey

Recently I attended the OceanSprint; An intense nix hackathon lasting one week.

I never thought it would make so much sense to me, as i was using nix on a very small scale. Only using some frameworks like dream2nix, or node2nix for packaging my frontend application for corporate usage.

At oceanSprint i started working on a completely new nodejs builder. Heavily inspired from pnpm. Especially because i liked how pnpm brought isolation and symlinking into the node_modules.

But those isolation comes ate very high costs and i am not sure if it is really worth it. How i come to that conclusion, and what i came up with in the new dream2nix builder for nodejs is what I will tell in that short post.

Whats different in pnpm?

Contrary to npm; The original and historical package manager of node

pnpm uses a nested directory tree that allows private dependencies and deduplication of disk usage by heavily using symlinks.

The structure of pnpm allows strictly separating every package and its declared dependencies.

This is a huge benefit. āœ…

But

there is also a very huge disadvantage šŸ‘Ž

some broken packages rely on scope leakage

In the npm registry are some broken packages, that rely on (what i call) scope leakage. They just call code from packages that randomly appear somewhere in the tree, without specifying a dependency on them. Because somebody else in their dependency tree did that already. That does not work with pnpm structure idea because it strictly separates everything what is not declared explicitly.

peerDependencies are wild

The second problem arises with so called peerDependencies. Those are dependencies, that might be declared higher up in the tree. But they cannot explizitly depend on them, because that would introduce either multiple versions of a singleton (like React). It could add a cyclic-dependency which is a very bad architecture leading to even more problems.

Binary patching

A Third problem is that binaries still MUST be found in node_modules/.bin folder, but the actual binary and it's dependencies cannot be found there. So all binaries must be wrapped in shell files, that execute them in their actual installation path.

Personal recommendation

All those problems together sacrifice the benefits of pnpm for me.

Especially because pnpm solves those problems by intentionally breaking the isolation, which is even more worse than before because pnpm flattens the tree without resolving conflicts. pnpm creates the following folder: node_modules/.pnpm/node_modules which contains all indirect dependencies (All Dependencies of your dependencies). That intermediate folder breaks the isolation, because every package can find anything here again. It might even be worse because pnpm does not resolve any conflicts, while creating that intermediate folder.

Also the very complex structure and patches and cross linking things that could've potentially been a peerDependency makes things very hard to debug and even harder to understand.

Thats why i would not recommend using pnpm

How npm creates the 'flat' node_modules?

Ok back to the roots.

What does a valid npm package look alike?

npm states in its docs:

About package formats

A package is any of the following:

a) A folder containing a program described by a package.json file.
b) A gzipped tarball containing (a).
c) A URL that resolves to (b).
d) A <name>@<version> that is published on the registry with (c).
e) A <name>@<tag> that points to (d).
f) A <name> that has a latest tag satisfying (e).
g) A git url that, when cloned, results in (a).

That means a valid package is just a folder containing a package.json

That folder might be packed into a tarball and uploaded to a registry.

Then the simplest possible 'flat' node_modules looks like this:

node_modules/
ā”œā”€ā”€ pname
.   ā””ā”€ā”€ package.json -> `name:"pname","version":"1.0.0"...`
.
.

The user might have specified his dependencies and a flat folder like this was created by npm to allow the node runtime to resolve the import and require statements

For example I created a dependency tree that looks like this

uml diagram

With no conflicts a flat dependency folder of npm will be created like this:

node_modules/
ā”œā”€ā”€ a
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ b
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ c
ā”‚   ā””ā”€ā”€ package.json
ā””ā”€ā”€ d
    ā””ā”€ā”€ package.json

In npm it is possible to have packages that have very large dependency trees. Within that large trees potential conflicts are allowed. In more detail; every package is allowed to have private dependencies. But those privacy is not reflected in the package directory node_modules structure.

So if i change the tree slightly so that a conflict arises:

uml diagram

As there is no version in the folder structure that npm generates it is not possible to have both C in version 1.0.0 and version 2.0.0 In that case npm generates a nested node_modules folder at every place where C is needed. Still one version lives in the top level directory. All other dependents of C in higher versions will get copies of C in that other version locally.

node_modules/
ā”œā”€ā”€ a (depends on C^2.0.0)
ā”‚   ā”œā”€ā”€ node_modules
ā”‚   ā”‚   ā””ā”€ā”€ c
ā”‚   ā”‚       ā””ā”€ā”€ package.json (2.0.0)
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ b (depends on C^1.0.0)  
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ c
ā”‚   ā””ā”€ā”€ package.json (1.0.0)
ā””ā”€ā”€ d
    ā””ā”€ā”€ package.json

There are many things that are problematic with this approach. It is possible to have flat package directories if we use some sort of content-addressable storage. In the case of npm it would be enough to add the version to the package names like this

node_modules/
ā”œā”€ā”€ a@1.0.0
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ b@1.0.0
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ c@1.0.0
ā”‚   ā””ā”€ā”€ package.json
ā”œā”€ā”€ c@2.0.0
ā”‚   ā””ā”€ā”€ package.json
ā””ā”€ā”€ d@1.0.0
    ā””ā”€ā”€ package.json

As you can see C appears two times in the flat node_modules.

But this is not supported in the node ecosystem. We cannot just rename the paths. And use content addressable storage like this.

That is where pnpm tries to shine. Just to be clear from the beginning pnpm is arbitrarily complex and is definitely not the solution to this.

I just want to discuss the link structure that pnpm utilizes.

With pnpm the conflict tree from above looks like the following:

node_modules/
ā”‚   ā””ā”€ā”€ a -> ../.pnpm/a@1.1.0/node_modules/a
ā””ā”€ā”€ .pnpm
    ā”œā”€ā”€ a@1.1.0
    ā”‚   ā””ā”€ā”€ node_modules
# ---------- where `a` finds its dependencies --------
    ā”‚       ā”œā”€ā”€ a
    ā”‚       ā”œā”€ā”€ b -> ../../../b@1.0.0/node_modules/b
    ā”‚       ā””ā”€ā”€ c -> ../../../c@2.0.0/node_modules/c
# ----------------------------------------------------
    ā”œā”€ā”€ b@1.0.0
    ā”‚   ā””ā”€ā”€ node_modules
    ā”‚       ā”œā”€ā”€ b
    ā”‚       ā”œā”€ā”€ c -> ../../../c@1.0.0/node_modules/c
    ā”‚       ā””ā”€ā”€ d -> ../../../d@1.0.0/node_modules/d
    ā”œā”€ā”€ c@1.0.0
    ā”‚   ā””ā”€ā”€ node_modules
    ā”‚       ā””ā”€ā”€ c
    ā”œā”€ā”€ c@2.0.0
    ā”‚   ā””ā”€ā”€ node_modules
    ā”‚       ā””ā”€ā”€ c
    ā””ā”€ā”€ @hsjobeki+d@1.0.0
        ā””ā”€ā”€ node_modules
            ā””ā”€ā”€ d

This proposes to have a bunch of benefits:

Isolation between dependencies

Unlike the flat node_modules dependencies can only access their direct dependencies.

In the concrete example: a can only access b@1.0.0 and c@2.0.0 as specified in the package.json of a it cannot accidentally access c@1.0.0 because it is somewhere else in the directory-tree.

  • Deduplication, by using symlinks
  • Private dependencies by separating each package into its own subdirectory

flat node_modules are vulnerable to slight changes

In flat node modules conflicts are resolved from npm. But there is no single right way to resolve the conflicts. There are multiple possible solutions how to flatten the tree.

uml diagram

To resolve the conflict, which can be represented as just:

type-fest = ["0.20.2","3.3.0"]

Both of the following variants are fully valid node_modules structures.

Variant A

node_modules/
ā”œā”€ā”€ ansi-escapes        requires type-fest^3.0.0
ā”‚Ā Ā  ā””ā”€ā”€ node_modules
ā”‚Ā Ā      ā””ā”€ā”€ type-fest   3.3.0
ā”œā”€ā”€ globals             requires type-fest^0.20.2
ā””ā”€ā”€ type-fest           0.20.2

Variant B

node_modules/
ā”œā”€ā”€ ansi-escapes        requires type-fest^3.0.0
ā”œā”€ā”€ globals             requires type-fest^0.20.2
ā”‚Ā Ā  ā””ā”€ā”€ node_modules
ā”‚Ā Ā      ā””ā”€ā”€ type-fest   0.20.2
ā””ā”€ā”€ type-fest           3.3.0

(For simplicity unnecessary other dependencies are hidden)

Imagine being a package manager like npm, how would you resolve the tree ? Since both solutions are valid there is no right or wrong way.

type-fest = ["0.20.2","3.3.0"]

Which means you must somehow install both versions.

One can live in the root node_modules The other one has to be copied and nested in the requiring parents node_modules

This also implies that direct dependencies must always stay directly where they are required only nested conflicts can be moved around!

To find the solution to that problem, you have to come up with a declarative way. Those can be:

Pick a root dependency version:

  • Pick the first version -> 0.20.2
  • Pick the last version -> 3.3.0
  • Pick the highest semver -> 3.3.0
  • Pick the version with most parents so we save copy time -> e.g. 0.20.2 (we don't know yet)
  • Pick the version that minimizes the size of the tree.
  • and so on.

Nested dependencies must then be all every other versions, because there can only be one version on the root.

But only slight changes in that picking, can have huge impacts on the whole node_modules structure and where packages might be duplicated. Because every conflict resolution triggers a chain reaction on other things that where below that dependency in the tree. Which now might not find their dependencies anymore because they are hidden in nested folders.

Another problem arises if you apply the nesting; you always need to copy all conflicting children recursively (thankfully only conflicting ones) for every use of a conflicting dependency we need to copy the whole section of the tree that has conflicts. like this:

# pseudocode
let
  conflictTree = filterTree (dependency: hasConflict dependency) subTree;
in
  copy conflictTree

nobody really knows how large the tree will be. plus we need to copy every time.

Nested conflicts are possible after resolving

By resolving dependency conflicts, npm pulls dependencies that cannot be installed in the root node_modules into that private node_modules folder of a package. But there might already be a different dependency, depending on another version itself, which might be different than what it parent depends on.

For example: A dependency like the above shown type-fest conflict depending on another package. But to make things worse, depend on a package that would cause a conflict in the tree like this:

uml diagram

type-fest@3.3.0 now depends on imaginary^2.0.0 but the node resolve algorithm would find it in the root node_modules with version 1.0.0

To solve that problem, every time type-fest@3.3.0 appears all of its conflicting dependencies have to be nested in another node_modules folder every time.

In our concrete case a nested node_modules folder could be created which then looks like this:

node_modules/
ā”œā”€ā”€ ansi-escapes        requires type-fest^3.0.0
ā”‚Ā Ā  ā””ā”€ā”€ node_modules
ā”‚Ā Ā      ā””ā”€ā”€ type-fest   3.3.0 -> requires imaginary ^2.0.0 
ā”‚Ā Ā           ā””ā”€ā”€ node_modules
ā”‚Ā Ā                ā””ā”€ā”€ imaginary   2.2.0
ā”œā”€ā”€ imaginary           1.0.0
ā”œā”€ā”€ globals             requires type-fest^0.20.2
ā””ā”€ā”€ type-fest           0.20.2 

However the tree that npm would generate looks like this:

node_modules/
ā”œā”€ā”€ ansi-escapes        requires type-fest^3.0.0
ā”‚Ā Ā  ā””ā”€ā”€ node_modules
ā”‚Ā Ā      ā””ā”€ā”€ type-fest   3.3.0 -> requires imaginary ^2.0.0 
ā”‚Ā Ā      ā””ā”€ā”€ imaginary   2.2.0
ā”œā”€ā”€ imaginary           1.0.0
ā”œā”€ā”€ globals             requires type-fest^0.20.2
ā””ā”€ā”€ type-fest           0.20.2 

Since imaginary@2.2.0 does not have any conflicts with the dependencies of its parent.parent (ansi-escapes) it can be pulled upward into the higher node_modules, which can save a little bit of copy amount. As dependencies can also share same sub-dependencies.

In terms of the dependency tree this means a transformation into the following:

uml diagram

The parent ansi-escapes now depends directly on imaginary@2.0.0. This transformation is the same as flattening the dependency tree and only possible if the parent has no direct dependency conflict.

recursion

That same problem can occur over and over again in a recursive matter like shown above. While at the same time it gets less and less likely to occur as conflicts can also be pulled to the root node_modules as soon as there is a possibility in the conflict chain.

What we implemented in dream2nix

There are different strategies how to solve/avoid conflicts.

  • Make all dependencies private
  • Flatten the dependency tree and pick on version. The other one must be installed private.
  • Pull private dependencies into the parent if they don't cause conflicts.

In the new dream2nix builder, we calculate all root dependencies first, all other conflicts will depend on what has been installed in the root node_modules. If we encounter a conflict, dream2nix picks the higher version, if they have the same precedent. Then we create a nesting that includes the resolved other version. We do this in a recursive way such that we resolve all possible conflicts.

Although we do not yet pull dependencies into the parents outer scope. The necessary scope checking and recursive conflicts just add to much complexity.

Private Dependencies

In the following case type-fest is a private dependency of ansi-escapes

node_modules/
ā”œā”€ā”€ ansi-escapes        requires type-fest^3.0.0
ā”‚Ā Ā  ā””ā”€ā”€ node_modules
ā”‚Ā Ā      ā””ā”€ā”€ type-fest   3.3.0 -> requires imaginary ^2.0.0 

Pros:

  • isolation
  • no conflicts

Cons:

  • huge duplication if many other dependencies declare the same dependency

Every dependency that appears multiple times in the tree has to be copied for every occurrence. (Possibly >100x)

Flatten the dependency tree and pick on version. The other one must be installed private

Pros:

  • de-duplication

Cons:

  • conflicts possible
  • resolve logic required
    • optimal resolving is controversy
    • can be very complex
    • error prune
    • nobody knows how deterministic npm actually re-produces the node_modules (without lockfile, where also the node_modules structure is locked)

Pull private dependencies into the parent (optimization)

We could pull dependencies up in the tree as long as there is no conflict. As we pull a private dependency out of its scope it might conflict with another package that already exists in another version in the outer scope. In most cases it is possible and doing so saves a little bit of directory nesting. Also it also solves unspecified access to sup dependencies, which is a very sick-practice but due to the structure and history of node_modules some packages might still do that.

Pros:

  • de-duplication

Cons:

  • nested conflicts could be possible
  • complex

my last word

All this hell of complexity is just needed because node modules system šŸ’© has no content addressable storage. Even adding only the version to a pathname would've been enough to avoid all this.

Yet nodes module system is the best illustrative example to show, that small mistakes on very ordinary fundamentals can have huge impacts if not chosen correctly.

Thank you for reading, and let's connect

Thank you for reading my blog.