Using pathlib.Path, how to efficiently check if a directory resides anywhere under another directory?

I prefer to work mainly with pathlib.Paths. However, in the following example code, I ended up comparing strings like this:

if str(dir2).startswith(str(dir1))

dir1 and dir2 are Paths. Is there a better way to find out if dir2 is an arbitrarily nested subdirectory of dir1?

from pathlib import Path

mydirs = [Path('/a/a/a/a'), Path('/a/a/'), Path('/a/b/'), Path('/a/a/b'), Path('/b/a/a/'), Path('/a/a/a/a/a/a/'), Path('/a/b/a/a/a/')]

mydirs.sort(key = lambda x: len(x.parts))
roots = mydirs.copy()

for dir1 in mydirs:
    for dir2 in mydirs:
        if dir1 == dir2 or len(dir2.parts) <= len(dir1.parts):
            continue
        if str(dir2).startswith(str(dir1)):
            roots.remove(dir2)
            print(f'pruned subdir {str(dir2)} of {str(dir1)}')

print(roots)

Answer

Instead of

str(dir2).startswith(str(dir1))

try

dir1 in dir2.parents

If you have many paths, you probably shouldn’t try all pairs. Here’s a set solution:

mydirs.sort(key=lambda dir: len(dir.parts))
roots = set()
for dir in mydirs:
    if roots.isdisjoint(dir.parents):
        roots.add(dir)

Btw, your solution is buggy, as you’re incorrectly removing from a list while iterating it. For example for

mydirs = [Path('/a/'), Path('/a/a/'), Path('/a/b/')]

you only prune '/a/a/' but keep '/a/b/'.