Mercurial manifest sharding¶
Problem statement: imagine you have 1m to 1b files.
Individual manifest RAM overhead is a problem somewhere in this range.
Warning
Checkout: we don’t want to materialize the working copy on the machine
Limitations of large manifests/repos:
- manifest too large for RAM
- checkout too large for local disk
- clone size too large for local disk
- manifest resolution too much CPU
- 100k+ files on HFS+ has bad perf
Proposal: manifest entries could have pointers to other manifests
Not yet sure how to trigger sharding. You could do custom sharding based on checkout scenarios.
Discarded: using hash of filename and hash mod == 0 do a shard. Deterministic, but lots of churn.
Will require client support to do pull of sharded manifests.
Discussion¶
Directory recursive hashes:
We could compute the hash for submanifests by iterating recursively over the directories in the sub manifest content. This would produce a hash that is unique to the commit directory structure, and agnostic to how the manifest is sharded. (similar to git tree hash calculations).
Pros: - allows changing the manifest format in the future without changing the hashes - allows delivering customly sharded manifests to users on demand
Cons: - more expensive hash algorithm - will require a new manifest version flag (since it won’t be backwards comptabile)
Sparse checkouts¶
$ hg sparse --include mobile/
doesn’t matter if the repo has other stuff, you only get the mobile directory.
$ hg sparse --enable-profile mobile
profiles live in repo. .hgsparse
Proposal: have team specific .hgsparse files in directories. Allows changes without contention. (hg sparse --enable-profile mobile[/.hgsparse])
future magic: hg clone --sparse mobile (to avoid initial full clone)
merges get a little complicated
Using regexps matching now. proposed to use directories for includes, but allow regex/glob for exclude (to allow not writing certain types of files, like photoshop files)
Narrow changelog¶
Ellipsis nodes listing heads and roots.
Running log and bisect inside ellipsis is difficult. change default so hg log only considers downloaded changesets by default?
DAG can have dangling links and won’t explode when we hit them?
Should we have remote fallback for hg log? durin thinks so.