Skip the footage archaeology
Hand over terabytes of raw footage and get back an organized, searchable picture of everything you shot — no more scrubbing through hours to find the moment you remember.
the flawless folk-horror-comedy tone of widow's bay is a writing and directorial achievement that previously i would not have thought possible
SherpaEdit handles the two most tedious phases of the edit: archaeology and the rough cut.
Hand over terabytes of raw footage and get back an organized, searchable picture of everything you shot — no more scrubbing through hours to find the moment you remember.
The heavy analysis happens on your own machine, so your rushes never leave the building and you skip slow, costly cloud uploads.
Get back an assembly built around a genuine story — not a shallow summary — with the strongest narration and dialog already chosen and matching B-roll laid in.
Everything arrives as an editable Premiere or Final Cut Pro sequence, so you pick up right where the tedious work ends and spend your time on the craft.
I built SherpaEdit AI to tackle the manual drudgery of documentary and video editing — with the mission of preserving the human element of storytelling — while eliminating the tedious work. The pipeline breaks the job into four steps.
Sending terabytes of video to a frontier LLM is far too expensive, so the heavy lifting happens locally. For every clip, analyze_clips.py builds a structured JSON manifest using local models: WhisperX for transcripts, pyannote for diarization, and Moondream to describe the visuals.
The result is a single manifest an LLM can read to pick A-roll lines and matching B-roll shots — without ever needing access to the actual video files.
In testing, a single general-purpose prompt produced shallow story arcs. So instead, I built a multi-agent system that works with any frontier LLM model (Claude, Gemini). A story agent proposes the arc, which then can be refined by the user in a conversation with the agent. Then, an A-roll agent picks precise narration and dialog, a B-roll agent selects visuals while avoiding repetition, and a quality-check agent reviews pacing and checks for amateur editing mistakes.
With the arc decided, the agents lay down chosen A-roll and a layer with matching B-roll, automating the rough cut that an assistant editor or story producer would normally assemble by hand.
The assembly is delivered as a real, editable timeline in Final Cut Pro or Premiere — the first two steps of the edit done, so you can spend your time on the craft that actually needs a human.
Check out the documentation, source files, and build guides to set up your own version.