Every conversation your company has should make the next one smarter.
Calls happen. Emails flow. Decisions get made. Then the context vanishes into recordings nobody rewatches and threads nobody resurfaces. We build a system that captures it all, structures it, and keeps it searchable forever.
[The problem]
Your company's knowledge walks out the door every evening.
Every call generates insights: pain points, decisions, commitments, real context about what a client needs. Those insights live in a recording nobody rewatches, in notes one person wrote and nobody else reads, in someone's memory. When that person is busy, on vacation, or gone, the context goes with them.
[How we solved it]
Pipeline
- 01
Call recording appears in CRM
When a recording gets linked in your CRM, a webhook fires immediately. No manual trigger, no batch job. The pipeline starts within seconds of the conversation ending.
- 02
Automatic classification
The system determines the call type: sales discovery, client check-in, project review. Each type routes to the right extraction template. A sales call extracts buying signals and objections. A client call extracts status, blockers, and satisfaction indicators.
- 03
Structured extraction
The analysis layer processes the full conversation and extracts structured data: summary, pain points with verbatim quotes, objections, commitments, next steps, sentiment, competitors, and budget signals. Every output is validated against a schema. If entity recall drops below 80%, the digest gets flagged for review.
- 04
Hierarchical storage
Each digest is stored as hierarchical chunks in your vector database. One parent chunk for broad context, plus three to five child chunks with precise detail. Broad questions get broad answers. Specific questions get specific answers. Same source material.
- 05
Knowledge base enrichment
The digest merges into a rolling entity brief for that company. A living document that gets richer with every interaction. Decisions route to your team's decisions channel. Summaries post to briefs. CRM notes update automatically.
The institutional knowledge problem
Every company with 10-30 people has the same invisible problem. Knowledge is generated constantly in sales calls, client check-ins, internal discussions, and email threads, yet almost none of it compounds.
A sales call happens. The rep walks away knowing the prospect's pain points, budget, timeline, and competitive landscape. That context lives in their head and, if you're lucky, in a paragraph of CRM notes written before the next call. The recording sits in a tool nobody opens again. The email follow-up adds context. The Slack thread adds more. Each piece lives in a different system, retrievable only by the person who was there.
Most teams already have the recording software, the CRM, the email platform. The tools exist. But raw recordings and unstructured threads are data. Knowledge is structured, searchable, and available to everyone.
The cost shows up in predictable ways. A client mentions a blocker in a check-in, but the project lead was absent and doesn't hear about it until standup. A prospect raises an objection that another rep handled last month, but that conversation is buried in a recording nobody will find. Someone leaves, and six months of relationship context leaves with them.
How the digestion pipeline works
Your team already generates this intelligence. Calls happen across sales and delivery. Insights accumulate in recordings and email threads. This pipeline does the digestion automatically, consistently, and without anyone pressing a button.
When a call ends. A webhook fires the moment a recording gets linked in your CRM. The first step is classification: sales discovery or client delivery? The distinction matters because different call types produce different intelligence. A sales call needs buying signals, objections, and budget indicators. A client call needs project status, blockers, and satisfaction signals.
Structured extraction. The analysis layer processes the full conversation against the appropriate template. It pulls specific pain points with verbatim quotes attached, identifies objections and how they were addressed, catalogs commitments from both sides, captures competitors mentioned, budget signals, and the decision-making process described.
Every output validates against a strict schema. If recall drops below 80% of what was discussed, the digest gets flagged rather than silently stored with gaps. Quality control is non-negotiable when building a knowledge base your company will rely on for months.
Storage and distribution. The validated digest gets stored as hierarchical chunks and simultaneously distributed. A structured analysis posts as a CRM note on the relevant company record. A summary hits your briefs channel for team visibility. Decisions and commitments route to a dedicated channel so nothing gets lost between meetings.
Email digestion. The same architecture runs on a separate track for email. An hourly job scans your CRM for new threads, digests each one through the same extraction process, validates the output, and stores it as hierarchical chunks. Emails carry context that matters and get the same treatment as calls.
Entity brief merging. This is the part that makes the system compound. After every digestion, the extracted intelligence merges into a rolling entity brief for that company. The brief is a living summary that grows richer with every interaction. By the fifth touchpoint, it contains a detailed picture of pain points, decision process, internal dynamics, and the full history of commitments on both sides. Every conversation adds a layer. Nothing gets overwritten.
Hierarchical storage and why structure matters
Storing digestion output is straightforward. Making it retrievable months later, by someone absent from the original conversation, for a question nobody anticipated, is the hard part.
Each digest produces one parent chunk (400-600 tokens) capturing broad context: what happened, who was involved, what matters. Below that sit three to five child chunks (200-400 tokens each) with precise details: specific quotes, individual commitments, particular objections, exact next steps.
When someone asks a broad question ("What do we know about this company?"), parent chunks surface first with a coherent overview. When someone asks a specific question ("What did their CTO say about the migration timeline?"), child chunks surface with the precise quote and context.
The embeddings use a 3072-dimensional model. Higher dimensionality captures finer semantic distinctions, so "they are concerned about cost" and "they mentioned budget constraints around infrastructure spending" register differently in the vector space.
Retrieval uses hybrid search: vector similarity combined with full-text search via reciprocal rank fusion. Vector search catches semantic matches across different wording. Full-text search catches exact matches like names and product references. Both combined produce results neither achieves alone.
Every chunk carries metadata: entity type, client identifier, source system, participants, topics, and timestamps. When you need context about a specific company from the last 90 days, metadata narrows the search space before semantic matching starts.
What this enables downstream
The knowledge digestion pipeline is the foundation that makes the rest of your AI-native stack work.
Pre-call briefs get richer. The inbound lead intelligence pipeline pulls from this same knowledge base. The more calls and emails digested, the more context the briefs contain. A prospect's second call brief includes everything from the first call, every email exchanged since, and the relationship trajectory. Brief quality compounds automatically.
The self-improving loop has data to analyze. You cannot analyze which conversation approaches lead to closed deals if conversations are trapped in unstructured recordings. You cannot identify which objections correlate with lost opportunities if objections are absent from the data. The digestion pipeline produces the structured substrate that pattern recognition requires.
ICP scoring refines itself. Early scoring relies on surface signals: company size, funding stage, industry. Over time, digested calls reveal behavioral indicators that only emerge from accumulated data. How prospects describe problems. Which questions they ask first. How they talk about timelines. The scoring model incorporates these signals because the pipeline captured them in structured form.
Institutional memory becomes a real asset. New team members get the full history of every client relationship, every sales conversation, every decision and its reasoning. When someone goes on vacation, their context stays. When someone leaves, the knowledge stays.
This is what "if AI can't see it, AI can't help with it" means in practice. The digestion pipeline makes everything visible. Without this layer, AI-native is a label. With it, every interaction your company has makes the next one smarter.
[Results]
Outcomes
Interactions captured
Call to intelligence
Knowledge retained
[Stack]
Tools used
Attio CRM
Webhook trigger & data source
Claude
Structured extraction & analysis
Supabase pgvector
Vector storage & hybrid search
Gemini Embeddings
3072-dimensional text embeddings
Slack
Team notifications & decisions
Trigger.dev
Workflow orchestration
Zod
Output validation & quality control