Overview
Knowledge Bases are organized collections of documents that your agents can search and reference during conversations. Upload your company policies, research papers, technical documentation, and any reference materials to make them instantly searchable by your AI agents.Key Features
Multi-Format Document Support
Upload documents in various formats and your agents will automatically understand the content:| File Type | Formats | Notes |
|---|---|---|
| PDF Documents | Text extraction with OCR fallback for scanned documents | |
| Word Documents | .docx | Full formatting preservation |
| PowerPoint | .pptx | Slide content extraction |
| Spreadsheets | .xlsx, .xls, .xlsm | Table structure conversion |
| CSV Files | .csv | Structured data with automatic dialect detection |
| Images | .png, .jpg, .jpeg, .gif, .bmp, .webp | AI vision for text and content extraction |
| Text Files | .txt, .md, .json, .xml | Plain text processing |
| HTML Files | .htm, .html | Web content analysis |
| YAML Files | .yaml, .yml | Configuration and data files |
| Outlook Messages | .msg | Email content extraction |
Intelligent Processing Pipeline
Every file moves through a five-stage pipeline. Stage 3 forks into two durable outputs: the embedded chunks that power semantic search, and — when an extraction schema is set — the structured data your agents later read asextracted_data.json.
Upload
Files are uploaded directly to secure cloud storage (Cloudflare R2). A record is created and background processing begins immediately.
Parse → Markdown
Text is extracted from each format and normalized to markdown with page markers (
[[Page N]]). Scanned PDFs fall back to OCR; large files process asynchronously.Chunk & Extract
Documents are split into page-aware chunks that preserve structure. If the knowledge base has an extraction schema, an AI model also writes structured JSON for each file.
Embed
Each chunk is embedded in batches using your organization’s configured embedding model, producing searchable vectors.
Structured Data Extraction
Configure your knowledge bases to extract structured information from documents, creating an index of numeric and categorical fields for cross-document analysis:- Custom JSON Schemas: Define exactly what data to extract from each file
- AI Model Selection: Choose which AI model processes your documents
- Automatic Processing: Extraction happens during file upload
- Cross-Document Queries: Filter and aggregate across all files without scanning each one
How Agents Retrieve Knowledge
Once a knowledge base is processed, agents reach it through four retrieval tools plus the Python sandbox. Access is always organization-scoped, and you choose how broad it is per agent: either a selected list of knowledge bases, or — with the Allow all knowledge bases option — every knowledge base in the organization (pair this withmanage_knowledge_bases so the agent can discover them).
search_knowledge_base
Hybrid retrieval. Embeds the query, runs vector similarity and full-text search, then fuses the rankings. Returns the top-k chunks with file name, page numbers, and similarity scores.
read_kb_file
Reads a single file front-to-back, paginated by page or chunk. Can render PDF pages to inline images for multimodal models.
list_kb_files
Inventories a knowledge base — file names, sizes, status, and chunk counts — with regex filters. How an agent discovers what’s available before reading.
write_to_knowledge_base
Saves attachments or generated content back into a knowledge base, re-triggering the full pipeline. Gated by a separate writable allowlist.
Getting Started
Creating a Knowledge Base
- Navigate to Control Hub → Knowledge Bases
- Click “Create Knowledge Base”
- Configure your settings:
- Name & Description: Help your team understand the content
- Embedding Model: Choose from your organization’s configured models
- Extraction Schema: Optional JSON schema for structured data extraction
Uploading Documents
- Drag & Drop: Simply drag files into the knowledge base interface
- Bulk Upload: Select multiple files at once
- Real-time Processing: Watch files process with live status updates
- Processing Status: See extraction progress and any errors
Email Documents to Your Knowledge Base
Every knowledge base has a unique email address that you can use to add documents without opening the app. How to use:- Find your KB’s email address in the upload area (e.g.,
kb-123@updates.asteragents.com) - Send an email to that address
- With attachments: Documents are automatically added and processed
- Without attachments: The email body is saved as a
.mdfile with subject, sender, and timestamp metadata
- Only organization members can add documents via email
- Sender email is verified against your team’s Clerk user accounts
- Unauthorized senders are silently ignored
- Forward emails with attachments directly to your KB
- Email notes and updates to keep a knowledge base current (e.g., interaction logs, meeting notes)
- Add documents from your phone without logging in
- Set up automated workflows that email documents to knowledge bases
- Quickly share files from any device with email access
Trigger Agents on File Upload
Knowledge bases can automatically start an AI agent conversation whenever a file finishes processing. This enables powerful document automation workflows. How to configure:- Go to your knowledge base settings (click edit)
- Enable “Trigger Agent on File Upload”
- Select which agent should process the documents
- Write instructions telling the agent what to do with the document
- The full extracted text content of the document
- Structured extraction data (if you’ve configured an extraction schema)
- The filename and knowledge base context
- Files that triggered a conversation show a “View Conversation” option in their menu
- The conversation is linked to the file for easy reference
- Invoice Processing: Automatically extract line items and totals from uploaded invoices
- Contract Analysis: Summarize key terms and flag important clauses
- Report Summarization: Generate executive summaries of lengthy documents
- Content Routing: Have an agent read documents and route them to appropriate teams
- Triggers fire for all upload methods: drag-and-drop, bulk upload, and email-to-KB
- Only files uploaded by organization members trigger conversations (the uploader becomes the conversation owner)
- If a file fails processing, no trigger fires until the file is successfully processed
Configuring Agents
Once your knowledge base is ready:- Go to your agent configuration
- Enable the “Search Knowledge Base” tool
- Your agent can now access and search your documents during conversations
Agent Integration
Natural Language Queries
Your agents can search using natural language:- “Find information about our refund policy”
- “What does the Q3 financial report say about revenue growth?”
- “Show me technical specifications for our new product”
Advanced Search Options
Agents can also use advanced filtering:- Search only within specific files
- Exclude outdated documents
- Control the number of results returned
- Get detailed metadata about search results
Search Results Include
- Relevant Content: The actual text chunks that match the query
- Source Information: File names, page numbers, and document metadata
- Similarity Scores: How relevant each result is to the query
- Search Method: Whether found via semantic or text search
Working with KB Data in Python
When an agent hasexecute_python enabled, it can pull KB files into the Python sandbox for bulk processing. Files are mounted on request, not automatically.
How files get into the sandbox
Request files via thekb_files parameter on the execute_python call (format kb/{kb_id}/{filename}). They then appear at /home/user/kb/{kb_id}/:
- Original files — the uploaded documents (PDFs, Excel,
.docx, etc.) .mdfiles — pre-extracted markdown text, auto-included alongside each requested fileextracted_data.json— structured extraction results for all files in each accessible KB; available even when no individual files are requested (if an extraction schema is configured)
/home/user/kb/{kb_id}/ doesn’t exist until you request a file into it. Use list_kb_files to discover exact filenames, then pass them in kb_files. You can request files from any knowledge base in your organization.
Bulk processing pattern
For agents that process many documents at once (financial spreading, portfolio analysis, document comparison), request the files you need in one call and read their local.md files in Python — significantly faster than calling read_kb_file per document:
Use Cases
Customer Support
- Upload FAQs, product manuals, and policy documents
- Agents can instantly find answers to customer questions
- Ensure consistent, accurate responses across your team
Research & Analysis
- Store research papers, market reports, and analysis documents
- Agents can synthesize information across multiple sources
- Extract insights and trends from large document collections
Technical Documentation
- Upload API docs, system specifications, and troubleshooting guides
- Agents can help with code reviews and technical questions
- Keep documentation searchable and accessible
Compliance & Legal
- Store contracts, regulations, and compliance documents
- Agents can quickly reference relevant policies and procedures
- Ensure adherence to legal requirements and standards
Integration-Managed Knowledge Bases
Some knowledge bases are automatically populated by external integrations (e.g., Salesforce, SharePoint, or custom sync services). These are called integration-managed knowledge bases.How to Identify
- A badge appears next to the knowledge base name showing the integration source
- An info banner displays at the top of the knowledge base detail page
- Example: “Managed by Salesforce”
What’s Different
| Feature | User-Managed KB | Integration-Managed KB |
|---|---|---|
| Upload files | ✅ Yes | ❌ No (synced automatically) |
| Delete files | ✅ Yes | ❌ No (synced automatically) |
| Retry failed processing | ✅ Yes | ✅ Yes |
| Search & query | ✅ Yes | ✅ Yes |
| View extracted data | ✅ Yes | ✅ Yes |
| Delete knowledge base | ✅ Yes | ❌ No (via integration settings) |
Why This Matters
Integration-managed KBs are kept in sync with external systems. If you were to manually upload or delete files, those changes would be lost on the next sync. The UI protections ensure your data stays consistent with the source system.Organization & Management
Organization-Scoped
- Each knowledge base belongs to your organization
- Admin controls for secure document management
- Team members see only knowledge bases they have access to
File Management
- View all uploaded files with processing status
- Remove outdated or incorrect documents
- Monitor storage usage and document counts
Performance Monitoring
- Track search usage and performance
- Monitor embedding generation status
- View extracted structured data
Best Practices
Document Organization
- Create Topic-Specific Bases: Separate knowledge bases for different subject areas
- Use Descriptive Names: Clear names help agents and users understand content
- Regular Updates: Remove outdated documents to maintain search quality
- Consistent Formatting: Well-formatted documents produce better search results
Search Optimization
- Structured Content: Use headings, bullet points, and clear sections
- Complete Information: Include context and background in documents
- Avoid Duplicates: Multiple versions of the same content can confuse search
- Test Searches: Verify your agents can find key information
Security Considerations
- Sensitive Data: Only upload documents appropriate for AI processing
- Access Controls: Use organization-level access management
- Regular Audits: Review uploaded content periodically
- Compliance: Ensure uploaded documents comply with your data policies
API Integration
Knowledge bases integrate seamlessly with the Aster Agents API. Use the search functionality programmatically or build custom workflows that leverage your document collections. For detailed API documentation, see the API Reference section.Transform your documents into searchable knowledge that your agents can access instantly. Knowledge bases make your information work harder for your team, providing AI-powered insights from your existing documentation and files.
