Overview

Knowledge Bases allow you to upload documents that your AI agents can search through and reference. They support advanced features like structured data extraction and intelligent file filtering.

Creating a Knowledge Base

  1. Navigate to Control Hub → Knowledge Bases
  2. Click Create Knowledge Base
  3. Configure the following settings:

Basic Settings

  • Name: A descriptive name for your knowledge base
  • Description: Optional description of the content and purpose
  • Embedding Model: Choose the AI model for generating search embeddings

Structured Data Extraction (Optional)

  • Extraction Model: AI model to use for structured data extraction
  • Extraction Schema: JSON Schema defining what data to extract from each document

Supported File Types

Upload any of these file formats:
File TypeFormatsNotes
PDF Documents.pdfText extraction with OCR fallback for scanned documents
Word Documents.docx, .docFull formatting preservation
Spreadsheets.xlsx, .xls, .xlsmTable structure conversion with formula support
CSV Files.csvStructured data processing with automatic dialect detection
Text Files.txt, .md, .json, .xmlPlain text processing

File Processing Pipeline

When you upload files, they go through this automated pipeline:
  1. Upload: Secure cloud storage
  2. Content Extraction: Text extracted from various formats
  3. Structured Extraction: AI extracts data based on your schema (if configured)
  4. Intelligent Chunking: Documents split while preserving structure
  5. Vector Embeddings: AI generates searchable embeddings
  6. Storage: Everything indexed for fast retrieval

Structured Data Extraction

What is Structured Extraction?

Structured extraction uses AI to automatically pull specific information from your documents into a consistent JSON format. This is perfect for:
  • Financial Reports: Extract revenue, costs, and key metrics
  • Contracts: Pull out dates, parties, and terms
  • Research Papers: Extract methodology, results, and conclusions
  • Product Specs: Standardize technical specifications

Setting Up Extraction

  1. Choose an Extraction Model: Select from your available chat models
  2. Define JSON Schema: Specify the structure you want extracted

Example JSON Schema

{
  "type": "object",
  "properties": {
    "title": {
      "type": "string",
      "description": "Document title or heading"
    },
    "key_metrics": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "metric": {"type": "string"},
          "value": {"type": "number"},
          "unit": {"type": "string"}
        }
      },
      "description": "Important numerical data points"
    },
    "summary": {
      "type": "string",
      "description": "Brief summary of main points"
    }
  },
  "required": ["title", "summary"]
}

Viewing Extracted Data

Once processing completes, you can:
  • View extracted JSON data in the file details
  • Use the data in custom workflows
  • Export for analysis in other tools

Agent Integration

Enabling Knowledge Base Access

  1. Edit your agent configuration
  2. Enable the Search Knowledge Base tool
  3. Select which knowledge bases the agent can access

Advanced Search Features

Your agents can use sophisticated search capabilities:

File Filtering

{
  "query": "quarterly revenue",
  "knowledge_base_id": 123,
  "include_file_ids": [45, 67],  // Search only these files
  "exclude_file_ids": [12, 34],  // Exclude these files
  "top_k": 10
}

Search Types

  • Vector Search: Semantic similarity using AI embeddings
  • Text Search: Traditional keyword matching
  • Hybrid Search: Combines both approaches for best results

Best Practices

Organization

  • Separate by Topic: Create different knowledge bases for different subjects
  • Regular Updates: Remove outdated documents to maintain search quality
  • Descriptive Names: Use clear, searchable file names

Structured Extraction

  • Start Simple: Begin with basic schemas and expand over time
  • Test with Sample Files: Validate your schema works before bulk uploads
  • Use Required Fields: Mark essential fields as required in your schema

Performance

  • Optimize File Sizes: Larger files take longer to process
  • Monitor Processing: Check the processing status of uploaded files
  • Use File Filtering: Help agents focus on relevant documents

Managing Knowledge Bases

File Management

  • Upload Status: Monitor processing progress for each file
  • Reprocess Files: Re-run extraction if you update your schema
  • Delete Files: Remove individual files without affecting others

Knowledge Base Settings

  • Update Schema: Modify extraction schema for future uploads
  • Change Models: Switch embedding or extraction models as needed
  • Access Control: Configure which agents can access the knowledge base

Troubleshooting

  • Processing Failures: Check file format compatibility
  • Poor Search Results: Consider adjusting your embedding model
  • Extraction Issues: Validate your JSON schema syntax

API Access

Knowledge bases are also accessible via API for programmatic integration:
# Search a knowledge base
curl -X POST "https://asteragents.com/api/kb/search" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "revenue trends",
    "knowledge_base_id": 123,
    "top_k": 5
  }'

Pricing Notes

  • File processing and storage are included in your plan
  • Embedding generation uses your model provider credits
  • Structured extraction uses your chat model credits
  • No additional fees for search operations
Knowledge bases provide a powerful way to give your AI agents access to your organization’s documents while maintaining full control over access and processing.