StrideOps.aiStrideOps.ai Docs
Knowledge Bases

Knowledge Bases

Manage knowledge bases and their sources for RAG.

Base path

/api/v1/bases

A knowledge base is a collection of documents (PDFs, URLs, transcripts, scraped pages) that has been chunked, embedded, and stored for retrieval. Bases plug into voice agents and chat assistants as their RAG corpus.

Source types

| Type | Notes | | ---- | ----- | | file | Pre-uploaded file. The dashboard handles direct uploads; the API accepts a storageUrl. | | url | Public URL to a PDF, HTML page, or other text-extractable document. | | youtube | YouTube URL - transcript is extracted. | | scrape | Entire site crawled from a root URL. Respects robots.txt and a max-page cap. |

Processing pipeline

  1. Source ingested → uploaded to storage.
  2. Trigger.dev process-source task picks it up.
  3. Docling extracts text (PDFs, HTML, Office docs).
  4. Text is chunked using the base's ragSettings.chunkingStrategy.
  5. Chunks are embedded with OpenAI text-embedding-3-small (1536 dimensions).
  6. Chunks are written to source_chunks with pgvector embeddings.

Status processing can take seconds (short PDF) to a few minutes (long YouTube video). Use webhooks instead of polling for production workflows.

RAG settings

Each base has a ragSettings JSON object that controls retrieval:

| Field | Default | Notes | | ----- | ------- | ----- | | chunkingStrategy | "semantic" | "semantic" | "sentence" | "fixed" | | chunkSize | 512 | Tokens per chunk (for fixed). | | chunkOverlap | 64 | Token overlap between adjacent chunks. | | topK | 10 | Chunks returned per query. | | similarityThreshold | 0.7 | Cosine similarity floor. | | hybridSearch | true | Combine vector similarity with full-text BM25. |

Per-source overrides via ragSettingsOverride when adding a source.