Step 2: Knowledge Pipeline Orchestration

In XpertAI, a knowledge pipeline functions like an intelligent data processing assembly line. Each node performs a specific task, and you can drag, drop, and connect different nodes to gradually transform raw document data into a searchable and understandable knowledge base. The entire process is highly visual and configurable, helping you quickly build knowledge acquisition and indexing flows tailored to your business logic.

This chapter will help you understand the overall flow of the knowledge pipeline, the role and configuration of each node, and how to customize and optimize your knowledge processing chain.

Interface Status Overview

When you enter the knowledge pipeline orchestration interface, you will see:

Tab Status: The Documents, Retrieval Test, and Settings tabs are grayed out and unavailable.
Prerequisites: You must complete the configuration, debugging, and publishing of the knowledge pipeline before uploading files and running retrieval tests.

If you choose a blank knowledge pipeline, the system will display a canvas with only the "Knowledge Base Node" by default. You can then follow the guide to create and connect other nodes step by step.

If you select a preset pipeline template, the canvas will immediately show the complete node structure of that template.

Overall Knowledge Pipeline Flow

Before configuring, let's understand how data flows through the knowledge pipeline:

**Data Source → Document Transformer → Chunker → Knowledge Base Node (Index Configuration) → Trigger Node (User Input Parameters) → Test & Publish**

pipeline nodes — Knowledge Pipeline Nodes

Data Source Configuration: Import raw content (local files, Notion, web pages, cloud drives, etc.).
Document Transformer Node: Convert raw files into standardized structured data (supports text and image extraction).
Chunker Node: Intelligently split structured content into chunks suitable for indexing.
Knowledge Base Node: Define tree structures and indexing strategies.
Trigger Node Configuration: Set input parameters to trigger pipeline execution.
Test & Publish: Validate the process and officially enable the knowledge base.

Step 1: Data Source Configuration

In XpertAI, you can select multiple data sources for knowledge extraction. Each data source can be configured independently and supports local uploads, online documents, and web crawling.

Currently supported data sources include:

Local file upload
Online documents (e.g., Notion)
Cloud drives (Google Drive, Dropbox, OneDrive)
Web crawlers (Firecrawl, etc.)

More data sources are available via the XpertAI Plugin Marketplace.

Step 2: Configure Data Processing Nodes

Data processing nodes are the core of the knowledge pipeline. They parse, transform, clean, and chunk raw files into structured semantic units. XpertAI divides data processing into two main parts: Document Transformer and Chunker.

Document Transformer

The document transformer converts various file formats (PDF, Word, Excel, etc.) into structured content understandable by models. It supports extraction of images, tables, and text, serving as the "first step" in the knowledge flow.

You can choose the built-in XpertAI transformer or other transformers from the Marketplace (such as Unstructured, MinerU, etc.).

Features

Supports multiple input formats (PDF, DOCX, XLSX, PPTX, TXT, Markdown, etc.)
Automatically extracts images and generates usable URLs
Supports asynchronous tasks and batch conversion
Supports OCR and structured table extraction

Chunker

After transformation, documents are often too large for direct vectorization and retrieval. The chunker splits content into semantically complete chunks for subsequent indexing and recall.

XpertAI provides multiple chunking strategies, including:

Type	Features	Use Case
General Chunker	Fixed-size chunks, supports delimiters and overlap	General text
Parent-Child Chunker	Automatically generates tree-structured context	Long or complex documents
Q&A Processor	Extracts Q&A data, e.g., FAQ or Excel Q&A tables (in development)	Tables or structured Q&A (in development)

General Chunker

Configuration Options

Parameter	Description
Delimiter	Split by line breaks or custom regex
Max Length	Automatically splits if exceeded
Overlap	Improves context association

Input/Output

Type	Name	Description
Input	`Document`	Raw text content
Output	`Document with chunks`	Array of semantic chunks

Parent-Child Chunker

The parent-child chunker generates a tree-structured chunk tree, a unique XpertAI chunking system that manages hierarchical relationships between parent and child chunks.

Unlike traditional "chunk structures," XpertAI uses a tree structure to store chunks, supporting multi-level traceability and aggregation.

Features

Automatically maintains context association
Supports semantic retrieval at parent level and precise matching at child level
Can be extended to a hybrid graph structure

Q&A Processor — In Development

The Q&A processor combines extraction and chunking, used to extract Q&A pairs from CSV or Excel files. This node is under development and will support batch processing of structured Q&A knowledge.

Step 3: Configure Knowledge Base Node

The knowledge base node is the endpoint of the pipeline, responsible for building a searchable knowledge index structure.

XpertAI's knowledge base uses a tree-structured chunk system to manage chunk hierarchies. Each node (chunk) can be associated with vectors, images, source information, and context nodes.

Core Features

Module	Description
Structure	Unified tree-structured chunking
Indexing	Vector index
Retrieval	Semantic similarity-based recall
Keyword Index	In development, hybrid retrieval

Step 4: Configure Trigger Node (User Input Parameters)

In XpertAI, user input parameters are managed via the Trigger Node.

The trigger node lets you define runtime input parameters (such as file upload, URL, delimiter, custom variables, etc.), which are injected into upstream nodes during execution.

Advantages

Unified parameter management
Automatic binding with other nodes
Visual configuration and default value support

Step 5: Test & Publish

After configuring the pipeline, click the Test Run button in the upper right to validate the entire process. The system will execute each node in sequence and output the final knowledge base result.

Once testing passes, click Publish to officially apply the knowledge pipeline to your knowledge base.

Summary

XpertAI's knowledge pipeline integrates data processing, chunk management, and index building into a unified architecture:

Trigger (Runtime Inputs) → Data Source → Document Transformer
    → Chunker (Tree Structure)
    → Image Understanding (vlm/ocr)
    → Knowledgebase (Vector Embedding)

This system not only simplifies knowledge base construction but also ensures cross-document consistency, context traceability, and maximized retrieval performance.

Interface Status Overview​

Overall Knowledge Pipeline Flow​

Step 1: Data Source Configuration​

Step 2: Configure Data Processing Nodes​

Document Transformer​

Features​

Chunker​

General Chunker​

Parent-Child Chunker​

Q&A Processor — In Development​

Step 3: Configure Knowledge Base Node​

Core Features​

Step 4: Configure Trigger Node (User Input Parameters)​

Step 5: Test & Publish​

Summary​