Lark Document Source Strategy
LarkSourceStrategy
is mainly responsible for document source loading logic—fetching folders and document data from the Lark Docs API, and converting them into unified Document
objects for use in knowledge base / AI vectorization.
Full code below (imports omitted):
@DocumentSourceStrategy(LarkName)
@Injectable()
export class LarkSourceStrategy implements IDocumentSourceStrategy<LarkDocumentsParams> {
readonly permissions = [
{
type: 'integration',
service: LarkName,
description: 'Access to Lark system integrations'
} as IntegrationPermission
]
readonly meta: IDocumentSourceProvider = {
name: LarkName,
category: DocumentSourceProviderCategoryEnum.OnlineDocument,
label: {
en_US: 'Lark Documents',
zh_Hans: '飞书文档'
} as I18nObject,
configSchema: {
type: 'object',
properties: {
folderToken: {
type: 'string',
title: { en_US: 'Folder Token', zh_Hans: '文件夹 Token' },
description: { en_US: 'The folder token to fetch documents from.', zh_Hans: '从中获取文档的文件夹 Token。' }
},
types: {
type: 'array',
title: { en_US: 'Document Types', zh_Hans: '文档类型' },
description: { en_US: 'The types of document to fetch.', zh_Hans: '要获取的文档类型。' },
default: ['docx'],
items: {
type: 'string',
enum: ['doc', 'sheet', 'mindnote', 'bitable', 'file', 'docx', 'folder', 'shortcut']
},
uniqueItems: true,
minItems: 0
}
},
required: ['folderToken']
},
icon: {
type: 'image',
value: iconImage,
color: '#4CAF50'
}
}
async validateConfig(config: LarkDocumentsParams): Promise<void> {
if (!config.folderToken) {
throw new Error('Folder Token is required')
}
}
test(config: LarkDocumentsParams): Promise<any> {
throw new Error('Method not implemented.')
}
async loadDocuments(config: LarkDocumentsParams, context?: { integration: IIntegration }): Promise<Document[]> {
const integration = context?.integration
if (!integration) {
throw new Error('Integration system is required')
}
await this.validateConfig(config)
const client = new LarkClient(integration)
const children = await client.listDriveFiles(config.folderToken)
const documents: Document[] = children
.filter((item) => config.types ? config.types.includes(item.type) : true)
.map((item) => {
return new Document({
id: item.token,
pageContent: `${item.name}\n${item.url}`,
metadata: {
...item,
chunkId: item.token,
title: item.name,
url: item.url,
createdAt: item.created_time
}
})
})
return documents
}
async loadDocument?(document: Document, context: { integration?: IIntegration }): Promise<Document> {
const integration = context?.integration
if (!integration) {
throw new Error('Integration system is required')
}
const client = new LarkClient(integration)
const content = await client.getDocumentContent(document.id)
return new Document({
id: document.id,
pageContent: content,
metadata: {
id: document.id,
title: `Lark Document ${document.id}`
}
})
}
}
1. Class Declaration & Decorators
@DocumentSourceStrategy(LarkName)
@Injectable()
export class LarkSourceStrategy implements IDocumentSourceStrategy<LarkDocumentsParams>
@DocumentSourceStrategy(LarkName)
Registers this class as a document source strategy with provider nameLarkName
("lark").
The Xpert AI system will automatically detect and invoke it.@Injectable()
Allows the class to be injected (standard NestJS usage).Implements interface:
IDocumentSourceStrategy<LarkDocumentsParams>
Ensures the class provides required methods likevalidateConfig
,loadDocuments
, etc.
2. Permission Definition
readonly permissions = [
{
type: 'integration',
service: LarkName,
description: 'Access to Lark system integrations'
} as IntegrationPermission
]
Explanation:
- The plugin requires access to Lark integration info (AppID, Secret, etc.) at runtime.
- The system checks if the user has authorized the plugin to access Lark.
3. Metadata Definition
readonly meta: IDocumentSourceProvider = {
name: LarkName,
category: DocumentSourceProviderCategoryEnum.OnlineDocument,
label: { en_US: 'Lark Documents', zh_Hans: '飞书文档' },
configSchema: { ... },
icon: { type: 'image', value: iconImage, color: '#4CAF50' }
}
- name/category: Identifies this as an online document source.
- label: UI display name (English/Chinese).
- configSchema: Defines user-configurable parameters:
folderToken
: Folder token (required).types
: Array of document types to load, default isdocx
.
- icon: Plugin icon for UI display.
4. Config Validation
async validateConfig(config: LarkDocumentsParams): Promise<void> {
if (!config.folderToken) {
throw new Error('Folder Token is required')
}
}
- Checks if
folderToken
is present. - Throws an error if the folder token is not configured.
5. Document Loading Method
async loadDocuments(config: LarkDocumentsParams, context?: { integration: IIntegration }): Promise<Document[]> {
const integration = context?.integration
if (!integration) {
throw new Error('Integration system is required')
}
await this.validateConfig(config)
const client = new LarkClient(integration)
const children = await client.listDriveFiles(config.folderToken)
const documents: Document[] = children
.filter((item) => config.types ? config.types.includes(item.type) : true)
.map((item) => {
return new Document({
id: item.token,
pageContent: `${item.name}\n${item.url}`,
metadata: {
...item,
chunkId: item.token,
title: item.name,
url: item.url,
createdAt: item.created_time
}
})
})
return documents
}
Logic steps:
- Check for integration info; throw error if missing.
- Validate config (must have
folderToken
). - Initialize
LarkClient
, calllistDriveFiles(folderToken)
to get folder contents. - Filter documents by
types
. - Convert each document to a LangChain
Document
object:id
: document tokenpageContent
: document title and URLmetadata
: extra info (token, title, url, creation time)
✅ The output is a list of documents, each containing basic info and link only.
6. Single Document Loading
async loadDocument?(document: Document, context: { integration?: IIntegration }): Promise<Document> {
const integration = context?.integration
if (!integration) {
throw new Error('Integration system is required')
}
const client = new LarkClient(integration)
const content = await client.getDocumentContent(document.id)
return new Document({
id: document.id,
pageContent: content,
metadata: {
id: document.id,
title: `Lark Document ${document.id}`
}
})
}
Purpose:
- Further loads the full content of a single document when needed.
- Uses
getDocumentContent(docToken)
to fetch the complete document content. - Returns a new
Document
object withpageContent
as the document body.
7. Design Summary
permissions
: Declares dependency on Lark integration permissions.meta
: Provides UI config schema, defines folderToken and document types input.validateConfig
: Ensures config is valid.loadDocuments
: Gets document list from folder, generates document summary.loadDocument
: Loads full content of a single document as needed.
The overall logic converts Lark Drive API data structures into LangChain Document objects, preparing them for downstream AI knowledge base processing.