Plugin Permission Design Guide
Design a configuration in meta
that allows the plugin to request large model invocation permissions from the core system (e.g., request an OpenAI Client instance). For example, declare the large model permission request in meta
, design properties in configSchema
for users to configure the model provider and model name, and when the core system detects the permission request, it will instantiate the large model Client according to the user's config and inject it into the understandImages
method via config
.
๐ Got it! You want to declare the required large model permission request in the plugin Strategy's meta, so the core system can parse it and automatically inject a client
into config
based on user configuration (model provider + model name + other parameters). This way, the plugin doesn't need to create its own OpenAI / Anthropic / other VLM client.
I'll help you redesign the OpenAIGPT4VStrategy
meta and configSchema so it can request the Client from the core system.
New Strategyโ
import { Injectable } from '@nestjs/common';
import { Document } from 'langchain/document';
import {
ImageUnderstandingStrategy,
IImageUnderstandingStrategy,
TImageUnderstandingFile,
TImageUnderstandingResult
} from './strategy.interface';
@Injectable()
@ImageUnderstandingStrategy('openai-gpt4v')
export class OpenAIGPT4VStrategy implements IImageUnderstandingStrategy<any> {
readonly meta = {
name: 'openai-gpt4v',
label: { en: 'OpenAI GPT-4 Vision', zh: 'OpenAI GPT-4 ๅพๅ็่งฃ' },
/**
* Permission declaration: tells the core system this plugin needs an "llm" permission,
* provider is openai, capability is vision.
* The core system will instantiate the corresponding Client based on config.provider + config.model,
* and inject it into config.client.
*/
permissions: [
{
type: 'llm',
provider: 'openai',
capability: 'vision',
},
],
/**
* Config schema: users need to fill in provider and model.
* The core system will instantiate the Client based on provider + model + user's API Key,
* and inject it.
*/
configSchema: {
type: 'object',
properties: {
provider: {
type: 'string',
enum: ['openai'],
description: 'Model provider'
},
model: {
type: 'string',
default: 'gpt-4-vision-preview',
description: 'Model name'
},
prompt: {
type: 'string',
default: 'Describe this image in detail.',
description: 'Prompt (customize the understanding task)'
}
},
required: ['provider', 'model']
},
icon: {
svg: '<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 20 20"><circle cx="10" cy="10" r="8" stroke="black" fill="none"/><text x="6" y="14">VLM</text></svg>',
color: '#10a37f'
}
};
async validateConfig(config: any): Promise<void> {
if (!config.client) {
throw new Error('Missing injected LLM client in config. Core system must inject client.');
}
}
async understandImages(
files: TImageUnderstandingFile[],
config: any
): Promise<TImageUnderstandingResult[]> {
const client = config.client; // โ
Injected by core system
const results: TImageUnderstandingResult[] = [];
for (const file of files) {
const description = await this.runGPT4V(client, file.path, config);
const doc = new Document({
pageContent: description,
metadata: {
chunkId: `img-${Date.now()}-${Math.random().toString(36).slice(2, 8)}`,
parentChunkId: file.parentChunkId,
imagePath: file.path,
source: file.filename,
type: 'vlm',
engine: config.model
}
});
results.push({
docs: [doc],
metadata: { engine: config.model, file: file.filename }
});
}
return results;
}
private async runGPT4V(client: any, imagePath: string, config: any): Promise<string> {
// client is already an openai client, instantiated by the core system
const response = await client.chat.completions.create({
model: config.model,
messages: [
{
role: 'user',
content: [
{ type: 'text', text: config.prompt || 'Describe this image in detail.' },
{ type: 'image_url', image_url: { url: imagePath } }
]
}
],
max_tokens: 500
});
return response.choices[0]?.message?.content || '';
}
}
๐ How the Core System Handles Thisโ
Parse the plugin's
meta.permissions
and find it needs:{ "type": "llm", "provider": "openai", "capability": "vision" }
Instantiate
OpenAI({ apiKey: userApiKey })
according to userconfig.provider = "openai"
,config.model = "gpt-4-vision-preview"
Inject the client into
config.client
Call
strategy.understandImages(files, config)
๐ Usage Exampleโ
const strategy = this.imageUnderstandingRegistry.get('openai-gpt4v');
const result = await strategy.understandImages(
[
{
path: 'https://example.com/chart.png',
filename: 'chart.png',
extname: 'png',
parentChunkId: 'uuid-123'
}
],
{
provider: 'openai',
model: 'gpt-4-vision-preview',
prompt: 'Please describe this image in detail in Chinese.',
client: openaiClient // โ
Injected by core system
}
);
console.log(result[0].docs[0].pageContent);
This way, the plugin itself does not depend on any openai
wrapper logicโit only declares permissions + requires provider/model, and the core system can inject the appropriate client.
Would you like me to write an abstract permission declaration interface (e.g., ILLMPermissionRequest) so OCR/VLM/Embedding plugins can declare dependencies in a unified way?