agent-task-executor/docs/text_analysis.md
zhukang 6143abc83b docs: add comprehensive documentation
- Add contributing guidelines to README
- Add detailed architecture documentation
- Add text analysis task documentation
- Include usage examples and best practices
2025-01-14 21:05:33 +08:00

3.7 KiB
Raw Blame History

Text Analysis Task

Overview

The Text Analysis Task is designed to perform comprehensive analysis of text content, with special support for Chinese text processing. It demonstrates the use of LLMs for various text analysis tasks including summarization, keyword extraction, and detailed analysis.

Features

  1. Input Validation

    • UTF-8 encoding validation
    • Chinese character detection
    • Text length and format checks
  2. Text Preprocessing

    • Chinese punctuation normalization
    • Whitespace handling
    • Character standardization
  3. Summary Generation

    • Concise text summarization
    • Key point extraction
    • Main idea identification
  4. Keyword Extraction

    • Important term identification
    • Topic-related keyword extraction
    • Frequency and relevance analysis
  5. Final Analysis

    • Comprehensive text analysis
    • Structured report generation
    • Multi-aspect evaluation

Usage

from agent_task_executor.tasksamples.text_analysis_task import TextAnalysisExecutor

# Create executor instance
executor = TextAnalysisExecutor()

# Prepare input text
text = """
从ChatGPT到Devin:AI编程的四个发展阶段与范式转变。
AI编程从ChatGPT出现到现在也就两年出头的时间但已经经历了四个阶段...
"""

# Execute analysis
result = await executor.execute({"text": text})

Implementation Details

Step 1: Input Validation

async def handle_input_validation(self, step_input: dict) -> dict:
    """
    Validates input text for:
    - Non-empty content
    - Valid Chinese characters
    - Proper UTF-8 encoding
    """

Step 2: Text Preprocessing

async def handle_text_preprocessing(self, step_input: dict) -> dict:
    """
    Preprocesses text by:
    1. Normalizing Chinese punctuation
    2. Handling whitespace
    3. Standardizing characters
    """

Step 3: Summary Generation

async def handle_generate_summary(self, step_input: dict) -> dict:
    """
    Generates text summary using LLM:
    1. Extracts main points
    2. Creates concise summary
    3. Maintains key information
    """

Step 4: Keyword Extraction

async def handle_extract_keywords(self, step_input: dict) -> dict:
    """
    Extracts keywords:
    1. Identifies important terms
    2. Analyzes frequency and relevance
    3. Returns structured list
    """

Step 5: Final Analysis

async def handle_final_analysis(self, step_input: dict) -> dict:
    """
    Performs comprehensive analysis:
    1. Combines all previous results
    2. Generates structured report
    3. Provides detailed insights
    """

Configuration

The task uses the following LLM configuration:

llm:
  provider: deepseek
  model: deepseek-chat
  temperature: 0.7
  max_tokens: 2000

Error Handling

  1. Input Errors

    • Invalid encoding
    • Empty text
    • Non-Chinese content
  2. Processing Errors

    • LLM API failures
    • Token limit exceeded
    • Response parsing errors
  3. Output Validation

    • Result structure validation
    • Content quality checks
    • Format verification

Best Practices

  1. Text Input

    • Proper encoding (UTF-8)
    • Reasonable text length
    • Clean input formatting
  2. LLM Prompts

    • Clear instructions
    • Specific requirements
    • Example outputs
  3. Result Processing

    • Validate all outputs
    • Handle edge cases
    • Maintain text quality

Extensions

  1. Language Support

    • Add support for other languages
    • Language detection
    • Multi-language analysis
  2. Analysis Types

    • Sentiment analysis
    • Topic classification
    • Entity recognition
  3. Output Formats

    • Custom report formats
    • Export options
    • Integration capabilities