Skip to main content
ClaudeWave
Skill617 estrellas del repoactualizado 3d ago

company-product-context

# company-product-context This Claude Code skill systematically compiles comprehensive business intelligence by extracting data from company PDF documents, conducting targeted web research, and synthesizing industry knowledge into a structured context report. Use it when building detailed company profiles, conducting competitive analysis, preparing for business development conversations, or establishing baseline knowledge about organizations before client engagements or partnership evaluations.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/lofcz/LLMTornado /tmp/company-product-context && cp -r /tmp/company-product-context/src/LlmTornado.Tests/Static/Files/Skills/company-product-context ~/.claude/skills/company-product-context
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

## Company Product Context Compiler

This skill extracts information from company PDF documents, conducts web research, and synthesizes industry knowledge to create a comprehensive company product context report.

Copy this checklist and track your progress:

```
Company Product Context Progress:
- [ ] Step 1: Gather company materials and identify sources
- [ ] Step 2: Extract information from PDF documents
- [ ] Step 3: Structure extracted data
- [ ] Step 4: Conduct web research and validation
- [ ] Step 5: Synthesize industry knowledge
- [ ] Step 6: Compile comprehensive product context
- [ ] Step 7: Generate final report
- [ ] Step 8: Export deliverables
```

## **Step 1: Gather company materials and identify sources**

Collect all available company information:

**Required Inputs:**
- Company PDF documents (annual reports, product sheets, presentations, etc.)
- Company name and website URL
- Industry/sector information
- Specific products or services to focus on (if applicable)

**Actions:**
1. Request all relevant PDF files from user
2. Confirm company name, website, and primary industry
3. Ask about specific focus areas or products of interest
4. Identify any competitive context needed

**Expected in INPUT_DIR:**
- `*.pdf` - Company documents
- `company_info.txt` - Basic company details (optional)

## **Step 2: Extract information from PDF documents**

Extract structured information from all provided PDF files.

**Use the Python script for PDF extraction:**

```python
import os
import re
from pathlib import Path
import PyPDF2
import json

def extract_pdf_content(pdf_path):
    """Extract text content from PDF file."""
    text_content = []
    metadata = {}
    
    try:
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            
            # Extract metadata
            if pdf_reader.metadata:
                metadata = {
                    'title': pdf_reader.metadata.get('/Title', ''),
                    'author': pdf_reader.metadata.get('/Author', ''),
                    'subject': pdf_reader.metadata.get('/Subject', ''),
                    'pages': len(pdf_reader.pages)
                }
            else:
                metadata = {'pages': len(pdf_reader.pages)}
            
            # Extract text from all pages
            for page_num, page in enumerate(pdf_reader.pages, 1):
                try:
                    text = page.extract_text()
                    if text.strip():
                        text_content.append({
                            'page': page_num,
                            'text': text
                        })
                except Exception as e:
                    print(f"Error extracting page {page_num}: {e}")
                    
    except Exception as e:
        print(f"Error reading PDF {pdf_path}: {e}")
        return None
    
    return {
        'filename': os.path.basename(pdf_path),
        'metadata': metadata,
        'content': text_content
    }

def extract_key_sections(text):
    """Extract key sections from text based on common headers."""
    sections = {
        'company_overview': [],
        'products_services': [],
        'business_model': [],
        'market_position': [],
        'financials': [],
        'technology': [],
        'customers': [],
        'strategy': [],
        'other': []
    }
    
    # Keywords for section identification
    keywords = {
        'company_overview': ['about us', 'company overview', 'who we are', 'introduction', 'history'],
        'products_services': ['products', 'services', 'solutions', 'offerings', 'portfolio'],
        'business_model': ['business model', 'revenue model', 'how we work', 'operations'],
        'market_position': ['market', 'industry', 'competitive', 'position', 'landscape'],
        'financials': ['financial', 'revenue', 'earnings', 'profit', 'growth'],
        'technology': ['technology', 'platform', 'infrastructure', 'technical', 'innovation'],
        'customers': ['customers', 'clients', 'partners', 'case study', 'testimonial'],
        'strategy': ['strategy', 'vision', 'mission', 'goals', 'objectives', 'roadmap']
    }
    
    lines = text.split('\n')
    current_section = 'other'
    
    for line in lines:
        line_lower = line.lower().strip()
        
        # Check if line is a section header
        for section, section_keywords in keywords.items():
            if any(keyword in line_lower for keyword in section_keywords):
                if len(line_lower) < 100:  # Likely a header
                    current_section = section
                    break
        
        if line.strip():
            sections[current_section].append(line)
    
    return sections

def analyze_company_info(extracted_data):
    """Analyze extracted data for key company information."""
    analysis = {
        'company_name': '',
        'industry': '',
        'products': [],
        'key_terms': [],
        'metrics': [],
        'urls': [],
        'emails': []
    }
    
    all_text = ''
    for doc in extracted_data:
        for page in doc['content']:
            all_text += page['text'] + '\n'
    
    # Extract URLs
    url_pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
    analysis['urls'] = list(set(re.findall(url_pattern, all_text)))
    
    # Extract emails
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'
    analysis['emails'] = list(set(re.findall(email_pattern, all_text)))
    
    # Extract potential metrics (numbers with units/context)
    metrics_pattern = r'\$?\d+\.?\d*\s*(?:million|billion|trillion|k|M|B|%|percent|users|customers|employees)'
    analysis['metrics'] = re.findall(metrics_pattern, all_text, re.IGNORECASE)
    
    return analysis

def main():
    input_dir = os.environ.get('INPUT_DIR', '/tmp')
    output_dir = '/tmp/extracted_data'
    os.makedirs(output_dir, exist_ok=True)
    
    # Find all PDF files