Skill576 repo starsupdated 1mo ago

proprietary-data-generator

The proprietary-data-generator skill automates the design and execution of original surveys, benchmarks, and aggregated datasets to create defensible content competitive advantages. Use this skill when building authority through unique research, establishing a data moat competitors cannot replicate, or creating naturally linkable assets that earn backlinks through original statistics and findings unavailable elsewhere.

View source Repository: affiliate-skills

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Affitor/affiliate-skills /tmp/proprietary-data-generator && cp -r /tmp/proprietary-data-generator/skills/automation/proprietary-data-generator ~/.claude/skills/proprietary-data-generator

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Proprietary Data Generator

Create original surveys, benchmarks, and aggregated data that nobody else has. Proprietary data is the ultimate content moat — competitors can copy your writing style but they can't copy YOUR data. Automates the design and execution framework for data collection that feeds unique content angles.

## Stage

S7: Automation & Scale — Generating data at scale requires automation. This skill designs the collection system, not just one data point. Creates repeatable data assets that compound over time.

## When to Use

- User wants to create content that can't be replicated by competitors
- User asks about "original research", "surveys", "benchmarks", "proprietary data"
- User says "data moat", "unique data", "first-party data", "original statistics"
- After `content-moat-calculator` identifies the need for differentiated content
- User wants to build authority through data-driven content
- User wants to create linkable assets that earn backlinks naturally

## Input Schema

```yaml
niche: string                 # REQUIRED — topic area for data collection
                              # e.g., "AI video tools", "affiliate marketing"

data_type: string             # OPTIONAL — "survey" | "benchmark" | "aggregation" | "case_study"
                              # Default: recommend based on niche and resources

audience_access: string       # OPTIONAL — how you can reach respondents
                              # e.g., "email list of 500", "Reddit community", "Twitter followers"
                              # Default: suggest options

budget: string                # OPTIONAL — "zero" | "low" ($0-100) | "medium" ($100-500) | "high" ($500+)
                              # Default: "zero"

goal: string                  # OPTIONAL — "content_moat" | "backlink_magnet" | "authority" | "lead_gen"
                              # Default: "content_moat"
```

**Chaining from S3 content-moat-calculator**: Use `competitive_advantages` to identify data moat opportunities.

## Workflow

### Step 1: Identify Data Opportunity

Analyze the niche for data gaps:
1. `web_search`: `"[niche] statistics 2025" OR "[niche] survey" OR "[niche] benchmark"` — what data already exists?
2. Identify gaps: what questions does the industry ask that nobody has answered with data?
3. `web_search`: `"[niche] reddit" "I wish I knew" OR "does anyone know"` — find unmet data needs

### Step 2: Design Data Collection

Based on `data_type` (or recommend the best fit):

**Survey Design:**
- 8-12 questions (shorter = higher completion)
- Mix: 70% multiple choice, 20% scale (1-5), 10% open-ended
- One "surprising" question that will generate headline-worthy data
- Target sample size: 100+ for credibility
- Distribution plan: where and how to reach respondents

**Benchmark Study:**
- Define metrics to measure (3-5)
- Data sources: public data, API calls, manual collection
- Collection methodology: how often, what tools
- Comparison framework: how to present findings

**Data Aggregation:**
- Sources to aggregate from (public databases, APIs, web scraping targets)
- Aggregation logic: how to combine and normalize
- Update frequency: one-time or recurring
- Visualization plan

**Case Study Collection:**
- Template for collecting stories (5-7 structured questions)
- Outreach template for requesting case studies
- Anonymization rules
- Minimum viable sample: 10+ cases

### Step 3: Create Collection Assets

Produce ready-to-use assets:
1. **Survey questions** (if survey) — complete question list with answer options
2. **Collection template** — spreadsheet structure or form layout
3. **Outreach template** — email/message to recruit respondents
4. **Data analysis plan** — how to turn raw data into insights
5. **Content plan** — how to present findings (blog post, infographic, report)

### Step 4: Design Automation

Create a repeatable system:
- Schedule: when to collect data (monthly, quarterly, annually)
- Tools: recommended platforms (Google Forms, Typeform, Airtable)
- Automation: how to automate collection and reporting
- Update process: how to refresh and republish with new data

### Step 5: Self-Validation

- [ ] Data gap is real (verified by search — nobody else has this data)
- [ ] Sample size is realistic given audience access
- [ ] Questions are unbiased and well-structured
- [ ] Collection method is feasible with stated budget
- [ ] Output content plan is specific (not just "write a blog post")
- [ ] Data is ethically collected (no scraping private data, survey has consent)

## Output Schema

```yaml
output_schema_version: "1.0.0"
proprietary_data:
  niche: string
  data_type: string
  data_gap: string              # What data doesn't exist yet
  headline_potential: string    # The "surprising finding" angle

  collection:
    method: string
    sample_target: number
    tools: string[]
    timeline: string
    budget_needed: string

  assets:
    survey_questions: object[]  # If survey type
    collection_template: string # Template description
    outreach_template: string   # Recruitment message
    analysis_plan: string

  content_outputs:              # Content to create from the data
    - type: string              # "blog" | "infographic" | "report" | "social"
      title: string
      skill_to_use: string     # Which skill creates this content

  data_assets: string[]        # Moat strengtheners for chaining

chain_metadata:
  skill_slug: "proprietary-data-generator"
  stage: "automation"
  timestamp: string
  suggested_next:
    - "affiliate-blog-builder"
    - "content-pillar-atomizer"
    - "content-moat-calculator"
```

## Output Format

```
## Proprietary Data Plan: [Niche]

### The Data Gap
**Nobody has answered:** [the question]
**Why it matters:** [why people care]
**Headline potential:** "[Surprising finding template]"

### Collection Design

**Type:** [Survey / Benchmark / Aggregation / Case Study]
**Target sample:** XX responses
**Timeline:** X weeks
**Budget:** $XX
**Tools:** [tools list