Skill458 repo starsupdated 2mo ago
proprietary-data-generator
The proprietary-data-generator skill automates the design and execution of original surveys, benchmarks, and aggregated datasets to create defensible content competitive advantages. Use this skill when building authority through unique research, establishing a data moat competitors cannot replicate, or creating naturally linkable assets that earn backlinks through original statistics and findings unavailable elsewhere.
Install in Claude Code
Copygit clone --depth 1 https://github.com/Affitor/affiliate-skills /tmp/proprietary-data-generator && cp -r /tmp/proprietary-data-generator/skills/automation/proprietary-data-generator ~/.claude/skills/proprietary-data-generatorThen start a new Claude Code session; the skill loads automatically.
Definition
SKILL.md
# Proprietary Data Generator
Create original surveys, benchmarks, and aggregated data that nobody else has. Proprietary data is the ultimate content moat — competitors can copy your writing style but they can't copy YOUR data. Automates the design and execution framework for data collection that feeds unique content angles.
## Stage
S7: Automation & Scale — Generating data at scale requires automation. This skill designs the collection system, not just one data point. Creates repeatable data assets that compound over time.
## When to Use
- User wants to create content that can't be replicated by competitors
- User asks about "original research", "surveys", "benchmarks", "proprietary data"
- User says "data moat", "unique data", "first-party data", "original statistics"
- After `content-moat-calculator` identifies the need for differentiated content
- User wants to build authority through data-driven content
- User wants to create linkable assets that earn backlinks naturally
## Input Schema
```yaml
niche: string # REQUIRED — topic area for data collection
# e.g., "AI video tools", "affiliate marketing"
data_type: string # OPTIONAL — "survey" | "benchmark" | "aggregation" | "case_study"
# Default: recommend based on niche and resources
audience_access: string # OPTIONAL — how you can reach respondents
# e.g., "email list of 500", "Reddit community", "Twitter followers"
# Default: suggest options
budget: string # OPTIONAL — "zero" | "low" ($0-100) | "medium" ($100-500) | "high" ($500+)
# Default: "zero"
goal: string # OPTIONAL — "content_moat" | "backlink_magnet" | "authority" | "lead_gen"
# Default: "content_moat"
```
**Chaining from S3 content-moat-calculator**: Use `competitive_advantages` to identify data moat opportunities.
## Workflow
### Step 1: Identify Data Opportunity
Analyze the niche for data gaps:
1. `web_search`: `"[niche] statistics 2025" OR "[niche] survey" OR "[niche] benchmark"` — what data already exists?
2. Identify gaps: what questions does the industry ask that nobody has answered with data?
3. `web_search`: `"[niche] reddit" "I wish I knew" OR "does anyone know"` — find unmet data needs
### Step 2: Design Data Collection
Based on `data_type` (or recommend the best fit):
**Survey Design:**
- 8-12 questions (shorter = higher completion)
- Mix: 70% multiple choice, 20% scale (1-5), 10% open-ended
- One "surprising" question that will generate headline-worthy data
- Target sample size: 100+ for credibility
- Distribution plan: where and how to reach respondents
**Benchmark Study:**
- Define metrics to measure (3-5)
- Data sources: public data, API calls, manual collection
- Collection methodology: how often, what tools
- Comparison framework: how to present findings
**Data Aggregation:**
- Sources to aggregate from (public databases, APIs, web scraping targets)
- Aggregation logic: how to combine and normalize
- Update frequency: one-time or recurring
- Visualization plan
**Case Study Collection:**
- Template for collecting stories (5-7 structured questions)
- Outreach template for requesting case studies
- Anonymization rules
- Minimum viable sample: 10+ cases
### Step 3: Create Collection Assets
Produce ready-to-use assets:
1. **Survey questions** (if survey) — complete question list with answer options
2. **Collection template** — spreadsheet structure or form layout
3. **Outreach template** — email/message to recruit respondents
4. **Data analysis plan** — how to turn raw data into insights
5. **Content plan** — how to present findings (blog post, infographic, report)
### Step 4: Design Automation
Create a repeatable system:
- Schedule: when to collect data (monthly, quarterly, annually)
- Tools: recommended platforms (Google Forms, Typeform, Airtable)
- Automation: how to automate collection and reporting
- Update process: how to refresh and republish with new data
### Step 5: Self-Validation
- [ ] Data gap is real (verified by search — nobody else has this data)
- [ ] Sample size is realistic given audience access
- [ ] Questions are unbiased and well-structured
- [ ] Collection method is feasible with stated budget
- [ ] Output content plan is specific (not just "write a blog post")
- [ ] Data is ethically collected (no scraping private data, survey has consent)
## Output Schema
```yaml
output_schema_version: "1.0.0"
proprietary_data:
niche: string
data_type: string
data_gap: string # What data doesn't exist yet
headline_potential: string # The "surprising finding" angle
collection:
method: string
sample_target: number
tools: string[]
timeline: string
budget_needed: string
assets:
survey_questions: object[] # If survey type
collection_template: string # Template description
outreach_template: string # Recruitment message
analysis_plan: string
content_outputs: # Content to create from the data
- type: string # "blog" | "infographic" | "report" | "social"
title: string
skill_to_use: string # Which skill creates this content
data_assets: string[] # Moat strengtheners for chaining
chain_metadata:
skill_slug: "proprietary-data-generator"
stage: "automation"
timestamp: string
suggested_next:
- "affiliate-blog-builder"
- "content-pillar-atomizer"
- "content-moat-calculator"
```
## Output Format
```
## Proprietary Data Plan: [Niche]
### The Data Gap
**Nobody has answered:** [the question]
**Why it matters:** [why people care]
**Headline potential:** "[Surprising finding template]"
### Collection Design
**Type:** [Survey / Benchmark / Aggregation / Case Study]
**Target sample:** XX responses
**Timeline:** X weeks
**Budget:** $XX
**Tools:** [tools list