Skill7.9k repo starsupdated 1mo ago

implement

The implement skill creates a Jupyter notebook that executes a research paper's proposed method on identical baseline data and computes all defined comparison metrics. Use this in Phase 4 after baseline benchmarks are established to ensure fair comparison through matched train/test splits, shared data sources, and consistent metric evaluation across both implementations.

View source Repository: Upsonic

Install in Claude Code

Copy

git clone --depth 1 https://github.com/Upsonic/Upsonic /tmp/implement && cp -r /tmp/implement/src/upsonic/prebuilt/applied_scientist/template/skills/implement ~/.claude/skills/implement

Then start a new Claude Code session; the skill loads automatically.

Definition

SKILL.md

# Implement Skill

## Purpose
Create a new Jupyter notebook implementing the method from the research paper, using the same data as the baseline. Record implementation details and measured metrics as a structured JSON entry.

## When to Use
Phase 4 — after benchmark metrics are defined and baseline values are extracted.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

## Actions

1. **Install dependencies:**
   - Install any new packages identified in Phase 2.
   - Capture installed package names and versions for the log entry below.

2. **Write `{experiment_path}/new_requirements.txt`:**
   - List all packages the new notebook needs (one per line, `package==version`).
   - Include both existing dependencies and new ones from the paper.

3. **Create `{experiment_path}/new.ipynb`** with this structure:

   ```
   [Markdown] # {Research Name} - New Method Implementation
   [Markdown] ## 1. Setup & Imports
   [Code]     import statements + dependency checks

   [Markdown] ## 2. Data Loading
   [Code]     load from experiments/{research_name}/current_data/
              (use the SAME data loading logic as current.ipynb)

   [Markdown] ## 3. Data Preprocessing
   [Code]     preprocessing as required by the new method
              (note any differences from baseline preprocessing)

   [Markdown] ## 4. Model Implementation
   [Code]     implement the new method from the paper

   [Markdown] ## 5. Training
   [Code]     train the model
              (use same train/test split as baseline for fair comparison)

   [Markdown] ## 6. Evaluation
   [Code]     compute ALL comparison metrics defined in Phase 3

   [Markdown] ## 7. Results Summary
   [Code]     print all metrics in a structured format
   ```

4. **Implementation rules:**
   - Use the SAME train/test split (same random seed, same ratio) as the baseline.
   - Use the SAME data — load from `current_data/`, do not download new data.
   - Compute ALL metrics defined in Phase 3 (including any with `"needs_computation": true`).
   - Add timing measurements for training (`training_time_seconds`).
   - Handle errors gracefully — if the method fails, log why.
   - **Efficiency:** if data is large (100K+ rows), sample it to a manageable size (10K–30K rows). Both notebooks must use the exact same sample. Use paper's recommended hyperparameters — do not run exhaustive grid searches. If training takes more than 10 minutes, reduce data size or simplify config. The goal is a fair comparison, not a production model.

5. **Run the notebook** end-to-end and verify it executes without errors.

6. **Append a Phase 4 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 4: Implement",
     "completed_at": "2026-04-17T11:30:00Z",
     "new_dependencies_installed": [
       {"name": "catboost", "version": "1.2.5"}
     ],
     "training": {
       "split": 0.2,
       "seed": 42,
       "stratified": true
     },
     "metrics": {
       "accuracy": 0.8721,
       "f1":       0.7310,
       "roc_auc":  0.9288,
       "training_time_seconds": 45.2
     },
     "notebook_executed": true,
     "errors":   [],
     "warnings": []
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/new.ipynb` — complete, executed notebook
- `{experiment_path}/new_requirements.txt` — written
- `{experiment_path}/log.json` — updated with Phase 4 implementation entry

More from this repository

unittest-generatorSubagent

Use this agent when you need to create unit tests for your code in unittest.TestCase format, organized in a tests folder with concept-based subfolders. Examples: <example>Context: User has just written a new authentication module and needs comprehensive unit tests. user: 'I just finished writing my user authentication functions in auth.py. Can you help me create unit tests for them?' assistant: 'I'll use the unittest-generator agent to create comprehensive unit tests for your authentication module.' <commentary>Since the user needs unit tests created for their authentication code, use the unittest-generator agent to create properly structured tests in the tests folder with appropriate subfolder organization.</commentary></example> <example>Context: User has implemented new data validation functions and wants to ensure they're properly tested. user: 'I've added several validation functions to my utils.py file. I need unit tests to make sure they handle edge cases correctly.' assistant: 'Let me use the unittest-generator agent to create thorough unit tests for your validation functions.' <commentary>The user needs unit tests for their validation functions, so use the unittest-generator agent to create comprehensive tests with edge case coverage.</commentary></example>

analyze_currentSkill

benchmarkSkill

evaluateSkill

experiment_managementSkill

progressSkill

researchSkill

code-reviewSkill

Perform structured code reviews with actionable feedback. Use when a user asks to review code, check code quality, find bugs, audit security, improve performance, or assess maintainability. Trigger when user says things like "review this code", "check for bugs", "is this code secure", "any issues with this", "code quality check", or pastes code asking for feedback. Also trigger for pull request reviews and pre-merge code checks. Do NOT trigger for writing new code from scratch, refactoring requests without review context, or general programming questions.