Skill701 estrellas del repoactualizado 4d ago

knowledge-synthesis

knowledge-synthesis combines search results from multiple enterprise sources (chat, email, documents, project trackers) into single, coherent narratives while removing duplicates and attributing each claim to its source. Use it when search returns overlapping information across different systems and you need one clear answer with full provenance rather than reviewing scattered results manually.

Ver fuente Repositorio: openyak

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/openyak/openyak /tmp/knowledge-synthesis && cp -r /tmp/knowledge-synthesis/backend/app/data/plugins/enterprise-search/skills/knowledge-synthesis ~/.claude/skills/knowledge-synthesis

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Knowledge Synthesis

The last mile of enterprise search. Takes raw results from multiple sources and produces a coherent, trustworthy answer.

## The Goal

Transform this:
```
~~chat result: "Sarah said in #eng: 'let's go with REST, GraphQL is overkill for our use case'"
~~email result: "Subject: API Decision — Sarah's email confirming REST approach with rationale"
~~cloud storage result: "API Design Doc v3 — updated section 2 to reflect REST decision"
~~project tracker result: "Task: Finalize API approach — marked complete by Sarah"
```

Into this:
```
The team decided to go with REST over GraphQL for the API redesign. Sarah made the
call, noting that GraphQL was overkill for the current use case. This was discussed
in #engineering on Tuesday, confirmed via email Wednesday, and the design doc has
been updated to reflect the decision. The related ~~project tracker task is marked complete.

Sources:
- ~~chat: #engineering thread (Jan 14)
- ~~email: "API Decision" from Sarah (Jan 15)
- ~~cloud storage: "API Design Doc v3" (updated Jan 15)
- ~~project tracker: "Finalize API approach" (completed Jan 15)
```

## Deduplication

### Cross-Source Deduplication

The same information often appears in multiple places. Identify and merge duplicates:

**Signals that results are about the same thing:**
- Same or very similar text content
- Same author/sender
- Timestamps within a short window (same day or adjacent days)
- References to the same entity (project name, document, decision)
- One source references another ("as discussed in ~~chat", "per the email", "see the doc")

**How to merge:**
- Combine into a single narrative item
- Cite all sources where it appeared
- Use the most complete version as the primary text
- Add unique details from each source

### Deduplication Priority

When the same information exists in multiple sources, prefer:
```
1. The most complete version (fullest context)
2. The most authoritative source (official doc > chat)
3. The most recent version (latest update wins for evolving info)
```

### What NOT to Deduplicate

Keep as separate items when:
- The same topic is discussed but with different conclusions
- Different people express different viewpoints
- The information evolved meaningfully between sources (v1 vs v2 of a decision)
- Different time periods are represented

## Citation and Source Attribution

Every claim in the synthesized answer must be attributable to a source.

### Attribution Format

Inline for direct references:
```
Sarah confirmed the REST approach in her email on Wednesday.
The design doc was updated to reflect this (~~cloud storage: "API Design Doc v3").
```

Source list at the end for completeness:
```
Sources:
- ~~chat: #engineering discussion (Jan 14) — initial decision thread
- ~~email: "API Decision" from Sarah Chen (Jan 15) — formal confirmation
- ~~cloud storage: "API Design Doc v3" last modified Jan 15 — updated specification
```

### Attribution Rules

- Always name the source type (~~chat, ~~email, ~~cloud storage, etc.)
- Include the specific location (channel, folder, thread)
- Include the date or relative time
- Include the author when relevant
- Include document/thread titles when available
- For ~~chat, note the channel name
- For ~~email, note the subject line and sender
- For ~~cloud storage, note the document title

## Confidence Levels

Not all results are equally trustworthy. Assess confidence based on:

### Freshness

| Recency | Confidence impact |
|---------|------------------|
| Today / yesterday | High confidence for current state |
| This week | Good confidence |
| This month | Moderate — things may have changed |
| Older than a month | Lower confidence — flag as potentially outdated |

For status queries, heavily weight freshness. For policy/factual queries, freshness matters less.

### Authority

| Source type | Authority level |
|-------------|----------------|
| Official wiki / knowledge base | Highest — curated, maintained |
| Shared documents (final versions) | High — intentionally published |
| Email announcements | High — formal communication |
| Meeting notes | Moderate-high — may be incomplete |
| Chat messages (thread conclusions) | Moderate — informal but real-time |
| Chat messages (mid-thread) | Lower — may not reflect final position |
| Draft documents | Low — not finalized |
| Task comments | Contextual — depends on commenter |

### Expressing Confidence

When confidence is high (multiple fresh, authoritative sources agree):
```
The team decided to use REST for the API redesign. [direct statement]
```

When confidence is moderate (single source or somewhat dated):
```
Based on the discussion in #engineering last month, the team was leaning
toward REST for the API redesign. This may have evolved since then.
```

When confidence is low (old data, informal source, or conflicting signals):
```
I found a reference to an API migration discussion from three months ago
in ~~chat, but I couldn't find a formal decision document. The information
may be outdated. You might want to check with the team for current status.
```

### Conflicting Information

When sources disagree:
```
I found conflicting information about the API approach:
- The ~~chat discussion on Jan 10 suggested GraphQL
- But Sarah's email on Jan 15 confirmed REST
- The design doc (updated Jan 15) reflects REST

The most recent sources indicate REST was the final decision,
but the earlier ~~chat discussion explored GraphQL first.
```

Always surface conflicts rather than silently picking one version.

## Summarization Strategies

### For Small Result Sets (1-5 results)

Present each result with context. No summarization needed — give the user everything:
```
[Direct answer synthesized from results]

[Detail from source 1]
[Detail from source 2]

Sources: [full attribution]
```

### For Medium Result Sets (5-15 results)

Group by theme and summarize each group:
```
[Overall answer]

Theme 1: [summary of related results]
Theme 2: [summary of related re

Del mismo repositorio

instrument-data-to-allotropeSkill

Convert laboratory instrument output files (PDF, CSV, Excel, TXT) to Allotrope Simple Model (ASM) JSON format or flattened 2D CSV. Use this skill when scientists need to standardize instrument data for LIMS systems, data lakes, or downstream analysis. Supports auto-detection of instrument types. Outputs include full ASM JSON, flattened CSV for easy import, and exportable Python code for data engineers. Common triggers include converting instrument files, standardizing lab data, preparing data for upload to LIMS/ELN systems, or generating parser code for production pipelines.

nextflow-developmentSkill

Run nf-core bioinformatics pipelines (rnaseq, sarek, atacseq) on sequencing data. Use when analyzing RNA-seq, WGS/WES, or ATAC-seq data—either local FASTQs or public datasets from GEO/SRA. Triggers on nf-core, Nextflow, FASTQ analysis, variant calling, gene expression, differential expression, GEO reanalysis, GSE/GSM/SRR accessions, or samplesheet creation.

scientific-problem-selectionSkill

This skill should be used when scientists need help with research problem selection, project ideation, troubleshooting stuck projects, or strategic scientific decisions. Use this skill when users ask to pitch a new research idea, work through a project problem, evaluate project risks, plan research strategy, navigate decision trees, or get help choosing what scientific problem to work on. Typical requests include "I have an idea for a project", "I'm stuck on my research", "help me evaluate this project", "what should I work on", or "I need strategic advice about my research".

scvi-toolsSkill

Deep learning for single-cell analysis using scvi-tools. This skill should be used when users need (1) data integration and batch correction with scVI/scANVI, (2) ATAC-seq analysis with PeakVI, (3) CITE-seq multi-modal analysis with totalVI, (4) multiome RNA+ATAC analysis with MultiVI, (5) spatial transcriptomics deconvolution with DestVI, (6) label transfer and reference mapping with scANVI/scArches, (7) RNA velocity with veloVI, or (8) any deep learning-based single-cell method. Triggers include mentions of scVI, scANVI, totalVI, PeakVI, MultiVI, DestVI, veloVI, sysVI, scArches, variational autoencoder, VAE, batch correction, data integration, multi-modal, CITE-seq, multiome, reference mapping, latent space.

single-cell-rna-qcSkill

Performs quality control on single-cell RNA-seq data (.h5ad or .h5 files) using scverse best practices with MAD-based filtering and comprehensive visualizations. Use when users request QC analysis, filtering low-quality cells, assessing data quality, or following scverse/scanpy best practices for single-cell analysis.

startSkill

Set up your bio-research environment and explore available tools. Use when first getting oriented with the plugin, checking which literature, drug-discovery, or visualization MCP servers are connected, or surveying available analysis skills before starting a new project.

cowork-plugin-customizerSkill

create-cowork-pluginSkill