Skill29.2k estrellas del repoactualizado today

memory-load-check

The memory-load-check skill reviews code changes for unbounded data loading into Node/Bun processes, particularly in background jobs, cron tasks, file parsing, API integrations, and migrations. Use it when examining PRs that read multiple rows, files, API responses, or logs into memory to verify explicit bounds on row counts, byte limits, concurrent operations, and pagination to prevent memory exhaustion and process crashes.

Ver fuente Repositorio: sim

Instalar en Claude Code

Copiar

git clone --depth 1 https://github.com/simstudioai/sim /tmp/memory-load-check && cp -r /tmp/memory-load-check/.agents/skills/memory-load-check ~/.claude/skills/memory-load-check

Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

Definición

SKILL.md

# Memory Load Check

Use this skill when a PR or diff could load unbounded data into a Node/Bun process, especially in cron routes, background tasks, API routes, workflow execution, file parsing, cleanup jobs, migrations, import/export flows, and external API integrations.

## Review Goal

Prove each changed path has explicit bounds for:
- rows held in memory
- bytes held in memory
- concurrent promises, DB queries, HTTP calls, storage operations, and jobs
- number of pages, batches, chunks, retries, and retained intermediate objects

If any bound depends only on current production size or "probably small" data, treat it as a finding.

## References

Read these when doing a deeper pass:
- Node.js streams/backpressure: https://nodejs.org/learn/modules/backpressuring-in-streams
- Node.js stream usage: https://nodejs.org/en/learn/modules/how-to-use-streams
- Keyset/cursor pagination over offset scans: https://blog.sequinstream.com/keyset-cursors-not-offsets-for-postgres-pagination/
- Postgres pagination tradeoffs: https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

## Sim Helpers To Prefer

- `apps/sim/lib/cleanup/batch-delete.ts`
  - `chunkedBatchDelete`: bounded SELECT -> optional side effect -> DELETE loop.
  - `batchDeleteByWorkspaceAndTimestamp`: common workspace/timestamp cleanup wrapper.
  - `selectRowsByIdChunks`: chunks large ID sets and enforces an overall row cap.
  - `chunkArray`: use only after the input set itself is already bounded.
- `apps/sim/lib/core/utils/stream-limits.ts`
  - `PayloadSizeLimitError`
  - `assertKnownSizeWithinLimit`
  - `assertContentLengthWithinLimit`
  - `readStreamToBufferWithLimit`
  - `readNodeStreamToBufferWithLimit`
  - `readResponseToBufferWithLimit`
  - `readResponseTextWithLimit`
- Cleanup dispatcher pattern in `apps/sim/lib/billing/cleanup-dispatcher.ts`
  - page active workspaces with `WHERE id > afterId ORDER BY id LIMIT N`
  - dispatch concrete chunks (`workspaceIds`, retention, label) instead of one giant scope
  - prefer Trigger.dev queue/concurrency keys when available
  - execute inline fallback chunks sequentially, not with unbounded `Promise.all`
- File parse route pattern in `apps/sim/app/api/files/parse/route.ts`
  - cap downloads and parsed output separately
  - preserve partial results when a later item exceeds the cap
  - never read untrusted response bodies without a byte cap
- KB connector file downloads in `apps/sim/connectors/utils.ts`
  - `CONNECTOR_MAX_FILE_BYTES`: shared per-file cap (aligned with the manual KB upload limit)
  - `readBodyWithLimit`: stream a download body to a Buffer with a hard byte cap (null on overflow)
  - `stubOrSkipBySize`: listing-time skip when the reported size exceeds the cap
  - `markSkipped` / `sizeLimitSkipReason`: surface oversized files as failed (skipped) KB rows
  - `ConnectorFileTooLargeError`: thrown mid-download when the listing under-reported size
- Large workflow value payloads
  - prefer durable references/manifests over inlining large arrays or files
  - materialize refs only behind an explicit byte budget

## KB Connector File Size Handling

The connector size pattern in `apps/sim/connectors/utils.ts` (`CONNECTOR_MAX_FILE_BYTES` + `readBodyWithLimit` + `stubOrSkipBySize`/`markSkipped`) exists for one risk: a knowledge-base connector downloading **arbitrary, user-controlled file bytes** that the source does not hard-cap. Apply it by that risk, not by the connector's name.

Use the pattern when the connector downloads file content via a stream/`download_url` where the user controls the size:
- file-storage connectors: Dropbox, OneDrive, SharePoint, Google Drive, S3, GitHub, GitLab, Azure DevOps
- any connector that fetches a file via a download URL even if it is not a "storage" service (e.g. the Zoom transcript `.vtt`)

For those, require all three:
- stream the body with `readBodyWithLimit(resp, CONNECTOR_MAX_FILE_BYTES)` — never raw `response.text()`/`response.arrayBuffer()`
- skip oversize at listing (`stubOrSkipBySize` with the reported size) and again at fetch time (overflow -> `markSkipped`), since the listing size can be missing or under-reported
- never drop/truncate silently — oversized files become content-less failed rows carrying `skippedReason`, so they stay visible in the KB UI instead of vanishing from the index

Skip the pattern when the source already bounds the payload:
- pure API/structured-data connectors (Jira, Linear, Notion, Confluence, Sentry, Slack, Zendesk, Gmail, ...) — paginated JSON/text; apply normal pagination + concurrency bounds instead of a per-file byte cap
- native-document connectors capped by the platform (Google Docs ~50 MB, Google Sheets via `MAX_ROWS`, Evernote ~25 MB/note) — a 100 MB cap can never fire, and wrapping a `response.json()`/Thrift parse in `readBodyWithLimit` is cargo-culting

Litmus test: "Can a user make this one fetch arbitrarily large, with nothing upstream stopping it?" Yes -> use the pattern. No (platform hard-cap, or already paginated) -> a per-file byte cap adds noise, not safety. Borderline: a user-configured/self-hosted endpoint with no platform cap (e.g. Obsidian) — bound it only if the content is genuinely unbounded.

## Review Workflow

1. Identify every changed data source:
   - database queries
   - storage lists/downloads/uploads
   - external API pagination
   - file reads and HTTP responses
   - workflow logs, snapshots, payloads, arrays, and manifests
   - queues, cron routes, and background jobs
2. For each source, write down the maximum cardinality and maximum bytes. If the code does not enforce one, it is unbounded.
3. Trace whether data is processed incrementally or accumulated:
   - arrays from `select`, `findMany`, `Promise.all`, `map`, `filter`, `flatMap`
   - maps/sets keyed by all users, workspaces, executions, files, or rows
   - `Buffer.concat`, `response.arrayBuffer()`, `response.text()`, `JSON.stringify`, `JSON.parse`
   - queues of promises or job payloads built before dispatch
4. Che

Del mismo repositorio

add-blockSlash Command

Create a block configuration for a Sim integration with proper subBlocks, conditions, and tool wiring

add-connectorSlash Command

Add a knowledge base connector for syncing documents from an external source

add-enrichmentSlash Command

Add a code-defined table enrichment (registry entry) backed by a provider cascade, ensuring each provider tool has hosted-key support

add-hosted-keySkill

Add hosted API key support to a tool so Sim provides the key when users don't bring their own. Use when adding hosted keys, BYOK support, hideWhenHosted, or hosted key pricing to a tool or block.

add-integrationSlash Command

Add a complete integration to Sim (tools, block, icon, registration)

add-modelSlash Command

Add a new LLM model to apps/sim/providers/models.ts with specs verified against the provider's live API docs (no hallucination)

add-toolsSlash Command

Create tool configurations for a Sim integration by reading API docs

add-triggerSlash Command

Create webhook or polling triggers for a Sim integration