Skip to main content
ClaudeWave
Skill429 estrellas del repoactualizado 10d ago

exp-simd-vectorization

This Claude Code skill optimizes performance-critical scalar loops in .NET 8+ applications by replacing them with either built-in `Span<T>` methods, `TensorPrimitives` API calls for mathematical operations on numeric arrays, or explicit SIMD intrinsics using Vector128/Vector256/Vector512 types. Use it when processing contiguous numeric data (byte, int, float, double, etc.) in hot paths where vectorization can significantly improve throughput for operations like aggregation, bitwise manipulation, type conversion, or element-wise math.

Instalar en Claude Code
Copiar
git clone --depth 1 https://github.com/managedcode/dotnet-skills /tmp/exp-simd-vectorization && cp -r /tmp/exp-simd-vectorization/catalog/Platform/Official-DotNet-Experimental/skills/exp-simd-vectorization ~/.claude/skills/exp-simd-vectorization
Después abre una sesión nueva de Claude Code; el skill carga automáticamente.

SKILL.md

# SIMD Vectorization

## Decision Gate
1. **Check `Span<T>` and `MemoryExtensions` first.** If the operation can be expressed using built-in `Span<T>` methods (e.g., `Contains`, `IndexOf`, `CopyTo`, `SequenceEqual`) or `MemoryExtensions`, use them — no additional dependency is needed and the runtime already vectorizes many of these internally.
2. **Check for TensorPrimitives next.** If one or more TensorPrimitives methods cover the operation → use them. If the `.csproj` does NOT already reference `System.Numerics.Tensors`, **add the package**, for example: `<PackageReference Include="System.Numerics.Tensors" />` (or use the versioning approach already used by your solution). Then replace the scalar loop with TP calls and stop. See the full API table below. Compose multiple TP calls when needed (e.g., finding both min and max → `TensorPrimitives.Min(span)` + `TensorPrimitives.Max(span)` as two calls). Do NOT write manual Vector128 code for operations TP already handles.
3. **Scalar loop over contiguous array/span** of `byte`, `sbyte`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `nint`, `nuint`, `float`, `double` (and `char` via reinterpretation as `ushort`)? → Implement with explicit `Vector128<T>` / `Vector256<T>` / `Vector512<T>` intrinsics using the patterns below.
4. **No contiguous numeric arrays to process** (dictionary lookups, tree traversals, linked lists, state machines, string formatting, small collections, enum comparisons, recursive algorithms, decimal arithmetic)? → Report `[NO SIMD OPPORTUNITY]` and write a **full paragraph** explaining WHY, referencing the specific code characteristics that prevent vectorization (e.g., "State machines require sequential branching on enum values — there are no contiguous numeric arrays to process in parallel, and each transition depends on the previous state"). This explanation is graded.

## TensorPrimitives API Reference
TensorPrimitives APIs are generic and work for any primitive type that satisfies the method's generic constraints — not just `float`/`double`. For example, `Sum` requires `IAdditionOperators<T,T,T>` + `IAdditiveIdentity<T,T>` and works for all primitive numeric types, while `CosineSimilarity` requires `IRootFunctions<T>` and only works for `float`/`double`. If the project doesn't already reference `System.Numerics.Tensors`, add it to the `.csproj`. Replace the entire manual loop with **one or more** `TensorPrimitives` calls as needed (prefer a single call when possible):

### Reductions (span → scalar)
| Operation | API |
|-----------|-----|
| Sum | `TensorPrimitives.Sum(span)` |
| Sum of squares | `TensorPrimitives.SumOfSquares(span)` |
| Sum of magnitudes (L1 norm) | `TensorPrimitives.SumOfMagnitudes(span)` |
| L2 norm | `TensorPrimitives.Norm(span)` |
| Product of all elements | `TensorPrimitives.Product(span)` |
| Min value | `TensorPrimitives.Min(span)` |
| Max value | `TensorPrimitives.Max(span)` |
| Index of max | `TensorPrimitives.IndexOfMax(span)` |
| Index of min | `TensorPrimitives.IndexOfMin(span)` |
| Dot product | `TensorPrimitives.Dot(a, b)` |
| Cosine similarity | `TensorPrimitives.CosineSimilarity(a, b)` |
| Euclidean distance | `TensorPrimitives.Distance(a, b)` |

### Element-wise transforms (span → span)
| Operation | API |
|-----------|-----|
| Negate | `TensorPrimitives.Negate(src, dst)` |
| Abs | `TensorPrimitives.Abs(src, dst)` |
| Sqrt | `TensorPrimitives.Sqrt(src, dst)` |
| Exp | `TensorPrimitives.Exp(src, dst)` |
| Log | `TensorPrimitives.Log(src, dst)` |
| Log2 | `TensorPrimitives.Log2(src, dst)` |
| Tanh | `TensorPrimitives.Tanh(src, dst)` |
| Sigmoid | `TensorPrimitives.Sigmoid(src, dst)` |
| SoftMax | `TensorPrimitives.SoftMax(src, dst)` |
| Sinh | `TensorPrimitives.Sinh(src, dst)` |
| Cosh | `TensorPrimitives.Cosh(src, dst)` |
| Round | `TensorPrimitives.Round(src, dst)` |
| Floor | `TensorPrimitives.Floor(src, dst)` |
| Ceiling | `TensorPrimitives.Ceiling(src, dst)` |
| CopySign | `TensorPrimitives.CopySign(src, sign, dst)` |
| Pow | `TensorPrimitives.Pow(bases, exponents, dst)` |

### Two-span operations (a, b → dst)
| Operation | API |
|-----------|-----|
| Add | `TensorPrimitives.Add(a, b, dst)` |
| Subtract | `TensorPrimitives.Subtract(a, b, dst)` |
| Multiply | `TensorPrimitives.Multiply(a, b, dst)` |
| Divide | `TensorPrimitives.Divide(a, b, dst)` |
| Element-wise Min | `TensorPrimitives.Min(a, b, dst)` |
| Element-wise Max | `TensorPrimitives.Max(a, b, dst)` |

### Three-span fused operations
| Operation | API |
|-----------|-----|
| (x+y)*z | `TensorPrimitives.AddMultiply(x, y, z, dst)` |
| x*y+z | `TensorPrimitives.MultiplyAdd(x, y, z, dst)` |
| fma(x,y,z) | `TensorPrimitives.FusedMultiplyAdd(x, y, z, dst)` |

> `AddMultiply` and `MultiplyAdd` are distinct — they optimize differently depending on whether the dependency chain flows from the addend or the multiplier. `FusedMultiplyAdd` is the IEEE 754 fused form of (x*y)+z with a single rounding step.

## Manual SIMD with Vector128/Vector256/Vector512

Use this when TensorPrimitives doesn't have a single API for the operation. This is required for byte-level operations, character class counting, range validation, bitwise bulk ops, cross-type conversions, and custom patterns.

### Required imports
```csharp
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
```
Prefer cross-platform APIs (`System.Runtime.Intrinsics`). Only use platform-specific intrinsics (`System.Runtime.Intrinsics.X86`, `.Arm`) when there is a significant performance advantage that justifies the increased code complexity of maintaining separate code paths.

### Three-tier dispatch pattern
Always include all three tiers. Use `if`/`else if` so that small inputs hit only one branch before reaching the scalar fallback — a fallthrough pattern (sequential `if`s) pessimizes the scalar case by requiring up to three not-taken branches that may mispredict. The `IsHardwareAccelerated` checks are JIT-time consta
aspnet-coreSkill

Build, debug, modernize, or review ASP.NET Core applications with correct hosting, middleware, security, configuration, logging, and deployment patterns on current .NET. USE FOR: working on ASP.NET Core apps, services, or middleware; changing auth, routing, configuration, hosting, or deployment behavior; deciding between ASP.NET Core sub-stacks. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

aspireSkill

Build, upgrade, and operate .NET Aspire 13.3.x application hosts with current CLI, AppHost, ServiceDefaults, integrations, dashboard, testing, and Azure deployment patterns for distributed apps. USE FOR: Aspire.AppHost.Sdk, Aspire.Hosting.*, DistributedApplication.CreateBuilder, WithReference, WaitFor, AddProject, AddRedis, AddPostgres, aspire run, aspire init, aspire. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

azure-functionsSkill

Build, review, or migrate Azure Functions in .NET with correct execution model, isolated worker setup, bindings, DI, and Durable Functions patterns. USE FOR: working on Azure Functions in .NET; migrating from the in-process model to the isolated worker model; adding Durable Functions, bindings, or host configuration. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

blazorSkill

Build and review Blazor applications across server, WebAssembly, web app, and hybrid scenarios with correct component design, state flow, rendering, and hosting choices. USE FOR: building interactive web UIs with C# instead of JavaScript; choosing between Server, WebAssembly, or Auto render modes; designing component hierarchies and state. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

entity-framework6Skill

Maintain or migrate EF6-based applications with realistic guidance on what to keep, what to modernize, and when EF Core is or is not the right next step. USE FOR: EF6 codebases; runtime versus ORM migration decisions; EDMX, code-first, ObjectContext, and legacy data-access review. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

entity-framework-coreSkill

Design, tune, or review EF Core data access with proper modeling, migrations, query translation, performance, and lifetime management for modern .NET applications. USE FOR: DbContext, migrations, model configuration, EF queries, tracking, loading, performance, transactions, and EF6 migration decisions. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

mauiSkill

Build, review, or migrate .NET MAUI applications across Android, iOS, macOS, and Windows with correct cross-platform UI, platform integration, and native packaging assumptions. USE FOR: working on cross-platform mobile or desktop UI in .NET MAUI; integrating device capabilities, navigation, or platform-specific code; migrating Xamarin.Forms or aligning. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.

mlnetSkill

Use ML.NET to train, evaluate, or integrate machine-learning models into .NET applications with realistic data preparation, inference, and deployment expectations. USE FOR: ML.NET integration; local model training or retraining; inference pipelines, model loading, evaluation, and deployment review. DO NOT USE FOR: unrelated stacks; generic tasks that do not need this specific guidance. INVOKES: inspect the repository context, edit targeted files, and run relevant build, test, lint, or validation commands when changes are made.