Skip to main content
ClaudeWave
Skill65 repo starsupdated yesterday

distributed-storage

Distributed storage systems design and operation for cloud platforms. Covers the GFS/HDFS block-and-master pattern, object storage (Swift/S3) with consistent hashing and eventual consistency, block storage semantics, replication vs erasure coding, the CAP theorem in practice, read-repair and anti-entropy, snapshot chains, and the GFS/BigTable/Spanner evolution. Use when designing a storage subsystem, choosing between object/block/file, or reviewing a replication and consistency strategy.

Install in Claude Code
Copy
git clone --depth 1 https://github.com/Tibsfox/gsd-skill-creator /tmp/distributed-storage && cp -r /tmp/distributed-storage/examples/skills/cloud-systems/distributed-storage ~/.claude/skills/distributed-storage
Then start a new Claude Code session; the skill loads automatically.

SKILL.md

# Distributed Storage

Distributed storage is where durability, consistency, latency, and cost collide. Every design choice trades among them, and the trade-offs cannot be hidden from applications for long. This skill covers the landmark systems and the recurring patterns they embody, from the GFS master-chunkserver split that defined cloud-scale storage, to the Dynamo-style consistent-hash rings that power modern object stores, to the Spanner-style externally consistent databases built on TrueTime.

**Agent affinity:** ghemawat (GFS and storage systems craftsmanship), dean (BigTable, Spanner, and the evolution from GFS), decandia (Dynamo and eventually consistent stores)

**Concept IDs:** cloud-cinder-block-storage, cloud-swift-glance-object-image, cloud-nova-instances

## The Three Storage Shapes

Cloud platforms expose storage in three shapes, each with a different access model and consistency profile.

**Object storage.** Immutable blobs with rich metadata, accessed by key, usually over HTTP. Examples: Amazon S3, OpenStack Swift, Google Cloud Storage. Consistent hashing distributes objects across nodes; replication or erasure coding provides durability. Objects are typically append-or-replace — partial updates are not the native operation. Best for: media, backups, logs, analytics input, static web content.

**Block storage.** Virtual disks presented as block devices to virtual machines. Examples: Amazon EBS, OpenStack Cinder, Google Persistent Disk. Provides the POSIX-ish semantics of a local disk but with the durability and mobility of a network service. Typically attached to a single instance at a time. Best for: databases, filesystems, anything that expects random-access block semantics.

**File storage.** Shared filesystem accessed via NFS, SMB, or similar protocols. Examples: Amazon EFS, OpenStack Manila, GCS Filestore. Multiple clients mount the same filesystem and see each other's writes. Best for: legacy applications expecting a POSIX filesystem and shared access.

These are not interchangeable. Picking the wrong shape — object where you need block, block where you need file, file where you need object — produces workarounds that dominate the cost of running the system.

## The GFS Pattern: Master and Chunkservers

The Google File System (Ghemawat, Gobioff, Leung, 2003) introduced the architecture that most large distributed filesystems still use:

- **Single master.** Holds all metadata: namespace, access control, mapping from files to chunks, mapping from chunks to chunkservers. Metadata fits in RAM for the scales GFS was built for.
- **Chunkservers.** Hold actual file data in 64 MB chunks (later 64-256 MB in successors). Each chunk is replicated 3 times across different failure domains.
- **Clients.** Query the master for chunk locations, then read and write directly from/to chunkservers. The master is not on the data path.

Key design decisions that shaped the cloud era:

- Optimize for large files and append-heavy workloads, not random writes.
- Assume component failures are routine, not exceptional.
- Push consistency compromises to the application — GFS is "consistent but defined" (all mutations succeed in the same order on all replicas) rather than fully linearizable.
- The master's single point of failure is mitigated by a shadow master and operational processes, not by distributed consensus.

HDFS (Hadoop Distributed File System) is an open-source reimplementation of GFS with slightly different choices (smaller default chunk size, different consistency model).

## Object Storage: Consistent Hashing and Eventual Consistency

Object stores at cloud scale are built around consistent hashing rings (Swift, Ceph RADOS, Riak, Cassandra). The ring distributes objects across nodes without a central metadata server, and tolerates node additions and removals with minimal data movement.

**Consistent hashing basics.**

1. Hash each node onto a ring (a large integer space).
2. Hash each object key onto the same ring.
3. An object is assigned to the first node found by walking clockwise from its hash.
4. For replication, use the first N nodes.

**Virtual nodes.** To smooth load imbalances, each physical node contributes many "virtual nodes" at different ring positions. This reduces the variance of the load distribution.

**Consistency profile.** Object stores typically provide strong read-your-writes consistency for a single object (after a write, subsequent reads from the same client see the new value) and eventual consistency across replicas (all replicas converge to the same value over time, but a read right after a write from a different client may see the old value).

## Replication vs Erasure Coding

Two ways to achieve durability:

**Replication.** Store N complete copies. Simple, fast reads (read any replica), survives N-1 failures. Storage overhead is N:1.

**Erasure coding.** Encode an object as K data chunks plus M parity chunks such that any K chunks can reconstruct the original. Storage overhead is (K+M)/K. Example: Reed-Solomon (10,4) needs 14 chunks total to store 10 chunks of data, survives 4 failures, storage overhead 1.4x.

Erasure coding is dramatically more space-efficient but slower to read (may need to reconstruct from multiple chunks) and more complex to repair. Modern object stores use both: replication for hot data and recent writes, erasure coding for cold data.

## CAP Theorem in Practice

Brewer's CAP theorem says that in the presence of a network partition, you can have Consistency (every read sees the latest write) or Availability (every request gets a response), but not both. You cannot opt out of partitions — they happen — so the real choice is between CP and AP during a partition.

- **CP systems.** Consistency over availability. During a partition, minority partitions stop serving. Examples: ZooKeeper, etcd, Spanner.
- **AP systems.** Availability over consistency. During a partition, all partitions keep serving but may see different values. Exa
art-history-movementsSkill

Major art movements and their historical context for art education. Covers 12 movements from the Renaissance to contemporary art, their defining characteristics, key artists, signature works, and the intellectual/social forces that produced them. Use when analyzing artworks in historical context, understanding stylistic lineages, identifying influences across periods, or connecting studio practice to art-historical precedent.

color-theorySkill

Color theory principles for art education. Covers the three color properties (hue, saturation, value), color mixing systems (subtractive and additive), color relationships (complementary, analogous, triadic, split-complementary), color temperature, simultaneous contrast and the relativity of color perception, and practical palette construction. Use when analyzing color in artworks, planning color schemes, understanding optical phenomena in painting, or investigating Albers's Interaction of Color experiments.

creative-processSkill

The creative process in art from idea to exhibition. Covers five phases of creative work (inspiration, incubation, exploration, execution, reflection), sketchbook practice, artist statements, critique methodology (formal and conceptual), portfolio development, and the studio as a working environment. Use when guiding students through project development, facilitating critique sessions, developing artist statements, curating portfolios, or understanding how professional artists structure their creative practice.

digital-artSkill

Digital art tools, techniques, and workflows for art education. Covers raster and vector workflows, digital painting, photo manipulation, generative and procedural art, 3D modeling and rendering, pixel art, the relationship between traditional skills and digital execution, and ethical considerations of AI-generated imagery. Use when working with digital tools, evaluating digital art, or bridging traditional art concepts into digital practice.

drawing-observationSkill

Observational drawing and visual perception techniques for art education. Covers contour drawing, gesture drawing, negative space, proportion and measurement, value mapping, spatial depth cues, and the cognitive shift from symbolic to perceptual seeing. Use when teaching drawing fundamentals, analyzing observational accuracy, or developing visual literacy in any medium.

sculpture-3dSkill

Three-dimensional art and sculptural thinking for art education. Covers additive and subtractive sculptural processes, armature construction, modeling in clay, carving principles, casting and moldmaking, assemblage and found-object sculpture, installation art as expanded sculpture, and the conceptual transition from pictorial to spatial thinking. Use when working with three-dimensional media, analyzing sculptural form, understanding spatial composition, or investigating the relationship between sculpture and site.

celestial-coordinatesSkill

Celestial coordinate systems and sky positioning. Covers horizon (altitude-azimuth), equatorial (right ascension-declination), ecliptic, and galactic systems; epoch and precession; coordinate transformations; planisphere use; and practical sky-locating from any latitude and date. Use when locating objects, planning observations, converting catalog coordinates, or teaching the geometry of the sky.

cosmological-observationSkill

Observational cosmology from Hubble's law to the CMB. Covers redshift, Hubble expansion, the cosmological parameters, the cosmic microwave background, large-scale structure, galaxy rotation curves and dark matter, Type Ia SNe and dark energy, and the current state of Lambda-CDM. Use when reasoning about the large-scale universe, interpreting cosmological surveys, or teaching the Big Bang evidence chain.