ethics-governance
Bias detection and mitigation, fairness metrics, privacy frameworks, consent models, transparency requirements, and accountability structures for data science practice. Covers algorithmic bias sources, disparate impact testing, differential privacy, GDPR principles, model cards, datasheets for datasets, responsible AI frameworks, and the organizational governance needed to make ethics actionable. Use when auditing models for bias, designing privacy-preserving systems, establishing governance processes, or evaluating the social impact of data-driven decisions.
git clone --depth 1 https://github.com/Tibsfox/gsd-skill-creator /tmp/ethics-governance && cp -r /tmp/ethics-governance/examples/skills/data-science/ethics-governance ~/.claude/skills/ethics-governanceSKILL.md
# Ethics and Governance in Data Science Data science operates on people's data, affects people's lives, and encodes human decisions into automated systems. Ethics in data science is not an afterthought or a compliance checkbox -- it is a design requirement. Ruha Benjamin's concept of the "New Jim Code" names the reality that automated systems can reproduce and amplify existing social inequalities while appearing objective. This skill covers the principles, frameworks, and practices that make ethical data science concrete and actionable. **Agent affinity:** benjamin (bias audit, fairness analysis, ethical review), nightingale (routing ethics queries), cairo (communicating ethical findings) **Concept IDs:** data-privacy-consent, data-algorithmic-bias, data-data-ownership, data-responsible-practice ## Sources of Bias ### Where Bias Enters the Pipeline Bias can enter at every stage of the data science workflow. It is not a property of algorithms alone -- it is a property of the system: data, design decisions, deployment context, and feedback loops. | Stage | Bias type | Example | |---|---|---| | **Problem formulation** | Framing bias | Defining "success" as engagement maximizes addictive behavior | | **Data collection** | Selection bias | Training a facial recognition system on predominantly light-skinned faces | | **Data labeling** | Annotation bias | Labelers' cultural assumptions influence what counts as "toxic" speech | | **Feature engineering** | Proxy bias | ZIP code encodes race due to residential segregation | | **Model training** | Optimization bias | Minimizing overall error ignores disparate performance across subgroups | | **Evaluation** | Metric bias | Reporting aggregate accuracy hides poor performance on minority groups | | **Deployment** | Automation bias | Decision-makers defer to model output without scrutiny | | **Feedback loops** | Amplification bias | Predictive policing increases patrols in targeted areas, generating more arrests, confirming the model | ### Historical Bias vs. Representation Bias - **Historical bias:** The world is unequal, and data reflects that inequality. A hiring model trained on historical decisions inherits past discrimination. Even a "perfect" model of biased reality produces biased outputs. - **Representation bias:** The training data does not represent the deployment population. A speech recognition system trained on American English performs poorly on other dialects. This is not a bug in the algorithm -- it is a gap in the data. Both are real. Neither is solved by "better algorithms" alone. The fix requires changes to data collection, problem formulation, and deployment monitoring. ## Fairness Metrics ### Impossibility Theorem Chouldechova (2017) and Kleinberg, Mullainathan, and Raghavan (2016) independently proved that three natural fairness criteria cannot all be satisfied simultaneously when base rates differ between groups: 1. **Calibration:** Among those predicted positive, the fraction truly positive is the same across groups. 2. **Equal false positive rate:** The rate of incorrectly predicting positive is the same across groups. 3. **Equal false negative rate:** The rate of incorrectly predicting negative is the same across groups. When the base rate (actual positive rate) differs between groups, satisfying any two of these requires violating the third. This is not a technical limitation to be solved -- it is a value judgment about which type of error matters more. The choice must be made explicitly, not hidden inside a loss function. ### Common Fairness Definitions | Metric | Definition | When appropriate | |---|---|---| | **Demographic parity** | P(positive prediction) is equal across groups | When the prediction itself causes differential treatment | | **Equalized odds** | TPR and FPR are equal across groups | When false positives and false negatives have different costs | | **Equal opportunity** | TPR is equal across groups (weaker than equalized odds) | When false negatives are the primary concern (e.g., loan approval for qualified applicants) | | **Predictive parity** | Precision is equal across groups | When the model's positive predictions trigger consequential actions | | **Individual fairness** | Similar individuals receive similar predictions | When you can define a meaningful similarity metric | | **Counterfactual fairness** | Prediction would be the same in a counterfactual world where the individual belonged to a different group | When causal reasoning is possible and the causal model is trusted | ### Measuring Disparate Impact The four-fifths rule (EEOC, 1978): if the selection rate for a protected group is less than 80% of the rate for the most-selected group, there is evidence of adverse impact. Disparate impact ratio = (selection rate for protected group) / (selection rate for most-selected group) If this ratio < 0.8, investigate. This is a screening heuristic, not a legal standard -- but it is widely used as a first check. ## Privacy ### Privacy Principles (GDPR Framework) | Principle | Meaning | Practical implication | |---|---|---| | **Lawfulness** | Legal basis for processing | Document the legal basis (consent, legitimate interest, contract, etc.) | | **Purpose limitation** | Collect for specified purposes only | Do not repurpose data without new consent or legal basis | | **Data minimization** | Collect only what is necessary | Every field in the dataset should have a documented purpose | | **Accuracy** | Keep data correct and current | Provide mechanisms for correction; audit data quality | | **Storage limitation** | Do not keep data longer than needed | Define retention periods; delete when expired | | **Integrity and confidentiality** | Protect against unauthorized access | Encryption, access controls, audit logs | | **Accountability** | Demonstrate compliance | Documentation, impact assessments, designated roles | ### Anonymization Techniques | Technique | How it works | Limitation |
Major art movements and their historical context for art education. Covers 12 movements from the Renaissance to contemporary art, their defining characteristics, key artists, signature works, and the intellectual/social forces that produced them. Use when analyzing artworks in historical context, understanding stylistic lineages, identifying influences across periods, or connecting studio practice to art-historical precedent.
Color theory principles for art education. Covers the three color properties (hue, saturation, value), color mixing systems (subtractive and additive), color relationships (complementary, analogous, triadic, split-complementary), color temperature, simultaneous contrast and the relativity of color perception, and practical palette construction. Use when analyzing color in artworks, planning color schemes, understanding optical phenomena in painting, or investigating Albers's Interaction of Color experiments.
The creative process in art from idea to exhibition. Covers five phases of creative work (inspiration, incubation, exploration, execution, reflection), sketchbook practice, artist statements, critique methodology (formal and conceptual), portfolio development, and the studio as a working environment. Use when guiding students through project development, facilitating critique sessions, developing artist statements, curating portfolios, or understanding how professional artists structure their creative practice.
Digital art tools, techniques, and workflows for art education. Covers raster and vector workflows, digital painting, photo manipulation, generative and procedural art, 3D modeling and rendering, pixel art, the relationship between traditional skills and digital execution, and ethical considerations of AI-generated imagery. Use when working with digital tools, evaluating digital art, or bridging traditional art concepts into digital practice.
Observational drawing and visual perception techniques for art education. Covers contour drawing, gesture drawing, negative space, proportion and measurement, value mapping, spatial depth cues, and the cognitive shift from symbolic to perceptual seeing. Use when teaching drawing fundamentals, analyzing observational accuracy, or developing visual literacy in any medium.
Three-dimensional art and sculptural thinking for art education. Covers additive and subtractive sculptural processes, armature construction, modeling in clay, carving principles, casting and moldmaking, assemblage and found-object sculpture, installation art as expanded sculpture, and the conceptual transition from pictorial to spatial thinking. Use when working with three-dimensional media, analyzing sculptural form, understanding spatial composition, or investigating the relationship between sculpture and site.
Celestial coordinate systems and sky positioning. Covers horizon (altitude-azimuth), equatorial (right ascension-declination), ecliptic, and galactic systems; epoch and precession; coordinate transformations; planisphere use; and practical sky-locating from any latitude and date. Use when locating objects, planning observations, converting catalog coordinates, or teaching the geometry of the sky.
Observational cosmology from Hubble's law to the CMB. Covers redshift, Hubble expansion, the cosmological parameters, the cosmic microwave background, large-scale structure, galaxy rotation curves and dark matter, Type Ia SNe and dark energy, and the current state of Lambda-CDM. Use when reasoning about the large-scale universe, interpreting cosmological surveys, or teaching the Big Bang evidence chain.