github-commit-recovery
This skill recovers deleted or inaccessible commits from GitHub repositories using commit SHAs through REST API calls, direct web URLs, and git operations. Use it when you have commit identifiers and need to retrieve commit content, diffs, patches, or metadata from commits that were force-pushed over or otherwise removed from standard branches, since GitHub retains these commits indefinitely on its servers despite marking them as deleted.
git clone --depth 1 https://github.com/deonmenezes/mantishack /tmp/github-commit-recovery && cp -r /tmp/github-commit-recovery/.claude/skills/oss-forensics/github-commit-recovery ~/.claude/skills/github-commit-recoverySKILL.md
# GitHub Commit Recovery
**Purpose**: Access commit content, diffs, and metadata directly from GitHub when you have commit SHAs. Includes methods for retrieving "deleted" commits that remain accessible on GitHub servers.
## When to Use This Skill
- You have commit SHAs and need actual code content
- Investigating commits that were force-pushed over ("deleted")
- Need commit diffs, patches, or full file contents
- Verifying commit authorship or metadata
- Retrieving content from dangling commits
**SHA Sources**: GitHub Archive, git reflog, CI/CD logs, PR comments, issue references, external archives, security reports.
## Core Principles
**Deleted Commits Are Never Really Deleted**:
- When developers force push to "delete" commits, GitHub keeps them indefinitely
- Any commit SHA remains accessible if you know the hash
- GitHub displays a warning ("This commit does not belong to any branch") but serves the content
- Even 4 hex digits can access commits (with collision risk)
**Rate Limits Matter**:
- Authenticated API: 5,000 requests/hour
- Unauthenticated API: 60 requests/hour
- Web interface: Undocumented limits, WAF may block heavy usage
- Git operations: No explicit limit, but excessive cloning may trigger throttling
## Quick Start
**Access a "deleted" commit via web browser**:
```
https://github.com/org/repo/commit/FULL_COMMIT_SHA
```
**Get commit as patch file**:
```bash
curl -L https://github.com/org/repo/commit/FULL_COMMIT_SHA.patch
```
**Query via REST API**:
```bash
curl -H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/repos/org/repo/commits/FULL_COMMIT_SHA
```
## Accessing Deleted Commits
### Method 1: Direct Web Access
GitHub serves "deleted" commits at predictable URLs. These commits show a warning banner but content remains fully accessible.
**Commit View**:
```
https://github.com/<ORG>/<REPO>/commit/<SHA>
```
**Patch Format** (raw diff with headers):
```
https://github.com/<ORG>/<REPO>/commit/<SHA>.patch
```
**Diff Format** (unified diff only):
```
https://github.com/<ORG>/<REPO>/commit/<SHA>.diff
```
**Example**:
```bash
# View commit that was force-pushed over
curl -L https://github.com/grapefruit623/gcloud-python/commit/e9c3d31212847723aec86ef96aba0a77f9387493
# Download as patch
curl -L -o leaked_commit.patch \
https://github.com/grapefruit623/gcloud-python/commit/e9c3d31212847723aec86ef96aba0a77f9387493.patch
```
**Short SHA Access**: GitHub allows accessing commits with just 4+ hex characters (if unique):
```
https://github.com/org/repo/commit/e9c3
```
### Method 2: REST API
The GitHub REST API provides structured commit data including file changes, author info, and commit message.
**Endpoint**:
```
GET https://api.github.com/repos/{owner}/{repo}/commits/{ref}
```
**Example Request**:
```bash
curl -H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
https://api.github.com/repos/org/repo/commits/abc123def456
```
**Response Structure**:
```json
{
"sha": "abc123def456...",
"commit": {
"author": {
"name": "Developer Name",
"email": "dev@example.com",
"date": "2025-06-15T14:23:11Z"
},
"message": "Commit message here"
},
"files": [
{
"filename": "src/config.js",
"status": "added",
"patch": "@@ -0,0 +1,3 @@\n+// config"
}
]
}
```
**Rate Limit Headers**:
```
x-ratelimit-limit: 5000
x-ratelimit-remaining: 4999
x-ratelimit-reset: 1623456789
```
### Method 3: Git Fetch
For bulk analysis or when you need full repository context, fetch specific commits via Git.
**Minimal Clone + Fetch Specific Commit**:
```bash
# Clone without file contents (just history/trees/commits)
git clone --filter=blob:none --no-checkout https://github.com/org/repo.git
cd repo
# Fetch the specific "deleted" commit
git fetch origin <COMMIT_SHA>
# View the commit
git show FETCH_HEAD
# View specific file from that commit
git show FETCH_HEAD:path/to/file.txt
```
**Why This Works**:
- `--filter=blob:none`: Omits file contents initially (fast clone)
- `--no-checkout`: Doesn't populate working directory
- `git fetch origin <SHA>`: Retrieves specific commit even if "deleted"
- Blobs are fetched on-demand when you access them
## Investigation Patterns
### Batch Download Patches
**Scenario**: You have a list of commit SHAs to investigate and need their content.
```python
import requests
import time
def download_commit_patch(repo, sha, token=None):
url = f"https://github.com/{repo}/commit/{sha}.patch"
headers = {"Authorization": f"Bearer {token}"} if token else {}
response = requests.get(url, headers=headers, allow_redirects=True)
if response.status_code == 200:
return response.text
return None
# Download patches for a list of commits
commits = [
{"repo": "org/repo1", "sha": "abc123..."},
{"repo": "org/repo2", "sha": "def456..."},
]
for commit in commits:
patch = download_commit_patch(commit["repo"], commit["sha"])
if patch:
with open(f"{commit['sha'][:8]}.patch", "w") as f:
f.write(patch)
time.sleep(0.5) # Rate limit courtesy
```
### Verifying Commit Authorship
**Scenario**: Need to verify who actually authored a suspicious commit (committer vs author can differ).
**API Query**:
```bash
curl -s -H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/repos/org/repo/commits/SHA" | \
jq '{
author: .commit.author,
committer: .commit.committer,
verified: .commit.verification.verified
}'
```
**Response Analysis**:
```json
{
"author": {
"name": "Real Developer",
"email": "dev@company.com",
"date": "2025-06-15T10:00:00Z"
},
"committer": {
"name": "CI Bot",
"email": "bot@company.com",
"date": "2025-06-15T10:05:00Z"
},
"verified": false
}
```
**Forensic Notes**:
- Author: Who wrote the code (can be forged via `git commit --author`)
- Committer: Who created the commit object
- Verified: Whether commit has valid GPG signature
-Use this agent when the target is a LIVE REST or GraphQL API you are authorized to test and the question is "can I tamper request bodies, headers, ids, and tokens to read or act on data that isn't mine?" — active, request-driven abuse of the API contract, not static code review. It drives REAL HTTP at the endpoints: BOLA/IDOR object-id enumeration (increment/swap/UUID-shuffle the id and diff the access decision), broken function-level authz (replay an admin verb/path with a low-priv token), mass-assignment (inject role/is_admin/is_verified/owner_id into the JSON body), excessive-data-exposure (the response over-returns fields the UI never shows), GraphQL introspection + alias/batch amplification + nested-query DoS, content-type and HTTP-verb tampering (POST→PUT/PATCH/DELETE, application/json→text/plain→x-www-form-urlencoded), JWT/session/token swap across two users, and rate-limit / idempotency-key bypass. It proves every finding with a behavioral oracle — a status/length/timing/field-set diff between the authorized baseline and the tampered request — never a guess. Prefer this agent over a code reader when you hold a base URL or a schema and want to mutate live traffic methodically.\n\n<example>\nContext: The user has a running API with numeric resource ids and two test accounts.\nuser: "Here's our staging API at https://api.staging.acme.test and tokens for user A and user B — can user A read user B's orders?"\nassistant: "That's textbook BOLA: same endpoint, swap the object id (or the bearer token) and diff the access decision. I'll use the Task tool to launch the api-abuse-fuzzer agent to enumerate /orders/{id} with A's token against B's ids and prove the cross-tenant read with a status + ownership-field oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer: a live authorized API + two tokens + object-id enumeration is its core BOLA/IDOR mission.\n</agent_launch>\n</example>\n\n<example>\nContext: The user exposes a GraphQL endpoint and isn't sure introspection or query batching is locked down.\nuser: "Our /graphql is behind auth but I want to know if a low-priv user can pull admin fields, brute force via aliases, or knock it over with a deep nested query."\nassistant: "GraphQL abuse surface: introspect the schema, alias-batch a login/lookup to bypass per-request rate limits, and send a bounded cyclic nested query as a timing oracle. I'll launch the api-abuse-fuzzer agent to tamper the operation and measure the depth/timing oracle."\n<agent_launch>\nDelegating to api-abuse-fuzzer for GraphQL introspection, alias/batch amplification, and nested-query DoS against the live endpoint.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when: a live base URL + an OpenAPI/Swagger/GraphQL schema (or a captured request) is in hand and the target is authorized in-scope; endpoints take a resource identifier in the path/query/body (/users/{id}, ?account=, {"order_id": ...}) — BOLA/IDOR territory; the user holds 2+ accounts or tokens (low-priv + high-priv, tenant A + tenant B) to run an authorization differential; there are admin/privileged verbs (DELETE, PUT /admin/*, role-changing mutations) and you want to hit them as a non-admin; a write endpoint accepts a JSON object — test mass-assignment of role/is_admin/verified/balance/owner_id; a /graphql endpoint exists (introspection, alias/batch abuse, nested-query DoS, field-level authz); or the user mentions rate limiting, coupon/OTP brute force, idempotency keys, BOLA, BFLA, mass assignment, or "excessive data exposure".
Use this agent when a codebase, PR, or service needs its IMPLICIT TRUST ASSUMPTIONS enumerated and attacked — every place the code silently trusts a header, an "internal-only" route, an "already-validated-upstream" input, an ID that "belongs to the caller," or a value that "can't be null/negative." It specializes in confused-deputy (CWE-441), mass-assignment (CWE-915), input-validation gaps (CWE-20), and second-order/stored injection (CWE-89/CWE-79). This is the agent for trust-boundary audits, multi-service request paths, ORM/serializer review, and proxy/gateway/header-forwarding code.\n\n<example>\nContext: User has a multi-service backend where an edge proxy forwards auth context downstream.\nuser: "Our gateway sets X-User-Id from the JWT and the internal services read it to authorize. Can you check this?"\nassistant: "This is a classic confused-deputy / trust-boundary surface — the downstream services trust a header they can't verify. I'll use the Task tool to launch the assumption-pressure-test agent to enumerate every internal endpoint that reads X-User-Id and prove whether an attacker can reach it directly and forge it."\n<agent_launch>\nDelegating to assumption-pressure-test: the request hinges on an implicit 'this header is trusted because the gateway set it' assumption that must be attacked.\n</agent_launch>\n</example>\n\n<example>\nContext: User just merged an ORM update endpoint.\nuser: "Here's the new profile update handler: User.objects.filter(id=request.user.id).update(**request.data)"\nassistant: "That `**request.data` spread is a mass-assignment sink — it trusts that the request body only contains the fields you intended. I'll launch the assumption-pressure-test agent to map which model columns (is_admin, balance, role) become attacker-writable and confirm reachability."\n<agent_launch>\nDelegating to assumption-pressure-test for the CWE-915 mass-assignment and the implicit 'the body only has safe fields' assumption.\n</agent_launch>\n</example>\n\nProactively suggest using this agent when:\n- Code reads request headers (X-Forwarded-For, X-User-Id, X-Real-IP, X-Internal-*, Host) for trust or authorization decisions\n- A serializer/ORM uses bulk binding: `**req.body`, `Object.assign`, `ModelMapper`, `BeanUtils.copyProperties`, `update_attributes`, `params.permit!`\n- Comments or names assert trust: "internal only", "already validated", "trusted", "comes from gateway", "sanitized upstream"\n- Data is stored then later concatenated into SQL/HTML/shell (second-order injection)\n- An endpoint takes an `id`/`uuid`/`account`/`order` param that maps to a resource (IDOR / object ownership)
Generate gcov coverage data for a code repository.
Analyze security bugs from any C/C++ project with full root-cause tracing
Analyze crashes using rr recordings, function traces, and coverage data to produce root-cause analyses.
Carefully analyze root cause analysis reports for crashes to make sure they are correct
Multi-stage pipeline to validate vulnerability findings are real, reachable, and exploitable
|