Audit CI/CD Integration — Gotchas
Hard-won lessons from getting theaccessible-audit-ci working end-to-end. Each entry has the symptom, the root cause, and the fix so future-you (or a future product) doesn’t pay the same tax.
1. gh secret set --body - (stdin pipe) silently mangles the value
Symptom: gh secret set FOO --repo R --body - piped from echo -n or printf. The secret stores “successfully” (no error, gh secret list shows the entry), but every workflow run reads it back as exactly 1 character. The actual value isn’t 1 char.
Root cause: Some interaction between the gh CLI’s stdin reader and the macOS/zsh pipe environment in this session. Pre-existing secrets (stored via other means) are unaffected. Reproducible: re-storing the same value via --body 'literal' (no stdin) produces the correct length.
Fix: Never use --body - for secret-setting. Pass the value as a positional argument:
gh secret set FOO --repo OWNER/REPO --body 'the-actual-value'If the value contains shell metacharacters or you don’t want it in shell history, use a temp file with restrictive perms:
umask 077; printf '%s' "$VALUE" > /tmp/secret.txtgh secret set FOO --repo OWNER/REPO < /tmp/secret.txtrm /tmp/secret.txtQuick diagnostic when a workflow secret looks wrong: add a step that prints ${#YOUR_SECRET} (length only, never the value). Compare against the expected length.
2. Composite-action inputs mangle secrets the same way
Symptom: Your composite action takes with: api-key: ${{ secrets.X }}, then sets env: API_KEY: ${{ inputs.api-key }}. The CLI inside the action reads process.env.API_KEY and sees length=1. Workflow side shows the secret got passed (api-key: *** in logs). The runner just mangles secret values when they flow through inputs.* → env: inside a composite step.
Root cause: Same general class of GitHub Actions secret-masking quirk as #1. Probably related to how the runner re-redacts a value that’s already in its mask dictionary.
Fix: Don’t pass secrets as composite-action inputs. Have the caller set them as job-level env, and have the action read them straight from process.env:
# In the consuming workflow:jobs: audit: env: THEACCESSIBLE_API_KEY: ${{ secrets.THEACCESSIBLE_API_KEY }} steps: - uses: our-org/our-action@v1 with: target: marketing-site # only non-secret inputs# In the composite action.yml — no api-key input:- name: Run audit shell: bash run: theaccessible audit "$TARGET" # reads THEACCESSIBLE_API_KEY from envJob-level env propagates correctly into composite sub-steps. We verified this with a debug step that printed ${#NODE_AUTH_TOKEN} — length=40 via job env, length=1 via composite input.
3. c.executionCtx throws on Hono’s Node adapter
Symptom: Route on Cloudflare Workers does c.executionCtx?.waitUntil(...) for fire-and-forget. Same code on the Node adapter (Hono behind a Cloudflare Tunnel to a Node server) returns 500 "This context has no ExecutionContext".
Root cause: c.executionCtx is a getter that throws when there’s no execution context (Node), not a property that returns undefined. Optional chaining (?.) doesn’t help — the throw is on access, before ?. can short-circuit.
Fix: Wrap the access in try/catch, fall back to a detached promise:
try { const ctx = c.executionCtx; if (ctx?.waitUntil) { ctx.waitUntil(asyncWork); } else { void Promise.resolve(asyncWork).catch(() => undefined); }} catch { void Promise.resolve(asyncWork).catch(() => undefined);}The synchronous .catch(() => undefined) is important — without it the detached promise can produce an unhandled rejection on Node.
4. Hono Node adapter doesn’t auto-map process.env to c.env
Symptom: You set MY_SECRET=foo in the container’s environment. Code reads c.env.MY_SECRET and sees undefined. Works fine on Cloudflare Workers.
Root cause: On CF Workers, c.env is the runtime’s bindings — populated automatically. On Node, c.env starts empty. Our server.ts has a buildSyntheticEnv() middleware that copies a hardcoded allowlist from process.env into c.env per request.
Fix: When you add a new env var that your routes read via c.env.NEW_VAR, add it to the allowlist in workers/api/src/server.ts:buildSyntheticEnv():
return { // ... existing ... NEW_VAR: process.env.NEW_VAR,};Also add it to the type in workers/api/src/types/env.ts. Forgetting this means: works in CF Workers tests, dies at runtime in prod (Lambda or Node-server).
5. Native ESM imports need .js extensions even when TypeScript doesn’t require them
Symptom: npm publish succeeds. npm install -g your-package succeeds. Running the CLI: ERR_MODULE_NOT_FOUND: ./artifact. Tests pass locally.
Root cause: Package is "type": "module". TypeScript’s moduleResolution: "bundler" allows relative imports without extensions. But Node’s native ESM resolver requires .js extensions. dist/cli.js emits import x from './artifact' — Node looks for that exact path, doesn’t find it, dies. Vitest uses its own resolver that tolerates missing extensions, so unit tests pass.
Fix: Use .js extensions in all relative imports in the source TS files:
// Wrong (works in TS, breaks at runtime):import { x } from './artifact';
// Right:import { x } from './artifact.js';This looks weird in TS source but is the documented Node ESM convention. Alternatively, configure moduleResolution: "nodenext" which enforces extensions at typecheck time.
6. GitHub Packages cross-owner installs require non-default tokens
Symptom: @org-a/package published to GitHub Packages. CI in user-b/repo does npm install -g @org-a/package with the default GITHUB_TOKEN env. Gets 403 “permission_denied: The requested installation does not exist”.
Root cause: GitHub Packages namespaces packages by owner. A repo’s GITHUB_TOKEN only has package read access for packages under that repo’s owner. Cross-owner installs need either:
- A PAT (classic,
read:packagesscope) stored as a repo secret - Or the package made public (but org-level setting may block public packages)
Fix: For internal-only packages, generate a classic PAT with read:packages, store as a repo secret, expose to the runner as job-level env:
env: NODE_AUTH_TOKEN: ${{ secrets.ORG_PACKAGES_TOKEN }}For broadly-distributed CLI tools, prefer public npm.org — zero auth on the consumer side. Publish as an unscoped name or under a registered scope. We did this for theaccessible-audit-ci after the GH Packages route turned into a multi-hour debug session.
7. actions/upload-artifact paths inside composite actions
Symptom: Your composite action’s run: step writes to ./.theaccessible/. The next step uses actions/upload-artifact@v4 with path: ./.theaccessible/. Get No files were found with the provided path.
Root cause: Working-directory semantics inside composite actions aren’t 100% consistent. The CLI step might run from $GITHUB_WORKSPACE, but the upload step might resolve relative paths from elsewhere depending on action context.
Fix: Always use ${{ github.workspace }} absolute paths inside composite actions. Surface the path as a step output so downstream steps don’t have to recompute it:
- name: Run tool id: tool working-directory: ${{ github.workspace }} run: | OUT_DIR="${{ github.workspace }}/.theaccessible/${{ inputs.target }}" mkdir -p "$OUT_DIR" mytool --output-dir "$OUT_DIR" echo "output-dir=$OUT_DIR" >> "$GITHUB_OUTPUT"
- uses: actions/upload-artifact@v4 with: name: report-${{ inputs.target }} path: ${{ steps.tool.outputs.output-dir }}/Per-target subdir also avoids the upload-artifact@v4 “artifact already exists” error when matrix runs share a workflow run id.
8. actions/upload-sarif requires Code Scanning to be enabled on the repo
Symptom: Action runs fine on your test repo. Customer adds it to theirs. Their workflow fails on the SARIF upload step with Code scanning is not enabled for this repository.
Root cause: SARIF uploads go to GitHub’s Code Scanning UI, which is opt-in per-repo (Settings → Security → Code security and analysis).
Fix: Mark the SARIF upload step continue-on-error: true. Capture the real audit exit code earlier (in a $GITHUB_OUTPUT line) and use that for the final job result. The SARIF upload becomes a best-effort nice-to-have:
- name: Upload SARIF continue-on-error: true if: hashFiles(format('{0}/report.sarif', steps.tool.outputs.output-dir)) != '' uses: github/codeql-action/upload-sarif@v3 with: sarif_file: ${{ steps.tool.outputs.output-dir }}/report.sarif9. Watching gh pr checks --watch returns before the run actually finishes
Symptom: gh pr checks 123 --watch exits 0. You check status — still pending.
Root cause: gh pr checks --watch waits for the set of statuses on the PR to stabilize, not for an individual run to terminate. If a new run has been queued (e.g. by a new commit you just pushed), the watcher catches the old run’s terminal state and exits.
Fix: Capture the exact run id and use gh run watch <id>:
RUN_ID=$(gh run list --branch BRANCH --workflow=NAME --limit 1 --json databaseId -q '.[0].databaseId')gh run watch "$RUN_ID"Or accept the false-positive and just re-query checks after the watcher returns.
10. SQS visibility-timeout retries make broken jobs look stuck
Symptom: Submit an audit job. Status stays running for 30 min. Resubmit identical, same result.
Root cause: Worker pulled the SQS message, started processing, then errored on the callback. Standard error path. SQS message goes back into the queue after the visibility timeout (default 30 min). Worker retries the same message, fails the same way, and so on until redrive policy moves it to the DLQ.
Fix when debugging:
aws sqs get-queue-attributesto check queue depth and approximate-receives- Read the batch-worker logs (we use Loki/Promtail on the api-node containers); look for the structured
event: unhandled_errorlines - Fix the underlying issue
- Don’t bother purging the in-flight message — the next retry after the visibility timeout will succeed once the bug is fixed. Or purge if you want immediate confirmation.
The fact that we kept seeing the same job_id retried was actually a useful signal — it confirmed the worker code path was reaching the failure and not silently dropping the message.
Time sink summary
The end-to-end smoke test from “PR opened” to ”✅ audit passes in CI” took longer than the entire core implementation because of the above. Most of the time went into:
- Diagnosing
gh secret set --body -mangling — manifested as token-mismatch / 401 / 403 errors that looked like auth-flow bugs. - The composite-action input mangling — same class of bug, different surface area.
- Two GH-Packages-vs-npm rounds before settling on public npm.
Lesson: when an end-to-end test fails with what looks like an auth error, add a ${#VAR} length-check first, before chasing token scopes or revocation states. The integrity check is cheap and rules out half the failure modes.