Security & Privacy
Overview
PG Atlas operates with a public-first security model: all data (dependency graphs, metrics, contributor statistics) is publicly accessible by design. Security measures focus on protecting system integrity, preventing abuse, and maintaining contributor privacy where applicable.
Current status: Operational with GitHub OIDC authentication for write endpoints, IP-based rate limiting for all endpoints, and privacy-preserving contributor data handling.
Core principles:
- Public data by design — All ecosystem metrics and dependency graphs are publicly readable
- Write authentication — SBOM ingestion authenticated via GitHub OIDC tokens
- Contributor privacy — Git log email addresses are SHA-256 hashed before storage
- Rate limiting — Per-IP throttling prevents API abuse
- Input validation — SPDX 2.3 schema validation prevents malformed submissions
- Auditability — All SBOM submissions logged with provenance metadata, and git log extracts are logged before we parse them
Authentication & Authorization
Read Access
All read endpoints (/contributors, /metadata,/projects, /repos) are public and require no authentication. This aligns with the transparency goals of the SCF Public Goods Maintenance Working Group.
Write Access (SBOM Ingestion)
SBOM submissions to /ingest/sbom are authenticated using GitHub OpenID Connect (OIDC) tokens:
Authentication flow:
- The pg-atlas-sbom-action requests a short-lived OIDC token from GitHub Actions
- The token is signed with GitHub’s private key (RS256) and includes claims:
repositoryandactor(GitHub username) - PG Atlas validates the token using GitHub’s public JWKS (JSON Web Key Set)
- Token validation checks:
- Signature matches GitHub’s public key
- Issuer is
https://token.actions.githubusercontent.com - Audience matches the configured API URL (prevents token reuse)
- Token has not expired
Error responses:
401 Unauthorized— Missing or malformedAuthorizationheader403 Forbidden— Invalid signature, expired token, or audience mismatch422 Unprocessable Content— Malformed SPDX payload
Rate Limiting
The API implements in-memory request throttling via ApiRateLimitMiddleware using PyrateLimiter at the ASGI level:
| Endpoint Category | Limit | Window |
|---|---|---|
| General endpoints | 100 requests | 1 minute per IP |
/health | 600 requests | 1 minute per IP |
/ingest/sbom | 600 requests | 1 minute per IP |
Implementation details:
- Requests bucketed by client IP address (resolved via
X-Forwarded-Forheader) - Exceeded limits return
429 Too Many RequestswithRetry-Afterheader
Input Validation & Integrity
SBOM Validation
All SBOM submissions undergo strict validation:
- Format validation — Must be valid SPDX 2.3 JSON
- Schema validation — Structure verified against SPDX specification
- Denial of Service protection — Content hash checked against previously processed submissions
- Provenance tracking — Repository and actor claims from OIDC token stored in audit records
Failure handling:
- Invalid SBOMs return
422 Unprocessable Contentwith detailed error messages - Failed submissions create audit records with
status='failed'anderror_detailfor triage
SQL Injection Protection
SQLAlchemy’s parameterized queries provide automatic protection against SQL injection. All database interactions use SQLAlchemy ORM or Core expressions with bound parameters.
Privacy Measures
Contributor Data
Git log parsing extracts contributor statistics while preserving privacy:
- Email hashing — Contributor email addresses are normalized and SHA-256 hashed before storage
- Aggregated statistics — Only commit counts and date ranges are included in contributor details
- Bot filtering — Automated accounts (e.g.,
[bot]suffix, CI/CD patterns) are filtered out during ingestion
User Tracking
The dashboard and API implement privacy-first analytics:
- No cookies — Dashboard does not set tracking cookies
- No session storage — API is stateless; no user sessions tracked
- Local preferences only — Theme/display preferences stored in browser local storage (never synced to server)
- No PII collection — No user registration, accounts, or personal data collected
Artifact Storage & Auditability
PG Atlas implements a content-addressed artifact storage strategy for durable audit trails. Raw submitted SBOMs and git log extracts are persisted to immutable storage with database audit records serving as the queryable index.
Storage architecture:
- Filebase S3 — IPFS-backed object storage in production providing cryptographic content integrity
- Content Identifier (CID) — Filebase returns an IPFS CID which becomes the durable reference stored in the database
- SHA-256 content hash — Computed for all artifacts enabling idempotent processing and deduplication
Audit record pattern:
Every artifact submission creates a corresponding database row with:
- Provenance metadata — OIDC claims (repository, actor) or other source identifiers
- Content hash — SHA-256 digest of raw payload bytes
- Artifact path — CID or storage location for retrieval
- Processing status —
pending,processed, orfailed - Error details — Failure messages for triage when applicable
- Timestamps — Submission time and processing completion time
Benefits:
- Auditability — Complete provenance trail for every submission with immutable artifacts
- Failure recovery — Failed submissions retain raw artifacts for manual triage and reprocessing
- Idempotency — Content-addressed storage ensures identical payloads are stored once, enabling safe retry semantics
- Decoupled lifecycle — Artifacts are independently retrievable via IPFS gateway, decoupling audit retention from database schema evolution
- Queryable history — Database index allows filtering by source, status, and timestamp without artifact retrieval
This pattern applies uniformly to SBOM submissions and git log artifacts, with schema details documented in Storage.
Future Enhancements
Near-term security improvements under consideration:
- Sentry integration — Automatic error grouping and security event tracking (free tier for OSS)
- Enhanced alerting — Discord webhooks for failed authentication attempts and rate limit breaches
- SBOM signature verification — Cryptographic signing of SBOM submissions beyond OIDC tokens
- API key authentication — Optional API keys that provide elevated rate limits to power users
These enhancements will build on the current foundation while maintaining the public-first security model.