Security & Privacy

Overview

PG Atlas operates with a public-first security model: all data (dependency graphs, metrics, contributor statistics) is publicly accessible by design. Security measures focus on protecting system integrity, preventing abuse, and maintaining contributor privacy where applicable.

Current status: Operational with GitHub OIDC authentication for write endpoints, IP-based rate limiting for all endpoints, and privacy-preserving contributor data handling.

Core principles:

Public data by design — All ecosystem metrics and dependency graphs are publicly readable
Write authentication — SBOM ingestion authenticated via GitHub OIDC tokens
Contributor privacy — Git log email addresses are SHA-256 hashed before storage
Rate limiting — Per-IP throttling prevents API abuse
Input validation — SPDX 2.3 schema validation prevents malformed submissions
Auditability — All SBOM submissions logged with provenance metadata, and git log extracts are logged before we parse them

Authentication & Authorization

Read Access

All read endpoints (/contributors, /metadata,/projects, /repos) are public and require no authentication. This aligns with the transparency goals of the SCF Public Goods Maintenance Working Group.

Write Access (SBOM Ingestion)

SBOM submissions to /ingest/sbom are authenticated using GitHub OpenID Connect (OIDC) tokens:

Authentication flow:

The pg-atlas-sbom-action requests a short-lived OIDC token from GitHub Actions
The token is signed with GitHub’s private key (RS256) and includes claims: repository and actor (GitHub username)
PG Atlas validates the token using GitHub’s public JWKS (JSON Web Key Set)
Token validation checks:
- Signature matches GitHub’s public key
- Issuer is https://token.actions.githubusercontent.com
- Audience matches the configured API URL (prevents token reuse)
- Token has not expired

Error responses:

401 Unauthorized — Missing or malformed Authorization header
403 Forbidden — Invalid signature, expired token, or audience mismatch
422 Unprocessable Content — Malformed SPDX payload

Rate Limiting

The API implements in-memory request throttling via ApiRateLimitMiddleware using PyrateLimiter at the ASGI level:

Endpoint Category	Limit	Window
General endpoints	100 requests	1 minute per IP
`/health`	600 requests	1 minute per IP
`/ingest/sbom`	600 requests	1 minute per IP

Implementation details:

Requests bucketed by client IP address (resolved via X-Forwarded-For header)
Exceeded limits return 429 Too Many Requests with Retry-After header

Input Validation & Integrity

SBOM Validation

All SBOM submissions undergo strict validation:

Format validation — Must be valid SPDX 2.3 JSON
Schema validation — Structure verified against SPDX specification
Denial of Service protection — Content hash checked against previously processed submissions
Provenance tracking — Repository and actor claims from OIDC token stored in audit records

Failure handling:

Invalid SBOMs return 422 Unprocessable Content with detailed error messages
Failed submissions create audit records with status='failed' and error_detail for triage

SQL Injection Protection

SQLAlchemy’s parameterized queries provide automatic protection against SQL injection. All database interactions use SQLAlchemy ORM or Core expressions with bound parameters.

Privacy Measures

Contributor Data

Git log parsing extracts contributor statistics while preserving privacy:

Email hashing — Contributor email addresses are normalized and SHA-256 hashed before storage
Aggregated statistics — Only commit counts and date ranges are included in contributor details
Bot filtering — Automated accounts (e.g., [bot] suffix, CI/CD patterns) are filtered out during ingestion

User Tracking

The dashboard and API implement privacy-first analytics:

No cookies — Dashboard does not set tracking cookies
No session storage — API is stateless; no user sessions tracked
Local preferences only — Theme/display preferences stored in browser local storage (never synced to server)
No PII collection — No user registration, accounts, or personal data collected

Artifact Storage & Auditability

PG Atlas implements a content-addressed artifact storage strategy for durable audit trails. Raw submitted SBOMs and git log extracts are persisted to immutable storage with database audit records serving as the queryable index.

Storage architecture:

Filebase S3 — IPFS-backed object storage in production providing cryptographic content integrity
Content Identifier (CID) — Filebase returns an IPFS CID which becomes the durable reference stored in the database
SHA-256 content hash — Computed for all artifacts enabling idempotent processing and deduplication

Audit record pattern:

Every artifact submission creates a corresponding database row with:

Provenance metadata — OIDC claims (repository, actor) or other source identifiers
Content hash — SHA-256 digest of raw payload bytes
Artifact path — CID or storage location for retrieval
Processing status — pending, processed, or failed
Error details — Failure messages for triage when applicable
Timestamps — Submission time and processing completion time

Benefits:

Auditability — Complete provenance trail for every submission with immutable artifacts
Failure recovery — Failed submissions retain raw artifacts for manual triage and reprocessing
Idempotency — Content-addressed storage ensures identical payloads are stored once, enabling safe retry semantics
Decoupled lifecycle — Artifacts are independently retrievable via IPFS gateway, decoupling audit retention from database schema evolution
Queryable history — Database index allows filtering by source, status, and timestamp without artifact retrieval

This pattern applies uniformly to SBOM submissions and git log artifacts, with schema details documented in Storage.

Future Enhancements

Near-term security improvements under consideration:

Sentry integration — Automatic error grouping and security event tracking (free tier for OSS)
Enhanced alerting — Discord webhooks for failed authentication attempts and rate limit breaches
SBOM signature verification — Cryptographic signing of SBOM submissions beyond OIDC tokens
API key authentication — Optional API keys that provide elevated rate limits to power users

These enhancements will build on the current foundation while maintaining the public-first security model.