Public data policy

Public data policy

Every project on Glimmer must expose a required public, queryable summary, even when its raw data is members-only. This is what keeps the platform legible: anyone (and the public site agent) can learn what a study is and how it's built, while detailed data stays gated.

The contract is enforced by a validator (_services/tests/test_projects_schema.py) that runs in the deploy gate and in CI. A project missing any required field fails the build — so the docs, the site landings, and the agent never drift from the manifest.

Required public fields

Per project in site/explore/projects.json:

FieldMeaning
id, name, type, visibilityidentity
tagline, descone-line + full description
owner, contactwho runs it and how to reach them
license, accesswhat's open vs gated, and how to request access
computecurrent compute footprint (and that more can be requested)
links[]external links (papers, repos, viewers)
references[]public reference/paper nodes + linked datasets — each { title, kind: dataset | paper, url | doi, relation }
summary{ node_count, node_types, dataset_size, updated }the at-a-glance numbers

references is how studies link to other datasets: reference and paper nodes are public, so the cross-study research graph is queryable even when the underlying data isn't.

What stays gated

Raw graph data (graph.json), the papers index (papers.json), PDFs, and volume data live under /explore/<study>/… and require a members session (server-enforced via nginx auth_request). The public summary never includes gated content, and the agents are grounded only in these public fields.