The Inheritance
The pattern that names a brain tumor in a six-year-old takes about thirty seconds to recognize. The papers describing it existed for decades.
When I was that six-year-old, my mother took me to twelve doctors in four months before someone connected the three observations the textbooks would have connected. Headaches, vomiting, balance: the facts were simple. The pattern that eventually named the tumor existed because of what other children’s lives had already taught medicine. The system that should have carried it to me did not.
Every field has a version of those four months. A failed Phase II trial that another team enrolls patients into a year later because nobody told them. A correction that travels by rumor to one lab and never reaches the three downstream programs that depended on the original claim. A graduate student who joins a project and spends six months rebuilding her advisor’s institutional memory because none of it was written down. A clinician who misses a drug interaction described in a 2008 case report because no decision-support tool ingests case reports. None of it is exotic; it is what happens when a field produces more work than it can keep track of.
AI is making hypotheses, code, protocols, reviews, and experiment plans cheaper to produce than science’s current institutions can absorb. A model now drafts a candidate experiment in seconds. The wet lab still needs weeks. OpenAI’s collaboration with Ginkgo Bioworks ran a closed loop on cell-free protein synthesis: model proposes, lab executes, six rounds across more than 36,000 reactions. Wet-lab capacity, not generation, was the rate-limiter. GPT-5 ran a closed-loop autonomous lab at Ginkgo Bioworks across six rounds and more than 36,000 cell-free protein synthesis reactions, cutting reagent cost by roughly 40% (OpenAI, Feb 2026).
When generation gets cheap, judgment becomes the bottleneck. A frontier is the current state of a scientific question: what is known, what failed, what changed, what depends on what, and what would move the field next.
The question stops being how to produce more candidates and becomes which candidates deserve scarce experiment time and human trust. That question can only compound if it is written down somewhere durable. Otherwise every lab re-adjudicates the same candidates in private, and every AI agent recompiles the same scattered prose into a context window that closes when the conversation ends.
Fig. 01. What survives the surface. Above the line is the work that travels. Below it is the work that stays in private memory.
Science already has the raw material. Findings, failures, methods, corrections, replications, and expert judgment are scattered across registries, repositories, supplementary materials, dashboards, slack threads, and private memory. What it lacks is the first layer beneath all of those: a writable substrate where a frontier can be updated, inherited, and used by the next researcher, clinician, or agent.
The problem is not only that science forgets. The problem is that the next generation of scientific agents will inherit whatever memory layer exists. If that layer is private, incomplete, or controlled by the wrong incentives, AI will not fix scientific transmission. It will industrialize its distortions: the same models retrieving fluently from the same dominant prose, contracting the field’s effective hypothesis space at speed.
The first proof is one writable frontier: a place where findings, evidence, corrections, and the events that move them can compound. I am building Vela as the reference implementation for that test.
The Pattern
Take amyloid.
For more than twenty years it dominated Alzheimer’s funding, attention, and late-stage trial investment. Plaques drive the disease, so clearing the plaques should stop the decline. Tau, inflammation, vascular, and lipid hypotheses never disappeared. They lost funding, trial slots, and senior-author attention to amyloid year after year.
Trial after trial failed. Each Phase III failure meant years of enrollment unwound and patients and families who had organized their lives around a hypothesis that ended at interim analysis. Wrong belief was only part of the cost. Most of it was patient time, trial capacity, funder attention, and expert labor organized around an outdated picture of the field.
Lecanemab and donanemab eventually arrived and showed modest benefit in early symptomatic disease. The field had not been calibrated to expect that. Most actors were either fully bought in or fully out, with little room for “this works at the margin in a narrow population.” Citations to the original amyloid claim kept compounding while trial failures sat in separate registries. Nothing linked the two so a downstream reader saw both at once. Hundreds of clinical trials targeted amyloid between 2003–2023; nearly all failed. The antibodies that eventually showed modest clinical benefit (lecanemab, donanemab) arrived decades into the program and remain debated against cost, safety, and translation constraints. See Cummings et al., Alzheimer’s & Dementia: TRCI 2023, for the 2023 pipeline; van Dyck et al., NEJM 2023, for lecanemab.
The evidence and the corrections existed in separate places, and nothing forced them to move together. Confidence traveled farther than correction because no shared record forced them to. The error-correcting tradition only works when the correction can find the claim it corrects; a field producing more work than its transmission layer can carry loses that property by default. The same pattern recurs wherever knowledge has to change action: retractions reach one lab but not the work that depends on them, null results stay unpublished, and agents read a field without writing anything back.
AI is making this worse, not better. AI tools let individual researchers publish more and earn more citations. The same tools surface the same dominant claims to everyone who asks. Productivity per researcher rises while the field’s portfolio of bets contracts toward whatever the models surface most fluently. In an analysis of 41.3 million natural-sciences papers (Nature 2026), AI-augmented researchers published roughly 3× more papers and received nearly 5× more citations, while the collective volume of topics studied contracted by 4.6% and scientist-to-scientist engagement fell by 22%.
The amyloid concentration took twenty years to produce. The next version may take months.
Fig. 02. The negative-results funnel. What the field remembers is not the same as what the field learned.
The same failure shows up everywhere: translation delay, unpublished nulls, unreported trials, retractions that don’t propagate, duplicated experiments that no one had a way to inherit. Balas & Boren, Yearbook of Medical Informatics 2000, on the often-cited 17-year translation gap; Morris et al., J. R. Soc. Med. 2011, note the estimate hides significant variation by field. Franco, Malhotra & Simonovits, Science 2014 — in a TESS social-science sample, roughly two-thirds of null-result studies were never written up. Goldacre et al., BMJ 2018: about half of due EUCTR trials had reported results.
Underneath all of those is the same missing thing: structured dependencies between scientific objects (claim to evidence, claim to contradicting trial, claim to retraction, claim to dependent claim). Whatever isn’t recorded as a dependency does not propagate when something changes.
Labs repeat failed paths because the failure remained local. Reviewers rebuild the same evidence map because the last map lived in someone’s spreadsheet. Funders allocate against stale confidence because correction did not travel. Anyone who has joined a mature project knows the feeling: before you can contribute, you first have to rebuild the missing map.
The Substrate
Software offers the nearest contrast. Git gave code a memory. GitHub and package ecosystems made work networked through commits, issues, reviews, checks, and releases. AI writes code at scale because this infrastructure exists, so it can inherit, test, merge, and distribute. By 2026 the majority of new code at frontier AI labs is machine-authored, and the substrate underneath holds because it was already content-addressed and machine-readable. Public statements from Anthropic and Google in 2025 disclosed that AI now authors the majority of new code in their own organizations (Google ~75%, Anthropic 70-90% company-wide). Cursor crossed $1B ARR by November 2025 and $2B by February 2026, with users accepting nearly a billion lines of agent-generated code per day. Pull requests, issues, and CI checks are objects an agent can read and write. Science has no equivalent. An agent that synthesizes the literature today produces a fluent summary that no other agent inherits, no record carries forward, and no reviewer can read as a delta. The next agent starts the same job from scratch.
The AI industry is already negotiating shared substrates for itself. Tool access, coordination across agents, and state that survives a session are becoming infrastructure, but none of them is a state layer for science specifically.
Scientific work is moving into the same regime. A model can read more literature than any human and reason across chemistry, biology, and physics in the same conversation. What it cannot do is write into a record that compounds. A writable substrate gives AI a place to deposit what it produces; the frontier is the object a scientist reads and writes.
Human experts gain time in the same loop. A reviewer who used to spend three weeks reconstructing a contested cluster (reading two hundred papers, tracking which ones had been retracted, emailing authors for clarifications, building a private spreadsheet of which caveats still mattered) spends three days when the cluster is already an addressable graph with provenance and confidence histories. The substrate amplifies expertise without replacing the judgment about what to trust, who to trust, and what to test next.
The four months at the start were three weeks on a child’s clock.
AlphaFold’s database holds predicted structures for over 200 million proteins, drawn from the same shared archive structural biologists had been depositing into for decades. Jumper et al., Nature 2021; the AlphaFold Protein Structure Database holds over 200 million predicted structures. Hassabis and Jumper shared the 2024 Nobel Prize in Chemistry. By making experimentally determined structures shared, curated, and machine-readable, the PDB gave the model structured state to learn from. The Protein Data Bank, established in 1971, holds over 250,000 experimentally determined structures. Deposition became a condition of publication in major journals by the 1990s; AlphaFold’s training data was drawn directly from this shared resource. Most fields don’t have unique experimental ground truth the way protein structures do, so their version of the PDB cannot be a registry of settled facts. It has to be a registry of state (claims, evidence, contradictions, confidence) where corrections propagate even when the underlying answer is contested. The realistic adoption pattern looks closer to ClinicalTrials.gov than to the PDB: mandatory deposition with imperfect enforcement, where the value comes from the population of compliant deposits even as individual outliers fail. The substrate has to be canonical enough that downstream decisions reference it by default, not perfect. Every other field is now trying to work out what its version of that registry should look like.
Fig. 03. Structured state before intelligence. AlphaFold did not emerge from papers alone; it learned from a shared, curated scientific record.
Multiple agentic systems are now producing hypotheses, experiment designs, and contradiction checks at scale, and their outputs end up in private logs and transient context windows. Biohub’s Virtual Biology Initiative, Arc’s Virtual Cell Initiative, Google Research’s AI co-scientist, and FutureHouse’s agent platform are early examples; SoTA Letters, “AI Scientists Need a Social Network”, names the substrate gap directly. They make the gap more visible without filling it.
To a model reading prose alone, a retracted finding and a Nobel-winning one can look the same. A tentative claim travels through citations without carrying the uncertainty forward until it looks established. AI makes this faster because the model has no shared place to record the difference.
Fig. 04. Applications over shared state. New scientific applications become possible when they can read from and propose against the same record.
Whoever owns this layer decides what the next generation of agents inherits as fact, and that choice is being made now in the window where the layer’s structure is still negotiable. Tim Wu, The Master Switch: The Rise and Fall of Information Empires (Knopf, 2010), traces the same arc through telephone, radio, film, and broadcast: every information industry has a window in which its eventual structure is still negotiable, and a Cycle in which that window closes under a vertically integrated incumbent. Every prior information substrate (telephone switching, broadcast spectrum, film distribution, cable carriage) entered the same window open and exited it consolidated under whichever firm had fused infrastructure, content, and access. The likelier outcome here is not that no substrate appears, but that several do, each owned by the actor closest to distribution: the publisher, the AI lab, the pharma knowledge platform, the workflow vendor. Each will call itself infrastructure and make local work easier. The correction graph will not cross institutional boundaries unless the event layer is open, and once a private layer is established the switching costs are mechanical: agents are trained on its schema, tools are integrated against its APIs, depositing institutions cannot legally export their own state to a competitor.
The PDB worked because the layer stayed open and deposition became a condition of publication. If scientific state defaults to closed silos with AI wrappers on top, every model will require those wrappers to function, and every correction will require commercial cooperation to propagate.
The strongest counterargument is that the existing stack is enough. Trevor Bedford first imagined forkable papers and scientific pull requests; from inside outbreak genomics, he later argued that GitHub for code, figshare for data, virological.org for discussion, and blogs for review could carry much of the work. Trevor Bedford, “Some thoughts on a GitHub of Science”; Bedford, “On scientific publishing practices in the face of public health crises”.
For domains with disciplined practitioners and computational workflows, that stack is real. But experiments need grants, instruments, samples, patients, regulatory paths, and credit structures no repository produces. The remaining gap is the layer those surfaces leave implicit: changes to findings themselves. Existing tools move artifacts; they do not maintain the live state of the claims those artifacts contain.
Some of what licenses a scientific claim is irreducibly tacit. The trust an experienced reviewer places in a particular postdoc’s hands. The conditions of the lab notebook that don’t appear in the methods section. The way a particular instrument was calibrated last week. The substrate carries scope, evidence, dependency, and confidence; it does not carry social trust, and it cannot adjudicate which deposits are load-bearing until something downstream breaks. Begley and Ellis’s audit of fifty-three landmark preclinical cancer studies found methods sections complete, reagents named, conditions specified, and replications still failing in roughly nine out of ten cases. Begley & Ellis, Nature 2012: 47 of 53 landmark preclinical cancer studies failed to reproduce on independent replication. The variable that mattered was usually one no one knew to record at deposit time. What the substrate captures is the part of a finding that recurs across labs; what stays tacit is the part that does not reliably travel. What the substrate does is make the explicit part inheritable, so the tacit part can be applied to claims at the level where it actually changes decisions, and so the load-bearing variable becomes legible the next time it fails.
The diagnosis is older than this essay. Bush’s “As We May Think” wanted associative trails through the literature; Engelbart wanted augmented intellect with shared external memory; Licklider wanted libraries of the future with machine-actionable state. Bush, “As We May Think,” Atlantic 1945; Engelbart, “Augmenting Human Intellect” 1962; Licklider, Libraries of the Future 1965. The “GitHub for science” attempts that followed read papers and conversations, not state transitions.
The primitive is not a paper, post, artifact, or discussion. It is a state transition: this evidence changed this finding, under this scope, with this confidence, reviewed by these actors, affecting these dependencies. Accepted transitions change the frontier. That difference is small to describe and large in what it allows: dependency-tracked corrections, agent-readable deltas, a substrate that distinguishes a retracted claim from a Nobel-winning one without reading prose.
Vela starts with a small kernel of scientific primitives: findings, evidence, events, proposals, attestations, diffs, lineage, questions, protocols, experiments. Those primitives are versioned into living frontiers. Workbenches, capability graphs, agent runtimes, and labs can all sit above that layer, but the ordering matters: without state, runtime is just activity and network is just distribution.
Prior attempts mostly tried to make the artifact better. The substrate lets the state of the claim move, which is what lets criticism run on the layer instead of around it. David Deutsch, The Beginning of Infinity (Viking, 2011), frames knowledge growth as conditional on error-correction: “without error-correction all information processing, and hence all knowledge-creation, is necessarily bounded.” Provenance, confidence, and contradiction edges are what a downstream agent actually needs to read.
Take a single finding to make this concrete. Pericyte loss precedes cognitive decline in early Alzheimer’s. In the substrate, that sentence is an addressable object scoped to human sporadic AD, prodromal stage, with an evidence list pointing to Halliday and colleagues’ postmortem work and Nation et al.’s longitudinal MRI study. Halliday et al., J Cereb Blood Flow Metab 2016; Nation et al., Nature Medicine 2019. When Montagne et al.’s 2020 paper landed showing the BBB breakdown was concentrated in APOE4 carriers, a Correction event attaches: scope narrows from sporadic AD to APOE4-positive sporadic AD, confidence on the broader claim revises downward. Montagne et al., Nature 2020. Three downstream objects (a target hypothesis at one biotech, an inclusion criterion in a Phase II trial, a review article’s headline claim) get notified by the dependency graph. The substrate accelerates the evidence handoff, not the institutional decision: the trial PI, DSMB, and sponsor still negotiate the amendment, but they negotiate against current evidence within a trial cycle rather than across one. None of this currently happens at that speed.
A correction can only travel if the finding it corrects has an address.
A frontier is a question with memory: the current state of what is known, contested, weakened, unresolved, and worth testing next.
The Constellation
The first AI-science layers to appear are social and operational, not state: agents can post hypotheses, debate papers, chain tools, and preserve artifact lineage. Science Beach, Agent4Science, and ScienceClaw are early examples of social and runtime layers arriving before a shared state layer.
Those systems matter because they show that AI scientists will generate public activity. But posts are not state. Votes are not trust. Artifact DAGs are not belief change. Discourse relations are not dependency propagation.
Vela exists to turn activity into state: a hypothesis becomes a finding or question, an artifact becomes evidence, and either can enter a proposal that becomes a reviewed event changing a frontier. The important object is not the social post or the artifact, but the accepted diff to the frontier.
A constellation is how a frontier becomes navigable: findings connected by evidence, contradiction, correction, and dependency. A single finding sits alone until something connects it to others. What a working scientist navigates is the pattern: which findings depend on which evidence, which corrections have moved through which downstream claims, which questions still have no answer anyone trusts. The substrate makes the lines addressable.
Each node carries confidence, supporting and contradicting evidence, and outbound dependency edges. The questions a working scientist asks become tractable: Which findings can I trust right now? Which depend on each other? Which experiment would change confidence the most?
Today, those questions are answered by reconstruction. A clinician, researcher, or agent reads, follows citations, searches for reviews, chases down retractions, emails authors, guesses which caveats still matter. The reconstructed map lives in their head and disappears when they leave the project. In the substrate, that map is an artifact: one researcher’s reconstruction becomes the next researcher’s starting point, signed and inspectable.
When a result fails to replicate or a paper is retracted today, the update moves by rumor, by review article, by the luck of who happened to notice. In the substrate, a correction is an event inside the record, and every claim that depends on the original is updated automatically.
In 2014, two papers in Nature claimed a simple method for creating stem cells, and labs worldwide tried to replicate. A researcher in Hong Kong documented his attempt publicly, identified the flaw within weeks, and submitted his findings to Nature. The journal that published the flawed papers rejected the correction. For six more months, labs worldwide spent resources on an approach that had already been challenged in public. The correction existed; the publication system could not merge it. Obokata et al., Nature 2014, retracted July 2014. Ken Lee’s replication attempt was documented publicly on ResearchGate from February 2014; see Cyranoski, “Failed replications put STAP stem-cell claims to rest,” Nature News, January 2015.
An agent-generated correction in the same regime would face the same failure, only proposed faster. A writable record where any agent’s contribution merges automatically would amplify mistakes faster than corrections could catch up. Agents should be able to draft findings, attach evidence, and propose updates, but not silently become the canonical record of a field. The state of a finding (proposed, reviewed, replicated, contradicted, deprecated, retracted) is information downstream readers need.
The recursive risk is the same problem one generation up: agents that read the substrate today will write to it tomorrow, and tomorrow’s agents will inherit what was written. An agent-authored deposit that becomes canonical without human review becomes load-bearing on the next generation of agents that read against the substrate. Canonical status is gated by maintainer review; agent-drafted content sits in a draft state that is readable for transparency but excluded from the canonical layer.
Two groups can report opposite associations for the same gene variant in different populations, and the difference could reflect biology, sample size, analytic choices, or something no one has identified yet. PubPeer surfaces image problems, statistical errors, and failed replications faster than journals do; see Journalist’s Resource, “5 tips for using PubPeer”. Retraction Watch, “Stealth corrections”, documents journals quietly changing papers without transparent correction trails. A scientific contradiction is sometimes the answer, and the system has to represent “two replicated results disagree under conditions X and Y” as a state of its own. The substrate records disagreement; it does not resolve it.
Authority does not disappear; it relocates from narrative publication into visible merge decisions. Anyone can propose a finding, an agent can draft a correction, a lab can deposit a failed experiment, but canonical state requires maintainers with domain legitimacy.
Who maintains is not a placeholder problem. The protocol’s job is structural separation: the entity that holds the schema and signing keys must be structurally separate from the entity that runs the dominant client, sells the AI agents that read against the layer, or owns the corpus that trained them. Wu’s separations principle (Master Switch ch. 18) argues that those who create information, those who carry it, and those who control access to it must be held by structurally distinct actors. Crossref’s two-decade governance is the closest live instantiation of that rule in scientific infrastructure. A foundation modeled on Crossref carries that separation in the way the closest live precedent for shared scientific-state infrastructure already does: Crossref has held DOI, citation, and retraction infrastructure across competing publishers for two decades, with a non-profit owning the spec, multiple independent implementations, no single entity able to take the layer private, and elected technical and steward seats with term limits. Crossref has operated the DOI registry and citation infrastructure across competing scholarly publishers since 2000. The combination of non-profit governance and open infrastructure is the closest live analogue to what the substrate would need.
Per-frontier maintainers are domain consortia (a foundation-led BBB-Alzheimer working group, an FRO operating an early consortium, a lab network in materials science), each running an instance of the merge process under the protocol’s authentication and provenance rules. The Hugging Face capture path (open community, then a venture-backed company controlling the canonical hub) is exactly what a substrate-style protocol has to be structurally incapable of, and the foundation entity is what closes that path. Maintainer turnover, reputation revocation, and dispute escalation are part of the spec, not afterthoughts.
When maintainers themselves disagree over a contested Correction (where two scientifically defensible camps reach different conclusions on the same finding), the protocol allows plural canonical views to coexist on that finding, with the dependency graph recording the dispute as a state of its own. Convergence is not assumed; legibility is. Premature consensus is what closes the criticism layer, and a substrate that forced one canonical view would reproduce the failure mode it was built to prevent.
Open-by-default is the right rule for the overwhelming majority of scientific findings, and wrong for a small, defined set. For gain-of-function results, model-generated protein designs in dual-use space, and certain synthesis routes for controlled compounds, readability itself is part of the harm. The substrate carries these through a governed channel: deposition is mandatory, read access is permissioned, audited, and time-delayed, with dual-use review embedded in the merge process rather than asked of the depositor in good faith. Classification runs through existing institutional review machinery (IBCs and the federal DURC framework, extended to model-generated artifacts), with thresholds aligned to the capability gates frontier labs already publish under their own safety frameworks. Content above those thresholds is excluded from public deposit entirely, not gated; the openness default fails closed on ambiguous cases. The composition risk (capability uplift from aggregation across the dependency graph rather than any single deposit) is the harder problem and the part the foundation does not yet claim to have solved.
Contact With Reality
The substrate has to ingest the output of the lab: protocol, measured outcome, calibration, uncertainty, and updates to the affected findings.
RECOVERY is the practical proof. When COVID hit, many trials were fragmented hospital by hospital, protocol by protocol. The UK did something simpler and more durable: one protocol, one ethics path, lightweight enrollment, integration with existing records, a system any hospital could join.
The first patient was enrolled within days. One in six hospitalized COVID patients in the UK entered the trial. Within 100 days, RECOVERY produced a result that changed care worldwide. Dexamethasone, a cheap generic steroid, reduced mortality in ventilated patients by about a third and has since saved hundreds of thousands of lives. Dexamethasone reduced 28-day mortality by roughly one-third in ventilated patients across 176 UK hospitals (RECOVERY Collaborative Group, NEJM 2021). The broader RECOVERY platform has since enrolled over 47,000 participants. See also Wellcome, “The Story of RECOVERY” (2021).
Execution structure mattered as much as intelligence: one protocol, one ethics path, an existing EHR backbone, and emergency regulatory cover. The substrate inherits none of those preconditions and has to substitute by other means: signed-finding portability across institutions, a foundation-held spec that no single funder controls, and conditional grants that align incentives the way emergency authority once did. The mechanism RECOVERY proved is that protocol consolidation produces speed; the substrate’s bet is that a shared schema can produce analogous consolidation across many institutions without an NHS to anchor it.
Fig. 05. Execution as infrastructure. RECOVERY worked because participation, measurement, and learning converged into one shared protocol.
The loop matters: models propose against the current record, labs test what would reduce uncertainty, failures return to the record, and human attention moves to the decisions only humans can make. Amodei’s “Machines of Loving Grace” argues that AI could compress decades of biological progress into years. McCarty’s “Levers for Biological Progress” is the useful constraint: experiment speed, cost, measurement, regulation, protocols, and human collaboration remain bottlenecks even if intelligence becomes abundant.
The Haverford lab showed this at small scale. Failed experiments contain real knowledge, and when that knowledge enters a shared system, it changes what the next researcher can do. Raccuglia et al., Nature 2016. A machine-learning model trained on years of failed vanadium selenite syntheses predicted reaction outcomes with 89% accuracy, vs. 78% for experienced chemists. Years of private waste became training signal because heterogeneous notebook failures were structured enough to learn from.
When a synthesis fails today, the failure stays local. In the substrate, it enters the record directly: this compound, these conditions, this measured outcome, this uncertainty. The next chemist designing a similar synthesis sees the dead end before she runs the experiment.
Fig. 06. The learning loop. Maps guide experiments; experiments update maps.
The primitive expands from the finding to the state transition: the prior belief, the evidence that arrived, the change, and the downstream claims affected. Results enter the record before they are polished into narrative, with protocols machine-operable and measurements attached to calibration, uncertainty, and lineage.
The IDs, schemas, signature format, and event log have to be open. Paper silos became digital silos when healthcare digitized records without interoperability, and the same chokepoint forms at field scale when scientific state runs through a single vendor.
The state is local-first by design. Hubs mirror the state; they do not own it. The local-first principle: data and identity live with the user; cloud services are convenient mirrors, not the source of truth. See Ink & Switch, “Local-first software” (2019).
The first proof is one writable frontier, not a universal ontology: one place where a correction travels farther than it would have in the paper system, with one artifact-to-state pipeline, one proposal review workflow, one accepted event, one diff, one dependency update.
Alzheimer’s neurovascular and blood-brain-barrier dysfunction is the right first proof on the clinical-translation side because mechanism-level findings travel where phenomenology does not: a BBB pericyte loss claim scoped to APOE4 carriers carries the same evidential weight in Toronto and Tokyo, and the substrate’s primitives have to survive that transport before they are tested on softer kinds of evidence. It is also a domain with public failed trials, active commercial interest, funders who need better allocation, clinicians who need live confidence, and researchers who already know the field is too large to hold in one head, where the existing literature is contradictory enough that decisions can no longer be made on prose alone.
A parallel proof on the experimental-biology side runs through perturbation-response data: the cell-line, modality, and effect-size scope that frontier biology institutes already produce at scale, and that today accumulates as private CSV files with drifting cell-type labels and analyst-specific QC thresholds. The two corridors test different parts of the substrate’s surface: the clinical proof exercises evidence aggregation across heterogeneous study types; the perturbation proof exercises scope-rich state transitions where the same gene perturbation gives different signatures across cell contexts.
In that frontier, a researcher opens the question rather than a paper. She sees human evidence, animal-model claims that failed to translate, interventions that moved biomarkers without changing cognition, and the experiment most likely to separate vascular damage from downstream inflammation. She clicks a failed intervention and sees the protocol, model, dose, endpoint, measurement window, and reason confidence fell. She begins from the current state of the question. The substrate’s primitives (Finding, Evidence, Correction, dependency) are domain-agnostic, so a working frontier is evidence that the primitives carry; if the frontier fails, the obstacles to scaling become visible at the level of which primitives don’t fit which kinds of science.
In April 2026, the OpenAI Foundation announced a major Alzheimer’s initiative built around causal maps, AI-designed drugs, open datasets, biomarkers, and off-patent intervention trials. Arc is treating Alzheimer’s as a blueprint for complex disease. OpenAI Foundation, “AI for Alzheimer’s”, April 8, 2026. The initiative announced more than $100 million in planned grants across six research institutions and described a five-layer stack: causal mapping, AI-assisted drug design, open datasets, biomarkers, and off-patent intervention testing. See also Arc’s Alzheimer’s Disease Initiative. Multiple teams can produce discoveries in parallel and still leave no shared map unless failed attempts, partial replications, biomarkers, target hypotheses, and changing confidence enter one record. Without that record, a $100M Alzheimer’s initiative replicates the amyloid concentration at speed.
The Human Genome Project gave biology a reference. The Human Genome Project, launched in 1990 and completed in 2003, produced the first sequence of the human genome and became a reference for modern biology. The next public work of comparable scope is harder because the object is the changing state of scientific belief itself, not a fixed reference. The institutional form that fits is the focused research organization: a time-bound team building the first corridor as public-good infrastructure, sunsetting into the foundation that holds the spec. The substrate is the kind of thing FROs were invented for, and the architecture is recursive: the corridor team is an FRO, the foundation that holds the protocol is perpetual, and standing maintainer consortia inherit the corridor’s frontier when the FRO sunsets. Focused research organizations were proposed for bottlenecks that fall between academia, startups, industry, and government: time-bound, milestone-driven teams building public-good science infrastructure that sunset into standing institutions. See Marblestone et al., Nature 2022, and field notes in Issues in Science and Technology.
The Crossing
If every field could close the RECOVERY loop (propose, test, return, update), a question would have a current state rather than an accumulated bibliography. A failed Phase II would surface in every preclinical pipeline that depended on the target hypothesis within hours, not years, with the affected scope marked explicitly. A failed experiment would enter the shared record before the result is written up, so the next team starts from the current field instead of rebuilding a private map.
At that scale, funders can see neglected bottlenecks across many fields at once, regulators can trace a claim through evidence, corrections, and dependencies, and models can propose discriminating tests against the current field instead of a private scrape of papers.
That requires coalition work. Make frontier state a public object that institutions can trust, extend, mirror, and build on. Public institutes, startups, foundations, funders, and AI labs should be able to coordinate against the same frontier without surrendering every workflow to a single platform. The objects that travel between them are signed findings, failed experiments, confidence updates, and review events.
Incentives drive adoption. The incentive is access to a capability that does not currently exist, not a marginal improvement to an existing one. Focused research organizations are time-bound by design, so their findings, methods, and accumulated state are supposed to outlive them as public-good infrastructure. Today most of that legacy gets archived as final reports and frozen datasets that no future researcher’s agent reads as live state. With the substrate, an FRO’s work continues compounding after the org sunsets: its experiments, corrections, and dependency graphs are still queryable, still updating as downstream evidence arrives, still answerable to the next researcher’s question. The same logic gives frontier biology institutes a shared graph instead of a thousand private CSV files, lets AI-for-science labs compound across sessions and labs, and gives patient-led foundations a live frontier instead of a stack of post-mortem reviews.
Some incentives run the other way. Closed AI-bio platforms (Recursion’s Phenom, Insitro, the pharma knowledge graphs) monetize private state and have a structural reason to withhold it. The substrate does not become canonical because every actor wakes up converted. It becomes canonical because enough actors with public legitimacy coordinate to make withholding state more costly than depositing it. Patient-led foundations, new science funders, FRO incubators, and regulators each have a lever: grant conditions, frontier teams, and provenance requirements. The substrate does not have to win every actor’s incentive analysis. It has to win this coalition’s.
Open infrastructure widens who can contribute. arXiv let mathematicians outside elite departments compete on the work; GitHub let outsiders ship code to projects used by Fortune 500 companies; Hugging Face let independent ML researchers ship models that production systems depend on. A scientific substrate would do the same for clinicians at non-research hospitals, researchers in fields without good shared archives, and students. Science is one of the few cumulative human activities still routed through gatekeepers chosen a century ago.
Credit reform comes after the substrate, not before it. Search committees, tenure committees, and study sections can count signed contributions only once signed contributions exist as inspectable objects.
With that layer, agents have a clearer role. They draft candidate updates, search for contradictions, propose discriminating experiments, audit provenance, and carry corrections across dependencies. Human experts spend less time rebuilding maps and more time deciding what should be trusted, tested, left unresolved, or admitted into the shared record.
The pattern that doesn’t reach in time is no longer the hardest case. Retrieval can surface what a textbook would have connected. The harder failures are the ones retrieval cannot solve: the local correction, the failed trial that never reaches the next protocol, the agent whose synthesis disappears at the end of the session. The test is whether the light arrived: confidence that compounds across labs, corrections that travel through dependencies, a question whose current state can be read and written in one place.
Human knowledge is never contained in one person. It grows from the relationships we create between each other and the world, and still it is never complete.
The first moves are justified by the new layer they create, not by marginal improvements to existing workflows. The institutions below are the first to operate inside that layer.
If you fund science (at a patient-led foundation, or a new science funder like Astera, Speculative Technologies, or Renaissance Philanthropy), conditional grant-making is the lever. Pick the bounded question your field cannot afford to keep relearning, the one that haunts every literature review, and condition the grant on deposition into a public record. You are not buying more trials; you are buying the first frontier where the question can be navigated by anyone, including the agents that will inherit it. The funder that conditions first sets the precedent every other funder has to match.
If you run a research institute or an FRO (Arc, FutureHouse, Astera-incubated programs, Convergent Research and its spin-up cohort), the work your team produces is supposed to compound across labs and outlive any single org. Today it accumulates in private CSVs and end-of-org archives. Writing it to the substrate makes your institute the first host of a frontier where the next decade of agents has somewhere to read state from. For an FRO the urgency is sharper: deposition is the deliverable, not the report, and the corridor becomes inheritable infrastructure only if its findings, corrections, and dependency graphs sit in a layer the next FRO can read.
If you run an AI lab building scientific agents, the public log is the contribution that compounds. The deposit unit is model-initiated synthesis against public corpora, produced by the same internal frontier-research arms already running bio uplift evaluations and red-team work. API-surface customer outputs are not the channel; enterprise contracts already prohibit deriving public artifacts from private sessions. The first lab to deposit at this layer gets to shape what readable means for every agent that reads against it next.
If you build for science, the layer is the thing to build above. Every dashboard, triage tool, and discriminating-experiment recommender becomes a client of the same record.
The literature is what science has said. The frontier is what science can act on now.
A six-year-old comes in with headaches, vomiting, and balance trouble. The frontier has changed since the last textbook was written, and the pattern reaches her in thirty seconds.
for M.