Ground Truth

Data quality

What we know about the limits of this data

Ground Truth is built on USAspending.gov. The data is real, and it is imperfect. Below is what oversight bodies have publicly said about its limits, and how we mitigate.

What GAO has said

The Government Accountability Office has reported that approximately 70% of USAspending.gov information is inconsistent with program-level reporting, and that half of agencies with inconsistent data are not reporting at all. Reference: GAO blog post on USAspending.gov data quality.

What POGO has said

The Project On Government Oversight publishes “Ten Questions USAspending Can't Answer”, including limitations around sub-award visibility, contractor ownership obfuscation, and inconsistent set-aside reporting. We answer some of those questions (cost outlier patterns, contractor concentration, IDIQ task-order pricing); we do not answer others (sub-award accounting, bid rigging detection, cross-program portfolio analysis).

How we mitigate

  • Frozen evidence. Each anomaly's comparison set is captured at flag time. Cohort medians drift; published claims do not.
  • Ingestion run ledger. Every row in our database is stamped with the API call that produced it (source, query filters, timestamp). Reproducibility is built in.
  • Deterministic confidence gate. Five rules. No human judgment in publication. Method version stamped on every flag.
  • Right of reply rendered inline. Contractors get a printed mic on the same URL as the flag.
  • 48-hour dispute SLA. Every dispute resolution is logged at /corrections.

Known issues with our data today

  • ~40% of awards have phase=unknown. Our description-keyword classifier doesn't match the description text. These awards still score against the broader cohort but may have noisier signals.
  • Detail enrichment is partial. The fields used for cost-growth and sole-source detection (cost_growth_ratio, extent_competed, parent IDIQ) are populated via a separate pass against /api/v2/awards/{id}. USAspending throttles this endpoint; the backfill runs nightly. Recently flagged awards may not yet have these fields populated.
  • Federal grants are not indexed. Most IIJA highway dollars flow as grants to state DOTs, not direct contracts. Our IIJA-tagged subset is heavily concentrated in Federal Lands and DOD work as a result. Adding the grant-side ingester is a planned v2.
  • SAM.gov enrichment is deferred. Contractor exclusions and parent-company linkage from SAM.gov would meaningfully improve the small-business-kingpin and shell-game patterns. Pending API-key application.

Saw something that does not look right? Submit a dispute. 48-hour SLA.