How we collect this data

STT operates the same database for field staff and for this public site. The numbers you see have passed the same validation rules our researchers see when entering a record. Here's how that works.

1. Geocoded against the official Cambodian gazetteer (NCDD)

Every community, eviction incident, and environmental observation is anchored to the National Committee for Sub-National Democratic Development (NCDD) administrative hierarchy: Province → District → Commune → Village. Free-text location fields are not allowed — entries cascade through validated dropdowns powered by the official NCDD codes.

This guarantees that "Phnom Penh" in our data is exactly the same Phnom Penh that appears in government statistics, World Bank tables, and IDPoor records. Cross-walks with NSDP and IDPoor classifications are direct.

2. Range-checked at write time

Database CHECK constraints + Zod schemas reject impossible values before they ever reach a chart. Examples a reviewer can verify in the source code:

  • GPS points must fall within the Cambodia bounding box (lat 8–16, lng 102–108).
  • Population estimates must be 0–5,000,000.
  • Founding years must be between 1900 and the current year.
  • Income levels are restricted to Low / Medium / High; eviction outcomes to Resolved / Ongoing / Pending / Resisted; etc.
  • Phone numbers must contain 8–12 digits after stripping formatting.

These rules are enforced both in the browser (immediate feedback) and on the server (immutable). A would-be admin can't bypass them by editing devtools.

3. Reporting periods + audit trail

The platform runs on a quarterly reporting cycle. Each period has a defined start/end window and a sign-off step before its data feeds the public dashboard. Every row in every table records created_at and updated_at timestamps and the user who made the change — so any downstream number can be traced back to a specific data-entry event.

When STT issues a year-end report, the figures match the public dashboard for the same period exactly. There is no second spreadsheet.

4. Bulk imports validated row-by-row, with duplicate detection

When partner organisations send us spreadsheets — or when our own field teams batch up monthly observations into a Google Sheet — every row is parsed, type-coerced, and re-validated against the same schema as manual entry. Duplicates are detected against:

  • The existing database (rows we've already imported).
  • Earlier rows in the same import batch.

Per-entity unique keys are documented in the codebase (e.g., publications dedupe on title + published date; environmental observations dedupe on site + date + type). No row enters the database without explicit human review of the preview.

5. Privacy: aggregate-only public surface

The internal database stores some sensitive information by necessity — head of household names, contact phone numbers, and notes from field visits. None of that is exposed on this public site.

The endpoints powering /transparency return only counts, sums, and per-province roll-ups. There is no way to query an individual household, an individual eviction notice, or an individual GPS point through the public surface.

Community-level data is shared only with prior consent from community representatives, in accordance with our long-standing field protocol.

6. Community consent, not extraction

STT's data collection follows community-led research principles. We document what communities tell us is happening to them; we do not survey "subjects". When data is published — whether in our annual reports, in joint statements, or on this dashboard — affected communities have already seen it and approved it.

We do not sell, license, or share row-level data with third parties.

7. Open data, freely reusable

All aggregate data on this site is published under Creative Commons CC BY 4.0. You can reuse it in your own reports, journalism, or research, provided you attribute Sahmakum Teang Tnaut (STT) and link back to the original source on this site.

Want to verify any of this?

The codebase is open to audit. Email contact@teangtnaut.org for read-only access to validation logs, dedupe reports, or our database schema. We regularly host data partners, donors, and academic peer reviewers for walk-throughs.