Skip to content

Human Testing Protocol

When a delivery touches a UI surface, automated tests are not enough. This protocol turns "feature shipped" into "feature verified" through Riff-tracked human test tasks.

Projects

Riff project Tester Surface Constraints
po-tests-dev akadmin API + UI Terminal allowed (curl, JWT minting, DB checks, browser devtools)
po-tests-client Cristina UI only Browser only on *.qual.portugalodyssey.pt — no terminal, no JWTs, no env vars

Engineering Riff stays as 8da0e2bf-ee46-499c-b864-10128714f1f4. Test tasks reference back to it but live in their own project to keep each tester's queue clean.

Trigger rule

A Riff in the engineering project cannot move to done until paired test tasks exist (one per applicable audience) and at least one has passed.

Surface po-tests-dev po-tests-client
public-fo / partner-console / admin-console UI required required
API only (no UI consumer change) required skip
Strapi content schema required required (admin can verify shape via Strapi UI)
Infra / CI / DB migration / docs skip skip

When in doubt → spawn both. Cheap to skip, expensive to miss.

Lifecycle

Both projects use the Kanban workflow (todo → doing → review → done).

engineering Riff: doing → review → done
                            ├─► test task (dev)    : todo → doing → review → done
                            │                          ↑                  │
                            │                          └─── on FAIL ──────┘
                            └─► test task (client) : todo → doing → review → done
                                                       ↑                  │
                                                       └─── on FAIL ──────┘

A failed test does not spawn a fresh task — the same task ping-pongs review → todo across rounds, with each round's feedback archived as a comment. One canonical task per "test for feature X"; comment thread holds history.

State semantics — test tasks only

Status Meaning to tester Meaning to engineer
todo "Go test now." Build is on qual; description Result block is empty.
doing "I'm testing." Optional — skip if test runs in one sitting.
review "I tested; result is in the description." Your turn to read.
done Accepted; test passed.

One-line rule for testers

todo = test now. review = I've tested. If a task moves from review back to todo, read the latest comment first — it tells you what was fixed and where to focus.

Engineer procedure when picking up a review task

  • Result = PASS → move task to done. No comment needed (description holds the verdict). Engineering Riff can move to done once all applicable test tasks pass.
  • Result = FAIL → follow the four steps below, in order:
  • Paste-as-comment first. Add a new comment to the task, body prefixed with a round header:
    [Round 2 · 2026-05-07] FAIL — <one-line summary>
    
    <verbatim Result block from the description>
    
    Round number lives in the comment so the audit trail stays linear across multiple FAILs.
  • Fix the bug, push to qual, verify the fix is live. Do not reset the description or move the task before this. Otherwise the tester may pull the task back in and retest the same broken code → false-fail or wasted round-trip.
  • Add a retest-hint comment when flipping back to todo. One line on what was fixed and where to focus. Example:
    [Round 3 ready] Fixed cursor-jump after bullet toggle.
    Please re-run with focus on Step 4 (toggle list type while typing).
    
    Without this, the tester retests blind and may declare PASS without exercising the fix.
  • Reset the Result section of the description (uncheck PASS/FAIL boxes, clear the FAIL description text) and move task to todo.

The engineering Riff stays at doing while any paired test task is in a non-done state.

Task body template

Use this verbatim. Subject prefixes the surface and feature name.

Subject: [<surface>] <feature> — <audience suffix>

Examples: - [partner-console] logout clears stale partner state — dev smoke - [admin-console] partner content review page — client UAT

Body:

## Source
Engineering Riff: #NN ("<subject>")
Surface: <partner-console | admin-console | public-fo | API>

## Setup
- (dev) <preconditions: JWT, DB state, env, devtools panel>
- (client) <plain-language preconditions: which URL, which login>

## Steps
1. <action> → <expected observation>
2. <action> → <expected observation>
3. ...

## Pass criteria
- All steps observe expected outcome
- No errors in network tab / no flash of stale data / no 500s
- <feature-specific assertions>

## Result (filled by tester)
- [ ] PASS
- [ ] FAIL — describe + paste screenshot

Steps are atomic (one action, one observation). Pass criteria are written by the engineer before the test runs — no negotiation about "did it work."

Drafting guidance

Dev-grade tasks: - Setup may include temp/<feature>-jwt.sh, env vars, DB seed snippets. - Steps may reference network requests by URL, browser devtools panels, or terminal commands. - Pass criteria can include status codes, log assertions, response shape checks.

Client-grade tasks: - No technical jargon. Use UI labels, not React component names. - Setup is one sentence: which URL, which login. Provide creds out-of-band (not in the task body). - Steps are click/type/look — no copy-paste of payloads. - Pass criteria are user-visible outcomes only (text on screen, navigation, no spinners stuck).

Anti-patterns

  • ❌ Adding a test:smoke label and skipping the lifecycle — without the trigger rule, tests get forgotten.
  • ❌ Reusing one test task across multiple deliveries — failure trail gets muddled.
  • ❌ Filing dev test tasks in po-tests-client because "Cristina can run them too" — separate audiences, separate queues.
  • ❌ Closing a blocked test task on the engineer's say-so — only the tester closes (with re-run after fix).

Bootstrap

Until volume justifies otherwise: - One test project per audience (dev / client). No further sub-projects. - No custom labels — use the Kanban workflow's todo / doing / review / done. - Backfill 5 tasks for already-shipped UI work (see temp/test-protocol-seed-tasks.md).

Glossary

Terms that appear in this doc OR are likely to show up in test task bodies. Aimed at testers who are not engineers — skim or search rather than read top-to-bottom.

Testing terms

Term Means
UAT User Acceptance Testing. The end-user (e.g. Cristina) checks the feature works for their needs — distinct from engineers checking that code runs.
Smoke test Quick check that the basic flow doesn't crash. "Does it turn on without smoke coming out?" Not a full feature audit.
Dev smoke A smoke test run by a developer with terminal access. Faster, more technical.
Pass criteria The yes/no checks the engineer writes BEFORE the test runs. No negotiation later about "did it work."
PASS / FAIL The two outcomes a tester records in the Result section of a task.
Regression A bug that re-appears after it was previously fixed. Catching these is one of the main reasons we test before shipping.
Edge case An unusual input or state that breaks an assumption (empty list, very long text, slow network). Pass criteria should include the obvious ones.

Workflow / project terms

Term Means
Riff The task tracker (this tool — tasks-prod MCP). Holds all engineering tasks AND test tasks.
Engineering Riff A task in the main po-platform project (where features get tracked). Spawns child test tasks under this protocol.
Kanban The workflow name used by both test projects. Statuses: tododoingreviewdone.
Trigger rule The condition that says "this delivery needs human tests" — see §Trigger rule above.
Backfill Creating test tasks for features that shipped before the protocol existed. One-time exercise.

Environments

Term Means
dev Local development — runs on José's laptop. Self-signed HTTPS.
qual Qualification (a.k.a. staging). URL pattern *.qual.portugalodyssey.pt. Where most testing happens.
prod Production. URL pattern portugalodyssey.pt, partner.*, admin.*. Reserved for after launch.

App surfaces

Term Means
public-fo The traveler-facing website (a.k.a. front-office). URL: portugalodyssey.pt.
partner-console The portal where partners manage their services. URL: partner.*.
admin-console The internal staff portal. URL: admin.*.
API The backend HTTP interface. Tests against it use curl or browser devtools.
CMS Content Management System. We use Strapi for partner UGC and editorial content.

Things you'll see in a browser while testing

Term Means
Devtools Browser developer tools (F12 / Ctrl+Shift+I). Has tabs for Network, Console, Application, Elements.
Network tab Shows every HTTP request the page makes. Useful to see status codes and request URLs.
Console Browser tab where JavaScript errors appear. A red line in console = something broke.
HTTP 200 "OK." The request succeeded.
HTTP 403 "Forbidden." Server understood but refuses (usually a permissions issue).
HTTP 404 "Not found." URL or resource doesn't exist.
HTTP 500 "Internal server error." Backend crashed. Always a bug.
Spinner The loading indicator that spins while waiting. "Stuck spinner" = >5s with no result = bug.
Toast A small temporary message that pops up (e.g. "Saved", "Error"). Confirms an action.
Empty state The screen shown when a list has no items ("No content yet"). Should never look like a crash.
Hard refresh Reload bypassing cache. Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac).
Incognito / private window Browser window with no cookies/cache. Useful to test "fresh user" without logging out.

Auth / identity

Term Means
Keycloak Our identity provider — the system that holds usernames/passwords.
Login / Logout The standard flows. After logout, no protected page should be reachable without re-login.
Session The logged-in state stored in the browser. Survives a page refresh, ends on logout or expiry.
Token / Bearer token The string the browser sends with each request to prove "I'm logged in."
JWT JSON Web Token — the most common kind of token. A long base64 string in 3 dot-separated parts.
OIDC / SSO OpenID Connect / Single Sign-On. The protocol behind "log in once, access everywhere."

Frontend tech you might hear named

Term Means
React The UI framework powering all 3 frontends. Pages are made of "components".
Redux A library that holds shared state (like "who is the logged-in user") in one place.
Redux DevTools A browser extension that shows the current Redux state — useful for dev tests where the state itself is the bug.
Component A reusable UI piece (button, page, form). When test tasks say "the X component", it means that visual building block.
TipTap The rich-text editor library used in partner-console for content authoring.
i18n Internationalization — the mechanism that switches the UI between Portuguese, English, etc.
Strapi Our CMS (Content Management System). Where editorial content + partner UGC lives.
UGC User-Generated Content. Content authored by partners (not by platform staff).

Business / product terms (likely in future tasks)

Term Means
B2B / B2C Business-to-Business / Business-to-Consumer. po-platform is a hybrid: B2B for partners, B2C for travelers.
MoR Merchant of Record. Whoever the customer's payment receipt is from — for po-platform, the platform itself (not individual partners).
KYC Know Your Customer. Identity-verification step required before a partner can receive payouts.
DSAR Data Subject Access Request. A user's GDPR right to see/export/delete their data. 30-day SLA.
GDPR EU data protection regulation. Affects how we store + delete personal data.
IDD Insurance Distribution Directive. EU regulation that may apply if we bundle insurance into experiences.
DAG Directed Acyclic Graph. The data shape behind an Experience (services chained with dependencies, no loops).

Terminology house style

  • "Above the fold" = visible on screen without scrolling.
  • "Happy path" = the successful flow with valid inputs. The opposite is the error path.
  • "Golden path" = same as happy path; preferred test target before edge cases.
  • "Stub" / "Stubbed" = a placeholder value that's correct in shape but not real (e.g. "0 contracts" while contract-service is being wired). Pass criteria should specify if a stub is acceptable.