Human Testing Protocol¶
When a delivery touches a UI surface, automated tests are not enough. This protocol turns "feature shipped" into "feature verified" through Riff-tracked human test tasks.
Projects¶
| Riff project | Tester | Surface | Constraints |
|---|---|---|---|
po-tests-dev |
akadmin | API + UI | Terminal allowed (curl, JWT minting, DB checks, browser devtools) |
po-tests-client |
Cristina | UI only | Browser only on *.qual.portugalodyssey.pt — no terminal, no JWTs, no env vars |
Engineering Riff stays as 8da0e2bf-ee46-499c-b864-10128714f1f4. Test tasks reference back to it but live in their own project to keep each tester's queue clean.
Trigger rule¶
A Riff in the engineering project cannot move to done until paired test tasks exist (one per applicable audience) and at least one has passed.
| Surface | po-tests-dev |
po-tests-client |
|---|---|---|
| public-fo / partner-console / admin-console UI | required | required |
| API only (no UI consumer change) | required | skip |
| Strapi content schema | required | required (admin can verify shape via Strapi UI) |
| Infra / CI / DB migration / docs | skip | skip |
When in doubt → spawn both. Cheap to skip, expensive to miss.
Lifecycle¶
Both projects use the Kanban workflow (todo → doing → review → done).
engineering Riff: doing → review → done
│
├─► test task (dev) : todo → doing → review → done
│ ↑ │
│ └─── on FAIL ──────┘
└─► test task (client) : todo → doing → review → done
↑ │
└─── on FAIL ──────┘
A failed test does not spawn a fresh task — the same task ping-pongs review → todo across rounds, with each round's feedback archived as a comment. One canonical task per "test for feature X"; comment thread holds history.
State semantics — test tasks only¶
| Status | Meaning to tester | Meaning to engineer |
|---|---|---|
todo |
"Go test now." Build is on qual; description Result block is empty. | — |
doing |
"I'm testing." Optional — skip if test runs in one sitting. | — |
review |
"I tested; result is in the description." | Your turn to read. |
done |
— | Accepted; test passed. |
One-line rule for testers¶
todo= test now.review= I've tested. If a task moves fromreviewback totodo, read the latest comment first — it tells you what was fixed and where to focus.
Engineer procedure when picking up a review task¶
- Result = PASS → move task to
done. No comment needed (description holds the verdict). Engineering Riff can move todoneonce all applicable test tasks pass. - Result = FAIL → follow the four steps below, in order:
- Paste-as-comment first. Add a new comment to the task, body prefixed with a round header: Round number lives in the comment so the audit trail stays linear across multiple FAILs.
- Fix the bug, push to qual, verify the fix is live. Do not reset the description or move the task before this. Otherwise the tester may pull the task back in and retest the same broken code → false-fail or wasted round-trip.
- Add a retest-hint comment when flipping back to
todo. One line on what was fixed and where to focus. Example: Without this, the tester retests blind and may declare PASS without exercising the fix. - Reset the Result section of the description (uncheck PASS/FAIL boxes, clear the FAIL description text) and move task to
todo.
The engineering Riff stays at doing while any paired test task is in a non-done state.
Task body template¶
Use this verbatim. Subject prefixes the surface and feature name.
Subject: [<surface>] <feature> — <audience suffix>
Examples:
- [partner-console] logout clears stale partner state — dev smoke
- [admin-console] partner content review page — client UAT
Body:
## Source
Engineering Riff: #NN ("<subject>")
Surface: <partner-console | admin-console | public-fo | API>
## Setup
- (dev) <preconditions: JWT, DB state, env, devtools panel>
- (client) <plain-language preconditions: which URL, which login>
## Steps
1. <action> → <expected observation>
2. <action> → <expected observation>
3. ...
## Pass criteria
- All steps observe expected outcome
- No errors in network tab / no flash of stale data / no 500s
- <feature-specific assertions>
## Result (filled by tester)
- [ ] PASS
- [ ] FAIL — describe + paste screenshot
Steps are atomic (one action, one observation). Pass criteria are written by the engineer before the test runs — no negotiation about "did it work."
Drafting guidance¶
Dev-grade tasks:
- Setup may include temp/<feature>-jwt.sh, env vars, DB seed snippets.
- Steps may reference network requests by URL, browser devtools panels, or terminal commands.
- Pass criteria can include status codes, log assertions, response shape checks.
Client-grade tasks: - No technical jargon. Use UI labels, not React component names. - Setup is one sentence: which URL, which login. Provide creds out-of-band (not in the task body). - Steps are click/type/look — no copy-paste of payloads. - Pass criteria are user-visible outcomes only (text on screen, navigation, no spinners stuck).
Anti-patterns¶
- ❌ Adding a
test:smokelabel and skipping the lifecycle — without the trigger rule, tests get forgotten. - ❌ Reusing one test task across multiple deliveries — failure trail gets muddled.
- ❌ Filing dev test tasks in
po-tests-clientbecause "Cristina can run them too" — separate audiences, separate queues. - ❌ Closing a
blockedtest task on the engineer's say-so — only the tester closes (with re-run after fix).
Bootstrap¶
Until volume justifies otherwise:
- One test project per audience (dev / client). No further sub-projects.
- No custom labels — use the Kanban workflow's todo / doing / review / done.
- Backfill 5 tasks for already-shipped UI work (see temp/test-protocol-seed-tasks.md).
Glossary¶
Terms that appear in this doc OR are likely to show up in test task bodies. Aimed at testers who are not engineers — skim or search rather than read top-to-bottom.
Testing terms¶
| Term | Means |
|---|---|
| UAT | User Acceptance Testing. The end-user (e.g. Cristina) checks the feature works for their needs — distinct from engineers checking that code runs. |
| Smoke test | Quick check that the basic flow doesn't crash. "Does it turn on without smoke coming out?" Not a full feature audit. |
| Dev smoke | A smoke test run by a developer with terminal access. Faster, more technical. |
| Pass criteria | The yes/no checks the engineer writes BEFORE the test runs. No negotiation later about "did it work." |
| PASS / FAIL | The two outcomes a tester records in the Result section of a task. |
| Regression | A bug that re-appears after it was previously fixed. Catching these is one of the main reasons we test before shipping. |
| Edge case | An unusual input or state that breaks an assumption (empty list, very long text, slow network). Pass criteria should include the obvious ones. |
Workflow / project terms¶
| Term | Means |
|---|---|
| Riff | The task tracker (this tool — tasks-prod MCP). Holds all engineering tasks AND test tasks. |
| Engineering Riff | A task in the main po-platform project (where features get tracked). Spawns child test tasks under this protocol. |
| Kanban | The workflow name used by both test projects. Statuses: todo → doing → review → done. |
| Trigger rule | The condition that says "this delivery needs human tests" — see §Trigger rule above. |
| Backfill | Creating test tasks for features that shipped before the protocol existed. One-time exercise. |
Environments¶
| Term | Means |
|---|---|
| dev | Local development — runs on José's laptop. Self-signed HTTPS. |
| qual | Qualification (a.k.a. staging). URL pattern *.qual.portugalodyssey.pt. Where most testing happens. |
| prod | Production. URL pattern portugalodyssey.pt, partner.*, admin.*. Reserved for after launch. |
App surfaces¶
| Term | Means |
|---|---|
| public-fo | The traveler-facing website (a.k.a. front-office). URL: portugalodyssey.pt. |
| partner-console | The portal where partners manage their services. URL: partner.*. |
| admin-console | The internal staff portal. URL: admin.*. |
| API | The backend HTTP interface. Tests against it use curl or browser devtools. |
| CMS | Content Management System. We use Strapi for partner UGC and editorial content. |
Things you'll see in a browser while testing¶
| Term | Means |
|---|---|
| Devtools | Browser developer tools (F12 / Ctrl+Shift+I). Has tabs for Network, Console, Application, Elements. |
| Network tab | Shows every HTTP request the page makes. Useful to see status codes and request URLs. |
| Console | Browser tab where JavaScript errors appear. A red line in console = something broke. |
| HTTP 200 | "OK." The request succeeded. |
| HTTP 403 | "Forbidden." Server understood but refuses (usually a permissions issue). |
| HTTP 404 | "Not found." URL or resource doesn't exist. |
| HTTP 500 | "Internal server error." Backend crashed. Always a bug. |
| Spinner | The loading indicator that spins while waiting. "Stuck spinner" = >5s with no result = bug. |
| Toast | A small temporary message that pops up (e.g. "Saved", "Error"). Confirms an action. |
| Empty state | The screen shown when a list has no items ("No content yet"). Should never look like a crash. |
| Hard refresh | Reload bypassing cache. Ctrl+Shift+R (Windows/Linux) or Cmd+Shift+R (Mac). |
| Incognito / private window | Browser window with no cookies/cache. Useful to test "fresh user" without logging out. |
Auth / identity¶
| Term | Means |
|---|---|
| Keycloak | Our identity provider — the system that holds usernames/passwords. |
| Login / Logout | The standard flows. After logout, no protected page should be reachable without re-login. |
| Session | The logged-in state stored in the browser. Survives a page refresh, ends on logout or expiry. |
| Token / Bearer token | The string the browser sends with each request to prove "I'm logged in." |
| JWT | JSON Web Token — the most common kind of token. A long base64 string in 3 dot-separated parts. |
| OIDC / SSO | OpenID Connect / Single Sign-On. The protocol behind "log in once, access everywhere." |
Frontend tech you might hear named¶
| Term | Means |
|---|---|
| React | The UI framework powering all 3 frontends. Pages are made of "components". |
| Redux | A library that holds shared state (like "who is the logged-in user") in one place. |
| Redux DevTools | A browser extension that shows the current Redux state — useful for dev tests where the state itself is the bug. |
| Component | A reusable UI piece (button, page, form). When test tasks say "the X component", it means that visual building block. |
| TipTap | The rich-text editor library used in partner-console for content authoring. |
| i18n | Internationalization — the mechanism that switches the UI between Portuguese, English, etc. |
| Strapi | Our CMS (Content Management System). Where editorial content + partner UGC lives. |
| UGC | User-Generated Content. Content authored by partners (not by platform staff). |
Business / product terms (likely in future tasks)¶
| Term | Means |
|---|---|
| B2B / B2C | Business-to-Business / Business-to-Consumer. po-platform is a hybrid: B2B for partners, B2C for travelers. |
| MoR | Merchant of Record. Whoever the customer's payment receipt is from — for po-platform, the platform itself (not individual partners). |
| KYC | Know Your Customer. Identity-verification step required before a partner can receive payouts. |
| DSAR | Data Subject Access Request. A user's GDPR right to see/export/delete their data. 30-day SLA. |
| GDPR | EU data protection regulation. Affects how we store + delete personal data. |
| IDD | Insurance Distribution Directive. EU regulation that may apply if we bundle insurance into experiences. |
| DAG | Directed Acyclic Graph. The data shape behind an Experience (services chained with dependencies, no loops). |
Terminology house style¶
- "Above the fold" = visible on screen without scrolling.
- "Happy path" = the successful flow with valid inputs. The opposite is the error path.
- "Golden path" = same as happy path; preferred test target before edge cases.
- "Stub" / "Stubbed" = a placeholder value that's correct in shape but not real (e.g. "0 contracts" while contract-service is being wired). Pass criteria should specify if a stub is acceptable.