Intent Feed

Capture → ID Match → Enrichment → Segmentation

Specifications

Release Date: September 17, 2025 Total rows (daily): 2B+ Version: 1.2.2 File Size: ~60G parquet Update Freq: Daily

Purpose

The Intent Feed ingests raw intent signals (cookies, pixels, SDK events, social interactions, commerce activity, and partner feeds), normalizes and matches them to an internal subject ID, enriches with models (interests, risk/fraud, value), and emits persona codes and playbook codes for instant segmentation and activation.

Architecture Overview

Capture: pixels, SDKs, server‑to‑server, partner uploads.
Normalize: schema harmonization, bot/fraud filtering, consent enforcement.
Identity: deterministic/probabilistic stitching → subject_id (internal ID).
Enrich: embeddings, category affinities, price sensitivity, churn/propensity, compliance checks.
Classify: compute persona probabilities → persona codes; map to playbook codes.
Serve: stream to feature store, segments, and decision API; batch sink to warehouse.
Feedback: outcomes loop into model refresh and playbook tuning.

Latency targets: stream p50 < 50ms, p95 < 120ms from capture → enriched record available in the feature store (deployment‑dependent).

Data Sources

Web: first‑party cookies, pageview/search/cart pixels, consent banner states, referrer UTM.
Mobile: app SDK events (screen views, taps, purchases); device class/network only (no raw PII in feed).
Commerce: checkout starts/completes, SKU lines, refunds/returns, subscription changes.
Social & Ads: engagement callbacks (view/click), campaign/line‑item IDs; partner event streams (server‑to‑server).
Partner/Offline: POS batches, loyalty systems, call‑center outcomes, cooperative membership.

Sensitive attributes and special‑category data are not ingested. All sources must carry consent/legitimate‑interest flags; region‑aware enforcement occurs at ingestion.

Identifiers Accepted

cookie_id (first‑party), session_id
hashed_email (SHA‑256, lower‑cased, trimmed) when consented
app_instance_id (random app GUID)
account_id / customer_id (first‑party)
partner_subject_key (for clean‑room mapped IDs)

Stitching

Deterministic precedence: account_id > hashed_email > app_instance_id > cookie_id.
Probabilistic backfill: co‑occurrence + device/geo/time windows, capped with confidence thresholds.
Output: subject_id (opaque, internal), linkage graph with decay (edge TTL).

Required fields: consent.purpose=[analytics|personalization|ads], consent.region, consent.scope.
Policy engine hard‑blocks enrichment/activation when consent is missing or revoked.

Event Normalization

Canonical Intent Event

Required: event_id (ULID), occurred_at (ISO‑8601 UTC), source, event_type, consent, capture_context.
Optional: item (SKU or content), value, currency, metadata (kv), page_context, campaign_context.

Deduplication: idempotency on (event_id), and near‑duplicate suppression via LSH on (subject_id, event_type, item, ±5s).

Bot/Fraud Filtering: signature/heuristics (JA3, header entropy, rapid‑fire patterns), partner IP allowlists, anomaly scores.

Enrichment Models

Interest Embeddings: session‑aware text/item2vec to derive topical vectors and category affinities.
Propensities: conversion, churn, upgrade, repeat purchase, price sensitivity.
Quality Signals: fraud likelihood, bot probability, complaint risk.
Context Features: time‑bucket responsiveness, device/network cohort performance.
Value Signals: short‑term value (STV) and predicted LTV bands.

Outputs are calibrated and written to the Feature Store keyed by subject_id.

Persona & Playbook Classification

Persona Codes: map P(personak∣x)P(\text{persona}_k\mid x) to canonical codes (e.g., PER-EXP-dealfirst-1.0).
- Assignment strategies: top‑1 with threshold, top‑N mixture, or abstain if uncertainty high.
Playbook Codes: deterministic/ML rules translating enriched features + persona to activation strategies, e.g.,
- PB-TRIAL-NUDGE, PB-REORDER, PB-COMPARE-GRID, PB-LOYALTY-PERK.
- Guardrails: fatigue caps, exclusion lists (sensitive categories), fairness constraints.

Stream Output Record (Post‑Match)

{
  "record_id": "01JC6Z8B9Q9K4AS3H7X5S3N1PZ",
  "subject_id": "SUB-7f91b1c6",
  "occurred_at": "2025-09-21T14:05:37Z",
  "source": "web_pixel",
  "event_type": "add_to_cart",
  "item": {"id": "SKU-1234", "category": "brushes/digital"},
  "value": 19.99,
  "currency": "USD",
  "consent": {"region": "US", "purpose": ["analytics","personalization"], "scope": "first_party"},
  "enrichments": {
    "affinity": {"categories": [{"id": "digital_art", "score": 0.86}]},
    "propensity": {"convert_7d": 0.34, "repeat_30d": 0.21},
    "value": {"ltv_band": "M"},
    "quality": {"bot_prob": 0.01}
  },
  "persona": {
    "codes": ["PER-EXP-dealfirst-1.0"],
    "probs": {"PER-EXP-dealfirst-1.0": 0.72, "PER-QLT-craft-1.0": 0.18}
  },
  "playbooks": ["PB-TRIAL-NUDGE", "PB-COMPARE-GRID"],
  "routing": {"eligible": true, "reasons": ["fatigue_ok","policy_pass"]},
  "trace_id": "TRC-5b2a"
}

Schemas

Intent Event (Capture Layer)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "IntentEvent",
  "type": "object",
  "required": ["event_id","occurred_at","source","event_type","consent"],
  "additionalProperties": true,
  "properties": {
    "event_id": {"type": "string"},
    "occurred_at": {"type": "string", "format": "date-time"},
    "source": {"type": "string", "enum": ["web_pixel","mobile_sdk","server","partner_batch"]},
    "event_type": {"type": "string"},
    "consent": {
      "type": "object",
      "required": ["region","purpose","scope"],
      "properties": {
        "region": {"type": "string"},
        "purpose": {"type": "array", "items": {"type": "string"}},
        "scope": {"type": "string", "enum": ["first_party","clean_room","partner"]}
      }
    },
    "identifiers": {
      "type": "object",
      "properties": {
        "cookie_id": {"type": "string"},
        "session_id": {"type": "string"},
        "hashed_email": {"type": "string"},
        "app_instance_id": {"type": "string"},
        "account_id": {"type": "string"},
        "partner_subject_key": {"type": "string"}
      }
    },
    "page_context": {"type": "object"},
    "campaign_context": {"type": "object"},
    "item": {"type": "object"},
    "value": {"type": "number"},
    "currency": {"type": "string"},
    "metadata": {"type": "object"}
  }
}

Enriched Intent Record (Post‑Match)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "EnrichedIntentRecord",
  "type": "object",
  "required": ["record_id","subject_id","occurred_at","source","event_type","consent","enrichments"],
  "additionalProperties": true,
  "properties": {
    "record_id": {"type": "string"},
    "subject_id": {"type": "string"},
    "occurred_at": {"type": "string", "format": "date-time"},
    "source": {"type": "string"},
    "event_type": {"type": "string"},
    "item": {"type": "object"},
    "consent": {"type": "object"},
    "enrichments": {
      "type": "object",
      "required": ["affinity","propensity","value","quality"],
      "properties": {
        "affinity": {"type": "object"},
        "propensity": {"type": "object"},
        "value": {"type": "object"},
        "quality": {"type": "object"}
      }
    },
    "persona": {
      "type": "object",
      "required": ["codes"],
      "properties": {
        "codes": {"type": "array", "items": {"type": "string"}},
        "probs": {"type": "object"}
      }
    },
    "playbooks": {"type": "array", "items": {"type": "string"}},
    "routing": {"type": "object"},
    "trace_id": {"type": "string"}
  }
}

Playbooks

a Definition: executable strategies with guardrails (eligibility, exclusions, exposure caps) and message/offer templates.
Examples:
- PB-TRIAL-NUDGE — trigger a low‑friction trial prompt with comparison grid.
- PB-REORDER — reorder reminder with last‑purchase context.
- PB-LOYALTY-PERK — perk/early access for loyal cohorts.

Playbooks are selected via rules or a policy model that consumes persona probabilities + enrichments.

Serving & Interfaces

Streaming: POST /v1/events (capture), GET /v1/stream/subjects/{id} (tail enriched records).
Batch: daily S3/Blob exports (Parquet) partitioned by dt=YYYY‑MM‑DD/region.
Segments: POST /v1/segments/query supports filters such as persona.codes ANY IN [...] and playbooks ANY IN [...].
Decisioning: POST /v1/decide accepts current context and returns ranked actions + playbook recommendations.

Quality, Safety & Governance

Compliance: consent gating, purpose limiting, DSAR/erasure; clean‑room join paths for partner data.
Bias/Fairness: exclude protected categories; monitor lift and error parity across non‑sensitive cohorts.
Security: PII vaulting; transport encryption; signed pixel/SDK payloads; replay protection.
Observability: dedupe rate, bot rate, stitch confidence, enrichment latency, model drift.

Retention & TTLs

Raw capture: 30–90 days (configurable) with access controls.
Enriched features: 6–24 months (downsampled); linkage edges decay with inactivity.
Consent logs: retained per regulatory requirements.

Error Handling & Edge Cases

Unmatched events → subject_id = null → queued for backfill (bounded retry) or aggregated at the cohort level.
Low consent scope → enrichment/playbooks omitted; analytics only.
High bot/fraud score → drop or quarantine with routing.eligible=false.

Versioning & Change Management

Schema evolution via MAJOR.MINOR.PATCH; producers/consumers validated in CI.
Backward‑compatible changes favored; deprecations announced with migration guidance.

Limitations & Assumptions

Browser privacy controls (ITP/ETP) limit cookie longevity; first‑party capture and server events are recommended.
Identity stitching quality depends on consented identifiers and traffic mix.
Playbook efficacy is context‑dependent; continuous testing is required.

PreviousActivation Playbook

Last updated 5 months ago

hashtagPurpose

hashtagArchitecture Overview

hashtagData Sources

hashtagIdentity & Consent

hashtagIdentifiers Accepted

hashtagStitching

hashtagConsent & Purpose Limiting

hashtagEvent Normalization

hashtagCanonical Intent Event

hashtagEnrichment Models

hashtagPersona & Playbook Classification

hashtagStream Output Record (Post‑Match)

hashtagSchemas

hashtagIntent Event (Capture Layer)

hashtagEnriched Intent Record (Post‑Match)

hashtagPlaybooks

hashtagServing & Interfaces

hashtagQuality, Safety & Governance

hashtagRetention & TTLs

hashtagError Handling & Edge Cases

hashtagVersioning & Change Management

hashtagLimitations & Assumptions