Intent Feed

Capture → ID Match → Enrichment → Segmentation

Specifications

Release Date: September 17, 2025 Total rows (daily): 2B+ Version: 1.2.2 File Size: ~60G parquet Update Freq: Daily

Purpose

The Intent Feed ingests raw intent signals (cookies, pixels, SDK events, social interactions, commerce activity, and partner feeds), normalizes and matches them to an internal subject ID, enriches with models (interests, risk/fraud, value), and emits persona codes and playbook codes for instant segmentation and activation.


Architecture Overview

  1. Capture: pixels, SDKs, server‑to‑server, partner uploads.

  2. Normalize: schema harmonization, bot/fraud filtering, consent enforcement.

  3. Identity: deterministic/probabilistic stitching → subject_id (internal ID).

  4. Enrich: embeddings, category affinities, price sensitivity, churn/propensity, compliance checks.

  5. Classify: compute persona probabilities → persona codes; map to playbook codes.

  6. Serve: stream to feature store, segments, and decision API; batch sink to warehouse.

  7. Feedback: outcomes loop into model refresh and playbook tuning.

Latency targets: stream p50 < 50ms, p95 < 120ms from capture → enriched record available in the feature store (deployment‑dependent).


Data Sources

  • Web: first‑party cookies, pageview/search/cart pixels, consent banner states, referrer UTM.

  • Mobile: app SDK events (screen views, taps, purchases); device class/network only (no raw PII in feed).

  • Commerce: checkout starts/completes, SKU lines, refunds/returns, subscription changes.

  • Social & Ads: engagement callbacks (view/click), campaign/line‑item IDs; partner event streams (server‑to‑server).

  • Partner/Offline: POS batches, loyalty systems, call‑center outcomes, cooperative membership.

Sensitive attributes and special‑category data are not ingested. All sources must carry consent/legitimate‑interest flags; region‑aware enforcement occurs at ingestion.


Identifiers Accepted

  • cookie_id (first‑party), session_id

  • hashed_email (SHA‑256, lower‑cased, trimmed) when consented

  • app_instance_id (random app GUID)

  • account_id / customer_id (first‑party)

  • partner_subject_key (for clean‑room mapped IDs)

Stitching

  • Deterministic precedence: account_id > hashed_email > app_instance_id > cookie_id.

  • Probabilistic backfill: co‑occurrence + device/geo/time windows, capped with confidence thresholds.

  • Output: subject_id (opaque, internal), linkage graph with decay (edge TTL).

  • Required fields: consent.purpose=[analytics|personalization|ads], consent.region, consent.scope.

  • Policy engine hard‑blocks enrichment/activation when consent is missing or revoked.


Event Normalization

Canonical Intent Event

  • Required: event_id (ULID), occurred_at (ISO‑8601 UTC), source, event_type, consent, capture_context.

  • Optional: item (SKU or content), value, currency, metadata (kv), page_context, campaign_context.

Deduplication: idempotency on (event_id), and near‑duplicate suppression via LSH on (subject_id, event_type, item, ±5s).

Bot/Fraud Filtering: signature/heuristics (JA3, header entropy, rapid‑fire patterns), partner IP allowlists, anomaly scores.


Enrichment Models

  • Interest Embeddings: session‑aware text/item2vec to derive topical vectors and category affinities.

  • Propensities: conversion, churn, upgrade, repeat purchase, price sensitivity.

  • Quality Signals: fraud likelihood, bot probability, complaint risk.

  • Context Features: time‑bucket responsiveness, device/network cohort performance.

  • Value Signals: short‑term value (STV) and predicted LTV bands.

Outputs are calibrated and written to the Feature Store keyed by subject_id.


Persona & Playbook Classification

  • Persona Codes: map P(personak∣x)P(\text{persona}_k\mid x) to canonical codes (e.g., PER-EXP-dealfirst-1.0).

    • Assignment strategies: top‑1 with threshold, top‑N mixture, or abstain if uncertainty high.

  • Playbook Codes: deterministic/ML rules translating enriched features + persona to activation strategies, e.g.,

    • PB-TRIAL-NUDGE, PB-REORDER, PB-COMPARE-GRID, PB-LOYALTY-PERK.

    • Guardrails: fatigue caps, exclusion lists (sensitive categories), fairness constraints.


Stream Output Record (Post‑Match)

{
  "record_id": "01JC6Z8B9Q9K4AS3H7X5S3N1PZ",
  "subject_id": "SUB-7f91b1c6",
  "occurred_at": "2025-09-21T14:05:37Z",
  "source": "web_pixel",
  "event_type": "add_to_cart",
  "item": {"id": "SKU-1234", "category": "brushes/digital"},
  "value": 19.99,
  "currency": "USD",
  "consent": {"region": "US", "purpose": ["analytics","personalization"], "scope": "first_party"},
  "enrichments": {
    "affinity": {"categories": [{"id": "digital_art", "score": 0.86}]},
    "propensity": {"convert_7d": 0.34, "repeat_30d": 0.21},
    "value": {"ltv_band": "M"},
    "quality": {"bot_prob": 0.01}
  },
  "persona": {
    "codes": ["PER-EXP-dealfirst-1.0"],
    "probs": {"PER-EXP-dealfirst-1.0": 0.72, "PER-QLT-craft-1.0": 0.18}
  },
  "playbooks": ["PB-TRIAL-NUDGE", "PB-COMPARE-GRID"],
  "routing": {"eligible": true, "reasons": ["fatigue_ok","policy_pass"]},
  "trace_id": "TRC-5b2a"
}

Schemas

Intent Event (Capture Layer)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "IntentEvent",
  "type": "object",
  "required": ["event_id","occurred_at","source","event_type","consent"],
  "additionalProperties": true,
  "properties": {
    "event_id": {"type": "string"},
    "occurred_at": {"type": "string", "format": "date-time"},
    "source": {"type": "string", "enum": ["web_pixel","mobile_sdk","server","partner_batch"]},
    "event_type": {"type": "string"},
    "consent": {
      "type": "object",
      "required": ["region","purpose","scope"],
      "properties": {
        "region": {"type": "string"},
        "purpose": {"type": "array", "items": {"type": "string"}},
        "scope": {"type": "string", "enum": ["first_party","clean_room","partner"]}
      }
    },
    "identifiers": {
      "type": "object",
      "properties": {
        "cookie_id": {"type": "string"},
        "session_id": {"type": "string"},
        "hashed_email": {"type": "string"},
        "app_instance_id": {"type": "string"},
        "account_id": {"type": "string"},
        "partner_subject_key": {"type": "string"}
      }
    },
    "page_context": {"type": "object"},
    "campaign_context": {"type": "object"},
    "item": {"type": "object"},
    "value": {"type": "number"},
    "currency": {"type": "string"},
    "metadata": {"type": "object"}
  }
}

Enriched Intent Record (Post‑Match)

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "EnrichedIntentRecord",
  "type": "object",
  "required": ["record_id","subject_id","occurred_at","source","event_type","consent","enrichments"],
  "additionalProperties": true,
  "properties": {
    "record_id": {"type": "string"},
    "subject_id": {"type": "string"},
    "occurred_at": {"type": "string", "format": "date-time"},
    "source": {"type": "string"},
    "event_type": {"type": "string"},
    "item": {"type": "object"},
    "consent": {"type": "object"},
    "enrichments": {
      "type": "object",
      "required": ["affinity","propensity","value","quality"],
      "properties": {
        "affinity": {"type": "object"},
        "propensity": {"type": "object"},
        "value": {"type": "object"},
        "quality": {"type": "object"}
      }
    },
    "persona": {
      "type": "object",
      "required": ["codes"],
      "properties": {
        "codes": {"type": "array", "items": {"type": "string"}},
        "probs": {"type": "object"}
      }
    },
    "playbooks": {"type": "array", "items": {"type": "string"}},
    "routing": {"type": "object"},
    "trace_id": {"type": "string"}
  }
}

Playbooks

  • a Definition: executable strategies with guardrails (eligibility, exclusions, exposure caps) and message/offer templates.

  • Examples:

    • PB-TRIAL-NUDGE — trigger a low‑friction trial prompt with comparison grid.

    • PB-REORDER — reorder reminder with last‑purchase context.

    • PB-LOYALTY-PERK — perk/early access for loyal cohorts.

Playbooks are selected via rules or a policy model that consumes persona probabilities + enrichments.


Serving & Interfaces

  • Streaming: POST /v1/events (capture), GET /v1/stream/subjects/{id} (tail enriched records).

  • Batch: daily S3/Blob exports (Parquet) partitioned by dt=YYYY‑MM‑DD/region.

  • Segments: POST /v1/segments/query supports filters such as persona.codes ANY IN [...] and playbooks ANY IN [...].

  • Decisioning: POST /v1/decide accepts current context and returns ranked actions + playbook recommendations.


Quality, Safety & Governance

  • Compliance: consent gating, purpose limiting, DSAR/erasure; clean‑room join paths for partner data.

  • Bias/Fairness: exclude protected categories; monitor lift and error parity across non‑sensitive cohorts.

  • Security: PII vaulting; transport encryption; signed pixel/SDK payloads; replay protection.

  • Observability: dedupe rate, bot rate, stitch confidence, enrichment latency, model drift.


Retention & TTLs

  • Raw capture: 30–90 days (configurable) with access controls.

  • Enriched features: 6–24 months (downsampled); linkage edges decay with inactivity.

  • Consent logs: retained per regulatory requirements.


Error Handling & Edge Cases

  • Unmatched eventssubject_id = null → queued for backfill (bounded retry) or aggregated at the cohort level.

  • Low consent scope → enrichment/playbooks omitted; analytics only.

  • High bot/fraud score → drop or quarantine with routing.eligible=false.


Versioning & Change Management

  • Schema evolution via MAJOR.MINOR.PATCH; producers/consumers validated in CI.

  • Backward‑compatible changes favored; deprecations announced with migration guidance.


Limitations & Assumptions

  • Browser privacy controls (ITP/ETP) limit cookie longevity; first‑party capture and server events are recommended.

  • Identity stitching quality depends on consented identifiers and traffic mix.

  • Playbook efficacy is context‑dependent; continuous testing is required.

Last updated