Skip to main content

Extraction and matching

Behind every price data point in your Wiser dashboard is a two-stage process: extraction (getting data off retailer pages) and matching (connecting that data to the right product in your catalog). This article explains how both work — and why getting them right is harder than it looks.

Stage 1: Extraction

Extraction is the process of retrieving structured data from retailer websites. Wiser's extraction infrastructure handles several challenges that trip up simpler solutions:

  • Domain-specific rules: Each retailer gets custom extraction logic. Amazon, Walmart, Target, and thousands of regional retailers all structure their product pages differently. Generic rules miss important fields or capture the wrong data.

  • Detection avoidance: Retailers actively block automated traffic. Wiser uses proxy waterfalls and advanced fingerprinting to minimize blocks and maintain consistent uptime, so your data feed doesn't develop unexplained gaps.

  • Layout change detection: When a retailer updates their site, Wiser's systems detect the change and flag affected extraction rules. Updates are integrated across all impacted domains, not just the one where the change was first noticed.

  • Attribute capture: Beyond price, Wiser captures product name, brand, color, MPN, UPC, SKU, availability status, promotional badges, and other attributes relevant to your analysis.

Stage 2: Matching

Matching connects extracted data to the correct product in your catalog. Wiser supports two match types:

  • Exact matches: the same product, same SKU, confirmed via shared identifiers (UPC, MPN, EAN). The highest-confidence match type.

  • Equivalent matches: products that serve the same function and compete directly, even if they're different SKUs. This includes private label equivalents, bundles, and multipacks.

Why equivalent matching matters

If a competitor sells a 3-pack of a product you sell individually, a system that only does exact matching won't surface that comparison.


How matching accuracy is maintained

Matching isn't a one-time configuration, it requires ongoing refinement:

  1. Catalog onboarding: When you join Wiser, your product catalog is ingested and used to seed the matching process. The more complete your catalog (including MPNs, EANs, and product descriptions), the higher your initial match rate.

  2. Machine learning refinement: AI models learn from validated matches and build product relationship models over time.

  3. Manual review queue: Low-confidence matches are flagged and reviewed by Wiser's data analysts before they're surfaced to customers. You can also flag matches for review directly from the platform.

Did this answer your question?