Progressive Data Enrichment Pipeline with n8n

Last quarter, I watched a client burn through $4,000 in Clearbit credits in three weeks because they were enriching every single form fill—including job seekers, students, and spam submissions. Their enrichment “strategy” was essentially firing expensive API calls at anything with an email address.

We rebuilt their enrichment pipeline using progressive waterfall logic: start with free sources, escalate to paid APIs only when necessary, and skip enrichment entirely for low-value prospects. Their enrichment costs dropped 73% while data quality actually improved.

This is the power of intelligent enrichment architecture—and n8n is the perfect platform to build it.

Why Most Enrichment Strategies Waste Money

The “Enrich Everything” Fallacy Many teams treat enrichment as an all-or-nothing decision. But enriching a personal Gmail address the same way you enrich an enterprise Fortune 500 prospect is like using a firehose to water a houseplant.

The Single-Source Dependency Relying on one enrichment provider (Clearbit, ZoomInfo, Apollo) means you’re locked into their pricing, coverage gaps, and data freshness issues. When that API goes down or hits rate limits, your entire enrichment pipeline stops.

The Timing Problem Enriching leads immediately at capture wastes credits on prospects who’ll never engage. But waiting too long means your sales team is working with incomplete data during critical early outreach.

The ENRICH Framework for n8n

After building enrichment pipelines for dozens of clients, I’ve developed the ENRICH framework:

Evaluate lead quality before enrichment
Native data sources first (free/cheap)
Rate limiting and cost controls
Intelligent provider waterfall
Caching and deduplication
Handoff and CRM updates

Architecture: The Waterfall Enrichment Model

Progressive enrichment works like a waterfall—each tier catches what the previous tier missed:

Tier 0: Pre-Enrichment Validation (Free)

Email validation
Domain categorization (corporate vs. personal)
Spam detection
Lead scoring

Tier 1: Free Public Sources (Free)

Company website scraping
LinkedIn public profiles
WHOIS data
DNS/SPF records
Social media APIs (limited data)

Tier 2: Budget Enrichment APIs ($0.01-0.05 per lookup)

Hunter.io for email finding
FullContact for social profiles
IPinfo for company location
Custom scrapers

Tier 3: Premium Enrichment ($0.50-2.00 per lookup)

Clearbit for firmographics
ZoomInfo for org charts
Apollo for technographics
PredictLeads for funding data

Tier 4: Manual Research (High-value only)

SDR manual research
Intent data platforms
Custom investigative research

Building the Foundation in n8n

Module 1: Intake and Initial Validation

Every enrichment workflow starts by determining if enrichment is even warranted:

Webhook Node Configuration:

{
  "method": "POST",
  "path": "lead-enrichment",
  "response_mode": "immediately",
  "expected_fields": ["email", "company", "source"]
}

Email Validation Node (Function):

// n8n Code Node
const email = $input.item.json.email;
const domain = email.split('@')[1];

// Personal email domains - skip enrichment
const personalDomains = [
  'gmail.com', 'yahoo.com', 'hotmail.com',
  'outlook.com', 'aol.com', 'icloud.com'
];

// Validation checks
const isPersonalEmail = personalDomains.includes(domain);
const isValidFormat = /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
const isDisposable = await checkDisposableEmail(domain);

return {
  email: email,
  domain: domain,
  isPersonalEmail: isPersonalEmail,
  isValidFormat: isValidFormat,
  isDisposable: isDisposable,
  enrichmentPriority: isPersonalEmail || isDisposable ? 'skip' : 'proceed'
};

Lead Scoring Node: Calculate enrichment priority before spending credits:

// n8n Function Node
const lead = $input.item.json;
let score = 0;

// Source quality
const sourceScores = {
  'demo-request': 30,
  'pricing-page': 25,
  'webinar': 20,
  'content-download': 10,
  'newsletter': 5
};
score += sourceScores[lead.source] || 0;

// Company size indicators (from domain)
if (lead.domain && await checkCompanySize(lead.domain) === 'enterprise') {
  score += 30;
}

// Behavioral signals
if (lead.pageViews > 5) score += 15;
if (lead.timeOnSite > 300) score += 10;

// Enrichment decision
let enrichmentTier = 'skip';
if (score >= 70) enrichmentTier = 'tier3'; // Premium
else if (score >= 40) enrichmentTier = 'tier2'; // Budget
else if (score >= 20) enrichmentTier = 'tier1'; // Free
else enrichmentTier = 'skip';

return { ...lead, enrichmentScore: score, enrichmentTier: enrichmentTier };

Module 2: Deduplication and Cache Checking

Never pay to enrich the same domain twice:

Check Cache Node (MySQL/PostgreSQL):

SELECT * FROM enrichment_cache
WHERE domain = '{{$json.domain}}'
AND cache_age < INTERVAL 30 DAY
LIMIT 1;

Cache Logic (IF Node):

If cache found and < 30 days old:
  → Use cached data
Else:
  → Proceed to enrichment waterfall
  → Store results in cache after enrichment

Cache Table Schema:

CREATE TABLE enrichment_cache (
  id INT PRIMARY KEY AUTO_INCREMENT,
  domain VARCHAR(255) UNIQUE,
  company_name VARCHAR(255),
  employee_count INT,
  industry VARCHAR(100),
  revenue_range VARCHAR(50),
  technologies JSON,
  enrichment_tier VARCHAR(20),
  cost_usd DECIMAL(10,4),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);

Tier 1: Free Public Sources

Extract maximum value from free data sources before spending money:

Company Website Scraping

HTTP Request Node - Fetch Company Website:

{
  "method": "GET",
  "url": "https://{{$json.domain}}",
  "options": {
    "timeout": 10000,
    "redirect": {
      "followRedirects": true
    }
  }
}

Parse Metadata (Function Node):

// n8n Code Node
const html = $input.item.json.data;
const cheerio = require('cheerio');
const $ = cheerio.load(html);

// Extract company info from metadata
const companyName = $('meta[property="og:site_name"]').attr('content') ||
                   $('title').text().split('|')[0].trim();

const description = $('meta[name="description"]').attr('content');

// Look for employee count hints
const aboutText = $('body').text();
const employeeMatch = aboutText.match(/(\d+)\+?\s*(employees|team members)/i);
const employeeCount = employeeMatch ? parseInt(employeeMatch[1]) : null;

// Extract social media links
const linkedinUrl = $('a[href*="linkedin.com"]').attr('href');
const twitterUrl = $('a[href*="twitter.com"]').attr('href');

// Technology detection
const technologies = [];
if (html.includes('Shopify.analytics')) technologies.push('Shopify');
if (html.includes('google-analytics.com')) technologies.push('Google Analytics');
if ($('script[src*="hubspot"]').length) technologies.push('HubSpot');

return {
  companyName,
  description,
  employeeCount,
  socialLinks: { linkedin: linkedinUrl, twitter: twitterUrl },
  technologies,
  dataSource: 'website_scrape'
};

DNS and Domain Intelligence

DNS Lookup Node:

// n8n Function Node using DNS library
const dns = require('dns').promises;
const domain = $input.item.json.domain;

const [mxRecords, txtRecords] = await Promise.all([
  dns.resolveMx(domain),
  dns.resolveTxt(domain)
]);

// Detect email provider
const emailProvider = mxRecords[0]?.exchange.includes('google') ? 'Google Workspace' :
                     mxRecords[0]?.exchange.includes('outlook') ? 'Microsoft 365' :
                     mxRecords[0]?.exchange.includes('mail.protection.outlook') ? 'Microsoft 365' :
                     'Other';

// Extract SPF record for tech stack hints
const spfRecord = txtRecords.find(record =>
  record.toString().includes('v=spf1')
);

const techStack = [];
if (spfRecord?.includes('hubspot')) techStack.push('HubSpot');
if (spfRecord?.includes('salesforce')) techStack.push('Salesforce');
if (spfRecord?.includes('sendgrid')) techStack.push('SendGrid');

return {
  emailProvider,
  mxRecords,
  techStack,
  dataSource: 'dns_lookup'
};

LinkedIn Company Scraping (Public Data)

HTTP Request to LinkedIn:

{
  "method": "GET",
  "url": "https://www.linkedin.com/company/{{$json.companyLinkedinSlug}}",
  "options": {
    "headers": {
      "User-Agent": "Mozilla/5.0..."
    }
  }
}

Parse LinkedIn Data:

// Extract visible public data (no authentication required)
const $ = cheerio.load(html);

const companySize = $('.org-top-card-summary-info-list__info-item')
  .filter((i, el) => $(el).text().includes('employees'))
  .text()
  .match(/[\d,]+-?[\d,]*/)?.[0];

const industry = $('.org-top-card-summary-info-list__info-item')
  .first()
  .text()
  .trim();

const followerCount = $('.org-top-card-secondary-content__follower-count')
  .text()
  .match(/[\d,]+/)?.[0];

return { companySize, industry, followerCount, dataSource: 'linkedin_public' };

Tier 2: Budget Enrichment APIs

When free sources don’t provide enough data, escalate to affordable APIs:

Hunter.io for Email Finding

Hunter.io HTTP Node:

{
  "method": "GET",
  "url": "https://api.hunter.io/v2/domain-search",
  "qs": {
    "domain": "{{$json.domain}}",
    "api_key": "{{$credentials.hunter_api_key}}",
    "limit": 10
  }
}

Cost Tracking:

// Log API usage for cost monitoring
const hunterCost = 0.04; // $0.04 per domain search

await $execution.addMetadata({
  enrichmentProvider: 'Hunter.io',
  cost: hunterCost,
  recordsFound: $input.item.json.meta.results
});

// Update running cost total
let totalCost = $executionState.get('totalEnrichmentCost') || 0;
totalCost += hunterCost;
$executionState.set('totalEnrichmentCost', totalCost);

FullContact Person API:

{
  "method": "POST",
  "url": "https://api.fullcontact.com/v3/person.enrich",
  "headers": {
    "Authorization": "Bearer {{$credentials.fullcontact_token}}"
  },
  "body": {
    "email": "{{$json.email}}"
  }
}

Tier 3: Premium Enrichment

Reserve expensive APIs for high-value leads only:

Clearbit Enrichment

Clearbit Company API Node:

{
  "method": "GET",
  "url": "https://company-stream.clearbit.com/v2/companies/find",
  "qs": {
    "domain": "{{$json.domain}}"
  },
  "headers": {
    "Authorization": "Bearer {{$credentials.clearbit_key}}"
  }
}

Selective Enrichment Logic:

// Only call Clearbit if:
// 1. Lead score >= 70
// 2. Free + budget sources returned insufficient data
// 3. Monthly Clearbit budget not exceeded

const shouldEnrich =
  $json.enrichmentScore >= 70 &&
  !$json.employeeCount && // Missing critical data
  $executionState.get('clearbitSpendThisMonth') < 500; // Under budget

if (shouldEnrich) {
  return $input.items; // Proceed to Clearbit
} else {
  return []; // Skip expensive enrichment
}

Intelligent Waterfall Logic

The magic is in the conditional flow between tiers:

n8n Switch Node Pattern:

Tier 1 Complete → Evaluate Completeness
├─ If data complete: Skip to CRM Update
├─ If data partial & score >= 40: Proceed to Tier 2
└─ If data partial & score >= 70: Skip to Tier 3

Tier 2 Complete → Evaluate Completeness
├─ If data complete: Skip to CRM Update
└─ If data partial & score >= 70: Proceed to Tier 3

Tier 3 Complete → Proceed to CRM Update

Data Completeness Function:

// Calculate data completeness percentage
const requiredFields = [
  'companyName', 'industry', 'employeeCount',
  'revenue', 'location', 'technologies'
];

const completedFields = requiredFields.filter(field =>
  $json[field] !== null && $json[field] !== undefined && $json[field] !== ''
);

const completeness = (completedFields.length / requiredFields.length) * 100;

return {
  ...$json,
  dataCompleteness: completeness,
  shouldContinueEnrichment: completeness < 80 // Continue if <80% complete
};

Rate Limiting and Cost Controls

Prevent runaway API costs with built-in governors:

Rate Limit Implementation

Rate Limiter Function Node:

// Check rate limits from state
const now = Date.now();
const windowStart = $executionState.get('rateLimitWindowStart') || now;
const requestCount = $executionState.get('rateLimitCount') || 0;
const windowDuration = 60000; // 1 minute

// Reset window if expired
if (now - windowStart > windowDuration) {
  $executionState.set('rateLimitWindowStart', now);
  $executionState.set('rateLimitCount', 0);
  return $input.items; // Allow request
}

// Check if under limit (e.g., 100 requests per minute)
if (requestCount < 100) {
  $executionState.set('rateLimitCount', requestCount + 1);
  return $input.items; // Allow request
} else {
  // Rate limit exceeded - queue for retry
  await queueForRetry($json, windowDuration - (now - windowStart));
  return []; // Block this request
}

Daily Budget Caps

Cost Control Node:

// Track daily enrichment spend
const today = new Date().toISOString().split('T')[0];
const dailyBudgetKey = `enrichment_cost_${today}`;
const dailySpend = await $redis.get(dailyBudgetKey) || 0;
const dailyBudgetLimit = 100; // $100 per day

if (parseFloat(dailySpend) >= dailyBudgetLimit) {
  // Budget exceeded - defer to tomorrow or queue for approval
  await slackAlert(`Enrichment budget limit reached: $${dailySpend}/$${dailyBudgetLimit}`);
  return []; // Block enrichment
} else {
  return $input.items; // Allow enrichment
}

CRM Integration and Data Handoff

HubSpot Update Node

Update Contact with Enriched Data:

{
  "module": "HubSpot - Update Contact",
  "email": "{{$json.email}}",
  "properties": {
    "company": "{{$json.companyName}}",
    "industry": "{{$json.industry}}",
    "numberofemployees": "{{$json.employeeCount}}",
    "annualrevenue": "{{$json.revenue}}",
    "website": "{{$json.domain}}",
    "data_enrichment_date": "{{$now}}",
    "data_enrichment_tier": "{{$json.enrichmentTier}}",
    "data_enrichment_cost": "{{$json.totalEnrichmentCost}}",
    "data_completeness_score": "{{$json.dataCompleteness}}",
    "technologies": "{{$json.technologies.join(', ')}}"
  }
}

Salesforce Update with Field Mapping

// Map enriched data to Salesforce fields
const salesforceMapping = {
  'Company': $json.companyName,
  'NumberOfEmployees': $json.employeeCount,
  'Industry': $json.industry,
  'AnnualRevenue': parseRevenue($json.revenue),
  'Website': $json.domain,
  'Enrichment_Date__c': new Date().toISOString(),
  'Enrichment_Tier__c': $json.enrichmentTier,
  'Enrichment_Cost__c': $json.totalEnrichmentCost,
  'Data_Quality_Score__c': $json.dataCompleteness
};

return { salesforceMapping };

Monitoring and Optimization

Enrichment Analytics Dashboard

Track these key metrics in your n8n database:

CREATE TABLE enrichment_metrics (
  id INT AUTO_INCREMENT PRIMARY KEY,
  date DATE,
  total_leads_processed INT,
  tier0_validated INT,
  tier1_enriched INT,
  tier2_enriched INT,
  tier3_enriched INT,
  skipped_low_quality INT,
  total_cost_usd DECIMAL(10,2),
  avg_completeness_score DECIMAL(5,2),
  avg_cost_per_lead DECIMAL(10,4),
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Daily Metrics Aggregation Workflow:

// Run this as a scheduled n8n workflow daily
const yesterday = new Date();
yesterday.setDate(yesterday.getDate() - 1);

const metrics = await database.query(`
  SELECT
    COUNT(*) as total_leads,
    SUM(CASE WHEN tier = 'tier1' THEN 1 ELSE 0 END) as tier1_count,
    SUM(CASE WHEN tier = 'tier2' THEN 1 ELSE 0 END) as tier2_count,
    SUM(CASE WHEN tier = 'tier3' THEN 1 ELSE 0 END) as tier3_count,
    SUM(cost) as total_cost,
    AVG(completeness) as avg_completeness
  FROM enrichment_logs
  WHERE DATE(created_at) = '${yesterday.toISOString().split('T')[0]}'
`);

// Send daily report to Slack
await slack.postMessage({
  channel: '#revops-metrics',
  text: `📊 Enrichment Report - ${yesterday.toDateString()}

  Total Processed: ${metrics.total_leads}
  Free Sources: ${metrics.tier1_count}
  Budget APIs: ${metrics.tier2_count}
  Premium APIs: ${metrics.tier3_count}

  Total Cost: $${metrics.total_cost.toFixed(2)}
  Avg Cost/Lead: $${(metrics.total_cost / metrics.total_leads).toFixed(2)}
  Avg Data Quality: ${metrics.avg_completeness.toFixed(1)}%`
});

Advanced Patterns

Batch Processing for Cost Efficiency

Some APIs offer batch discounts:

// Accumulate leads in a buffer until batch size reached
const batchSize = 25;
const currentBatch = $executionState.get('enrichmentBatch') || [];
currentBatch.push($json);

if (currentBatch.length >= batchSize) {
  // Process batch
  $executionState.set('enrichmentBatch', []);
  return currentBatch; // Send entire batch to next node
} else {
  // Wait for more leads
  $executionState.set('enrichmentBatch', currentBatch);
  return []; // Don't proceed yet
}

Conditional Re-Enrichment

Automatically refresh stale data:

// Check when lead was last enriched
const lastEnrichment = new Date($json.enrichmentDate);
const daysSinceEnrichment = (Date.now() - lastEnrichment) / (1000 * 60 * 60 * 24);

// Re-enrich if:
// - Data is >90 days old AND lead recently engaged
// - Data is >180 days old (any lead)
// - Lead score increased significantly

const shouldReEnrich =
  (daysSinceEnrichment > 90 && $json.recentEngagement) ||
  daysSinceEnrichment > 180 ||
  ($json.currentScore - $json.scoreAtEnrichment > 30);

if (shouldReEnrich) {
  return $input.items; // Proceed to enrichment
} else {
  return []; // Skip re-enrichment
}

FAQ

Q: How do I decide which leads deserve expensive enrichment? A: Use a scoring model that considers source quality, behavioral signals, and company size indicators. Only send leads scoring 70+ to premium APIs. For B2B SaaS, demo requests from corporate domains warrant premium enrichment, while ebook downloads from personal emails don’t.

Q: What’s a reasonable enrichment budget for a startup? A: Start with $200-500/month and adjust based on lead volume and conversion rates. Calculate your cost per SQL (Sales Qualified Lead) including enrichment costs. If enrichment adds $2 per lead but increases qualification accuracy by 40%, that’s usually a good trade.

Q: How do I handle API rate limits across multiple providers? A: Implement a token bucket algorithm in n8n using execution state. Track request counts per provider separately. When approaching limits, either queue requests for the next window or switch to an alternative provider for that data point.

Q: Should I enrich leads immediately or wait until they show engagement? A: Use a hybrid approach: do lightweight Tier 1 enrichment (free sources) immediately for scoring and routing. Defer expensive Tier 2-3 enrichment until leads hit engagement thresholds (email opens, multiple page visits, demo requests).

Q: How do I prevent duplicate enrichment costs? A: Implement domain-level caching with 30-90 day TTL. Before enriching any lead, check if you’ve enriched that domain recently. For companies with multiple contacts, enrich once at the domain level and apply firmographic data to all contacts.

Q: What’s the best way to measure enrichment ROI? A: Track conversion rates by enrichment tier. Calculate: (Additional revenue from enriched leads - Enrichment costs) / Enrichment costs. Also measure time saved for sales reps—if enrichment reduces research time by 15 minutes per lead, that’s quantifiable labor savings.

Q: How do I handle enrichment for international leads? A: Many US-centric enrichment APIs have poor coverage for EMEA/APAC. Build region-specific waterfalls: use Crunchbase for EU startups, use Owler for APAC companies. Always start with free website scraping—it works globally.

Progressive enrichment in n8n transforms enrichment from a cost center into a strategic advantage. Start with the free tiers, prove the value, then selectively invest in premium data where it actually drives revenue. Your CFO will love the cost efficiency, and your sales team will love the data quality.