CRM and Data Warehousing: Connecting CRM to BigQuery or Snowflake

CRM data warehousing becomes useful when the business needs a stronger analytics layer than the CRM alone can provide. Moving CRM data into BigQuery or Snowflake gives teams more control over history, joins, and cross-system reporting.

CRM platforms are optimised for transactional queries – give me the contacts in this pipeline stage, show me deals closing this month. What they are not designed for is the kind of analytical workload that RevOps, data teams, and finance run: cohort analysis on deal velocity, multi-year customer lifetime value calculations, attribution modelling across thousands of deals, and joins between CRM data and product usage or financial data. For these analytical use cases, connecting CRM data to a cloud data warehouse – Google BigQuery or Snowflake – is the standard modern data architecture. This guide covers how to connect Salesforce and HubSpot to BigQuery and Snowflake, what the data looks like once it’s there, and when this architecture is worth the complexity.

That setup is more demanding than a standard sync, but it also gives the team a single place to combine CRM data with product, finance, or marketing data when the reporting needs grow.

CRM to Data Warehouse: Architecture Overview

Component	Role	Examples
CRM (source)	Generates customer interaction and pipeline data	Salesforce, HubSpot
ETL/ELT pipeline (extract, transform, load)	Extracts data from CRM API, transforms it, and loads it into the warehouse	Fivetran, Airbyte, Stitch, HubSpot native sync
Data warehouse	Stores raw and transformed CRM data for analytical queries	Google BigQuery, Snowflake, Databricks, AWS Redshift
Data transformation layer	Applies business logic to raw warehouse data (e.g., calculating win rate, cohort definitions)	dbt (data build tool), custom SQL
BI / visualization layer	Presents transformed data in dashboards and reports	Looker, Tableau, Power BI, Metabase

When This Architecture Is Worth It

Connecting CRM to a data warehouse adds complexity and cost – it requires a data engineer or at minimum a RevOps analyst with SQL skills, an ETL tool, and ongoing maintenance of the data pipeline. This investment is justified when:

CRM’s native reporting is insufficient for your analytical needs (cohort analysis, attribution modelling, multi-system joins)
You need to combine CRM data with non-CRM data sources (product usage, billing/invoicing, marketing spend, support tickets) in the same analysis
You’re running analyses on large historical datasets (10,000+ deals, multi-year history) where CRM reporting tools become slow or hit limits
You have multiple CRMs or CRM migration history that needs unified analysis
You’re building a centralised business intelligence platform that serves multiple departments

For most SMB and early-stage mid-market teams, CRM native reporting (Salesforce Reports + Dashboards, HubSpot Analytics) is sufficient. The data warehouse architecture typically becomes necessary at scale – $10M+ ARR organisations or those with dedicated data/RevOps teams.

Connecting Salesforce to BigQuery

Option 1: Fivetran: Fivetran’s Salesforce connector is the most widely deployed Salesforce-to-warehouse integration in the market. Fivetran extracts Salesforce objects (Opportunity, Contact, Account, Task, and any custom objects you specify), handles API rate limiting and incremental updates, and loads data into BigQuery or Snowflake with a schema that mirrors Salesforce’s object structure. Setup time: 1-2 hours for configuration; ongoing maintenance is minimal. Cost: Fivetran pricing starts at ~$500/month for production deployments.

Option 2: Google BigQuery Salesforce Data Transfer Service: Google’s native Salesforce-to-BigQuery connector (part of BigQuery Data Transfer Service) allows scheduled sync of Salesforce objects to BigQuery without third-party tooling. Less flexible than Fivetran for custom object handling but zero additional cost beyond BigQuery usage pricing. Suitable for organisations already on Google Cloud infrastructure.

Option 3: Salesforce Data Export + custom pipeline: For simple use cases, export Salesforce data weekly via Salesforce’s Data Export tool (Settings ? Data Management ? Data Export) and load CSVs into BigQuery manually or via a simple script. This works for one-time or infrequent analysis but is not suitable for real-time or daily analytical workloads.

Connecting HubSpot to BigQuery or Snowflake

HubSpot native BigQuery integration: HubSpot Operations Hub Enterprise (released 2022) includes a native HubSpot-to-BigQuery data sync. This sync exports HubSpot Contact, Company, Deal, and Activity data to BigQuery on a configurable schedule. It’s the lowest-friction option for HubSpot-to-BigQuery integration – no ETL tool required. Limitation: requires Operations Hub Enterprise ($800/month), which may not be justified if BigQuery sync is the only required feature.

Fivetran HubSpot connector: Fivetran supports HubSpot as a source, connecting to HubSpot’s REST API to extract contact, company, deal, and activity data into BigQuery or Snowflake. More flexible than the native integration for custom property mapping. Same pricing structure as the Salesforce connector.

Airbyte (open source): Airbyte is an open-source ELT platform with HubSpot and Salesforce connectors. Self-hosted on your own infrastructure (Google Cloud Run, Kubernetes, or EC2), it provides Fivetran-like functionality at no software cost. Requires engineering setup and ongoing maintenance. Best for organisations with engineering resources who want to minimise vendor costs.

What CRM Data Looks Like in the Warehouse

When Salesforce data lands in BigQuery or Snowflake via Fivetran, it appears as a schema with tables matching Salesforce objects: `salesforce.opportunity`, `salesforce.contact`, `salesforce.account`, `salesforce.task`. Each Salesforce field becomes a column in the corresponding table. Relationships between objects (Opportunity is linked to Account, Contact is linked to Opportunity via OpportunityContactRole) are preserved as foreign key relationships that can be JOINed in SQL.

A simple revenue attribution query in BigQuery SQL:

SELECT
  o.lead_source,
  COUNT(*) as deals_closed,
  SUM(o.amount) as total_revenue,
  SUM(o.amount) / COUNT(*) as avg_deal_size,
  COUNT(*) / NULLIF(COUNT(CASE WHEN o.is_won = false THEN 1 END), 0) as win_rate
FROM salesforce.opportunity o
WHERE o.is_closed = true
AND o.close_date >= '2025-01-01'
GROUP BY o.lead_source
ORDER BY total_revenue DESC

“Our Fivetran sync runs but data in BigQuery is always 24 hours behind”

Fivetran’s Salesforce connector sync frequency is configured in the connector settings. The free tier and lower-cost plans have a minimum sync interval – typically 6 or 24 hours. If near-real-time data is required, upgrade to a plan that supports hourly or sub-hourly sync intervals. For most analytical use cases (daily business reviews, weekly reporting), 24-hour lag is acceptable. For operational analytics that require near-real-time CRM data, consider Salesforce’s Change Data Capture (CDC) feature combined with a streaming pipeline rather than a batch ETL approach.

“We connected HubSpot to BigQuery but deal amounts in BigQuery don’t match HubSpot reports”

Deal amount discrepancies between HubSpot and BigQuery usually have one of three causes: (1) the BigQuery data includes deals that HubSpot’s native reports exclude based on pipeline or status filters; (2) currency conversion is being applied in HubSpot reports but not in the BigQuery query; (3) the HubSpot sync hasn’t captured the most recent updates (check last sync timestamp). Fix: verify the filters applied in HubSpot’s native report and replicate them exactly in the BigQuery query – same date range, same pipeline, same deal status. Add a WHERE clause that filters out deals with null amounts. Check the sync timestamp for the most recent data load. If discrepancies persist after these checks, run a sample reconciliation on 10-20 specific deals to identify whether the data difference is systematic or random.

Sources
Fivetran, Salesforce and HubSpot Connector Documentation (2026)
Google, BigQuery Salesforce Data Transfer Service Documentation (2026)
HubSpot, Operations Hub Enterprise BigQuery Sync Documentation (2025)
dbt Labs, Data Build Tool CRM Data Modelling Best Practices (2025)

The most durable setups are the ones the team can revisit later without re-learning the whole process. If the reporting or export step becomes hard to repeat, the workflow is probably too brittle.

Advanced Strategies and Common Pitfalls in CRM and Data Warehousing

Step-by-Step Fix: Build Your Foundation Before Scaling

Successful implementation of crm and data warehousing follows a consistent pattern: start with a clearly defined use case for a single team, measure the baseline, implement incrementally, and scale only after achieving measurable results in the pilot. Avoid configuring everything simultaneously. A phased approach with 30-day review cycles catches configuration errors before they spread.

Measuring Success: KPIs and Review Cadence

Establish three to five quantifiable success metrics before launch: adoption rate, data completeness score, and process efficiency measured as time saved per rep per week. Review these metrics monthly and tie configuration decisions to data rather than opinion.

What are the key benefits of CRM and Data Warehousing?

The primary benefits include improved operational efficiency, better data visibility for management decision-making, and more consistent customer-facing processes. Organisations that implement structured approaches report average productivity improvements of 20 to 35 percent, though results vary based on implementation quality and user adoption levels.

How long does implementation typically take?

Simple configurations for small teams can be live in two to four weeks. Mid-complexity implementations for 20 to 100 users typically take 60 to 90 days. Enterprise-scale projects with custom integrations and data migrations usually require four to nine months from kickoff to full production deployment.

What is the most common reason implementations fail?

Implementations fail most often due to insufficient user adoption rather than technical problems. Systems are configured correctly but teams revert to old habits because training was insufficient, workflows were not simplified, or leadership did not reinforce usage. Executive sponsorship and simplicity of design are the two highest-leverage success factors.

How do you calculate ROI from this type of investment?

Calculate ROI by comparing costs against measurable gains: hours saved per week multiplied by average hourly cost, pipeline increase attributable to improved process, and reduction in revenue lost to poor follow-up. Most organisations targeting a 12-month positive ROI need to demonstrate at least three dollars in measurable value for every one dollar of cost.

Common Problems and Fixes

Common Implementation Challenges to Anticipate

Organisations working on crm and data warehousing frequently encounter three recurring obstacles: inadequate stakeholder alignment during planning, underestimated data migration complexity, and insufficient end-user training budget. Addressing all three before go-live dramatically improves adoption rates and time-to-value. Build a project team with representatives from sales, marketing, and IT rather than delegating entirely to one function.

Tell us how to reach you