DocsData Pipelines

Data Pipelines Overview

⚠️

If you already did setup older version of pipelines via the API and want to manage it, go to docs here.

Customers on an Enterprise or Growth plan can access Data Pipeline as an add-on package. See our pricing page for more details.

Data Pipelines is a feature that continuously exports data from your Mixpanel project to a Cloud Storage bucket or Data Warehouse of your choice. This feature is ideal for those who wish to perform SQL analysis on Mixpanel data within their own environment.

Setting up Data Pipelines involves two main steps:

  1. Configuring your destination to accept data writes from Mixpanel.
  2. Creating data pipelines through the Integrations page in the UI.

We offer a 30-day free trial of the Data Pipelines add-on. For details on activation, refer to the FAQ.

Step 1: Configuring Your Destination

The configuration process varies depending on the type of destination you choose: Cloud Storage or Data Warehouse.

Cloud Storage

JSON pipelines export data as JSON files to a cloud storage bucket, providing a straightforward method for data handling.

For specific configuration instructions, see our guides for each storage destination:

Data is exported to the following structured paths in your bucket:

  • Events: <BUCKET_NAME>/<MIXPANEL_PROJECT_ID>/mp_master_event/<YEAR>/<MONTH>/<DAY>/
  • User profiles: <BUCKET_NAME>/<MIXPANEL_PROJECT_ID>/mp_people_data/
  • Identity mappings: <BUCKET_NAME>/<MIXPANEL_PROJECT_ID>/mp_identity_mappings_data/

Data Warehouse

JSON Pipelines also facilitate data export into tables, creating schemas that are inferred from your event data.

For detailed setup guides per destination, see:

Step 2: Creating the Pipeline

After configuring your destination, initiate data export in Integrations page on your Mixpanel project > Create Pipeline > fill in necessary configurations. You can choose different data sources including events, people and identity and other advanced options. See Data Pipelines for more details.

FAQ

Managing Existing Pipelines

To delete, pause, or unpause a JSON pipeline:

  1. Go to the Integrations page
  2. Find the pipeline you want to manage
  3. Click the 3-dot menu on the right side of the pipeline
  4. Select Delete Pipeline, Pause Pipeline, or Unpause Pipeline as needed

Note: JSON pipelines can only be managed through the UI, not via the data pipelines API.

Viewing Pipeline Configuration

To check a pipeline’s configuration:

  1. Go to the Integrations page
  2. Either:
    • Click on the pipeline name to view configuration at the top of the page, or
    • Click the 3-dot menu and select View Configuration

Why does the number of events in Mixpanel not match the number of exported events to my destination?

Discrepancies between the event counts in Mixpanel and those exported to your destination can occur for several reasons:

  • Data Sync: If Events Data Sync is not enabled or is unsupported for your pipeline, this could prevent some data from being exported.
  • Data Delay: Late-arriving data may take up to one day to sync from Mixpanel to your destination, leading to temporary discrepancies.
  • Hidden Events: Mixpanel exports all events, including those hidden in the Mixpanel UI via Lexicon. To reconcile differences in counts, check if the events in your destination include those hidden in the Mixpanel UI.

How can I count events exported by Mixpanel in the warehouse?

Counting events can be slightly different for each warehouse, since we use different partitioning methods. Here are examples for BigQuery and Snowflake.

How does the free trial work?

Mixpanel offers a 30-day free trial of the Data Pipelines, allowing you to create one pipeline per data source for each project.

Trial limitations:

  • Exports are scheduled on a daily basis only.
  • Data synchronization feature is not available.
  • Only one pipeline can be created per data source per project.
  • Backfilled data is limited to one day prior to the creation date of the pipeline.

Why can’t I delete my trial pipeline?

You can’t delete a trial pipeline in Mixpanel — this is intentional. Each project is limited to one trial pipeline per data source, and keeping it prevents deleting/recreating trials to bypass limits. It also serves as a record that the trial has been used. Even after you upgrade to a paid Data Pipelines package, the trial pipeline remains visible and cannot be removed. This is a policy decision to preserve the integrity of the trial program, not a technical constraint.

What is Active Pipeline Limit

Each project can have 2 recurring pipelines and 1 date ranged backfill pipeline active.

To maintain optimal performance across our services, we limit the number of concurrently running pipeline steps to one per project. This approach ensures that each job, including those involving substantial backfills, waits its turn, preventing any single project from monopolizing resources and thus promoting fair scheduling among all customers.

Is it possible to specify when pipeline exports run?

No. Hourly pipelines are targeted to run 30 minutes past the hour to be exported, and daily pipelines are targeted to run 30 minutes past the day to be exported (e.g. 00:30 AM) in the project’s timezone. For example, a project in the Pacific timezone with a daily events pipeline will start the export for data from 5/22 on 5/23 00:30 AM PT.

Please note that the 24-hour SLA applies.

What is the Service Level Agreement?

Our Service Level Agreement (SLA) stipulates a latency policy for exported events of up to 24 hours end-to-end, with an additional 24-hour allowance for late-arriving data.

Note: Mixpanel classifies late data as any data point or user profile update that reaches Mixpanel servers more than two hours after the end of the export window. Late-arriving event data is exported only for pipelines that with sync enabled.

For hourly pipelines, data that arrives on time is expected to be exported within 24 hours from the moment it is ingested by Mixpanel. For daily pipelines, this 24-hour period begins at the start of the next calendar day based on project time (e.g., data ingested on January 1st is scheduled to be exported from the start of January 2nd midnight to the end of the day on January 3rd).

Data arriving late is handled during a daily sync process the following day after ingestion, provided sync is enabled. For instance, if data for January 1st arrives on January 3rd, it would be exported in the sync run on January 4th.

What should I set for the Account Name and Storage Integration when creating a Snowflake pipeline?

The Account Name should be set to your unique account identifier Eg. “blah2321.us-west-2” while the Storage Integration should be set to the name of the storage integration you created in Snowflake Eg. “MIXPANEL_EXPORT_STORAGE_INTEGRATION”.

Was this page useful?