What is Segment?
Learn all about Segment's core features, capabilities, and the various downsides to the platform.
Luke Kline
November 13, 2023
13 minutes
What is Segment?
Segment is a traditional Customer Data Platform (CDP) that specializes in event collection and data activation. The platform collects data from websites, mobile apps, servers, and cloud applications and pushes that data to downstream destinations.
Founded in 2011, Segment is one of the oldest and largest CDPs by market share. While Segment does a lot, the company is primarily known for pioneering event tracking and introducing standardized software development kits (SDKs) that you can deploy on their website to capture key events like page views or button clicks to understand behavioral tendencies.
At its core, Segment helps you collect behavioral data on your users so you can build and manage audiences for marketing activation and automatically ingest that data back into the operational tools of your business teams.
Core Products and Capabilities
While Segment has many different features and capabilities, the company offers four base products: Connections, Protocols, Unify, and Twilio Engage.
- Connections is Segment’s core product offering. This feature allows you to collect event data from your mobile apps, websites, servers, and SaaS applications so you can federate it to your downstream operational tools.
- Protocols governs how your data must be collected and stored within Segment so you can define what events you want to track and align it to your business objectives.
- Unify is Segment’s product for creating actionable customer profiles from all your behavioral event data. This feature leverages identity resolution to stitch together multiple touchpoints into a single and coherent customer profile.
- Twilio Engage represents Segment’s audience-building and customer engagement capabilities. This feature enables you to create customizable audiences based on specific events, attributes, or computed traits that you define so you can automatically sync them to downstream tools to power personalization across your marketing channels. More recently, the platform also supports the ability to send emails and SMS messages directly.
All of these core features come bundled together within the Segment platform, which doesn’t offer much in terms of flexibility. While the packaged approach provides solid functionality, it’s worth noting that Segment lacks some core architectural benefits that more modern players offer, such as Composable CDPs.
Below, we’ll dive into each of the specific functionalities and features Segment offers and highlight the benefits and frustrations companies face when using the platform.
Data Collection
For data collection, Segment offers both client-side and server-side tracking. Client-side tracking runs on the user’s device, and server-side tracking occurs directly on your application servers. All data collection within Segment is powered by libraries that generate messages and send them to Segment’s standardized API in a basic structure outlined within the implementation of that code. Segment then forwards these events automatically to downstream destinations that you define.
For client-side tracking, Segment has an open-source JavaScript library known as Analytics.js 2.0. One interesting thing to note about this tracking library is that it is completely open-source and available on GitHub. This is the default option that Segment recommends for any website or basic event tracking use case. However, Segment also offers a variety of SDKs that you can embed directly in your mobile app to capture key events. In both cases (for both website and mobile), when an event happens, the data is automatically fired into Segment.
Server-side tracking works slightly differently with Segment because the data collection happens directly from your server. However, the implementation is largely the same because Segment provides several different libraries that you can embed directly into your infrastructure. This approach offers a few advantages over the previous one because you have the ability to capture offline events and more sensitive information that wouldn’t be possible with client-side tracking. Segment also provides an HTTP API if you can’t find a library that works with your environment. There’s also a Pixel Tracking API to help you track events when you can’t execute code.
Because Segment was not natively designed to integrate with data warehouses, the platform itself is often relegated to behavioral data. This means that you don’t have access to other offline data, first-party attributes, or custom data science models that only live in your data warehouse. As data warehouse adoption becomes more widespread and is seen as the source of truth for businesses, companies are seeing that this causes inconsistencies between data in their analytics tools and downstream destinations.
Data Storage
For the Connections product, all event data is stored for an unlimited duration in Segment’s event archives on S3. By default, you can sync old events to any destination you’ve connected to Segment at any point in time.
However, when it comes to the data used by Segment Unify and Twilio Engage, customer data gets stored in a separate Segment-managed data store optimized for these features. In this case, the platform only stores data for three years, and this means your ability to historically understand your users and build granular audiences with that data is limited.
Given that more modern competitors, such as Composable CDPs, actually avoid storing your data altogether, this is a major limitation that leads to multiple sources of truth, security and privacy concerns, and higher total cost of ownership.
Data Modeling
There are two core data modeling components within Segment: schema management and identity resolution. Schema management refers to how your data is structured before and after it is collected, and identity resolution defines how your tracked data is stitched together to create coherent customer profiles.
Schema Management
The Segment Spec (or schema) outlines the structure and format in which your data must be collected before it can be ingested into Segment. The schema supports six distinct API calls, and every call shares the same set of common fields, which includes the following:
- Identify: This call identifies users and records specific traits like name or email.
- Track: This call records user actions like page views or button clicks
- Page: This call captures page information like URL, title, or path.
- Screen: This call helps you determine what app screen a mobile user is on.
- Group: This call helps you determine what account or organization a user is linked to.
- Alias: This call is used to record offline data via an anonymous so you can leverage it later once you can identify that specific user.
Within these tracking parameters, Segment’s Protocols feature enables you to create a tracking plan so you can choose exactly what events you want to track and also define standardized naming and collection practices. This tracking plan outlines the events and properties you intend to collect from your sources. Many companies end up creating multiple tracking plans depending on how many data sources they have.
Once this data is ingested into Segment, it has to conform to Segment’s schema, meaning all of your data has to fit underneath either a user or account object. This can be problematic if you're a complex business with custom objects, data models, and entities like pets, households, playlists, workspaces, products, subscriptions, etc. The only platform flexible enough to manage these types of relationships is the data warehouse, which is one of the many reasons that companies are adopting the Composable CDP.
Identity Resolution
Identity resolution is a critical part of the Unify feature. This capability allows you to combine all of your web, mobile, server, third-party interactions, and behavioral events into a single coherent customer profile to power anonymous identity stitching. Segment automatically creates, merges, or adds to existing user profiles by searching for identifiers like userID, anonymousID, or email on all incoming events. Once your profiles have been merged into one central identity graph, Segment will automatically generate and maintain a persistent ID for each user profile. For most use cases, Segment provides an out-of-the-box model that you can apply to your identity resolution use cases, but you also have some basic flexibility to update the identifiers you match on.
One important factor to note about Segment’s identity resolution feature is that your identity graph lives in Segment, which means you don’t actually own your identity resolution process or the graph. Ultimately, this means you can’t adjust your identity resolution process to support other unique related models that don’t fall under a user or account object. This makes performing basic entity resolution difficult, especially if you want to understand which users belong to a specific household or subscription.
Additionally, Segment’s identity graph is restricted to exact match rules, so you cannot create “fuzzy match” models that are useful in paid media targeting. With a Composable CDP, these types of use cases are a breeze because the identity graph lives in your warehouse, and you can match profiles using non-event data.
Audience Management
Within Segment, you have access to a number of audience-building features to help you build granular audiences of users or accounts for specific marketing use cases. There’s a visual UI for non-technical users that enables you to define audiences from events, traits, or conditions that you define. For example, when building an audience using events, you might select a list of users who viewed a specific page over a certain period of time, but ultimately abandoned their cart.
You can create computed traits on a per-user or per-account basis to calculate metrics for specific users, like “total revenue” to “big spenders,” once they reach a certain threshold and save these pre-calculations for future audience building. There’s also a feature called SQL traits, which lets you run queries from your data warehouse and import those results into Segment. If you need to organize your audiences, Segment provides features like folders and cloning to optimize re-usability visibility.
For journey and campaign orchestration and analytics, you can build and monitor campaigns directly in Segment. Depending on your scale, audience building can be problematic because Segment has certain concurrency limits around trait computations. By default, all audiences are updated automatically every eight hours.
All audience building is limited to the data housed in Segment, so any offline actions your customers take in other data sources will not be natively available for audience building. Given that audiences are limited to users or accounts, it’s impossible to leverage other related models in your data warehouse.
Depending on your industry, this can be challenging, especially if you have entities like households, playlists, subscriptions, etc. With a Composable CDP, you don’t have to worry about this because you can leverage all of the data that lives in your warehouse to build audiences–not just basic clickstream data.
Real-Time Capabilities
Real-time capabilities within Segment differ from product to product. For event collection, the Connections feature enables you to create event streams for real-time delivery to your downstream destinations (Note: this feature is only compatible with server-side integrations). Additionally, ingestion speed is largely dependent on the rate limits of the destination.
Segment does not support real-time audience syncing, but it does support real-time audience building in that users will auto-populate into audiences once they meet specific criteria that you’ve defined. Sync frequency is limited to batch ingestion every 2-3 hours. Real-time (or streaming) is only supported for events.
However, if you need to pull in real-time data, Segment has a feature called Profile API, which allows you to query entire user or account objects programmatically so you can leverage that data to power in-app personalization or power complex marketing campaigns with custom properties and traits.
One of the major downsides to Segment when it comes to real-time capabilities is that the platform does not support Streaming Reverse ETL. That is to say you cannot stream data from tables in your data warehouse leveraging more modern data cloud features like Snowflake’s recently launched dynamic tables.
Reverse ETL
Segment offers some basic functionality for Reverse ETL so you can query data from your warehouse and sync it to your chosen destination. The platform integrates directly with BigQuery, Databricks, Postgres, Redshift, and Snowflake. However, Segment was not built with a warehouse-first architecture, which means you can’t take advantage of any of the native features that Segment offers. Likewise, core capabilities around version control via Git, sync logs (sending sync data back to your warehouse), live debugging, and alerting are simply not available with this feature.
Security
For workspace management, Segment offers a number of features to help you manage roles, permissions, and workspace access. You also have the ability to create custom environments for development or production to reduce potential errors and streamline processes. The platform also provides a 90-day in-app audit trail so you can easily see changes or actions made by specific users in your workspace. Features like single-sign-on (SSO) and multi-factor authentication (MFA) are also available.
Because Segment is a traditional CDP, your data is stored outside of your cloud infrastructure, which can definitely make security more complex than simply housing that data in your own managed infrastructure. Out of the box, you have to do a lot of work to comply with regulations like GDPR, CCPA, and HIPPA. One of the advantages of a Composable CDP is that storage takes place within your cloud infrastructure–not another provider’s.
Pros and Cons
Segment’s biggest advantage is that it’s one of the most established CDPs. That being said, there’s no such thing as an easy technology implementation; there’s just technology. With that in mind, here’s a list of the biggest pros and cons of Segment.
Pros | Cons |
---|---|
Simple & easy to use | Long implementation |
Supports both client-side & server tracking | Expensive |
Many SDK options | Inflexible schema |
Can sync events directly to downstream destinations | Data is stored outside of your cloud infrastructure |
Supports testing & production environments | You don’t own the identity graph |
Closing Thoughts
While Segment can be a decent option for companies looking to implement a packaged solution, many companies are now adopting Composable CDPs or other Segment alternatives because they offer greater flexibility, security, time-to-value, and interoperability at a far lower cost. If you’re interested in learning more about the Composable CDP, book a demo with one of our solution architects today to see how we can help.