Databand Logo

Alert Rule Creation

Data Pipeline & Dataset Monitoring

Role Senior UX Designer
Company Databand · IBM
Focus Data Observability
Alert Rule Creation

Databand Alert System

Overview

Databand is a proactive data observability tool for data engineers that helps detect and resolve data issues. The main value Databand gives to its customers is catching bad data before it impacts the business.

Databand's alert system is the core engine that customers use to keep track of their moving data. This is why the alert definition editor is a very strategic and important part of the Databand system. By setting alert rules, customers can choose which metric and what data or pipelines processes they want to monitor.

When I came to Databand, the UX of creating an alert was one of the most painful experiences, and at the same time, one of the most challenging products.


Goals

High Level Goals

Product Goals:

UX Roadmap:

Old Create Alert
Before: The old create alert - users found it hard to work with
New Create Alert
After: New create alert, focused on specific alert type, target and metric

My Role

Responsibilities


Research

Users' Research

Learning how our users were experiencing the old alert creation editor was through 4 main channels: customer support, user interviews, watching recordings of demos for potential users, and watching user recordings on Smartlook.

Main Findings:

Smartlook Insights
All recordings insights documented according to different clients
Feature Requests
We collect all problems raised by users, and possible ideas for solutions that frequently come up

Research

Competitors Research

In order to understand how to design the best and optimal experience for setting an alert on data and pipelines, I was researching direct and indirect competitors, trying to understand how others are answering similar challenges that Databand's users are facing, what kind of possible features we can use, and looking for UX/UI solutions and patterns that can be used as reference or inspiration.

Main Findings:

Competitors Research
Competitor research - mapping patterns and UI solutions across direct and indirect competitors

Ideation

The Vision for Creating Alert Experience

The goal was to design an agile experience that can grow according to the complexity and additional alerts over time.

Alert Concept Vision
Alert concept vision - a high-level map of the full wizard experience

We knew that for the first iteration we would add a Data Delay alert, so I used this scenario to create a low-fidelity wireframe flow to brainstorm and share ideas inside the product team and get feedback from customer success.

List of Alerts
List of alerts
Alerts Gallery
Alerts gallery
Data Delay Alert
Data delay alert
Alert Receiver
Alert receiver step
Test Alert and Alert Summary
Test alert & alert summary

Iteration No.1

Learn about Data Quality Issues

The new alert creation started with a request that kept returning from users at that time. They wanted to know when something is wrong with their data quality. Users wanted to be notified whenever there is an issue with data inside a specific dataset. For that, we needed to add an advanced alert monitor, checking the data source or pipeline process which is relevant.

What does a Data Quality Issue mean?

A possible indication of problematic data is when there is an issue with any of the metadata metrics like data freshness, null count, anomalies, and data statistics. A problematic data might be ingested into the target dataset undetected, and in turn affect downstream data products.

Data Quality Issue
Mapping the types of data quality issues users needed to monitor

Starting with Data Delay Alert

Talking to users, we realized that knowing when data is late to be updated gives them a lot of value in indicating that something is wrong. For that we decided to allow users to set a Data Delay alert across multiple datasets, including the option to target datasets related to a specific pipeline.

Data Delay First Iteration
New "Add Alert" experience for data delay

To begin with, we started by creating only the alert define experience without a receiver and no alerts gallery. We also temporarily left the old experience under pipeline alert. I created a prototype that would allow us to test the new alert experience with a few customers. It was pretty clear that the new design is easier to consume and much more clear than the old one.

Preparing Design Spec for Development

I created detailed specs, presenting different use cases and flows, with high-fidelity design. Together with the front-end team, we defined different interactions and validation behaviors.

Data Delay Spec
Detailed spec presenting different use cases and interactions
Defining a Condition Options
Spec for conditions rules
Multi Selection
Example for multi-selection dropdown interactions

First Released Learning

Eventually, when the first iteration was on the air, we started collecting feedback from users.

Main good feedback:

Need improvement:

I decided to address the feedback during the following iterations.


Iteration No.2

Data Quality Check

According to users' requests, the next iteration included the capability to set alerts on different metrics for column-level data. The new alert included setting alerts to one or more columns of a dataset, on different metrics like anomaly, nulls, and metadata statistics.

Data Quality Alert
Data quality alert, still without the full steps experience of wizard steps and no receivers
Data Quality Alert Spec
Data quality alert spec for development

Users' Feedback

When users started working with the new data quality alert, we learned that users are missing 2 important things, that were scoped out because of technical priority:

Those requests were quite critical for users, so we invested the following sprint in adding those requests.

Null Anomaly Alert
Null anomaly alert - the range threshold is calculated according to previous data
Null Range Alert
Null percentage error range alert
Alert on Multiple Datasets
Setting alert on multiple datasets

Iteration No.3

Choose an Owner / Receiver to be Notified

Now was the time to approach another step from the original concept - assigning different receivers for different alerts. Until this point, it was possible to set only one receiver for triggered alerts. The ability to choose different receivers per alert rule was a deal-breaker. Different people inside the same team or company were responsible for different pipelines, and they wanted to get only the triggered alerts relevant to them.

We took advantage to build this feature upon a request that came from an important client; adding integration for the PagerDuty receiver.

The experience was supposed to be very short and simple:

Add Receiver
Now users can set the receiver in the context of a specific alert

Iteration No.4

Simplify the Experience for Assigning a Receiver

We quickly learned that the UI for selecting a receiver was really hard to understand, which ended up with customers not using it. We updated the experience into a list that exposes all possible receivers from the start, so the user only needs to choose which receiver they want notified.

New Receiver Step
Now it is possible to see all integrated receivers from the start

Iteration No.5

Creating an Alert on Multiple Assets

After introducing the ability to set a data quality alert on multiple datasets, customers asked for the same capability for the 3 most popular pipeline alerts: Run State, Run Duration & Schema Change.

We were looking for a quick way to validate and were wondering if to add multiple selections to the old alert editor, but we faced huge product and UX challenges that solving, would at least double our scope; the old editor supported setting an alert on specific tasks inside pipelines.

Old Create Alert
Previous pipeline alert experience - limited to single asset selection

We knew that on our future road map we plan to solve the complexity of setting alerts on entire pipelines vs inner tasks. So for this iteration, we decided to add multi selections for those 3 alerts, using the new experience pattern. This would give us a quick solution that we can validate with users.

Bulk Alert
The 3 most popular and useful alerts, now supporting selecting multiple assets

Iteration No.6

Users are Creating Duplicated Alerts

Our customer success team started to notice that for some reason customers are creating duplicated alerts. talking to them raised a few problems:

We realized that the problem was in our experience, which allowed creating similar alerts without warning the users that they are about to do so.

An alert considers duplicated if there is a similarity in 3 parameters:

The solution

In case we recognize that the user setting an alert that is similar to existing, we show him as part of the create alert flow, a validation screen which warn him that he is about to create a duplicated alert, and present him details about those alerts.

Duplicated Alerts
Duplicated alert validation screen - appears only in case of similar alerts

Iteration No.7 (WIP)

Creating an Alert for Multiple Assets

Lately, we working on unifying the entire alert experience, and tackling more of the major problems in alert creation products.

Product requirements:

UX requirements:

Assumptions we had:

I create a low-fidelity mockups in order for the product manager and I are could explore and refine different ideas, and use cases, and collect feedback from customers. Each version starts with assumptions that we validate with customers.

You can see a prototype example for one of the latest concepts.
Note: This version is still in the ideation phase.

Iteration 6.1
Starting the wizard from a gallery that exposes and explains all possible alerts
Iteration 6.2
Alert define step - where the user creates the rule

Usability feedback

We tested this version with our customers, and we learned some good & bad insights.

The good feedback:

Less good insights:

Adopting the feedback

After the testing we did, we understood the we need to update a little our assumptions:

UX changes:

Iteration 6.3
On this version it is more explicit that filtering is according to pipelines, sources or projects

Summary

What Have We Done So Far?