Databand is a proactive data observability tool for data engineers that helps detect and resolve data issues. The main value Databand gives to its customers is catching bad data before it impacts the business.
Databand's alert system is the core engine that customers use to keep track of their moving data. This is why the alert definition editor is a very strategic and important part of the Databand system. By setting alert rules, customers can choose which metric and what data or pipelines processes they want to monitor.
When I came to Databand, the UX of creating an alert was one of the most painful experiences, and at the same time, one of the most challenging products.
Goals
High Level Goals
Product Goals:
Supporting more alert types.
More users would create more alerts without the help of Databand's customer success department.
Creating alerts for multiple assets - Pipelines, tasks, or datasets.
Setting alerts on data.
Setting alerts on column level.
UX Roadmap:
Creating an alert should be easy, and users should be able to do it alone.
Simplify as much as possible the complexity of setting alerts on different assets and processes.
Before: The old create alert - users found it hard to work withAfter: New create alert, focused on specific alert type, target and metric
My Role
Responsibilities
Defining UX requirements.
Conducting competitor research.
Collecting feedback and iterating accordingly.
Creating mockups for different concepts and iterations.
Mapping use-cases and different flows.
Creating low and high-fidelity prototypes for testing.
Created a full detailed spec and worked closely with the Dev team.
Research
Users' Research
Learning how our users were experiencing the old alert creation editor was through 4 main channels: customer support, user interviews, watching recordings of demos for potential users, and watching user recordings on Smartlook.
Main Findings:
Users don't know where to start from.
The old experience is hard to understand.
Lack of confidence in the UI.
Setting up an alert is too complex for users to do themselves - they eventually ask for help from customer success.
There is no clear hierarchy in the UI.
All recordings insights documented according to different clientsWe collect all problems raised by users, and possible ideas for solutions that frequently come up
Research
Competitors Research
In order to understand how to design the best and optimal experience for setting an alert on data and pipelines, I was researching direct and indirect competitors, trying to understand how others are answering similar challenges that Databand's users are facing, what kind of possible features we can use, and looking for UX/UI solutions and patterns that can be used as reference or inspiration.
Main Findings:
It is common to start creating alerts from a preset.
The process of defining an alert is split into several steps.
Conversational UX for setting the alert rule.
It is common to display a preview of past data (if available), to help users choose correct trigger values.
Choosing stakeholder receiver and platform, to be notified in case of a rule breach.
It would be nice to have some testing action.
There is a popular UX pattern where setting the alert is done as part of a conversational experience.
Another pattern: an "If condition" structure.
Competitor research - mapping patterns and UI solutions across direct and indirect competitors
Ideation
The Vision for Creating Alert Experience
The goal was to design an agile experience that can grow according to the complexity and additional alerts over time.
Alert concept vision - a high-level map of the full wizard experience
We knew that for the first iteration we would add a Data Delay alert, so I used this scenario to create a low-fidelity wireframe flow to brainstorm and share ideas inside the product team and get feedback from customer success.
List of alertsAlerts gallery
Data delay alertAlert receiver step
Test alert & alert summary
Iteration No.1
Learn about Data Quality Issues
The new alert creation started with a request that kept returning from users at that time. They wanted to know when something is wrong with their data quality. Users wanted to be notified whenever there is an issue with data inside a specific dataset. For that, we needed to add an advanced alert monitor, checking the data source or pipeline process which is relevant.
What does a Data Quality Issue mean?
A possible indication of problematic data is when there is an issue with any of the metadata metrics like data freshness, null count, anomalies, and data statistics. A problematic data might be ingested into the target dataset undetected, and in turn affect downstream data products.
Mapping the types of data quality issues users needed to monitor
Starting with Data Delay Alert
Talking to users, we realized that knowing when data is late to be updated gives them a lot of value in indicating that something is wrong. For that we decided to allow users to set a Data Delay alert across multiple datasets, including the option to target datasets related to a specific pipeline.
New "Add Alert" experience for data delay
To begin with, we started by creating only the alert define experience without a receiver and no alerts gallery. We also temporarily left the old experience under pipeline alert. I created a prototype that would allow us to test the new alert experience with a few customers. It was pretty clear that the new design is easier to consume and much more clear than the old one.
Preparing Design Spec for Development
I created detailed specs, presenting different use cases and flows, with high-fidelity design. Together with the front-end team, we defined different interactions and validation behaviors.
Detailed spec presenting different use cases and interactionsSpec for conditions rulesExample for multi-selection dropdown interactions
First Released Learning
Eventually, when the first iteration was on the air, we started collecting feedback from users.
Main good feedback:
A conversational and direct new experience is more clear.
The user's attention is focused on one alert.
Setting an alert on multiple targets is super valuable for data engineers' workflow.
Need improvement:
Multi-selection dropdown interaction was confusing. Users couldn't understand the difference between them.
Users wanted more details about how the alert is working.
I decided to address the feedback during the following iterations.
Iteration No.2
Data Quality Check
According to users' requests, the next iteration included the capability to set alerts on different metrics for column-level data. The new alert included setting alerts to one or more columns of a dataset, on different metrics like anomaly, nulls, and metadata statistics.
Data quality alert, still without the full steps experience of wizard steps and no receiversData quality alert spec for development
Users' Feedback
When users started working with the new data quality alert, we learned that users are missing 2 important things, that were scoped out because of technical priority:
Anomaly/Range detection - Users were looking for visual anomaly detection range, otherwise they can't really know what to expect from when an alert would be triggered. So we added a graph with anomaly detection, based on previous data that was collected.
Alert on multiple datasets - Users were missing the ability to set an alert on multiple datasets at once.
Those requests were quite critical for users, so we invested the following sprint in adding those requests.
Null anomaly alert - the range threshold is calculated according to previous dataNull percentage error range alertSetting alert on multiple datasets
Iteration No.3
Choose an Owner / Receiver to be Notified
Now was the time to approach another step from the original concept - assigning different receivers for different alerts. Until this point, it was possible to set only one receiver for triggered alerts. The ability to choose different receivers per alert rule was a deal-breaker. Different people inside the same team or company were responsible for different pipelines, and they wanted to get only the triggered alerts relevant to them.
We took advantage to build this feature upon a request that came from an important client; adding integration for the PagerDuty receiver.
The experience was supposed to be very short and simple:
Select one of the receivers that are integrated with the system - Slack, Email, or PagerDuty.
Choose email, slack channel, or contact person, who would be notified.
Now users can set the receiver in the context of a specific alert
Iteration No.4
Simplify the Experience for Assigning a Receiver
We quickly learned that the UI for selecting a receiver was really hard to understand, which ended up with customers not using it. We updated the experience into a list that exposes all possible receivers from the start, so the user only needs to choose which receiver they want notified.
Now it is possible to see all integrated receivers from the start
Iteration No.5
Creating an Alert on Multiple Assets
After introducing the ability to set a data quality alert on multiple datasets, customers asked for the same capability for the 3 most popular pipeline alerts: Run State, Run Duration & Schema Change.
We were looking for a quick way to validate and were wondering if to add multiple selections to the old alert editor, but we faced huge product and UX challenges that solving, would at least double our scope; the old editor supported setting an alert on specific tasks inside pipelines.
Previous pipeline alert experience - limited to single asset selection
We knew that on our future road map we plan to solve the complexity of setting alerts on entire pipelines vs inner tasks. So for this iteration, we decided to add multi selections for those 3 alerts, using the new experience pattern. This would give us a quick solution that we can validate with users.
The 3 most popular and useful alerts, now supporting selecting multiple assets
Iteration No.6
Users are Creating Duplicated Alerts
Our customer success team started to notice that for some reason customers are creating duplicated alerts. talking to them raised a few problems:
They were receiving a lot of spam alerts, which ended up with ignoring some of the alerts or turning them off.
The user didn't know if they or someone else has already set up an alert on a specific asset.
We realized that the problem was in our experience, which allowed creating similar alerts without warning the users that they are about to do so.
An alert considers duplicated if there is a similarity in 3 parameters:
Alert type
Condition
Origin of assets
The solution
In case we recognize that the user setting an alert that is similar to existing, we show him as part of the create alert flow, a validation screen which warn him that he is about to create a duplicated alert, and present him details about those alerts.
Duplicated alert validation screen - appears only in case of similar alerts
Iteration No.7 (WIP)
Creating an Alert for Multiple Assets
Lately, we working on unifying the entire alert experience, and tackling more of the major problems in alert creation products.
Product requirements:
All alerts on pipelines should support the complexity of setting alerts on the entire pipeline vs alerts on inner tasks.
All alerts should be applied to single or multiple assets.
UX requirements:
Users should understand what the alerts suppose to test, and giving 1 or more examples.
1 coherent experience for all alerts - Today it includes 3 different experiences with different capabilities.
Old experience is confusing, not clear, and not friendly to use.
The conversation UX pattern is not optimized for the localization of different languages.
Assumptions we had:
User wishes to filter according to source or project, might be both of them together.
It is important the user would know from the start if he sets an alert on the entire pipeline or on tasks level.
I create a low-fidelity mockups in order for the product manager and I are could explore and refine different ideas, and use cases, and collect feedback from customers. Each version starts with assumptions that we validate with customers.
You can see a prototype example for one of the latest concepts. Note:This version is still in the ideation phase.
Starting the wizard from a gallery that exposes and explains all possible alertsAlert define step - where the user creates the rule
Usability feedback
We tested this version with our customers, and we learned some good & bad insights.
The good feedback:
People loves the Summary part, "Setting an alert can be complicated and it is good I can see it all the time"
The gallery which explain the alert and it is good there are some use cases examples.
Less good insights:
Sometimes people missing that they need to choose if setting alert to the entire pipeline or to inner tasks.
Even when users are not missing this option, they are not always understand what the different.
Filter pipelines assets according to source and projects together is confusing. Seems that this is not useful scenario.
Adopting the feedback
After the testing we did, we understood the we need to update a little our assumptions:
If the user wishes to filter according to source or project, he probably wishes to create alerts on the level of pipelines or any tasks of one or more pipelines.
The user would probably filter according to projects or sources, but not both of them.
Setting task alerts is derived from the context of specific pipeline.
UX changes:
A user is asked to filter from the start according to source, project, or pipeline.
Then we asked him about asset selection.
Applying an alert to a specific task is optional only when selecting a specific pipeline. The user needs to toggle on the options related to the tasks.
It is not possible to cross-filtering according to source & project together.
On this version it is more explicit that filtering is according to pipelines, sources or projects
Summary
What Have We Done So Far?
Designed a new editor and constantly improving it.
Added 2 more alerts for monitoring data quality: Data Delay and column-level quality check.
Added the option to set the receiver specific to each alert definition.
Managed to decrease the creation of useless monitors by warning users when creating duplicated alerts.