Exploratory Data Analysis (EDA) is the task of reviewing and profiling the client’s data and providing a list of findings related to data quality, identified data anomalies and issues, and determining the best possible solutions for getting better data from different sources in the future. It is done historically using basic tools like MS-Excel, free-hand SQL, SAS frequency report, Informatica Data Quality, etc.
At CX Driven, Interactive EDA (i-EDA) is more robust & a revolutionized multi-step task & exercise and a very effective one powered by of the most effective data discovery and visualization (DDV) column-based application in the market (Alterian). Alterian is mainly known for being campaign management suite (AMS/IMS) and one of the leading tools for Real-Time CX, however, we use the DDV capabilities only for this task. The same exercise can be done by accessing the client’s data remotely using their DB platforms but the task will take 5-10 times longer to complete.
Once the data is received from our client (several subject areas, each subject area can be 1-5 different text files), the data is loaded into Alterian mart (highly indexed - column based technology), we explore the data to verify the keys for each file/table/subject area (surrogate keys, natural keys, alternate keys), list of valid & invalid values for each field, data anomalies, required data transformation to standardize each data element, data issues by sources, correlation between IDs/Codes/Descriptions fields, foreign key relationships (to other subject areas), data types and the consistency of formatting, date and flag fields, and many more. The more data we receive from the client, the more effective is the exercise. We recommend full dimensions and lookup files, maybe up to 100M records for the fact tables in the source system.
The findings for each subject area are shared with the client’s data services IT and the marketing team as well as the best practices to solve each issue/finding. It usually results in several back-and-forth sessions to finalize the permanent fixes and/or a temporary ones to get the proper data into the future CRM platform. Usually i-EDA unstructured data can take between 10-20% more time than structured data.
During the i-EDA, a logical data model is developed based on the data received as well as a detailed data dictionary document containing data elements definitions, examples of list of values (valid/invalid) so the marketing team and future end-user teams are very familiar with the data structures and available data to support their current and future needs. This step is conducted using ERwin data modeler. One or two interactive sessions are conducted with the client’s IT and marketing teams to walk them through the logical model, the relationships, and how to navigate from one subject area to the next. This usually turns to be the first time the client’s users get to fully visualize their own logical data structures and get to know what business cases they can/can’t produce in the future data mart once it is built.
“I did not know we have such data to drive our new marketing initiatives”
“I knew our data is bad but not that bad”
“The ability for my marketing team to visualize the data structures is priceless. Our campaign strategy team is happy”
“You’re telling me I will be able to do X Y Z once the campaign mart/platform is built!! This is great exercise”
“Knowing most of the data anomalies and issues will be identified and fixed makes me very comfortable. My campaign users will focus on execution and instead of troubleshooting data issues during campaign build and testing”
The findings from the i-EDA exercise oftentimes drives many aspects of the design document and especially source-to-target (S2T) mapping, where transformation rules are forced if the client’s IT decides NOT to implement the recommended fixes permanently in the source system or in the client’s EDW/CDP.
This exercise is extremely essential for the success of any campaign management implementation for any CRM suite and a step that we don’t skip at CX Driven. Reach out to us @ email@example.com to start the conversation and find out how we can help your business.
We would love to find out more about your needs and challenges. We focus on technology without the common distractions of media, creative, social, and content development. We’ve made the conscious decision to avoid distraction and pursue one thing and do it really well. We would love to join your family of agencies or in-house teams on a collective quest for innovation and excellence.