Today's big problem with the data domain is that we have an explosion of tools, but very few End-2-End solutions, especially turnkey or out-of-the-box solutions. Though we may understand and experience the utility of a particular tool or quantify its value somehow, once this tool is part of a bigger "Picture" integrated with other tools in a solution, this utility and value might fade away or become less relevant.
So far, no one has been able to answer this "simple" question during my professional career:
"What is the value of your data solution?"
All the discussions around this subject are about CAPEX, OPEX, and TCO, which are valid points but stressful for the CTO, CIO, Enterprise Architects, and other essential stakeholders.
Failing to answer this question in a few simple words may lead to different results:
- Your data program is poorly financed or, worst, canceled "data is important for us but we have other priorities like increasing revenue and reduce costs!"
- Your small data team cannot grow "because we have a lot of people doing the same stuff in Excel."
- Your fancy "Modern Data Stack" solution is held as "it will generate more technical debt."
- Your data governance initiative is a "must" "because and only for the GDPR scarecrow."
If you have an answer to this question, and it can be expressed in a straightforward sentence, please send your comments. I'm always open to learning.
If you are not yet bored with my minor surgical incision in the most wanted worldwide data unknown, bear with me a few minutes as I go into details and tell you a story from my consultancy career.
But first, a brief background on what a data solution is about for everyone to understand.
An End-2-End data solution is usually built from several components: a data extraction and integration component, a data repository component, a data transformation component, a data visualization component, a data science component, a data management component, and so forth.
Some of these are seen as platforms fulfilling multiple capabilities. Some are dedicated software fulfilling one maximum of two capabilities, and some are just plugins or libraries that sit on top or integrate with these components above.
I will call them "tools" in this article for simplicity, though I feel that some of you reading this will disagree with me.
According to a study made by Gartner a couple of years ago, an End-2-End data solution is built from at least 10 tools. Putting all these puzzles pieces together is always challenging for them.
Companies invest a lot of resources in it, 2-3 years or more, and rarely see the real benefits and the value associated with this investment.
Now, back to the story
I was working for a big client, with thousands of employees, with the scope to optimize and rationalize their current data solution landscape and prepare it to evolve and support the company's digital transformation journey.
I started to work, and after one month, we were able to identify all the data tools in the company. We found not 10, not 20 but 111 data tools this company was using or "somehow using!" A magic number, I may say!
I informed my boss, and he told me:
"Adrian, let's put all of them in a big picture on our wall to scare people away when they enter our office!"
And this is how the "Picture Data Solution" concept was born.
We went to my boss's boss with the "Picture Data Solution" printed out (double A0 format). After four hours of discussion - where some harsh words were invoked that I cannot describe here - the final decision was to find a way to simplify, optimize and standardize the current data solution landscape.
For the next three months, I was the "Picasso" of the enterprise diagraming tools, struggling to develop the ideal solution for my client. Version 27 was the lucky one, where from 111 data tools, we reduced to 23 of them.
My way of approaching this problem was using TOGAF enterprise architecture standards. I took each tool, decomposed it into application components and functions, mapped them to different business processes and capabilities, and at the end, I built a matrix where we could spot the overlapping ones.
In this case, we had, for example:
- 5 data extraction, transformation, and integration tools (ETL tools),
- 12 relational databases,
- 8 data visualization tools.
From the discussion with users, we got the information that some of the tools are not used at all, some are used a couple of times per year/month, and only a few are used in the day-by-day operations.
That was my base core for Version 27 of the Picture Data Solution.
The big day has come, and we presented these findings in the SteerCo meeting to all the essential stakeholders in the company. This was the first time they uttered the words: transparency, and clarity related to data solutions costs, as in the meantime, we managed to assign some financial metrics to our picture.
The other outcome of this meeting was kind of unexpected for me, or I was just not prepared at that time to give a response.
When you have thousands of employees working with some of these 111 data tools, What will be the best way to tell them that in 12/24 months, they will only use 23 of them?
You have the "lucky ones" that already work with these tools, and hopefully, there is no problem here, but you have the "unlucky ones" that clearly will have a problem.
- Do we need to train them? How fast will they learn?
- Do we need to move them to different positions or re-assign departments?
- Fire them? Or will they quit their jobs before we fire them?
From the technology perspective:
- What is the impact of decommissioning 88 tools, as they are dependent on the rest of the 23 tools? What are the first ones to go? Will the world end?
- Can we replace "n" existing tools with one tool? Will we have the same capabilities? How about usability?
- How about the new fancy data tools from the market popping up daily on ProductHunt?
- Can we integrate them into the existing "Picture Data Solution"?
- My favorite: Can we find the magical tool(s) that solves our existing problems?
These are sensitive questions with no simple answers, I'm afraid. But any company out there will have to be prepared with answers at some point in time. Sooner is better!
Now, back to our present days
The last Matt Turck's Data & AI landscape was just released, and I was discussing with my friend and CTO of DataStema, Dragos, about the data landscape future and the challenges companies have with the new wave of technologies.
We concluded with a couple of points and "rhetorical" questions which might be practical, philosophical, or "crazy" for this present time:
- We have a lot of technologies but very few solutions.
- Most companies will buy tools from the market, but these tools will be part of their solution: current or future.
- What is the best and fast way for companies to test and validate these new tools?
- Or better, test and validate them with the existing solution to see how they integrate into the landscape?
- Do we need to buy data tools anymore, for the next 2-3 years, or just rent them on-demand when we have a use case?
- With the advancement of DevOps, Cloud Native Services, Infrastructure as Code, this is possible today, not with all the tools, but this is the way.
Imagine the following scenario:
- You come with your use-case,
- Choose different Data Solution Blueprints© to test your use-case with various tools and combinations,
- Deploy them in minutes, with a few clicks, on your cloud provider of choice,
- Run your experiments and validation,
- Select the Data Solution Blueprint that works well for your use-case,
- Keep it running until produces value for the company,
- Save the outcome data to your "Data Temple" and in the end,
- Terminate the environment.
In this case, you only keep the core components of your data solution fixed, which are business-critical, as I call it the "Data Temple", an architecture pattern I will detail in future articles.
- Do you think this will improve your cost metrics?
- How simplified and transparent will be your "Picture Data Solution" in this scenario?
- How will companies know how to combine these tools into data solutions that work?
The last point triggered us to build the Data Platform Generator website to present the "Picture Data Solution" concept.
Our idea was simple:
- Let's take some technologies from the MAD Landscape and generate simple visual blueprints to inspire other data aficionados in the design process of a data platform.
- Also, expose them to different technologies that exist on the market, which could be good alternatives to already better-known technologies.
- Create a generator to display various "Picture Data Solutions" - there are unlimited combinations. Can you discover all of them?
- When working on the project, Mirela, our DataStema COO told us about Scott Brinker's Martech landscape, and we lost our breath for a few seconds. Can we integrate the MAD & Martech? Do you like M&M?
Officially we launched our page today, though we run into some design issues with the mobile terminals. We are working hard to fix this in the following site version.
Are these combinations valid?
We don't know. We are not experts in all the data tools, but if you know it, share your knowledge with the world by submitting your comments.
We aim to familiarize you with the "Picture Data Solutions" concept or the Data Solution Blueprints© in more commercial terms.
Have fun, be inspired, and we hope you will enjoy it!