Case Study: Data & Analytics
SITUATION & BUSINESS CHALLENGE
A large American retail store chain with a massive data store was having difficulty locating particular data on demand. With nearly 2 billion data assets stored across its enterprise, the company’s IT analytics department would take from 2–3 days to a month to fulfill employee data requests, often finding the data via email requests or other indefinite processes.
To address the problem, the analytics department began to use a MySQL relational database as a catalog to store metadata ingested from Oracle, IBM DB2, Domo, Tableau, MicroStrategy, and other sources. A crawler would locate data synchronously and store it in the MySQL database. But while the database was intended to be used with Elasticsearch, it was not as distributed or scalable as Elasticsearch requires and was not sufficiently addressing the company’s needs.
The analytics team had some high-level architecture and application knowledge but lacked the proper resources and expertise to develop a scalable, production-ready application involving Elasticsearch. After six months of development, the project stalled because the application could not keep up with the demand of ingesting the millions of data sets and data attributes required of it, and project leaders looked for help. They found the expertise and clarity they needed in AIM Consulting.
SOLUTION
Two consultants from AIM Consulting’s Data & Analytics (D&A) practice joined the analytics team and provided practical design and development leadership to reengineer the application into a complete data portal that instantly fulfills employee data requests. The data portal enables visibility to the company’s massive dispersed data store to provide the following:
- Immediate discovery of the data needed to answer business questions
- Less duplication of reports from more intelligent indexing
- Cleaner data with increased levels of trust
With the new data portal, it now takes less than a second to find the data that took days or weeks to locate in the past.
AIM Consulting began by introducing proof-of-concept plans for several different technologies that could meet the requirements of the new data portal to help project leaders select the proper technology to best fit their infrastructure. AIM recommended an open-source scalable solution consisting of the famous NoSQL database MongoDB for storing and querying data, Elasticsearch for recording and analyzing data and RabbitMQ, the most widely used open source message broker, for communication among the solution components and systems. The team ultimately selected this option after proof-of-concept testing.
Because the existing application code did not perform to new standards, the team decided it was best to rewrite the entire application from scratch. AIM helped the team reevaluate its technology infrastructure before reengineering the project to ensure it would be sufficient to host the new solution.
AIM also helped the client team with error handling and data cleaning measures, and guided the team on intelligent search methods and indexing data in Elasticsearch. To make the new data portal perform with top efficiency, data needs to be indexed properly so the tool can quickly determine its source and destination. This notion of data lineage was new to the team.
Through these methods, AIM’s leadership on the project helped to ensure that the data ingested and exposed by the data portal is high quality and trustable. Additionally, smarter data tagging can help employees discover reports and other data more easily, reducing the possibility of creating a report that already exists in the system.
Now with the new data portal, employees across the company can not only locate the data and reports they need, but they can also tag data themselves to increase its discoverability for future users. They can also set data permissions and search for data lineages, another measure of helping to determine whether duplicate data might exist in multiple databases.
AIM Consulting added additional engineering value by helping to streamline numerous application development processes on the analytics team in the areas of code management, deployment, testing and maintenance.
RESULTS
Around 70 searches per day were performed through the data portal when it first went live; after only a few months, the number topped 1,000 as its popularity increased throughout the company. The analytics team is communicating with more teams across the company for access to their data to include in the portal.
Project leaders have received tremendously positive feedback on the portal, and they were so delighted with AIM Consulting’s work that they asked the consultants to remain on the team to help add more features and functionality to the portal. Among the enhancements to come are increased data lineage capabilities, adding a rules engine for defining business rules and changing data based on those rules, and adding SLA permissions.
Most importantly, AIM Consulting helped the company to bring rapid life to the vast amount of data dispersed throughout the company, providing the ability to leverage those assets in a way the company had always wanted to do.