Do you analyze data? Whether you work in Sales, Finance, Manufacturing, Logistics, Engineering or Customer Service, there is a constant need to analyze data.
With the advent of the industrial age and the conveyor belt, gaining efficiency and reducing errors has been a constant battle. We need more and better ways to analyze data. Today, billions of new data points are gathered every minute. There is more storage available in the cloud, and there is no dearth of computers big and strong enough to process it quickly.
Yet, there is a lack of know how, a lack of understanding on how to analyze data. As a result, more companies are still not able to tap on the promise of Big Data, Artificial Intelligence, Internet of Things or Machine Learning.
To Understand the State of Data Analysis, and look at solutions, we need to understand a few fundamentals about Data, and Data Analysis.
Why Do We Want To Analyze Data?
Of course there are a gazillion reasons.
No… Not really. The need to analyze data is routed in only a few basic reasons.
- To Gain Insights about our data. Why?
- So that we can find out what works, and what does not. So we can compare the past results with another company, anoder product, another period, and see if there is some insight to glean.
- When we find out what works, and find out where the problem lies, we can then perform Root cause analysis – which can tell us the real problem to fix. When we have fever, we can take a tablet of Paracetamol (Tylenol/Panadol/Aspirin). However, the fever is not the disease. It is a symptom that we develop, when something is not right with our body, and the body is fighting the bad bacteria/virus. During this fight, the temperature rises. The Paracetamol can reduce the fever, but it does not remove the fight. To go to the root cause of the problem, a visit to the doctor is required. The doctor may analyze the problem by looking at your Blood pressure, heart rate, x-ray, probes etc. and figure out the issue. If it is indeed bad bacteria causing some area to swell, the doc can issue a dose of Anti-biotics, which fight the bad guys, and treat the real problem. Once the bad guys are gone, the fever automatically comes down to the normal level.
- Make Better & Faster Decisions
- Based on the insights, we are able to make better decision, and faster too. And backed by data, we can have more confidence on our decision making process… rather than relying only on the gut-feeling.
- With statistical methods of analyzing data and forecasting trends, the accuracy of the analysis, and insights gets deeper as we generate better analytics. We can then make a more informed decisions faster.
How Do We Analyze Data: The Traditional Data Analysis Process
The data analysis process is simple, and we have been following it for so long that it has almost become a habit… albeit a not so good habit. What we usually do is outlined below:
- Get the Raw Data File From The Source System
- This could be a Text File, a Comma Separated Values (CSV) File, a PDF file, an SQL data dump, an Access or RDBMS source (think Oracle, Microsoft SQL Server or MySQL Database)
- Load into Excel
- Load this data into Excel. This could be easily achieved, but this is a manual process, that has to be repeated every day, week, month etc., whenever the data changes, or new data arrives.
- Incoming data is seldom clean. There will be missing values, duplicates, wrong headers, mis-aligned dates in YYYYMMDD or some other odd text or numeric layout, and other data issues to fix.
- To do this, you’ll have to write some simple formulas, functions, data cleanup steps, fix dates, and fill blanks or nulls with an appropriate values. This is all manual process, unless automated with the help of macros.
- Lookup Master Data
- Once the data is cleaned-up, we need to change the codes with values, get the correct product price, employee salary, or the date of manufacture from some master data tables. This often requires expertise with Excel formulas like VLOOKUP & HLOOKUP. You may have to do some interim calculations too, create some calculated columns, and create a wide table with all the fields/columns required for the analysis in the next step.
- Create Pivot Tables
- To analyze the data, one of the quickest and fastest ways is to create a Pivot Table. Pivot Tables help you to summarize data – create Sum, Count, Differences, Cumulative Totals etc. and split it by the Rows & Columns. Further, you can create multiple Filters or Slices using the Filters or Slicers options.
- Keep Refreshing
- By default, a Pivot Table does not refresh automatically. You have to either set up this option, or refresh manually. And even if you refresh it either way, it may not pick the additional, new data, for the range of the source data may be set explicitly to an Absolute range. Changing the Absolute range always is an extra step that you have to remember to do.
- Create Charts
- To present the data, we mostly rely on the popular Bar Charts, Column Charts, Pie Charts, and the newly introduced Map, Funnel or Tree map charts. However, charts often get too busy, and there are not too many options to customize them. Once the charts are ready, you don’t want to send the entire Excel file, with the charts. And sending only the charts is not an option, without the underlying data. This brings us to the next point…
- Paste Charts Into PowerPoint
- PowerPoint is the darling of the corporate boardroom. No presentation is ever complete without a mega slideshow. So people use Excel to analyze the data, and the paste the charts into PowerPoint. They create hundreds of slides, and then marvel at their slide deck. Finally, they are ready for the big presentation to the directors.
- Present to Management using PPT Slides
- The PowerPoint show begins, and your audience is thrilled by your analysis, prestation and charts. You are secretly gloating over the success. Suddenly a member of the audience has a question – can you compare this quarter with a quarter 5 years ago?
- Well… yes you can. But you don’t have the data right now. You’ll have to get back ot your desk, do the right analysis, and probably show them the chart in your next quarterly meetup.
- The opportunity window is so small, by the time you get back to them, they probably do not need the information any longer. You have to cut a sorry figure, and all your great work comes to nothing…
What’s wrong with this whole scenario. Let’s find out.
Problems With Traditional Approach To Data Analysis
There are a number of major problems in the Traditional approach of Data Analysis that has been around for the past 20-30 years… pretty much all along the advent of Spreadsheets. To name a few:
- Excel Charts and Reports Are Not Interactive
- Traditional Excel or PowerPoint Reports/Charts are not interactive. They are just screen shots. So they don’t change dynamically. Good because it will always be the same, and speed of execution will be fast. Bad because it can’t be used again… And has to be manually refreshed each time.
- You Can’t Share Excel Files Easily
- People are worried when it comes to sharing their Excel files. They have security, and privacy concerns. And an Excel file is quite fragile. Change of any path, sheet, formula or cell can break the workings completely, rendering it completely useless.
- Excel VLookups are Slow & Cumbersome To Use
- Excel files rely on Vlookups to pick the correct employee salary, product description attributes, or prices. It works great, and has been the mainstay for Excel’s popularity. However, VLookups recalculate extremely slowly on a linked Excel file, which has anywhere in excess of 200,000 records. At half a million, things absolutely come to a standstill, and many times Excel will give you the dreaded Blue Screen and simply crash.
- Excel Takes Time to Clean & Refresh every time
- Newly added data has to be cleaned up. It takes time, and is a slow, painstaking process. If you forget to update, or refresh on time, it shows old/wrong data and is a complete waste. New data has to be added, and refreshed each time.
- Security of Excel Data is Quite Fragile
- Sending Excel reports without the data is simply not possible. And sending it with data raises Security concerns. Any leaks will destroy your pricing, and lose your competitive advantage. And any broken links will render the data useless too.
- A Very Small, Finite Limit of Excel Rows
- There is only a finite number of rows that can be loaded into Excel. The current limit is a million rows (1,048,576 to be exact.). But in today’s world, a million rows is considered nothing. We need to look at ways to expand the limit to millions of rows.
- Pivot Table Limitations
- Pivot tables are quite slow to refresh. Further, they do not refresh automatically. And, there are very few ways to visualize the data – only sum, count, average, and YTD etc. It lacks the rich data transformations like Year on Year, Quarter on Quarter Analysis. It is not very good at showing the total overall percentages and its breakdowns.
- Wasted Time in Cleanup
- 80% time is spent to clean the data, and only 20% is left for us to analyze it. This does not accord the time and importance data analysis requires. The equation is completely skewed out. It should be the other way round – 20% to clean, and 80% time for analysis.
Is There A Solution To All These Data Analysis Woes?
With the current state of Business Intelligence Tools, there is more than a glimmer of hope. The industry has finally arrived to a point where Big Data processing is becoming commonplace, and is no longer in the realm of the wealthy or the academics. Today, the common man’s BI tools are already working wonders.
- Need for Interactive System
- Today, the top of the breed BI tools like ClikView, Tableau and Microsoft, all offer dynamic ways to present data, and interactive ways to visualize it, in numerous ways.
- Write Once. Use Again.
- Now the steps of cleaning the data can be recorded, and used again and again, each month with hardly any tweaks. This reduces the need for manual cleanup and improves accuracy and reliability. It offers ways to scale up the data analysis process and work on more value added services.
- Shift Cleanup vs Analyze Ratio to 20-80.
- With the added interactivity and automation, finally more time can be spent in analyzing the data. Cleanup jobs are set in the background, and they can continue to run on auto-pilot. Plus, we are able to process data now in real time, which drastically improves the prospects of having usable, actionable data and insights.
- Load Data From Multiple Sources
- The ability to load and merge data from multiple sources is becoming much easier, and automation is making this chore into a breeze that is a fun to do activity today.
- Data from Text files, CSV, Excel, Databases, XML, Websites, Live Tickers, On and Off site Corporate Data Warehouses is becoming a reality. This opens up new possibilities.
- Remove need to do complex, slow VLookups
- Forget slow VLookups, and be able to lookup any row, any column, irrespective of the order or distribution of the data rows and columns. This frees up new options and makes merging data a simple matter. In fact, now we do not need to have very wide tables.. We can have shorter (Less wide – less number of column in a table And have taller tables (More data rows)
- Load Huge Data Volume, beyond 1 million rows of Excel
- Options like Microsoft Power Query & Microsoft Power Pivot enable you to load a few million rows a simple matter. And there is no need to worry about running out of space in Excel or its bandwidth – the ability to process a few million rows.
Microsoft Power BI: The Magic Wand To Vanquish All Your Data Analysis Problems
- Super Fast
- Dynamic, Interactive
- Handle Big Data With Ease
- Hundreds of Data Sources
- Build Relationships. End of VLookups
- Multiple Ways to Refresh
- Visualize First
- Generate Insights
- Share With Ease
- New Ways to Visualize – Maps, Tree Maps, Funnels, KPIs, Speedometers
- Security of Sharing
- Multi Device Support – Web, Desktop, Mobile, Tablet, Without installing any additional software
- Real Time Processing
- Multiple Dashboards
- Slice & Dice To your Heart’s content