Your company generates a ton of data—so much that it's essential to pare it down and only store the most relevant stats, right?
Wrong. In 1975 data warehouses were developed, and at the time a gigabyte of storage cost $200,000. Today, one GB of storage will cost you around 2 cents. With low storage costs, companies can stop worrying about compression and place their focus on making sure they can fully understand their data.
What does this mean for brands? Heavily “cooking” (processing) data may have been necessary a few decades ago, but now its few benefits are far outweighed by the advantages of keeping data raw.
Read on to find out why all your data should be raw, and how it can benefit your brand.
“Cooked data" essentially refers to processed data. Meaning, that data has been taken from its raw format and processed, reorganized, or compressed. Traditionally, companies heavily cook their data in order to optimize storage space and query times. Three major ways to cook data are:
However, cooking your data with any of these methods isn't the optimal choice anymore. These methods were initially created because they allowed the data to fit on a machine and allowed people to answer queries quickly—not because they actually made sense. Subtle bugs, like email automation pulling information from the wrong table, are exceedingly difficult to find when data is processed like this. And again, the motivation behind cooking data no longer exists as storage prices have dropped.
“The Sushi Principle,” says that raw data is better than cooked data, because it keeps your data analysis fast, secure, and easily comprehendible. There are three steps you need to take to keep your data raw.
When your data pipeline already has to read every line of your data, it's tempting to make it perform some fancy transformations. However, brands should steer clear of these add-ons to avoid:
Of course, there are a few circumstances where you will need business logic in your pipeline. Regulations may require you to purge old user accounts and drop IP addresses. But every time you think about pushing a piece of business logic into your pipeline you need to consider the risks. We're all still relatively bad at writing software—every complicated bit you add increases your chances of an error. And since storage is so much cheaper now, you have every incentive to just perform those calculations later.
Once you've gone through the trouble of collecting all your data, you shouldn't toss out portions of it. With data storage costs so low, there's no reason not to keep all of your data—but a bunch of reasons to do so:
Keeping your original data reduces your unnecessary work, so you can get to parts that actually add value. It takes away the need for extensive prior planning and spending time figuring out where your stats came from—so more time can be spent on fully exploring your data.
You may be tempted to summarize and sample your data early in the pipeline. The thinking goes, "I'm going to have to do these things no matter what, why not shrink my data and make it easier to process?" But sampling and summarizing early on can harm the accuracy of your data. It's much less risky to do so at query time:
Yes, you will likely need to sample your data at some point to get answers to your queries quickly. But making that point at query time will ensure that you have the appropriate representative sample you need for every query.
Data is necessary to grow any business—so stop wasting it.
At Scuba Analytics, we believe data works best when brands can iterate queries continuously instead of having to craft the perfect idea first—if you throw business logic into your pipeline, you lose this ability. By keeping your data raw, you can ask any query you want without having to plan for it in advance.
With Scuba's continuous intelligence platform, brands don't have to worry about tedious ETL or constantly updating their data—and gives brands the agency to explore and understand their data with ease.
Ready to learn how Scuba can help you optimize your data? Request a demo today or talk to a Scuba expert.