database strategy to provide data analytics

I provide a solution that handles operations for brick and mortar shops. My next step is to provide analytics for my customers.

As I am in the starting phase I am hoping to find a free way to do it myself instead of using third party solutions. I am not expecting a massive scale at this point but I would like to get it done right instead of running queries off the production database.

And I am thinking for performance concerns I should run the analytics queries from separate tables in the same database. A cron job will run every night to replicate the data from the production tables to the analytics tables.

Is that the proper way to do this?

The other option I have in mind is to run the analytics from a different database (as opposed to just tables). I am using Amazon RDS with MySQL if that makes it more convenient?

Answer

It depends on how much analytics you want to provide.

I am a DWH manager and would start off with a small (free) BI (Business Intelligence) solution.
Your production DB and analytics DB should always be separate.

  1. Take a look at Pentaho Data Integration (Community Edition) It’s a free ETL tool that will help you get your data from your production to your analytics database and also can perform transformation.
  2. check out some free reporting software like Jaspersoft to help you provide a Reporting Platform for customers (if that’s what you want, otherwise just use Excel).
  3. BI never wants to throw away data. If you think that your data in the analytics DB is gonna grow large (2TB +) don’t use MySQL but rather PostgreSQL. MySQL does not handle big data well.
  4. If you are really serious about this, read “The Datawarehouse Toolkit” by Ralph Kimball. That will set you up with some basic Data Warehouse knowledge.

Leave a Reply

Your email address will not be published. Required fields are marked *