The question is published on by Tutorial Guruji team.
I provide a solution that handles operations for brick and mortar shops. My next step is to provide analytics for my customers.
As I am in the starting phase I am hoping to find a free way to do it myself instead of using third party solutions. I am not expecting a massive scale at this point but I would like to get it done right instead of running queries off the production database.
And I am thinking for performance concerns I should run the analytics queries from separate tables in the same database. A cron job will run every night to replicate the data from the production tables to the analytics tables.
Is that the proper way to do this?
The other option I have in mind is to run the analytics from a different database (as opposed to just tables). I am using Amazon RDS with MySQL if that makes it more convenient?
Answer
It depends on how much analytics you want to provide.
I am a DWH manager and would start off with a small (free) BI (Business Intelligence) solution.
Your production DB and analytics DB should always be separate.
- Take a look at Pentaho Data Integration (Community Edition) It’s a free ETL tool that will help you get your data from your production to your analytics database and also can perform transformation.
- check out some free reporting software like Jaspersoft to help you provide a Reporting Platform for customers (if that’s what you want, otherwise just use Excel).
- BI never wants to throw away data. If you think that your data in the analytics DB is gonna grow large (2TB +) don’t use MySQL but rather PostgreSQL. MySQL does not handle big data well.
- If you are really serious about this, read “The Datawarehouse Toolkit” by Ralph Kimball. That will set you up with some basic Data Warehouse knowledge.