The question is published on by Tutorial Guruji team.
I am not so int SQL and I have the following doubt about how to optimize a query. I am using MySql
I have this DB schema:
And this is the query that returns the last price (the last date into the Market_Commodity_Price_Series table) of a specific commodity into a specific market.
It contains a lot of join to retrieve all the related information:
SELECT MCPS.id AS series_id, MD_CD.market_details_id AS market_id, MD_CD.commodity_details_id AS commodity_id, MD.market_name AS market_name, MCPS.price_date AS price_date, MCPS.avg_price AS avg_price, CU.ISO_4217_cod AS currency, MU.unit_name AS measure_unit, CD.commodity_name_en, CN.commodity_name FROM Market_Commodity_Price_Series AS MCPS INNER JOIN MeasureUnit AS MU ON MCPS.measure_unit_id = MU.id INNER JOIN Currency AS CU ON MCPS.currency_id = CU.id INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id INNER JOIN MarketDetails AS MD ON MD_CD.market_details_id = MD.id INNER JOIN CommodityDetails AS CD ON MD_CD.commodity_details_id = CD.id INNER JOIN CommodityName AS CN ON CD.id = CN.commodity_details_id INNER JOIN Languages AS LN ON CN.language_id = LN.id WHERE MD.id = 4 AND CD.id = 4 AND LN.id=1 ORDER BY price_date DESC LIMIT 1
My doubt is: using the previous query I am extracting all the records related to a specific commodity into a specific market from the Market_Commodity_Price_Series table, do a lot of join, ordinating these records based on the price_date field and limiting to the last one.
I think that it could be expansive because I can have a lot of records (because the Market_Commodity_Price_Series table contains daily information).
This query works but I think that can be done in a smarter way.
So I thought that I can do something like this:
1) Select the record related to the last price of a specific commodity into a specific market using a query like this:
SELECT measure_unit_id, currency_id, market_commodity_details_id, MAX(price_date) price_date FROM Market_Commodity_Price_Series AS MCPS INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id WHERE MD_CD.market_details_id = 4 AND MD_CD.commodity_details_id = 4 GROUP BY measure_unit_id, currency_id, market_commodity_details_id
that returns the single record related to this information:
measure_unit_id currency_id market_commodity_details_id price_date -------------------------------------------------------------------------------- 1 2 24 05/10/2017
Use this output like a table (I don’t know the exact name, maybe view, is it?) and join this “table” to the other required information that are into the MeasureUnit, Currency, MarketDetails, CommodityDetails, CommodityName and Languages tables.
I think that it could be better because in this way I am using the MAX(price_date) price_date to extract only the record related to the latest price into the Market_Commodity_Price_Series instead obtain all the records, ordering and limiting to the latest one.
Furthermore most onf the JOIN operation are doing o the single record returned by the previous query and not on all the records returned by the first version of my query (potentially they could be hundreds or thousands).
Could be a smart solution?
If yes…what is the correct syntax to join the output of this query (considering it as a table) with the other tables?
You’ve done a reasonably good job of writing an efficient query.
You didn’t use
SELECT *, which can mess up performance in a query with lots of joins, because it generates bloated and redundant intermediate result sets. But your intermediate result set — the one you apply
ORDER BY to — is not bloated.
WHERE col = val clauses mostly mention primary keys of tables (I guess). That’s good.
Your big table
Market_Commodity_Price_Series could maybe use a compound covering index. Similarly, some other tables may need that kind of index. But that should be the topic of another question.
Your proposed optimization — ordering an intermediate result set consisting mostly of
id values — would help a lot if you were doing
ORDER BY ... LIMIT and using the
LIMIT function to discard most of your results. But you are not doing that.
Without knowing more about your data, it’s hard to offer a crisp opinion. But, if it were me I’d use your first query. I’d keep an eye on it as you go into production (and on other complex queries). When (not if) performance starts to deteriorate, then you can do
EXPLAIN and figure out the best way to index your tables. You’ve done a good job of writing a query that will get your application up and running. Go with it!