Optimizing a query returning a lot of records, a way to avoid hundreds of join. Is it a smart solution? Code Answer

Hello Developer, Hope you guys are doing great. Today at Tutorial Guruji Official website, we are sharing the answer of Optimizing a query returning a lot of records, a way to avoid hundreds of join. Is it a smart solution? without wasting too much if your time.

The question is published on by Tutorial Guruji team.

I am not so int SQL and I have the following doubt about how to optimize a query. I am using MySql

I have this DB schema:

enter image description here

And this is the query that returns the last price (the last date into the Market_Commodity_Price_Series table) of a specific commodity into a specific market.

It contains a lot of join to retrieve all the related information:

SELECT MCPS.id AS series_id,
        MD_CD.market_details_id AS market_id,
        MD_CD.commodity_details_id AS commodity_id,
        MD.market_name AS market_name,
        MCPS.price_date AS price_date,
        MCPS.avg_price AS avg_price,
        CU.ISO_4217_cod AS currency, 
        MU.unit_name AS measure_unit, 
        CD.commodity_name_en,
        CN.commodity_name 
FROM Market_Commodity_Price_Series AS MCPS
INNER JOIN MeasureUnit AS MU ON MCPS.measure_unit_id = MU.id
INNER JOIN Currency AS CU ON MCPS.currency_id = CU.id
INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id
INNER JOIN MarketDetails AS MD ON MD_CD.market_details_id = MD.id
INNER JOIN CommodityDetails AS CD ON MD_CD.commodity_details_id = CD.id
INNER JOIN CommodityName AS CN ON CD.id = CN.commodity_details_id
INNER JOIN Languages AS LN ON CN.language_id  = LN.id
WHERE MD.id = 4
AND CD.id = 4 
AND LN.id=1
ORDER BY price_date DESC LIMIT 1

My doubt is: using the previous query I am extracting all the records related to a specific commodity into a specific market from the Market_Commodity_Price_Series table, do a lot of join, ordinating these records based on the price_date field and limiting to the last one.

I think that it could be expansive because I can have a lot of records (because the Market_Commodity_Price_Series table contains daily information).

This query works but I think that can be done in a smarter way.

So I thought that I can do something like this:

1) Select the record related to the last price of a specific commodity into a specific market using a query like this:

SELECT measure_unit_id, 
        currency_id, 
        market_commodity_details_id, 
        MAX(price_date) price_date
FROM Market_Commodity_Price_Series  AS MCPS 
INNER JOIN MarketDetails_CommodityDetails AS MD_CD ON MCPS.market_commodity_details_id = MD_CD.id
WHERE MD_CD.market_details_id = 4
AND MD_CD.commodity_details_id = 4
GROUP BY measure_unit_id, currency_id, market_commodity_details_id

that returns the single record related to this information:

measure_unit_id      currency_id          market_commodity_details_id price_date
--------------------------------------------------------------------------------
1                    2                    24                          05/10/2017

Use this output like a table (I don’t know the exact name, maybe view, is it?) and join this “table” to the other required information that are into the MeasureUnit, Currency, MarketDetails, CommodityDetails, CommodityName and Languages tables.

I think that it could be better because in this way I am using the MAX(price_date) price_date to extract only the record related to the latest price into the Market_Commodity_Price_Series instead obtain all the records, ordering and limiting to the latest one.

Furthermore most onf the JOIN operation are doing o the single record returned by the previous query and not on all the records returned by the first version of my query (potentially they could be hundreds or thousands).

Could be a smart solution?

If yes…what is the correct syntax to join the output of this query (considering it as a table) with the other tables?

Answer

You’ve done a reasonably good job of writing an efficient query.

You didn’t use SELECT *, which can mess up performance in a query with lots of joins, because it generates bloated and redundant intermediate result sets. But your intermediate result set — the one you apply ORDER BY to — is not bloated.

Your WHERE col = val clauses mostly mention primary keys of tables (I guess). That’s good.

Your big table Market_Commodity_Price_Series could maybe use a compound covering index. Similarly, some other tables may need that kind of index. But that should be the topic of another question.

Your proposed optimization — ordering an intermediate result set consisting mostly of id values — would help a lot if you were doing ORDER BY ... LIMIT and using the LIMIT function to discard most of your results. But you are not doing that.

Without knowing more about your data, it’s hard to offer a crisp opinion. But, if it were me I’d use your first query. I’d keep an eye on it as you go into production (and on other complex queries). When (not if) performance starts to deteriorate, then you can do EXPLAIN and figure out the best way to index your tables. You’ve done a good job of writing a query that will get your application up and running. Go with it!

We are here to answer your question about Optimizing a query returning a lot of records, a way to avoid hundreds of join. Is it a smart solution? - If you find the proper solution, please don't forgot to share this with your team members.

Related Posts

Tutorial Guruji