So we have this table that has over 20 million rows of data, storing stats. The general structure of the table is like this:
ItemId (varchar) | GroupID (varchar) | Stat1 (float) | Stat2 (float) | … StatN
There is an index on ItemId + GroupId. When issuing queries, I observed that they would take very very long – upto 100 seconds. The queries were simple and looked like this:
SELECT * FROM TABLE WHERE ItemId=123456789 AND GroupId=987654321
After banging my head with different partitioning, query styles, I had almost decided that I need to upgrade the server (it was a micro instance on Amazon EC2 after all, and the CPU used to spike when the query was issued).
Then it struck me. I changed the query to this, and the query times came down to 0.2 seconds!
SELECT * FROM TABLE WHERE ItemId=’123456789’ AND GroupId=’987654321’
Got it? The columns are varchar, but the parameters I was supplying were ints, and that would cause a manual type casting of each value (over 40 million values), and then non-indexed search. No wonder it was so slow.
One of those days 🙂 Though I think the query optmyzr should probably try to typecast the parameter instead of the table columns.