[Exercise] SQL - Record history

arithma · Apr 27, 2012

Given those tables:
currency (id PK, name)
usd_rate (id PK, stamp (timestamp), currency_id FK, rate (number))

And given a time instance (e.g. 2009-10-14 1300GMT), determine the currency exchanges for all the records in the currency table.

Result should be of shape (id, name, rate).

You should provide the SQL schema for the DB, and a query against it.

rolf · Apr 27, 2012

SELECT 
  `currency`.`id` AS `id`,
  `currency`.`name` AS `name`,
  `usd_rate`.`rate` as `rate`
FROM `currency` 
LEFT JOIN 
  `usd_rate` ON `currency`.`id` = `usd_rate`.`id` 
WHERE 
  `usd_rate`.`stamp` < '2009-10-14 1300GMT' 
ORDER BY `usd_rate`.`stamp` DESC 
LIMIT 1

I'm not sure about the date formatting, it might be necessary to use a function to properly format it, such as UNIX_TIMESTAMP('2009-10-14 1300GMT')... but I'm pretty sure about the rest.

That should get the currency exchange rate at the given time (the last rate we have in the database before that time, that is). For all records just remove the ORDER BY and LIMIT clauses.

I'm not even sure I understood the exercise.

The database schema, generated with the help of the HeidiSQL GUI client:

-- --------------------------------------------------------
-- Host:                         localhost
-- Server version:               5.5.15-log - MySQL Community Server (GPL)
-- Server OS:                    Win32
-- HeidiSQL version:             7.0.0.4053
-- Date/time:                    2012-04-27 13:18:21
-- --------------------------------------------------------

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET NAMES utf8 */;
/*!40014 SET FOREIGN_KEY_CHECKS=0 */;

-- Dumping database structure for test
CREATE DATABASE IF NOT EXISTS `test` /*!40100 DEFAULT CHARACTER SET utf8 */;
USE `test`;


-- Dumping structure for table test.currency
CREATE TABLE IF NOT EXISTS `currency` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `name` varchar(250) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- Dumping data for table test.currency: ~0 rows (approximately)
/*!40000 ALTER TABLE `currency` DISABLE KEYS */;
/*!40000 ALTER TABLE `currency` ENABLE KEYS */;


-- Dumping structure for table test.usd_rate
CREATE TABLE IF NOT EXISTS `usd_rate` (
  `id` int(10) NOT NULL AUTO_INCREMENT,
  `stamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `currency_id` int(10) NOT NULL,
  `rate` float NOT NULL,
  PRIMARY KEY (`id`),
  KEY `FK_usd_rate_currency` (`currency_id`),
  CONSTRAINT `FK_usd_rate_currency` FOREIGN KEY (`currency_id`) REFERENCES `currency` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-- Dumping data for table test.usd_rate: ~0 rows (approximately)
/*!40000 ALTER TABLE `usd_rate` DISABLE KEYS */;
/*!40000 ALTER TABLE `usd_rate` ENABLE KEYS */;
/*!40014 SET FOREIGN_KEY_CHECKS=1 */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;

Ayman · Apr 27, 2012

SELECT Currency.Name, usd_rate.Stamp, usd_rate.Rate
FROM Currency, usd_rate
WHERE usd_rate.Stamp =  '2012-04-27 00:00:00'
AND Currency.ID = usd_rate.Currency_ID

arithma · Apr 27, 2012

usd_rates is a time series. This is evident (Ayman's schema captures that).
So both queries are still off.

rolf · Apr 27, 2012

Can you explain more?

Ayman · Apr 27, 2012

@Arithma, the Rates actually change during the day?

xterm · Apr 27, 2012

Not that it matters, but shouldn't rate be a float?

arithma · Apr 27, 2012

I used rate exchange and timestamps to specifically answer that. Many changes per second even.

arithma · Apr 27, 2012

xterm wroteNot that it matters, but shouldn't rate be a float?

I said number because I didn't want it to make a difference. Some people prefer decimal types or fixed precision in finance, I imagine. Float has a lot of undesireable traits.

arithma · Apr 27, 2012

Just to make sure you guys don't miss what I meant: In the result, every currency must at most be represented once. You should have considered this by yourselves as a plausible scenario and picked to solve against it. At any rate...

arithma · Apr 27, 2012

Conversation with Ayman wroteMohammad Skafi
day1, day2, day4
those are example series
u're asked for day3's rate
u just default to day2

2:50
Ayman
why?

2:50
Mohammad Skafi
imagine people entering the rates manually
every month

2:50
Ayman
ok

2:50
Mohammad Skafi
but they also want to emergency add a rate in the middle of the month

2:50
Ayman
once per month

2:50
Mohammad Skafi
so it should handle all those details

2:51
Ayman
ahhhhhhhh
ye3ne hene
mesh every single day
are setting
the rate

2:51
Mohammad Skafi
I never said that

2:51
Ayman
ye3ne whats between may 1 and may 23 is a fixed rate
rej3o maslan on may 24
sar fi new rate

2:51
Mohammad Skafi
yep

2:51
Ayman
ok ok

2:52
Mohammad Skafi
I'll use this conversation
in the post
ok?

rolf · Apr 27, 2012

arithma wroteJust to make sure you guys don't miss what I meant: In the result, every currency must at most be represented once. You should have considered this by yourselves as a plausible scenario and picked to solve against it. At any rate...

SELECT 
  `currency`.`id` AS `id`,
  `currency`.`name` AS `name`,
  `usd_rate`.`rate` as `rate`
FROM `currency`
LEFT JOIN 
  `usd_rate` ON `currency`.`id` = `usd_rate`.`id` 
WHERE 
  `usd_rate`.`stamp` < '2009-10-14 13:00:00' 
GROUP BY `currency`.`id`

(note the slightly modified time format)

SELECT DISTINCT also works. One thing that bothers me is that I'm not sure that this query gets the most recent rate before the given date. It will surely return a rate that is before the current date for every currency but how can we guarantee that it's the most recent for each currency...?

PS: Thanks for clarifying.

Ayman · Apr 27, 2012

Based on my schema above, by doing an inner join between the original usd_rate table and a derived table of usd_rate with the max timestamps, should return all unique the unique currencies rates based on the latest timestamps of each.

SELECT *
FROM usd_rate t
INNER JOIN
(
    SELECT usd_rate.Currency_ID, max(usd_rate.Stamp) AS maxStamp
    FROM   usd_rate
    GROUP BY Currency_ID
) AS m
ON t.Currency_ID = m.Currency_ID
AND t.Stamp = m.maxStamp

arithma · Apr 27, 2012

@Ayman: So it doesn't depend on the input timestamp?

Ayman · Apr 27, 2012

@Arithma yes you're right this would only work if the time instance is the current timestamp. In case the timestamp was at a prior point in time(less than the real max) the max value should be derived relative to the input timestamp.

arithma · Apr 27, 2012

OK, so don't come so close and tease me with an almost there solution. I gotta go and use this code in production after all :P

rolf · Apr 27, 2012

AymanFarhat wrote@Arithma yes you're right this would only work if the time instance is the current timestamp. In case the timestamp was at a prior point in time(less than the real max) the max value should be derived relative to the input timestamp.

Then just add a WHERE, no? (See my query...)

arithma · Apr 28, 2012

@rolf:
Your first query will return more than one record for each currency (if there are enough records of course). This is what's wrong with it.
The second query is semantically incorrect since you don't have anything to group over.
@Ayman:
I don't understand the need for this piece of code on the last line. Seems you're over constraining:

t.Stamp = m.maxStamp

Let me chime in with a solution. Do check it out for correctness though, (never can be 100% sure):
currency (id PK, name)
usd_rate (id PK, stamp (timestamp), currency_id FK, rate (number))

SELECT `id`, `name`,
	(SELECT `rate`
		FROM `usd_rate`
		WHERE `currency_id`=`c`.`id` AND `stamp` <= %%%
		ORDER BY `stamp` DESC
		LIMIT 1
	) `rate`
FROM `currency` `c`

If the stamp of the rate was needed, it's easier to join over the usd_rate and then extract out whatever you need.

Ayman · Apr 28, 2012

@Arithma, the t.Stamp = m.maxStamp is needed because we're matching the two tables and getting only the rows that have the max timestamp, without it, it would return be returning all possible values of rates for the currencies.

I think your solution should work well, will try it.

arithma · Apr 28, 2012

@Ayman: You're correct. I saw I made a mistake with my comment but didn't want to reply on the iPhone.
@Rolf: The problem with your query is that it will return all the rates before the set timestamp rather than the last. Mind you, the query is supposed to return the rates of all the currencies together, no just one. Hope this helps you see my point.