mysql - How to find previous record [n-per-group max(timestamp) < timestamp]? -
i have large table containing time series sensor data. large few thousand 10m record divided amongst various channels being monitored. sensor type need calculate time interval between current , previous reading, i.e. find largest timestamp prior current one.
the obvious approaches come mind, each measured on core i5 channel of 40k entries:
correlated subquery
select collect.*, prev.timestamp prev_timestamp data collect left join data prev on prev.channel_id = collect.channel_id , prev.timestamp = ( select max(timestamp) data data.channel_id = collect.channel_id , data.timestamp < collect.timestamp ) collect.channel_id=14 , collect.timestamp >= 0 order collect.timestamp
time (exec, fetch) 11sec, 21sec
plan
+----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | | +----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+ | 1 | primary | collect | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | const | 45820 | using | | 1 | primary | prev | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 13 | const,func | 1 | using index | | 2 | dependent subquery | data | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | nils.collect.channel_id | 2495 | using where; using index | +----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+
anti join
select d1.*, d2.timestamp prev_timestamp data d1 left join data d2 on d2.channel_id=14 , d2.timestamp < d1.timestamp left join data d3 on d3.channel_id=14 , d3.timestamp < d1.timestamp , d3.timestamp > d2.timestamp d3.timestamp null , d1.channel_id=14 order timestamp
time 12sec, 21sec
plan
+----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | | +----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+ | 1 | simple | d1 | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | const | 45820 | using | | 1 | simple | d2 | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | const | 47194 | using index | | 1 | simple | d3 | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | const | 47194 | using where; using index; not exists | +----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+
and i've come pattern i'm calling naive count
select current.*, prev.timestamp prev_timestamp ( select data.*, @r1 := @r1+1 rownum data cross join (select @r1 := 0) vars channel_id=14 order timestamp ) current left join ( select data.*, @r2 := @r2+1 rownum data cross join (select @r2 := 0) vars channel_id=14 order timestamp ) prev on current.rownum = prev.rownum+1
time 1.1sec (this 1 fastest!)
plan
+----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | | +----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+ | 1 | primary | <derived2> | | | | | | 24475 | | | 1 | primary | <derived4> | | | | | | 24475 | | | 4 | derived | <derived5> | system | | | | | 1 | | | 4 | derived | data | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | | 45820 | using | | 5 | derived | | | | | | | | no tables used | | 2 | derived | <derived3> | system | | | | | 1 | | | 2 | derived | data | ref | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq | 5 | | 45820 | using | | 3 | derived | | | | | | | | no tables used | +----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+
as query runs on small platforms raspi performance critical- couple seconds highest acceptable.
my question: last approach 1 greatest-n-per-group or there better ones? expected correlated subquery slow experienced?
the last approach variables reasonable. might try:
select collect.*, (select max(timestamp) data data.channel_id = collect.channel_id , data.timestamp < collect.timestamp ) prev_timestamp data collect collect.channel_id = 14 , collect.timestamp >= 0 order collect.timestamp;
in addition, create indexes on: collect(channel_id, timestamp).
edit:
the following might fastest:
select d.*, if(@channel_id = channel_id, @prev_timestamp, null) prev_timestamp, @channel_id := channel_id, @prev_timestamp = timestamp data d cross join (select @channel_id := 0, @prev_timestamp := 0) vars collect.channel_id = 14 , collect.timestamp >= 0 order channel_id, timestamp;
Comments
Post a Comment