mysql - How to find previous record [n-per-group max(timestamp) < timestamp]? -
i have large table containing time series sensor data. large few thousand 10m record divided amongst various channels being monitored. sensor type need calculate time interval between current , previous reading, i.e. find largest timestamp prior current one.
the obvious approaches come mind, each measured on core i5 channel of 40k entries:
correlated subquery
select collect.*, prev.timestamp prev_timestamp data collect  left join data prev on prev.channel_id = collect.channel_id , prev.timestamp = (      select max(timestamp)      data      data.channel_id = collect.channel_id , data.timestamp < collect.timestamp )  collect.channel_id=14 , collect.timestamp >= 0  order collect.timestamp time (exec, fetch) 11sec, 21sec
plan
+----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+ | id |    select_type     |  table  | type |        possible_keys         |   key   | key_len |           ref           | rows  |                    | +----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+ |  1 | primary            | collect | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 | const                   | 45820 | using              | |  1 | primary            | prev    | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |      13 | const,func              |     1 | using index              | |  2 | dependent subquery | data    | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 | nils.collect.channel_id |  2495 | using where; using index | +----+--------------------+---------+------+------------------------------+---------+---------+-------------------------+-------+--------------------------+ anti join
select d1.*, d2.timestamp prev_timestamp data d1 left join data d2 on     d2.channel_id=14 ,     d2.timestamp < d1.timestamp  left join data d3 on     d3.channel_id=14 ,     d3.timestamp < d1.timestamp ,     d3.timestamp > d2.timestamp      d3.timestamp null ,     d1.channel_id=14 order timestamp time 12sec, 21sec
plan
+----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+ | id | select_type | table | type |        possible_keys         |   key   | key_len |  ref  | rows  |                                | +----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+ |  1 | simple      | d1    | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 | const | 45820 | using                          | |  1 | simple      | d2    | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 | const | 47194 | using index                          | |  1 | simple      | d3    | ref  | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 | const | 47194 | using where; using index; not exists | +----+-------------+-------+------+------------------------------+---------+---------+-------+-------+--------------------------------------+ and i've come pattern i'm calling naive count
select current.*, prev.timestamp prev_timestamp (     select data.*, @r1 := @r1+1 rownum data     cross join (select @r1 := 0) vars      channel_id=14     order timestamp ) current left join (     select data.*, @r2 := @r2+1 rownum data     cross join (select @r2 := 0) vars      channel_id=14     order timestamp ) prev on current.rownum = prev.rownum+1 time 1.1sec (this 1 fastest!)
plan
+----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+ | id | select_type |   table    |  type  |        possible_keys         |   key   | key_len | ref | rows  |          | +----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+ |  1 | primary     | <derived2> |    |                              |         |         |     | 24475 |                | |  1 | primary     | <derived4> |    |                              |         |         |     | 24475 |                | |  4 | derived     | <derived5> | system |                              |         |         |     |     1 |                | |  4 | derived     | data       | ref    | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 |     | 45820 | using    | |  5 | derived     |            |        |                              |         |         |     |       | no tables used | |  2 | derived     | <derived3> | system |                              |         |         |     |     1 |                | |  2 | derived     | data       | ref    | ts_uniq,idx_adf3f36372f5a1aa | ts_uniq |       5 |     | 45820 | using    | |  3 | derived     |            |        |                              |         |         |     |       | no tables used | +----+-------------+------------+--------+------------------------------+---------+---------+-----+-------+----------------+ as query runs on small platforms raspi performance critical- couple seconds highest acceptable.
my question: last approach 1 greatest-n-per-group or there better ones? expected correlated subquery slow experienced?
the last approach variables reasonable. might try:
select collect.*,        (select max(timestamp)         data         data.channel_id = collect.channel_id , data.timestamp < collect.timestamp        ) prev_timestamp data collect  collect.channel_id = 14 , collect.timestamp >= 0  order collect.timestamp; in addition, create indexes on: collect(channel_id, timestamp).
edit:
the following might fastest:
  select d.*,          if(@channel_id = channel_id, @prev_timestamp, null) prev_timestamp,          @channel_id := channel_id, @prev_timestamp = timestamp   data d cross join        (select @channel_id := 0, @prev_timestamp := 0) vars   collect.channel_id = 14 , collect.timestamp >= 0    order channel_id, timestamp; 
Comments
Post a Comment