Redshift UPDATE prohibitively slow -


i have table in redshift cluster ~1 billion rows. have job tries update column values based on filter. updating @ in table incredibly slow. here's example:

select col1, col2, col3 sometable col1 = 'a value of col1'   , col2 = 12; 

the above query returns in less second, because have sortkeys on col1 , col2. there 1 row meets criteria, result set 1 row. however, if run:

update sometable set col3 = 20 col1 = 'a value of col1'   , col2 = 12; 

this query takes unknown amount of time (i stopped after 20 minutes). again, should updating 1 column value of 1 row.

i have tried follow documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, talks creating temporary staging table update main table, got same results.

any idea going on here?

you didn't mention percentage of table you're updating it's important note update in redshift 2 step process:

  1. each row changed must first marked deletion
  2. then new version of data must written for each column in table

if have large number of columns and/or updating large number of rows process can labor intensive database.

you experiment using create table as statement create new "updated" version of table , dropping existing table , renaming new table. has added benefit of leaving sorted table.


Comments

Popular posts from this blog

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -

jsf - How to ajax update an item in the footer of a PrimeFaces dataTable? -

django - CSRF verification failed. Request aborted. CSRF cookie not set -