Redshift UPDATE prohibitively slow -
i have table in redshift cluster ~1 billion rows. have job tries update column values based on filter. updating @ in table incredibly slow. here's example:
select col1, col2, col3 sometable col1 = 'a value of col1' , col2 = 12; the above query returns in less second, because have sortkeys on col1 , col2. there 1 row meets criteria, result set 1 row. however, if run:
update sometable set col3 = 20 col1 = 'a value of col1' , col2 = 12; this query takes unknown amount of time (i stopped after 20 minutes). again, should updating 1 column value of 1 row.
i have tried follow documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, talks creating temporary staging table update main table, got same results.
any idea going on here?
you didn't mention percentage of table you're updating it's important note update in redshift 2 step process:
- each row changed must first marked deletion
- then new version of data must written for each column in table
if have large number of columns and/or updating large number of rows process can labor intensive database.
you experiment using create table as statement create new "updated" version of table , dropping existing table , renaming new table. has added benefit of leaving sorted table.
Comments
Post a Comment