Redshift UPDATE prohibitively slow -

July 15, 2012

i have table in redshift cluster ~1 billion rows. have job tries update column values based on filter. updating @ in table incredibly slow. here's example:

select col1, col2, col3 sometable col1 = 'a value of col1'   , col2 = 12;

the above query returns in less second, because have sortkeys on col1 , col2. there 1 row meets criteria, result set 1 row. however, if run:

update sometable set col3 = 20 col1 = 'a value of col1'   , col2 = 12;

this query takes unknown amount of time (i stopped after 20 minutes). again, should updating 1 column value of 1 row.

i have tried follow documentation here: http://docs.aws.amazon.com/redshift/latest/dg/merge-specify-a-column-list.html, talks creating temporary staging table update main table, got same results.

any idea going on here?

you didn't mention percentage of table you're updating it's important note update in redshift 2 step process:

each row changed must first marked deletion
then new version of data must written for each column in table

if have large number of columns and/or updating large number of rows process can labor intensive database.

you experiment using create table as statement create new "updated" version of table , dropping existing table , renaming new table. has added benefit of leaving sorted table.

Search This Blog

My

Redshift UPDATE prohibitively slow -

Comments

Post a Comment

Popular posts from this blog

Linux vanilla kernel on QEMU and networking with eth0 -

rdbms - what exactly the undo information lives in oracle? -

javascript - RequestAnimationFrame not working when exiting fullscreen switching space on Safari -