database - T-SQL help - can someone think of a set-based operation to replace our current row-basead operation in sql instead? -
i have question in 1 of process, it's written in row-based operation, , takes hour populate around 50,000 records take on 5 days 3million records. have been trying think of different approach involve set-based tsql , not use row-by-row/cursor, cursor makes process super slow, can't think of other ways. can help? here's trying accomplish, have table of email addresses gets updated everyday, , need parse out each part , add them separate email part table.
the email address table in format:
note domain or mailbox can empty values, not both. , email addresses unique in table.
specs:
(1) full domain split “.” parts. each part added parts table. right-most part assigned “level” value of 0. left-most part, part after @ sign, given “is_leaf_domain” value of 1. otherwise, parts numbered starting @ 0 on right-most part, , increasing left.
(2) mailbox added parts table as-is, “is_address” value of 1 , “level” value of number of parts in domain, plus one.
(3) each address have "is.address" , "is_leaf_domain" part, if empty value.
based on specs abive, above email address parsed out , inserted parts table in manner in parts table:
i can think of row-by-row operation accomplish this. @ each email address in email table 1 row @ time, starting looking @ right character of fulldomain string, moving left 1 character @ time until hit ".", , insert part parts table level 0, keep doing same thing , adding 1 level everytime new part inserted, when hit left (the beginning) of string , can't go further left anymore, part have "is_leaf_domain" = 1, appropriate label , level_con , parent_con being inserted. after done final step add whole mailbox parts table, level being previous level + 1 , "is.address" = 1, , fulldomain email table being parent_con.
populating parent_con/level_con values pretty self-explanatory looking @ parts table below.
for empty fulldomain, inserted blank value parts table "is_leaf_domain" = 1 empty level_con , empty parent_con.
for empty mailbox, inserted blank value parts table "is_address" = 1 , empty level_con (all mailbox/is_address = 1 has empty value level_con), , full_domain email table parent_con.
with approach, looking @ 1 record/email @ time, parse out string moving right left , insert each part parts table, move on next record.. , it's taking forever email table containing millions of rows, email address can have on 10 levels or so...
is there easier way without using row-by-row operation using set-based approach?
i thought creating temp table maybe calculate position of each "." using charindex() , using substring() parts out based on differnet position recorded in temp table, have no idea how start, there many email addresses have on 10 parts 30 parts...
does have idea how approach in set-based approach? row-based approach works takes long time on million records, if can think of approach appreciated, , if can write out whole code me in t-sql in sql server 2008 or create stored procedure me can run, forever in debt :)
or if can tell me there no way except doing row-by-row, appreciated stop thinking :)
thanks in advance help!
assuming i'm following you...
you have 3 distinct sets in parts table. is_address records, is_leaf_domain records, , exploded email domain records.
the first 2 pretty easy handle , i've done below. note - there's repeated crap handle additional columns on source table or creating temp table break ups. example, break rightofthe@ column , same leftofthe@. have same substring/charindex stuff repeated multiple times.
you should able take sql below , run in own environment (probably have make temptbl schema or drop schema specification):
if exists (select * sys.objects object_id = object_id(n'[temptbl].[addr]') , type in (n'u')) drop table temptbl.addr go create table temptbl.addr ( addressid int, clean_address varchar(100), mailbox varchar(100), fulldomain varchar(100) ) go insert temptbl.addr values (1,'john.scott@gmail.com','john.scott','gmail.com'); insert temptbl.addr values (2,'joseph@ny.email.gs.com','joseph','ny.email.gs.com'); insert temptbl.addr values (3,'@cornell.edu','','cornell.edu'); insert temptbl.addr values (4,'lenny@','lenny',''); go select addressid, levelcol = len(clean_address) - len(replace(clean_address,'.','')) + 1, label = substring(clean_address,0,coalesce(charindex('@',clean_address),0)), level_con = '', parent_con = substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999), is_address = 1, is_leaf_domain = 0 temptbl.addr union select addressid, levelcol = len(substring(clean_address,coalesce(charindex('@',clean_address),0),999)) - len(replace(substring(clean_address,coalesce(charindex('@',clean_address),0),999),'.','')), label = substring( substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999), 0, charindex('.',substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999))), level_con = substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999), parent_con = substring( substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999), charindex('.',substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999)) ,999), is_address = 0, is_leaf_domain = 1 temptbl.addr order addressid, levelcol desc
let me know if helps. today sure. i'll think on exploding set, or maybe gets before me.
i'm pretty close last piece i've gotta run , won't able @ least couple days. see gets you. i'm using recursion explode out each piece between dot.
select addressid, levelcol = 0, clean_address, rightofat = substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999), mailbox, fulldomain #t temptbl.addr substring(clean_address,coalesce(charindex('@',clean_address) + 1,0),999) > ''; t ( select addressid, levelcol, 0 r, currentlocation = 0, rightofat, charindex('.',reverse(rightofat)) dotindex #t union select addressid, levelcol = levelcol + 1, dotindex + 1 r, currentlocation = dotindex, rightofat, dotindex = charindex('.',reverse(rightofat),dotindex + 1) t dotindex > 0 --reach 0 in recursion when there no more located ) select addressid, levelcol, --currentlocation, --dotindex, label = case when dotindex > 0 reverse(substring(reverse(rightofat),currentlocation, dotindex)) else reverse(substring(reverse(rightofat),currentlocation + 1, 9999)) end, parent_con = rightofat, reverse(rightofat), is_address = 0, is_leaf_domain = 0 t dotindex <> 0 order addressid, levelcol desc drop table #t;
Comments
Post a Comment