Parse json arrays using HIVE -
i have many json arrays stored in table (jt) looks this:
[{"ts":1403781896,"id":14,"log":"show"},{"ts":1403781896,"id":14,"log":"start"}] [{"ts":1403781911,"id":14,"log":"press"},{"ts":1403781911,"id":14,"log":"press"}]
each array record.
i parse table in order new table (logs) 3 fields: ts, id, log. tried use get_json_object method, seems method not compatible json arrays because null values.
this code have tested:
create table logs select get_json_object(jt.value, '$.ts') ts, get_json_object(jt.value, '$.id') id, get_json_object(jt.value, '$.log') log jt;
i tried use other functions seem complicated. thank you! :)
update! solved issue performing regexp:
create table jt_reg select regexp_replace(regexp_replace(value,'\\}\\,\\{','\\}\\\n\\{'),'\\[|\\]','') valuereg jt; create table logs select get_json_object(jt_reg.valuereg, '$.ts') ts, get_json_object(jt_reg.valuereg, '$.id') id, get_json_object(jt_reg.valuereg, '$.log') log ams_json_reg;
i ran problem, json array stored string in hive table.
the solution bit hacky , ugly, works , doesn't require serdes or external udfs
select get_json_object(single_json_table.single_json, '$.ts') ts, get_json_object(single_json_table.single_json, '$.id') id, get_json_object(single_json_table.single_json, '$.log') log ( select explode ( split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2), '"}","', '"}",,,,"'), ',,,,') ) src_table) single_json_table;
i broke lines little easier read. i'm using substr() strip first , last characters, removing [ , ] . i'm using regex_replace match separator between records in json array , adding or changing separator unique can used split() turn string hive array of json objects can used explode() described in previous solution.
note, separator regex used here ( "}"," ) wouldn't work original data set...the regex have ( "},\{" ) , replacement need "},,,,{" eg..
split(regexp_replace(substr(json_array_col, 2, length(json_array_col)-2), '"},\\{"', '"},,,,{"'), ',,,,')
Comments
Post a Comment