hive同步到pg

最新推荐文章于 2025-01-13 14:12:46 发布

TriumPhSK

最新推荐文章于 2025-01-13 14:12:46 发布

阅读量2.3k

点赞数

CC 4.0 BY-SA版权

分类专栏： Linux 文章标签： linux hive pgsql 大数据

本文链接：https://blog.csdn.net/qq_38821502/article/details/111690086

Linux 专栏收录该内容

6 篇文章

订阅专栏

博客主要讲述将hive库的数据同步到PostgreSQL库中，涉及大数据处理和数据库操作，为数据迁移提供了一种可行方案。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

将hive库的数据同步到PostgreSQL库中

#!/bin/bash
# 参数校验
if [ $# -lt 1 ] ; then
  echo "期待1个参数"
  exit 1
fi

#时间处理
dt="$(date -d "$1 1 day ago" +'%Y-%m-%d')"

# pg库的配置连接信息
rdb="postgresql"
host="**********************"
port="**********************"
user="**********************"
password="******************"
database="******************"

export LANG="en_US.UTF-8"
export PGPASSWORD=${password}

hive_table="****************"
hive_partition="dt='${dt}'"
pg_table="******************"

#路径
current_path="$(cd $(dirname $0);pwd)"
tmp_file="$current_path/data"

hive_sql="
  select 
    name,
    sex,
    age
  from 
    ${hive_table}
  where dt='${dt}';
"
sudo -u root /home/apache-hive-2.1.1-bin/bin/hive -e "  ${hive_sql} " > $tmp_file || exit 1

pg_sql="
  begin;                                                 -- 回滚用
  truncate table ${pg_table};                            -- 清空表数据
  alter sequence ${pg_table}_id_seq restart with 1;      -- 表的自增id 置为1 
  copy ${pg_table}(name,sex,age) from stdin;             -- 将数据导入到pg中
  commit;                                                -- 回滚用 中间任意一步出错就回滚
"
# id_seq 一般会在前面加上table名  ${pg_table}_id_seq
# 如果 alter sequence ${pg_table}_id_seq restart with 1; 报以下错：
# ERROR: must be owner of sequence ${pg_table}_id_seq
# 就将其改成  select setval('${pg_table}_id_seq'::regclass,1);
# setval不需要owner权限

cat $tmp_file | psql -h${host} -p${port} -d${database} -U${user}  -c " ${pg_sql} " || exit 1