doris 数据导入、数据权限、资源管理

原创已于 2024-06-20 16:14:21 修改 · 1k 阅读

18 ·

CC 4.0 BY-SA版权

文章标签：

#doris

于 2024-06-20 16:12:47 首次发布

doris 专栏收录该内容

2 篇文章

订阅专栏

服务器环境

172.16.10.65 node1 （fe）

172.16.10.66 node2（be）

172.16.10.75 node3（be）

172.16.10.76 node4（be）

安装目录

doris

home/apps/doris-2.1.1

启动命令

# FE(65服务器)
/home/apps/doris-2.1.1/fe/bin/start_fe.sh --daemon
# BE (66服务器)
sh /home/apps/start/doris.sh start

停止命令

# FE(65服务器)
/home/apps/doris-2.1.1/fe/bin/stop_fe.sh
# BE(66服务器)
sh /home/apps/start/doris.sh stop

doris jdbc

172.16.10.65 9030 root/

HTTPS 安全传输 **

HTTP 安全传输 - Apache Doris

stream load 内部流程

高危sql限制

高危 SQL 限制 - Apache Doris

正则限制

Kill Query

Kill Query - Apache Doris

扫描超时连接

监控和报警

监控和报警 - Apache Doris

Doris 使用 Prometheus 和 Grafana 进行监控项的采集和展示。

服务自动拉起

服务自动拉起 - Apache Doris

本文档主要介绍如何配置 Doris 集群的自动拉起，保证生产环境中出现特殊情况导致服务宕机后未及时拉起服务从而影响到业务的正常运行。

数据库读写权限 ***

认证和鉴权 - Apache Doris

每个用户一个独立数据库，自己的库可以读写，主库只能读

服务器资源权限 ***

SET-PROPERTY - Apache Doris

用户cpu、内存、超时限制

审计日志插件 **

审计日志插件 - Apache Doris

用户sql记录

导入方式


curl --location-trusted \
-u root: \
-H "label:test1" \
-H "column_separator:," \
-H "columns: c1,c2,c3,c4,c5,c6,part_id = c1,month_id =c2,day_id =c3,prov_id = md5(c4),area_id =c5,create_time = c6,xxxx = md5(c3)"-H"label:11"\
-T test.csv \
http://localhost:8030/api/test/TESTXX/_stream_load

# BUG
curl  --location-trusted \
-u root: \
-T /home/apps/testdata1/test.csv  \
-H "label:test1" \
-H "sql:insert into test.TESTXX(part_id,month_id,day_id ,prov_id,area_id,create_time,xxxx) select c1,TO_BASE64(SM4_ENCRYPT(c2,'xxxxx')),c3,c4,c5,c6,md5(c2) from http_stream('format' = 'CSV', 'column_separator' = ',' ) " \
http://localhost:8030/api/_http_stream

{
    "TxnId": 87810,
    "Label": "11",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 9,
    "NumberLoadedRows": 9,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 504,
    "LoadTimeMs": 363,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 5,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 338,
    "CommitAndPublishTimeMs": 17
}

{
    "TxnId": 87812,
    "Label": "112",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "[DATA_QUALITY_ERROR]too many filtered rows",
    "NumberTotalRows": 9,
    "NumberLoadedRows": -9,
    "NumberFilteredRows": 18,
    "NumberUnselectedRows": 0,
    "LoadBytes": 504,
    "LoadTimeMs": 128,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 16,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 110,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://172.16.10.66:8040/api/_load_error_log?file=__shard_54/error_log_insert_stmt_564669773e05b944-2d7a303f7cb98396_564669773e05b944_2d7a303f7cb98396"
}

//  label: 一次导入的标签，相同标签的数据无法多次导入。用户可以通过指定Label的方式来避免一份数据重复导入的问题。
//  当前Doris内部保留30分钟内最近成功的label。

//  max_filter_ratio：最大容忍可过滤（数据不规范等原因）的数据比例。默认零容忍。数据不规范不包括通过 where 条件过滤掉的行。

//  exec_mem_limit: 导入内存限制。默认为 2GB。单位为字节。如果并发导入过多，可能会导致内存不足。

//  skip_lines: 整数类型, 默认值为0, 含义为跳过csv文件的前几行. 当设置format设置为 csv_with_names 或、csv_with_names_and_types 时, 该参数会失效.

//  sql: 数据tvf转换方法 2.1版本新特性，tvf产生的字段名 c1-c2------cn
//  column_separator：用于指定导入文件中的列分隔符，默认为\t。如果是不可见字符，则需要加\x作为前缀，使用十六进制来表示分隔符。
//  如hive文件的分隔符\x01，需要指定为-H "column_separator:\x01"。

//  strict_mode 严格模式，由于数据库设置了字段类型，在由文本中的字符串数据转换到数据库字段类型失败时，严格模式会认为该条数据是错误数据。