You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: best-practices/readonly-nodes.md
+1-9Lines changed: 1 addition & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -115,15 +115,7 @@ To read data from read-only nodes when using TiDB, you can set the system variab
115
115
set tidb_replica_read=learner;
116
116
```
117
117
118
-
#### 3.2 Use Follower Read in TiSpark
119
-
120
-
To read data from read-only nodes when using TiSpark, you can set the configuration item `spark.tispark.replica_read` to `learner`in the Spark configuration file:
121
-
122
-
```
123
-
spark.tispark.replica_read learner
124
-
```
125
-
126
-
#### 3.3 Use Follower Read when backing up cluster data
118
+
#### 3.2 Use Follower Read when backing up cluster data
127
119
128
120
To read data from read-only nodes when backing up cluster data, you can specify the `--replica-read-label` option in the br command line. Note that when running the following commandin shell, you need to use single quotes to wrap the label to prevent `$` from being parsed.
Copy file name to clipboardExpand all lines: ecosystem-tool-user-guide.md
-4Lines changed: 0 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -132,7 +132,3 @@ The following are the basics of sync-diff-inspector:
132
132
- Source: MySQL/TiDB clusters
133
133
- Target: MySQL/TiDB clusters
134
134
- Supported TiDB versions: all versions
135
-
136
-
## OLAP Query tool - TiSpark
137
-
138
-
[TiSpark](/tispark-overview.md) is a product developed by PingCAP to address the complexity of OLAP queries. It combines strengths of Spark, and the features of distributed TiKV clusters and TiDB to provide a one-stop Hybrid Transactional and Analytical Processing (HTAP) solution.
Copy file name to clipboardExpand all lines: explore-htap.md
+7-13Lines changed: 7 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,21 +57,15 @@ For more information about the architecture, see [architecture of TiDB HTAP](/ti
57
57
58
58
## Environment preparation
59
59
60
-
Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corresponding storage engines according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution.
60
+
Before exploring TiDB HTAP features, you need to deploy TiDB and its columnar storage engine TiFlash. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the solution.
61
61
62
-
- TiFlash
62
+
- If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
63
+
- If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
64
+
- When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
63
65
64
-
- If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
65
-
- If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
66
-
- When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
67
-
68
-
- If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
69
-
- If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
70
-
- If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
71
-
72
-
- TiSpark
73
-
74
-
- If your data needs to be analyzed with Spark, deploy TiSpark. For specific process, see [TiSpark User Guide](/tispark-overview.md).
66
+
- If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
67
+
- If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
68
+
- If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
75
69
76
70
<!-- - Real-time stream processing
77
71
- If you want to build an efficient and easy-to-use real-time data warehouse with TiDB and Flink, you are welcome to participate in Apache Flink x TiDB meetups.-->
Copy file name to clipboardExpand all lines: production-deployment-using-tiup.md
-1Lines changed: 0 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -257,7 +257,6 @@ The following examples cover six common scenarios. You need to modify the config
257
257
| OLTP | [Deploy minimal topology](/minimal-deployment-topology.md) | [Simple minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-mini.yaml) <br/> [Full minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-mini.yaml) | This is the basic cluster topology, including tidb-server, tikv-server, and pd-server. |
258
258
| HTAP | [Deploy the TiFlash topology](/tiflash-deployment-topology.md) | [Simple TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tiflash.yaml) <br/> [Full TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tiflash.yaml) | This is to deploy TiFlash along with the minimal cluster topology. TiFlash is a columnar storage engine, and gradually becomes a standard cluster topology. |
259
259
| Replicate incremental data using [TiCDC](/ticdc/ticdc-overview.md) | [Deploy the TiCDC topology](/ticdc-deployment-topology.md) | [Simple TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-cdc.yaml) <br/> [Full TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-cdc.yaml) | This is to deploy TiCDC along with the minimal cluster topology. TiCDC supports multiple downstream platforms, such as TiDB, MySQL, Kafka, MQ, and storage services. |
260
-
| Use OLAP on Spark | [Deploy the TiSpark topology](/tispark-deployment-topology.md) | [Simple TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tispark.yaml) <br/> [Full TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tispark.yaml) | This is to deploy TiSpark along with the minimal cluster topology. TiSpark is a component built for running Apache Spark on top of TiDB/TiKV to answer the OLAP queries. Currently, TiUP cluster's support for TiSpark is still **experimental**. |
261
260
| Deploy multiple instances on a single machine | [Deploy a hybrid topology](/hybrid-deployment-topology.md) | [Simple configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/simple-multi-instance.yaml) <br/> [Full configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/complex-multi-instance.yaml) | The deployment topologies also apply when you need to add extra configurations for the directory, port, resource ratio, and label. |
262
261
| Deploy TiDB clusters across data centers | [Deploy a geo-distributed deployment topology](/geo-distributed-deployment-topology.md) | [Configuration template forgeo-distributed deployment](https://github.com/pingcap/docs/blob/master/config-templates/geo-redundancy-deployment.yaml) | This topology takes the typical architecture of three data centersin two cities as an example. It introduces the geo-distributed deployment architecture and the key configuration that requires attention. |
Copy file name to clipboardExpand all lines: releases/release-5.4.0.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -160,7 +160,7 @@ In v5.4, the key new features or improvements are as follows:
160
160
161
161
This feature is disabled by default. When it is enabled, if a user operating through TiSpark does not have the needed permissions, the user gets an exception from TiSpark.
Copy file name to clipboardExpand all lines: releases/release-6.6.0.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -414,8 +414,8 @@ In v6.6.0-DMR, the key new features and improvements are as follows:
414
414
| DM | [`on-duplicate-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item controls how DM resolves conflicting data in the physical import mode. The default value is `"none"`, which means not resolving conflicting data. `"none"` has the best performance, but might lead to inconsistent data in the downstream database. |
415
415
| DM | [`sorting-dir-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item specifies the directory used for local KV sorting in the physical import mode. The default value is the same as the `dir` configuration. |
416
416
| sync-diff-inspector | [`skip-non-existing-table`](/sync-diff-inspector/sync-diff-inspector-overview.md#configuration-file-description) | Newly added | This configuration item controls whether to skip checking upstream and downstream data consistency when tables in the downstream do not exist in the upstream. |
417
-
| TiSpark | [`spark.tispark.replica_read`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
418
-
| TiSpark | [`spark.tispark.replica_read.label`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
417
+
| TiSpark | [`spark.tispark.replica_read`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview/#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
418
+
| TiSpark | [`spark.tispark.replica_read.label`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
When the telemetry feature is enabled, TiUP and TiSpark collect usage information and share the information with PingCAP to help understand how to improve the product.
9
+
When the telemetry feature is enabled, TiUP collects usage information and share the information with PingCAP to help understand how to improve the product.
10
10
11
11
> **Note:**
12
12
>
13
13
> - Starting from TiUP v1.11.3, the telemetry feature in TiUP is disabled by default, which means TiUP usage information is not collected by default. If you upgrade from a TiUP version earlier than v1.11.3 to v1.11.3 or a later version, the telemetry feature keeps the same status as before the upgrade.
14
-
> - Starting from TiSpark v3.0.3, the telemetry feature in TiSpark is disabled by default, which means TiSpark usage information is not collected by default.
15
14
> - For versions from v8.1.0 to v8.5.1, the telemetry feature in TiDB and TiDB Dashboard is removed.
16
15
> - Starting from v8.5.3, TiDB reintroduces the telemetry feature. However, it only logs telemetry-related information locally and no longer sends data to PingCAP over the network.
17
16
18
17
## What is shared when telemetry is enabled?
19
18
20
-
The following sections describe the shared usage information in detail for TiUP and TiSpark. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
19
+
The following sections describe the shared usage information in detail for TiUP. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
21
20
22
21
> **Note:**
23
22
>
24
23
> In **ALL** cases, user data stored in the TiDB cluster will **NOT** be shared. You can also refer to [PingCAP Privacy Policy](https://pingcap.com/privacy-policy).
25
24
26
-
### TiUP
27
-
28
25
When the telemetry collection feature is enabled in TiUP, usage details of TiUP will be shared, including (but not limited to):
29
26
30
27
- A randomly generated telemetry ID.
@@ -37,52 +34,22 @@ To view the full content of the usage information shared to PingCAP, set the `TI
37
34
TIUP_CLUSTER_DEBUG=enable tiup cluster list
38
35
```
39
36
40
-
### TiSpark
41
-
42
-
> **Note:**
43
-
>
44
-
> Starting from v3.0.3, the telemetry collection is disabled by default in TiSpark, and usage information is not collected and shared with PingCAP.
45
-
46
-
When the telemetry collection feature is enabled for TiSpark, the Spark module will share the usage details of TiSpark, including (but not limited to):
47
-
48
-
- A randomly generated telemetry ID.
49
-
- Some configuration information of TiSpark, such as the read engine and whether streaming read is enabled.
50
-
- Cluster deployment information, such as the machine hardware information, OS information, and component version number of the node where TiSpark is located.
51
-
52
-
You can view TiSpark usage information that is collected in Spark logs. You can set the Spark log level to INFO or lower, for example:
53
-
54
-
```shell
55
-
grep "Telemetry report" {spark.log} | tail -n 1
56
-
```
57
-
58
37
## Enable telemetry
59
38
60
-
### Enable TiUP telemetry
61
-
62
39
To enable the TiUP telemetry collection, execute the following command:
63
40
64
41
```shell
65
42
tiup telemetry enable
66
43
```
67
44
68
-
### Enable TiSpark telemetry
69
-
70
-
To enable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = true` in the TiSpark configuration file.
71
-
72
45
## Disable telemetry
73
46
74
-
### Disable TiUP telemetry
75
-
76
47
To disable the TiUP telemetry collection, execute the following command:
77
48
78
49
```shell
79
50
tiup telemetry disable
80
51
```
81
52
82
-
### Disable TiSpark telemetry
83
-
84
-
To disable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = false` in the TiSpark configuration file.
85
-
86
53
## Check telemetry status
87
54
88
55
For TiUP telemetry, execute the following command to check the telemetry status:
Copy file name to clipboardExpand all lines: tiflash/tiflash-overview.md
+2-19Lines changed: 2 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,7 +30,7 @@ Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the A
30
30
31
31
<CustomContentplatform="tidb">
32
32
33
-
TiFlash is compatible with both TiDB and TiSpark, which enables you to freely choose between these two computing engines.
33
+
TiFlash is compatible with TiDB. You can use TiDB as the computing engine for TiFlash.
34
34
35
35
</CustomContent>
36
36
@@ -85,27 +85,10 @@ TiFlash shares the computing workload in the same way as the TiKV Coprocessor do
85
85
86
86
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.
87
87
88
-
<CustomContentplatform="tidb">
89
-
90
-
You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:
91
-
92
-
</CustomContent>
93
-
94
-
<CustomContentplatform="tidb-cloud">
95
-
96
-
You can use TiDB to read TiFlash replicas for analytical processing. See the following sections for details:
97
-
98
-
</CustomContent>
88
+
You can use TiDB to read TiFlash replicas. See the following sections for details:
0 commit comments