Skip to content

Commit f7cb545

Browse files
authored
TiSpark: remove TiSpark docs and references (#22456)
1 parent f0ab86d commit f7cb545

15 files changed

+15
-754
lines changed

TOC.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
- [PD Microservices Topology](/pd-microservices-deployment-topology.md)
2424
- [TiProxy Topology](/tiproxy/tiproxy-deployment-topology.md)
2525
- [TiCDC Topology](/ticdc-deployment-topology.md)
26-
- [TiSpark Topology](/tispark-deployment-topology.md)
2726
- [Cross-DC Topology](/geo-distributed-deployment-topology.md)
2827
- [Hybrid Topology](/hybrid-deployment-topology.md)
2928
- [Deploy Using TiUP](/production-deployment-using-tiup.md)
@@ -552,8 +551,6 @@
552551
- [Quick Start](/clinic/quick-start-with-clinic.md)
553552
- [Troubleshoot Clusters Using PingCAP Clinic](/clinic/clinic-user-guide-for-tiup.md)
554553
- [PingCAP Clinic Diagnostic Data](/clinic/clinic-data-instruction-for-tiup.md)
555-
- TiSpark
556-
- [User Guide](/tispark-overview.md)
557554
- sync-diff-inspector
558555
- [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md)
559556
- [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md)
@@ -586,7 +583,6 @@
586583
- [Overview](/tiflash/tiflash-overview.md)
587584
- [Create TiFlash Replicas](/tiflash/create-tiflash-replicas.md)
588585
- [Use TiDB to Read TiFlash Replicas](/tiflash/use-tidb-to-read-tiflash.md)
589-
- [Use TiSpark to Read TiFlash Replicas](/tiflash/use-tispark-to-read-tiflash.md)
590586
- [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
591587
- [Use FastScan](/tiflash/use-fastscan.md)
592588
- [Disaggregated Storage and Compute Architecture and S3 Support](/tiflash/tiflash-disaggregated-and-s3.md)

best-practices/readonly-nodes.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -115,15 +115,7 @@ To read data from read-only nodes when using TiDB, you can set the system variab
115115
set tidb_replica_read=learner;
116116
```
117117

118-
#### 3.2 Use Follower Read in TiSpark
119-
120-
To read data from read-only nodes when using TiSpark, you can set the configuration item `spark.tispark.replica_read` to `learner` in the Spark configuration file:
121-
122-
```
123-
spark.tispark.replica_read learner
124-
```
125-
126-
#### 3.3 Use Follower Read when backing up cluster data
118+
#### 3.2 Use Follower Read when backing up cluster data
127119

128120
To read data from read-only nodes when backing up cluster data, you can specify the `--replica-read-label` option in the br command line. Note that when running the following command in shell, you need to use single quotes to wrap the label to prevent `$` from being parsed.
129121

credits.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@ TiDB developers contribute to new feature development, performance improvement,
2020
- [pingcap/tidb-dashboard](https://github.com/pingcap/tidb-dashboard/graphs/contributors)
2121
- [pingcap/tiflow](https://github.com/pingcap/tiflow/graphs/contributors)
2222
- [pingcap/tidb-tools](https://github.com/pingcap/tidb-tools/graphs/contributors)
23-
- [pingcap/tispark](https://github.com/pingcap/tispark/graphs/contributors)
2423
- [tikv/client-java](https://github.com/tikv/client-java/graphs/contributors)
2524
- [tidb-incubator/TiBigData](https://github.com/tidb-incubator/TiBigData/graphs/contributors)
2625
- [ti-community-infra](https://github.com/orgs/ti-community-infra/people)

ecosystem-tool-user-guide.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -132,7 +132,3 @@ The following are the basics of sync-diff-inspector:
132132
- Source: MySQL/TiDB clusters
133133
- Target: MySQL/TiDB clusters
134134
- Supported TiDB versions: all versions
135-
136-
## OLAP Query tool - TiSpark
137-
138-
[TiSpark](/tispark-overview.md) is a product developed by PingCAP to address the complexity of OLAP queries. It combines strengths of Spark, and the features of distributed TiKV clusters and TiDB to provide a one-stop Hybrid Transactional and Analytical Processing (HTAP) solution.

explore-htap.md

Lines changed: 7 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -57,21 +57,15 @@ For more information about the architecture, see [architecture of TiDB HTAP](/ti
5757

5858
## Environment preparation
5959

60-
Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corresponding storage engines according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution.
60+
Before exploring TiDB HTAP features, you need to deploy TiDB and its columnar storage engine TiFlash. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the solution.
6161

62-
- TiFlash
62+
- If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
63+
- If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
64+
- When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
6365

64-
- If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
65-
- If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
66-
- When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
67-
68-
- If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
69-
- If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
70-
- If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
71-
72-
- TiSpark
73-
74-
- If your data needs to be analyzed with Spark, deploy TiSpark. For specific process, see [TiSpark User Guide](/tispark-overview.md).
66+
- If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
67+
- If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
68+
- If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
7569

7670
<!-- - Real-time stream processing
7771
- If you want to build an efficient and easy-to-use real-time data warehouse with TiDB and Flink, you are welcome to participate in Apache Flink x TiDB meetups.-->

production-deployment-using-tiup.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,6 @@ The following examples cover six common scenarios. You need to modify the config
257257
| OLTP | [Deploy minimal topology](/minimal-deployment-topology.md) | [Simple minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-mini.yaml) <br/> [Full minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-mini.yaml) | This is the basic cluster topology, including tidb-server, tikv-server, and pd-server. |
258258
| HTAP | [Deploy the TiFlash topology](/tiflash-deployment-topology.md) | [Simple TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tiflash.yaml) <br/> [Full TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tiflash.yaml) | This is to deploy TiFlash along with the minimal cluster topology. TiFlash is a columnar storage engine, and gradually becomes a standard cluster topology. |
259259
| Replicate incremental data using [TiCDC](/ticdc/ticdc-overview.md) | [Deploy the TiCDC topology](/ticdc-deployment-topology.md) | [Simple TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-cdc.yaml) <br/> [Full TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-cdc.yaml) | This is to deploy TiCDC along with the minimal cluster topology. TiCDC supports multiple downstream platforms, such as TiDB, MySQL, Kafka, MQ, and storage services. |
260-
| Use OLAP on Spark | [Deploy the TiSpark topology](/tispark-deployment-topology.md) | [Simple TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tispark.yaml) <br/> [Full TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tispark.yaml) | This is to deploy TiSpark along with the minimal cluster topology. TiSpark is a component built for running Apache Spark on top of TiDB/TiKV to answer the OLAP queries. Currently, TiUP cluster's support for TiSpark is still **experimental**. |
261260
| Deploy multiple instances on a single machine | [Deploy a hybrid topology](/hybrid-deployment-topology.md) | [Simple configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/simple-multi-instance.yaml) <br/> [Full configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/complex-multi-instance.yaml) | The deployment topologies also apply when you need to add extra configurations for the directory, port, resource ratio, and label. |
262261
| Deploy TiDB clusters across data centers | [Deploy a geo-distributed deployment topology](/geo-distributed-deployment-topology.md) | [Configuration template for geo-distributed deployment](https://github.com/pingcap/docs/blob/master/config-templates/geo-redundancy-deployment.yaml) | This topology takes the typical architecture of three data centers in two cities as an example. It introduces the geo-distributed deployment architecture and the key configuration that requires attention. |
263262

releases/release-5.4.0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ In v5.4, the key new features or improvements are as follows:
160160

161161
This feature is disabled by default. When it is enabled, if a user operating through TiSpark does not have the needed permissions, the user gets an exception from TiSpark.
162162

163-
[User document](/tispark-overview.md#security)
163+
[User document](https://docs.pingcap.com/tidb/v5.4/tispark-overview#security)
164164

165165
- **TiUP supports generating an initial password for the root user**
166166

releases/release-6.6.0.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -414,8 +414,8 @@ In v6.6.0-DMR, the key new features and improvements are as follows:
414414
| DM | [`on-duplicate-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item controls how DM resolves conflicting data in the physical import mode. The default value is `"none"`, which means not resolving conflicting data. `"none"` has the best performance, but might lead to inconsistent data in the downstream database. |
415415
| DM | [`sorting-dir-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item specifies the directory used for local KV sorting in the physical import mode. The default value is the same as the `dir` configuration. |
416416
| sync-diff-inspector | [`skip-non-existing-table`](/sync-diff-inspector/sync-diff-inspector-overview.md#configuration-file-description) | Newly added | This configuration item controls whether to skip checking upstream and downstream data consistency when tables in the downstream do not exist in the upstream. |
417-
| TiSpark | [`spark.tispark.replica_read`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
418-
| TiSpark | [`spark.tispark.replica_read.label`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
417+
| TiSpark | [`spark.tispark.replica_read`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview/#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
418+
| TiSpark | [`spark.tispark.replica_read.label`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
419419
420420
### Others
421421

telemetry.md

Lines changed: 2 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,22 @@ aliases: ['/docs/dev/telemetry/','/tidb/dev/sql-statement-admin-show-telemetry']
66

77
# Telemetry
88

9-
When the telemetry feature is enabled, TiUP and TiSpark collect usage information and share the information with PingCAP to help understand how to improve the product.
9+
When the telemetry feature is enabled, TiUP collects usage information and share the information with PingCAP to help understand how to improve the product.
1010

1111
> **Note:**
1212
>
1313
> - Starting from TiUP v1.11.3, the telemetry feature in TiUP is disabled by default, which means TiUP usage information is not collected by default. If you upgrade from a TiUP version earlier than v1.11.3 to v1.11.3 or a later version, the telemetry feature keeps the same status as before the upgrade.
14-
> - Starting from TiSpark v3.0.3, the telemetry feature in TiSpark is disabled by default, which means TiSpark usage information is not collected by default.
1514
> - For versions from v8.1.0 to v8.5.1, the telemetry feature in TiDB and TiDB Dashboard is removed.
1615
> - Starting from v8.5.3, TiDB reintroduces the telemetry feature. However, it only logs telemetry-related information locally and no longer sends data to PingCAP over the network.
1716
1817
## What is shared when telemetry is enabled?
1918

20-
The following sections describe the shared usage information in detail for TiUP and TiSpark. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
19+
The following sections describe the shared usage information in detail for TiUP. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
2120

2221
> **Note:**
2322
>
2423
> In **ALL** cases, user data stored in the TiDB cluster will **NOT** be shared. You can also refer to [PingCAP Privacy Policy](https://pingcap.com/privacy-policy).
2524
26-
### TiUP
27-
2825
When the telemetry collection feature is enabled in TiUP, usage details of TiUP will be shared, including (but not limited to):
2926

3027
- A randomly generated telemetry ID.
@@ -37,52 +34,22 @@ To view the full content of the usage information shared to PingCAP, set the `TI
3734
TIUP_CLUSTER_DEBUG=enable tiup cluster list
3835
```
3936

40-
### TiSpark
41-
42-
> **Note:**
43-
>
44-
> Starting from v3.0.3, the telemetry collection is disabled by default in TiSpark, and usage information is not collected and shared with PingCAP.
45-
46-
When the telemetry collection feature is enabled for TiSpark, the Spark module will share the usage details of TiSpark, including (but not limited to):
47-
48-
- A randomly generated telemetry ID.
49-
- Some configuration information of TiSpark, such as the read engine and whether streaming read is enabled.
50-
- Cluster deployment information, such as the machine hardware information, OS information, and component version number of the node where TiSpark is located.
51-
52-
You can view TiSpark usage information that is collected in Spark logs. You can set the Spark log level to INFO or lower, for example:
53-
54-
```shell
55-
grep "Telemetry report" {spark.log} | tail -n 1
56-
```
57-
5837
## Enable telemetry
5938

60-
### Enable TiUP telemetry
61-
6239
To enable the TiUP telemetry collection, execute the following command:
6340

6441
```shell
6542
tiup telemetry enable
6643
```
6744

68-
### Enable TiSpark telemetry
69-
70-
To enable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = true` in the TiSpark configuration file.
71-
7245
## Disable telemetry
7346

74-
### Disable TiUP telemetry
75-
7647
To disable the TiUP telemetry collection, execute the following command:
7748

7849
```shell
7950
tiup telemetry disable
8051
```
8152

82-
### Disable TiSpark telemetry
83-
84-
To disable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = false` in the TiSpark configuration file.
85-
8653
## Check telemetry status
8754

8855
For TiUP telemetry, execute the following command to check the telemetry status:

tiflash/tiflash-overview.md

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the A
3030

3131
<CustomContent platform="tidb">
3232

33-
TiFlash is compatible with both TiDB and TiSpark, which enables you to freely choose between these two computing engines.
33+
TiFlash is compatible with TiDB. You can use TiDB as the computing engine for TiFlash.
3434

3535
</CustomContent>
3636

@@ -85,27 +85,10 @@ TiFlash shares the computing workload in the same way as the TiKV Coprocessor do
8585

8686
After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.
8787

88-
<CustomContent platform="tidb">
89-
90-
You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:
91-
92-
</CustomContent>
93-
94-
<CustomContent platform="tidb-cloud">
95-
96-
You can use TiDB to read TiFlash replicas for analytical processing. See the following sections for details:
97-
98-
</CustomContent>
88+
You can use TiDB to read TiFlash replicas. See the following sections for details:
9989

10090
- [Create TiFlash Replicas](/tiflash/create-tiflash-replicas.md)
10191
- [Use TiDB to Read TiFlash Replicas](/tiflash/use-tidb-to-read-tiflash.md)
102-
103-
<CustomContent platform="tidb">
104-
105-
- [Use TiSpark to Read TiFlash Replicas](/tiflash/use-tispark-to-read-tiflash.md)
106-
107-
</CustomContent>
108-
10992
- [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
11093

11194
<CustomContent platform="tidb">

0 commit comments

Comments
 (0)