TiSpark: remove TiSpark docs and references (#22456)

hfxsd · web-flow · commit f7cb5456c22f · 2026-02-13T03:42:04.000Z
diff --git a/TOC.md b/TOC.md
@@ -23,7 +23,6 @@
     - [PD Microservices Topology](/pd-microservices-deployment-topology.md)
     - [TiProxy Topology](/tiproxy/tiproxy-deployment-topology.md)
     - [TiCDC Topology](/ticdc-deployment-topology.md)
-    - [TiSpark Topology](/tispark-deployment-topology.md)
     - [Cross-DC Topology](/geo-distributed-deployment-topology.md)
     - [Hybrid Topology](/hybrid-deployment-topology.md)
   - [Deploy Using TiUP](/production-deployment-using-tiup.md)
@@ -552,8 +551,6 @@
     - [Quick Start](/clinic/quick-start-with-clinic.md)
     - [Troubleshoot Clusters Using PingCAP Clinic](/clinic/clinic-user-guide-for-tiup.md)
     - [PingCAP Clinic Diagnostic Data](/clinic/clinic-data-instruction-for-tiup.md)
-  - TiSpark
-    - [User Guide](/tispark-overview.md)
   - sync-diff-inspector
     - [Overview](/sync-diff-inspector/sync-diff-inspector-overview.md)
     - [Data Check for Tables with Different Schema/Table Names](/sync-diff-inspector/route-diff.md)
@@ -586,7 +583,6 @@
     - [Overview](/tiflash/tiflash-overview.md)
     - [Create TiFlash Replicas](/tiflash/create-tiflash-replicas.md)
     - [Use TiDB to Read TiFlash Replicas](/tiflash/use-tidb-to-read-tiflash.md)
-    - [Use TiSpark to Read TiFlash Replicas](/tiflash/use-tispark-to-read-tiflash.md)
     - [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
     - [Use FastScan](/tiflash/use-fastscan.md)
     - [Disaggregated Storage and Compute Architecture and S3 Support](/tiflash/tiflash-disaggregated-and-s3.md)
diff --git a/best-practices/readonly-nodes.md b/best-practices/readonly-nodes.md
@@ -115,15 +115,7 @@ To read data from read-only nodes when using TiDB, you can set the system variab
 set tidb_replica_read=learner;
 ```
 
-#### 3.2 Use Follower Read in TiSpark
-
-To read data from read-only nodes when using TiSpark, you can set the configuration item `spark.tispark.replica_read` to `learner` in the Spark configuration file:
-
-```
-spark.tispark.replica_read learner
-```
-
-#### 3.3 Use Follower Read when backing up cluster data
+#### 3.2 Use Follower Read when backing up cluster data
 
 To read data from read-only nodes when backing up cluster data, you can specify the `--replica-read-label` option in the br command line. Note that when running the following command in shell, you need to use single quotes to wrap the label to prevent `$` from being parsed.
 
diff --git a/credits.md b/credits.md
@@ -20,7 +20,6 @@ TiDB developers contribute to new feature development, performance improvement,
 - [pingcap/tidb-dashboard](https://github.com/pingcap/tidb-dashboard/graphs/contributors)
 - [pingcap/tiflow](https://github.com/pingcap/tiflow/graphs/contributors)
 - [pingcap/tidb-tools](https://github.com/pingcap/tidb-tools/graphs/contributors)
-- [pingcap/tispark](https://github.com/pingcap/tispark/graphs/contributors)
 - [tikv/client-java](https://github.com/tikv/client-java/graphs/contributors)
 - [tidb-incubator/TiBigData](https://github.com/tidb-incubator/TiBigData/graphs/contributors)
 - [ti-community-infra](https://github.com/orgs/ti-community-infra/people)
diff --git a/ecosystem-tool-user-guide.md b/ecosystem-tool-user-guide.md
@@ -132,7 +132,3 @@ The following are the basics of sync-diff-inspector:
 - Source: MySQL/TiDB clusters
 - Target: MySQL/TiDB clusters
 - Supported TiDB versions: all versions
-
-## OLAP Query tool - TiSpark
-
-[TiSpark](/tispark-overview.md) is a product developed by PingCAP to address the complexity of OLAP queries. It combines strengths of Spark, and the features of distributed TiKV clusters and TiDB to provide a one-stop Hybrid Transactional and Analytical Processing (HTAP) solution.
diff --git a/explore-htap.md b/explore-htap.md
@@ -57,21 +57,15 @@ For more information about the architecture, see [architecture of TiDB HTAP](/ti
 
 ## Environment preparation
 
-Before exploring the features of TiDB HTAP, you need to deploy TiDB and the corresponding storage engines according to the data volume. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the primary solution and TiSpark as the supplementary solution.
+Before exploring TiDB HTAP features, you need to deploy TiDB and its columnar storage engine TiFlash. If the data volume is large (for example, 100 T), it is recommended to use TiFlash Massively Parallel Processing (MPP) as the solution.
 
-- TiFlash
+- If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
+- If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
+- When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
 
-    - If you have deployed a TiDB cluster with no TiFlash node, add the TiFlash nodes in the current TiDB cluster. For detailed information, see [Scale out a TiFlash cluster](/scale-tidb-using-tiup.md#scale-out-a-tiflash-cluster).
-    - If you have not deployed a TiDB cluster, see [Deploy a TiDB Cluster Using TiUP](/production-deployment-using-tiup.md). Based on the minimal TiDB topology, you also need to deploy the [topology of TiFlash](/tiflash-deployment-topology.md).
-    - When deciding how to choose the number of TiFlash nodes, consider the following scenarios:
-
-        - If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
-        - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
-        - If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
-
-- TiSpark
-
-    - If your data needs to be analyzed with Spark, deploy TiSpark. For specific process, see [TiSpark User Guide](/tispark-overview.md).
+    - If your use case requires OLTP with small-scale analytical processing and Ad-Hoc queries, deploy one or several TiFlash nodes. They can dramatically increase the speed of analytic queries.
+    - If the OLTP throughput does not cause significant pressure to I/O usage rate of the TiFlash nodes, each TiFlash node uses more resources for computation, and thus the TiFlash cluster can have near-linear scalability. The number of TiFlash nodes should be tuned based on expected performance and response time.
+    - If the OLTP throughput is relatively high (for example, the write or update throughput is higher than 10 million lines/hours), due to the limited write capacity of network and physical disks, the I/O between TiKV and TiFlash becomes a bottleneck and is also prone to read and write hotspots. In this case, the number of TiFlash nodes has a complex non-linear relationship with the computation volume of analytical processing, so you need to tune the number of TiFlash nodes based on the actual status of the system.
 
 <!--    - Real-time stream processing
   - If you want to build an efficient and easy-to-use real-time data warehouse with TiDB and Flink, you are welcome to participate in Apache Flink x TiDB meetups.-->
diff --git a/production-deployment-using-tiup.md b/production-deployment-using-tiup.md
@@ -257,7 +257,6 @@ The following examples cover six common scenarios. You need to modify the config
 | OLTP | [Deploy minimal topology](/minimal-deployment-topology.md) | [Simple minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-mini.yaml) <br/> [Full minimal configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-mini.yaml) | This is the basic cluster topology, including tidb-server, tikv-server, and pd-server. |
 | HTAP | [Deploy the TiFlash topology](/tiflash-deployment-topology.md) | [Simple TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tiflash.yaml) <br/> [Full TiFlash configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tiflash.yaml) | This is to deploy TiFlash along with the minimal cluster topology. TiFlash is a columnar storage engine, and gradually becomes a standard cluster topology. |
 | Replicate incremental data using [TiCDC](/ticdc/ticdc-overview.md) | [Deploy the TiCDC topology](/ticdc-deployment-topology.md) | [Simple TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-cdc.yaml) <br/> [Full TiCDC configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-cdc.yaml) | This is to deploy TiCDC along with the minimal cluster topology. TiCDC supports multiple downstream platforms, such as TiDB, MySQL, Kafka, MQ, and storage services. |
-| Use OLAP on Spark | [Deploy the TiSpark topology](/tispark-deployment-topology.md) | [Simple TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/simple-tispark.yaml) <br/> [Full TiSpark configuration template](https://github.com/pingcap/docs/blob/master/config-templates/complex-tispark.yaml) |  This is to deploy TiSpark along with the minimal cluster topology. TiSpark is a component built for running Apache Spark on top of TiDB/TiKV to answer the OLAP queries. Currently, TiUP cluster's support for TiSpark is still **experimental**. |
 | Deploy multiple instances on a single machine | [Deploy a hybrid topology](/hybrid-deployment-topology.md) | [Simple configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/simple-multi-instance.yaml) <br/> [Full configuration template for hybrid deployment](https://github.com/pingcap/docs/blob/master/config-templates/complex-multi-instance.yaml) | The deployment topologies also apply when you need to add extra configurations for the directory, port, resource ratio, and label. |
 | Deploy TiDB clusters across data centers | [Deploy a geo-distributed deployment topology](/geo-distributed-deployment-topology.md) | [Configuration template for geo-distributed deployment](https://github.com/pingcap/docs/blob/master/config-templates/geo-redundancy-deployment.yaml) | This topology takes the typical architecture of three data centers in two cities as an example. It introduces the geo-distributed deployment architecture and the key configuration that requires attention. |
 
diff --git a/releases/release-5.4.0.md b/releases/release-5.4.0.md
@@ -160,7 +160,7 @@ In v5.4, the key new features or improvements are as follows:
 
     This feature is disabled by default. When it is enabled, if a user operating through TiSpark does not have the needed permissions, the user gets an exception from TiSpark.
 
-    [User document](/tispark-overview.md#security)
+    [User document](https://docs.pingcap.com/tidb/v5.4/tispark-overview#security)
 
 - **TiUP supports generating an initial password for the root user**
 
diff --git a/releases/release-6.6.0.md b/releases/release-6.6.0.md
@@ -414,8 +414,8 @@ In v6.6.0-DMR, the key new features and improvements are as follows:
 | DM | [`on-duplicate-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item controls how DM resolves conflicting data in the physical import mode. The default value is `"none"`, which means not resolving conflicting data. `"none"` has the best performance, but might lead to inconsistent data in the downstream database. |
 | DM | [`sorting-dir-physical`](/dm/task-configuration-file-full.md) | Newly added | This configuration item specifies the directory used for local KV sorting in the physical import mode. The default value is the same as the `dir` configuration. |
 | sync-diff-inspector | [`skip-non-existing-table`](/sync-diff-inspector/sync-diff-inspector-overview.md#configuration-file-description) | Newly added | This configuration item controls whether to skip checking upstream and downstream data consistency when tables in the downstream do not exist in the upstream.  |
-| TiSpark | [`spark.tispark.replica_read`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
-| TiSpark | [`spark.tispark.replica_read.label`](/tispark-overview.md#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
+| TiSpark | [`spark.tispark.replica_read`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview/#tispark-configurations) | Newly added | This configuration item controls the type of replicas to be read. The value options are `leader`, `follower`, and `learner`. |
+| TiSpark | [`spark.tispark.replica_read.label`](https://docs-archive.pingcap.com/tidb/v6.6/tispark-overview#tispark-configurations) | Newly added | This configuration item is used to set labels for the target TiKV node. |
 
 ### Others
 
diff --git a/telemetry.md b/telemetry.md
@@ -6,25 +6,22 @@ aliases: ['/docs/dev/telemetry/','/tidb/dev/sql-statement-admin-show-telemetry']
 
 # Telemetry
 
-When the telemetry feature is enabled, TiUP and TiSpark collect usage information and share the information with PingCAP to help understand how to improve the product.
+When the telemetry feature is enabled, TiUP collects usage information and share the information with PingCAP to help understand how to improve the product.
 
 > **Note:**
 >
 > - Starting from TiUP v1.11.3, the telemetry feature in TiUP is disabled by default, which means TiUP usage information is not collected by default. If you upgrade from a TiUP version earlier than v1.11.3 to v1.11.3 or a later version, the telemetry feature keeps the same status as before the upgrade.
-> - Starting from TiSpark v3.0.3, the telemetry feature in TiSpark is disabled by default, which means TiSpark usage information is not collected by default.
 > - For versions from v8.1.0 to v8.5.1, the telemetry feature in TiDB and TiDB Dashboard is removed.
 > - Starting from v8.5.3, TiDB reintroduces the telemetry feature. However, it only logs telemetry-related information locally and no longer sends data to PingCAP over the network.
 
 ## What is shared when telemetry is enabled?
 
-The following sections describe the shared usage information in detail for TiUP and TiSpark. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
+The following sections describe the shared usage information in detail for TiUP. The usage details that get shared might change over time. These changes (if any) will be announced in [release notes](/releases/_index.md).
 
 > **Note:**
 >
 > In **ALL** cases, user data stored in the TiDB cluster will **NOT** be shared. You can also refer to [PingCAP Privacy Policy](https://pingcap.com/privacy-policy).
 
-### TiUP
-
 When the telemetry collection feature is enabled in TiUP, usage details of TiUP will be shared, including (but not limited to):
 
 - A randomly generated telemetry ID.
@@ -37,52 +34,22 @@ To view the full content of the usage information shared to PingCAP, set the `TI
 TIUP_CLUSTER_DEBUG=enable tiup cluster list
 ```
 
-### TiSpark
-
-> **Note:**
->
-> Starting from v3.0.3, the telemetry collection is disabled by default in TiSpark, and usage information is not collected and shared with PingCAP.
-
-When the telemetry collection feature is enabled for TiSpark, the Spark module will share the usage details of TiSpark, including (but not limited to):
-
-- A randomly generated telemetry ID.
-- Some configuration information of TiSpark, such as the read engine and whether streaming read is enabled.
-- Cluster deployment information, such as the machine hardware information, OS information, and component version number of the node where TiSpark is located.
-
-You can view TiSpark usage information that is collected in Spark logs. You can set the Spark log level to INFO or lower, for example:
-
-```shell
-grep "Telemetry report" {spark.log} | tail -n 1
-```
-
 ## Enable telemetry
 
-### Enable TiUP telemetry
-
 To enable the TiUP telemetry collection, execute the following command:
 
 ```shell
 tiup telemetry enable
 ```
 
-### Enable TiSpark telemetry
-
-To enable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = true` in the TiSpark configuration file.
-
 ## Disable telemetry
 
-### Disable TiUP telemetry
-
 To disable the TiUP telemetry collection, execute the following command:
 
 ```shell
 tiup telemetry disable
 ```
 
-### Disable TiSpark telemetry
-
-To disable the TiSpark telemetry collection, configure `spark.tispark.telemetry.enable = false` in the TiSpark configuration file.
-
 ## Check telemetry status
 
 For TiUP telemetry, execute the following command to check the telemetry status:
diff --git a/tiflash/tiflash-overview.md b/tiflash/tiflash-overview.md
@@ -30,7 +30,7 @@ Deploying TiFlash on Linux AMD64 architecture requires a CPU that supports the A
 
 <CustomContent platform="tidb">
 
-TiFlash is compatible with both TiDB and TiSpark, which enables you to freely choose between these two computing engines.
+TiFlash is compatible with TiDB. You can use TiDB as the computing engine for TiFlash.
 
 </CustomContent>
 
@@ -85,27 +85,10 @@ TiFlash shares the computing workload in the same way as the TiKV Coprocessor do
 
 After TiFlash is deployed, data replication does not automatically begin. You need to manually specify the tables to be replicated.
 
-<CustomContent platform="tidb">
-
-You can either use TiDB to read TiFlash replicas for medium-scale analytical processing, or use TiSpark to read TiFlash replicas for large-scale analytical processing, which is based on your own needs. See the following sections for details:
-
-</CustomContent>
-
-<CustomContent platform="tidb-cloud">
-
-You can use TiDB to read TiFlash replicas for analytical processing. See the following sections for details:
-
-</CustomContent>
+You can use TiDB to read TiFlash replicas. See the following sections for details:
 
 - [Create TiFlash Replicas](/tiflash/create-tiflash-replicas.md)
 - [Use TiDB to Read TiFlash Replicas](/tiflash/use-tidb-to-read-tiflash.md)
-
-<CustomContent platform="tidb">
-
-- [Use TiSpark to Read TiFlash Replicas](/tiflash/use-tispark-to-read-tiflash.md)
-
-</CustomContent>
-
 - [Use MPP Mode](/tiflash/use-tiflash-mpp-mode.md)
 
 <CustomContent platform="tidb">
diff --git a/tiflash/use-tispark-to-read-tiflash.md b/tiflash/use-tispark-to-read-tiflash.md
diff --git a/tispark-deployment-topology.md b/tispark-deployment-topology.md
diff --git a/tispark-overview.md b/tispark-overview.md
diff --git a/tiup/tiup-cluster-topology-reference.md b/tiup/tiup-cluster-topology-reference.md
diff --git a/tiup/tiup-cluster.md b/tiup/tiup-cluster.md