0% found this document useful (0 votes)

5 views

Clustering for Gene Expression Analysis

The paper discusses the iterative clustering procedure for detecting differentially expressed genes in gene expression analysis, emphasizing the importance of clustering algorithms and their parameters. It reviews various clustering techniques, including hierarchical, objective function, and density-based methods, and highlights the challenges of identifying significant gene groups from large datasets. The proposed solution involves using an iterative DBSCAN approach to improve the detection of differentially expressed genes while addressing data preprocessing and parameter selection issues.

Uploaded by

Erick Loetz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Clustering for Gene Expression Analysis

Uploaded by

Erick Loetz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Clustering for Gene Expression Analysis

Olga Georgieva
Sofia University „St. Kliment Ohridski”, 5 James Bourchier blvd., 1164 Sofia, Bulgaria

Abstract
Since several decades intensive research for revealing and understanding gene expression and
consequent their role in organisms live are research focus of scientists in different areas. A part
of these investigations is the analysis of gene expression data. The present paper extends and
by that improves an iterative clustering procedure for detection genes that are differentially
expressed. The presented analytical review of the clustering algorithms applied to the gene
expression analysis serves as a good prerequisite for this aim. Additional investigation is
presented and conclusion for appropriate choice of the procedure parameters is summarized.

Keywords 1
Gene expression analysis, Differential gene expression, Clustering analysis

1 Introduction
Since several decades intensive research for revealing and understanding gene expression and
consequent their role in organisms live are research focus of scientists in different areas. A part of these
investigations is the analysis of gene expression data. This is a numerical table of values of expression
levels of thousands of genes observed for (usually large amounts) of samples [1].
The expression data tables are obtained through well-established technologies, which have been
significantly improved in recent years. Thus, the contemporary technologies named Next Generation
Sequencing provide large volumes of DNA-seq and RNA-seq data. The obtained gene expression data
by them are more precise having higher resolution than the older technologies of microarray. It
significantly increases the opportunities for effective research to gather knowledge for existing
dependencies of the genes’ activity. Despite the technology used to obtain gene expression, it results in
gene expression table that contains up to several hundred thousand numbers, where each row
corresponds to one particular gene and each column to a sample. The number of genes in such table
comprises a part or whole genome of a given organism. This huge amount of data needs to be processed
applying specific data mining algorithms to reveal new information about unknown biological
dependences. Typical information that could be revealed by processing these tables is summarized in
the following general tasks [1,5,6]:
 To understand how genome sequences specify the forms and functions in the organisms and
how remarkable diversity of life appears on the Earth.
 To understand the mechanisms and evolution of the organism’s response to environment
changes.
 To find groups of genes that act equally and by that to identify distinct subsets of genes that are
unknown in their biological response.

Specific solutions of the above general tasks could be mentioned as the aim to identify genes with
expression levels that reflect biological processes of certain interest as for instance cancer. One can
identify a specific type of cancer in respect to group of genes responsible for it. Thus, classification

Education and Research in the Information Society, October 13-14, 2022, Plovdiv, Bulgaria
EMAIL: [email protected]
ORCID: 0000-0001-5376-6729
© 2022 Copyright for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR Workshop Proceedings (CEUR-WS.org)

17
obtained from gene expression analysis can be used for more precise tumor diagnostic and its effective
treatment [1].
The posed general tasks need to discover groups of genes could be solved in a supervised manner if
preliminary knowledge about these groups exists. However, in most cases it is searched for unknown
data partitioning. In this case unsupervised methods of cluster analysis are appropriate tool [3,4]. Due
to the vast amount of data in the gene expression matrix it is helpful to process data by subdividing the
genes into a smaller number of categories and then analyzing the obtained groups [2]. It is more
interesting to cluster genes that show similar expression patterns across a number of samples, rather
than clustering the samples themselves.
A specific task of gene expression research is the aim for defining differentially expressed genes.
These are genes that act differently in equal strains, for which many samples of each strain have been
collected. In this way the genes responsible for certain disease states can be discovered or it is possible
to find differences between two species strains. In searching an appropriate solution several methods
for RNA-seq data analysis that identify different genes according to their expression levels basically on
statistical data analysis have been explored [7,8,9]. However, there is no good agreement among the
applied methods as beside the commonly identified genes each method detects additional genes that are
not identified by the others. Furthermore, the additional genes are large number among the distinct
methods. By taking the advantage of the unsupervised data analysis methods an iterative clustering
procedure to deal with this task was recently introduced and compared with several statistical methods
[10]. The difficulty of the selection problem due to the large number of indistinguishable genes is solved
by iterative solutions. The procedure depends on several parameters, which values are subject of a
preliminary choice.
The present paper aims to extend and by that to improve the iterative clustering procedure for
differential gene expression detection. The analytical review presented in the paper underlines the
powerful ability of the clustering algorithms to deal with gene expression data. The review serves as a
good prerequisite to solve the research aim. Based on it additional investigation is presented and
conclusion for appropriate choice of the procedure parameters is summarized.

2 Clustering analysis for gene expression data analysis

Clustering is machine learning method that represents the collection of data in groups/clusters. The
data are similar to each other within the cluster and different from data of other clusters. There is large
diversity of algorithms that are able to deal with this problem. Diversity is governed by the need to
incorporate solutions to two important questions. First, what similarity measure to involve to assess the
data proximity and second, what procedure to use in order to find the data groups. The first one is the
most important factor that defines the shape of the determined clusters. According to [3] “There is no
single best criterion for obtaining a partition” due to the fact that “each clustering criterion imposes a
certain structure on the data”.
Despite the large diversity of clustering methods, they could be grouped according to the
implemented clustering technique. Three main groups should be underlined for gene expression
analysis – hierarchical clustering, partitioning according a criterion for proximity and density-based
solutions.

2.1 Hierarchical clustering for gene expression analysis

The most applicable clustering method for gene expression analysis is hierarchical clustering [4,5].
Each cluster is built by smaller clusters, forming a tree-shaped data structure or dendrogram. Two
strategies for tree development are known. Agglomerative hierarchical clustering starts with the single-
gene clusters and successively joins the closest clusters until all genes have been joined into the
supercluster. The other strategy starts in opposite manner – all data are collected in a single cluster and
further they are divided into smaller groups. The agglomerative hierarchical algorithm that joins pairs
of clusters on the basis of their proximity, is the most widely used for gene expression analysis [5].
The important question is the cutting level of the dendrogram in order to obtain the right clusters.
Another difficulty concerns how data are linked to form a cluster. A family of clustering methods is

18
known according to the implemented linkage function. Due to the existing indistinguishability of genes
in their expression different initial linking and different linkage functions have to be explored in order
to establish good clustering results.
The establishment of hierarchical clustering method as a widely used method for clustering gene
expression data is owed to its visibility features. The clustering results are presented in a dendrogram.
However, they are also given in a colored view way that has become a standard for visualization of the
gene expression data.

2.2 Objective function clustering for gene expression analysis

The most used objective function clustering algorithm is k-means. It divides the data into a
predetermined number of k clusters. The algorithm identifies the clusters according to their
representatives – cluster centers. Data points are assigned to a cluster on the basis of the distances from
the centroids. For instance, an application for gene expression analysis is demonstrated in [11].
A family of objective functions algorithms based on k-means have been proposed. The major
question of their application is how many clusters actually exist and thus to initialize the algorithm. The
answer of this question is not trivial. It is a subject of additional search through extended procedure.
One possible approach is to initialize with k randomly chosen cluster centroids, and each gene is
assigned to the cluster with the closest centroid. Other good strategy is seeding prototype centroids with
the eigen vectors identified by PCA performed optimally for genes of yeast bacteria [12]. Subtractive
clustering was successfully implemented to determine the optimal number of clusters of gene
expression of soil bacterium data [13].
The fuzzy variant of the k-means algorithm – Fuzzy C-means, shows large advantage for gene
expression analysis as it finds clusters that are overlapped. It is a powerful tool to reflect the real
relationship between genes pointing distinct regulations and features of each gene’s function [12,13].
The self organizing map (SOM) methods are underlined to join these groups of clustering methods.
They find clusters, which are organized into a grid structure. However, the search procedure follows
the same idea of proximity assessment of the input vector and lattice representation [1,2].

2.3 Density-based algorithms for gene expression analysis

The density-based clustering searching for dense areas in the data space. It is not necessarily to
generate clusters explicitly, but instead to show the bunches of data that form cluster structure of the
data set. These algorithms are good way to separate clusters from the noise. It allows for centralized
description of irregularly shaped clusters in a data set with high dimension to identify outliers as data
points with low cardinality [14,15]. Such proposed algorithm [15] implemented in gene-based
clustering the dense and nondense areas of data are revealed, which explains complexes and patterns of
the gene associations. The implementation shows better efficacy than DBSCAN, which is another
density-based clustering algorithm. On other hand, DBSCAN clustering [16] has a simple scheme for
clusters detection. It uses a matrix of pairwise distances between data and finds a number of outliers
and core points. The key clustering parameters are the threshold radius r for neighborhood search and
a minimum number of neighbors Nmin required to identify a core point.
It should be underlined that other methods different from the discussed above have been proposed
in the recent years for gene expression analysis. Pattern-based clustering algorithms form clusters by
objects, which attributes present difference of changes of the attributes values smaller than a threshold
value. Other workable idea was to construct and apply clustering algorithm by iterative manner, which
is a helpful strategy in case of vast amount of data.

3 Clustering for differential gene expression analysis

Differential gene expression problem is slightly different from the general task of gene expression
analysis, which tries to separate genes in groups by their similarity. Instead, the differential expression
analysis aims to find genes with significantly different expression than the others with the same
behavior [7,9].

19
The difficulty of the task could be summarized as:
• Due to the large number of genes and lack of significance in separation by the estimated p-
values no valuable conclusion about the gene separation could be done.
• Often the number of searched genes is (quite) smaller than the whole genes’ amount. These
genes could be treated more as outliers than a significant group.
• The genes do not have (near) constantly activity levels among different samples of a certain
strain. This imposes to work with average values than to single sample data.
A recently proposed solution takes advantages from the density-based clustering approach dueto its
ability to find outliers with the core clusters. The implemented algorithm separates genes densely
scabbed around the equivalence area from the outliers that are away from this area. In fact, the outliers
are data of the interested differentially expressed genes. Figure 1 illustrates this observation for
DBSCAN clustering applied to the average values of the gene’s activity of two mice strains. However,
only 154 genes were discovered against several hundred found by statistical data analysis.

Figure 1: Results of DBSCAN, r = 0,4, Nmin = 20. Discovered clusters are enumerated in the picture
legends: outliers “-1”, core cluster “1” [10].

In order to overcome the problem of genes’ undistinguishing as small number of differentially

expressed genes could be found when clustering the entire data set, an iterative clustering scheme is
proposed. By that the number of discovered genes significantly increases. The procedure motivation is
detailly given in [10]. Its steps are briefly presented in the next section. In addition, the following text
concerns further procedure improvement by more insight and analysis about the values of the procedure
parameters.

3.1 Iterative clustering solution for differential gene expression analysis

The proposed clustering procedure [10] incorporates DBSCAN algorithm applied iteratively to
genes data divided into batches. The outliers defined in each batch are added to form a common set of
differently expressed genes.

3.2 Data preprocessing

Two main obstacles have to be tackled before clustering. First, logarithm transformation is needed
in order to solve the scalability problem. Some of the genes’ expressions are very large in respect to
others. By that the distance estimation and then grouping will be distorted. Second, zero expression
values are commonly seen in some genes, which were not active or their expression have been not
measured. In order to ensure the logarithm calculation, filtering for removing genes with zero activity
value is a requisite.

3.3 Data processing

Iterative implementation of DBSCAN requires setting the clustering parameters – the threshold
radius r for neighborhood search, minimum number of neighbors Nmin and the number of genes’ data
that form the batch.

20
Again, due to the similarity in the genes' behavior as in case of the whole data set (Fig. 1), it could
be expected that the data of each batch form a large compact group along the equivalence area. This
area is rather oblong than spherical one. It suggests that clusters are not Euclidean. This observation
requires further detail investigation into the selection of an appropriate proximity measure. In searching
of proper distance measure different distances have to be applied and the results to be assessed in terms
of separation abilities.
The minimum number Nmin is difficult to find in advance as it depends on the density of the data.
Here we determine heuristically in terms to it increase the outlier’s separation.
The number of genes that form a batch is other problem to be solved. The larger batch could make
impossible to detect the outliers at all, whereas small batch could embarrass detection of clusters and
thus the right distinguish between equivalent and differentially expressed genes.
Once the parameters are determined they are applied to each data batch according to the accepted
iterative scheme of clustering.

3.4 Parameter analysis and application results

The iterative procedure of differently expressed genes’ selection is applied for the set of samples of
two mice strains – ten of strain C57BL/6J and eleven of strain DBA/2J, represented in data of 13932
genes having non-all-zero rows in the dataset [17]. As still there are zero expression values in some
rows of the data, after the preprocessing stage the data set was reduced to 9196 genes.
First, we explore different distance metrics in the attempt to find the best method parameters (Table
1). For this the accumulated group of differently expressed genes marked by ML is compared with
genes’ groups separated by statistical data analysis of four statistical methods – ttest, edgeR, limma,
DESeq2. The number of discovered genes by each statistical method of the filtered dataset provided by
[7] is presented at the first (sub)column of the respective method column. The number of genes
discovered by our procedure that are common for the respective statistical method is given at the second
(sub)column. The last column of the table “ML all data” consists the total amount of differently
expressed genes that are identified by the proposed iterative clustering method.
Several distance metrics have been investigated and those that produce good results are presented
and discussed here. Both Euclidean distance as well as Minkowski distance with value of its parameter
p=2 (or close to 2) give relatively good results according to the separated clusters. However, as they
form spherical clusters that do not correspond to the data structure, it is a prerequisite to distort the
separation result by mixing some outliers with the selected good clusters. The distances that produce
oblong clusters are more valuable in respect to the real data structure. These are Mahalonobis,
Minkowski with another parameter value than p = 2. The distances Cityblock and Chebishev are also
investigated. By varying DBSCAN parameters different amounts of genes are discovered and the best
results for each method are presented at the table. Notwithstanding the fact that Minkowski distance
with small value p=0,5 presents best result according to commonly identified genes with respective
statistical method, the preference is given to Mahalanobis distance clustering as well (both in bold). The
oblong clusters through Mahalanobis distance determine the smallest number of 1905 selected genes.

Table 1: Number of differentially expressed genes selected by different cluster distances compared
with results of different statistical methods
Method ttest edgeR limma DESeq2
ML
ttest ML edgeR ML limma ML DESeq2 ML all data
Distance
Euclidean 71 71 915 647 736 537 982 648 2848
Mahalanobis 71 71 915 738 736 611 982 735 1905
Cityblock 71 70 915 417 736 365 982 407 994
Minkowski, p=2 71 68 915 286 736 266 982 283 614
Minkowski, p=0.5 71 71 915 789 736 641 982 807 3475
Chebychev, r=0,1, Nmin=5 71 71 915 667 736 548 982 665 2264

In searching the appropriate batch volume, we explore different volumes for clustering by best found
cluster distance (Table 2). The results for batches consisting of 511 and 1022 genes with appropriate

21
settings of the rest two parameters – r and Nmin, are comparable. Certain preferences can be given to
clustering at batch of 511 genes because of the smaller number of all separated genes (ML all data). On
the other hand, batch of 1022 genes ensures larger number of differentially expressed genes.

Table 2: Number of differentially expressed genes selected by iterative procedure with different
batch volumes size and applied Mahalonobis distance
ttest edgeR limma DESeq2
Number of genes
ML
in a batch,
ttest ML edgeR ML limma ML DESeq2 ML all data
algorithm
parameters
511, r=0,2,
71 71 915 738 736 611 982 735 1905
Nmin=5
511, r=0,3,
71 71 915 527 736 457 982 516 896
Nmin=5
1022, r=0,2,
71 71 915 584 736 508 982 573 1058
Nmin=5
1022, r=0,2,
71 71 915 791 736 642 982 787 2050
Nmin=10
1022, r=0,3,
71 71 915 525 736 455 982 514 891
Nmin=10

The advantage of the procedure is its result visibility abilities. For each batch the clustering result
could be seen and assessed visually. Further assessment and interpretation according the biological
meaning of the compact clusters found in the equivalent area and the respective outliers could be applied
(Figure 2).

Figure 2: Results of iterative DBSCAN clustering by Mahalonobis distance measure, r=0,3, Nmin=10
and bach=1022 genes.

22
4 Conclusion
The paper improves an iterative clustering procedure for differential gene expression detection. For
this aim, first an analytical review that underlines the powerful ability of the clustering algorithms to
deal with gene expression data by revealing their advantages and disadvantages for solving gene
expression data analysis tasks is presented. Second, the specific task for searching genes that are
differentially expressed is considered. The task is solved by procedure taking the advantage both of
density-based clustering and the iterative clustering.
The paper makes conclusions about the appropriate choice of the procedure parameters – proper
distance measure and batch volume. The results obtained are compared with statistical data analysis
applied to the investigated dataset of gene expression of two mice strains. The best distance measures
are found that are Mahalonobis and Minkowski distances. Good results are obtained for batches of 511
and 1022 genes.
The iterative procedure is standalone applicable. In other cases, it could be used in combination with
statistical methods in order to stick their search in a smaller number of genes.

5 Acknowledgements
This research work has been supported by GATE project, funded by the Horizon 2020
WIDESPREAD-2018-2020 TEAMING Phase 2 programme under grant agreement No.857155 and
Operational Programme Science, by Operational Programme Science and Education for Smart Growth
under Grant Agreement no. BG05M2OP001-1.003-0002-C01 and by the Science Fund of Sofia
University under project no. 80-10-169/23.05.2022.

6 References
[1] E. Domany, Cluster Analysis of Gene Expression Data. Journal of Statistical Physics 110 (2003): 1117–
1139.
[2] P. D'haeseleer, How does gene expression clustering work? Nat Biotechnol 23 (2005): 1499–1501.
[3] A.K. Jain, R.C.Dubes, Algorithms for Clustering Data. (Prentice Hall, Englewood Cliffs, New Jersey,
1988.
[4] D. Jiang, C. Tang, A. Zhang, Cluster analysis for gene expression data: a survey. IEEE Trans. Know.
Data Eng. 16 (2004): 1370–1386.
[5] M.B. Eisen, P.T. Spellman, P.O. Brown, D. Botstein, Cluster analysis and display of genome-wide
expression patterns. Proc. Natl. Acad. Sci. USA 95 (1998): 14863–14868.
[6] P.T.Spellman, Cluster analysis and display, In: “DNA Microarrays” edited by D. Bowtell and J.
Sambrook, Cold Spring Harbor Laboratory (2002): 569-581.
[7] D. Spies, P.F. Renz, T.A. Beyer, C. Ciaudo, Comparative analysis of differential gene expression tools
for RNA sequencing time course data, Briefings in Bioinformatics 20(1) (2019): 288–298.
[8] D. Palejev, Comparison of RNA-Seq Differential Expression Methods, Cybernetics and Information
Technologies 17(5) (2017): 60-67.
[9] T. Wang, B. Li, C.E. Nelson, et al. Comparative analysis of differential gene expression analysis tools
for single-cell RNA sequencing data. BMC Bioinformatics 20, 40, 2019.
[10] O. Georgieva, Iterative Clustering for Differential Gene Expression Analysis. In: Rojas, I., Valenzuela,
O., Rojas, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO
2022. Lecture Notes in Computer Science 13347. Springer, Cham., 2022.
[11] F.D. Smet, J. Mathys, K. Marchal, G. Thijs, B.D. Moor, and Y. Moreau, “Adaptive Quality-Based
Clustering of Gene Expression Profiles,” Bioinformatics 18 (2002): 735-746.
[12] A. P. Gasch, M.B. Eisen, Exploring the conditional coregulation of yeast gene expression through
fuzzy k-means clustering, Genome Biology, 3(11), (2002): 1-22.

23
[13] O. Georgieva, F. Klawonn, E. Härtig, Fuzzy Clustering of Macroarray Data. In: Reusch, B. (eds)
Computational Intelligence, Theory and Applications. Advances in Soft Computing 33. Springer,
Berlin, Heidelberg, 2005.
[14] D. Kumar, U. Batra, Clustering algorithm for gene expression data. Int J Recent Res Asp. 4, (2017):
122-128.
[15] S. Srivastava, N. Joshi. Clustering techniques analysis for microarray data. Int J Comput Sci Mob
Comput. 3 (2014); 359-364.
[16] M. Ester, H.-P Kriegel, J. Sander, X. Xiaowei, A density-based algorithm for discovering clusters in
large spatial databases with noise. In: Proceedings of the Second International Conference on
Knowledge Discovery in Databases and Data Mining, 226–231. AAAI Press, Portland, 1996.
[17] D. .Bottomly, N. A. R. Walter, J. E. Hunter et al. Evaluating Gene Expression in C57BL/6J and DBA/2J
Mouse Striatum Using RNA-Seq and Microarrays. – PLoS ONE 6(3), e17820, 2011.

1471-2105-15-S2-S2
No ratings yet
1471-2105-15-S2-S2
18 pages
Gene Clustering Review Paper
No ratings yet
Gene Clustering Review Paper
4 pages
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
No ratings yet
Cluster Analysis in DNA Microarray Experiments: Sandrine Dudoit and Robert Gentleman
48 pages
Synopsis: Data Mining Feasibility in Gene Expression Data Analysis Using Weka
No ratings yet
Synopsis: Data Mining Feasibility in Gene Expression Data Analysis Using Weka
12 pages
How Does Gene Expression Clustering Work?: Primer
No ratings yet
How Does Gene Expression Clustering Work?: Primer
3 pages
Ijcet 10 01 005 PDF
No ratings yet
Ijcet 10 01 005 PDF
10 pages
2445-gene-expression-clustering-with-functional-mixture-models
No ratings yet
2445-gene-expression-clustering-with-functional-mixture-models
8 pages
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
No ratings yet
An Overview On Gene Expression Analysis: Dr. R. Radha, P. Rajendiran
6 pages
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
No ratings yet
An Approach of Hybrid Clustering Technique For Maximizing Similarity of Gene Expression
14 pages
Multivariate Exploratory
No ratings yet
Multivariate Exploratory
13 pages
Genes 13 01839 v2
No ratings yet
Genes 13 01839 v2
22 pages
CMMB 461 Dna Microarray 2 2019 For D2L
No ratings yet
CMMB 461 Dna Microarray 2 2019 For D2L
27 pages
Methods Used For Identification of Differentially Expressing Genes (Degs) From Microarray Gene Dataset: A Review
No ratings yet
Methods Used For Identification of Differentially Expressing Genes (Degs) From Microarray Gene Dataset: A Review
8 pages
Introduction to Bioinformatics, Sequence and Genome Analysis
From Everand
Introduction to Bioinformatics, Sequence and Genome Analysis
Jerry H. Swift
No ratings yet
Plagiarism1 - Report
No ratings yet
Plagiarism1 - Report
8 pages
Gene Based Disease Prediction Using Pattern Similarity Based Classification
No ratings yet
Gene Based Disease Prediction Using Pattern Similarity Based Classification
6 pages
A Survey On Biclustering
No ratings yet
A Survey On Biclustering
3 pages
Journal of Computer Applications - Volume 1 Issue 4 P2
No ratings yet
Journal of Computer Applications - Volume 1 Issue 4 P2
2 pages
Gene Control: Unlocking Genetic Secrets
From Everand
Gene Control: Unlocking Genetic Secrets
Deevakar Asan
No ratings yet
WGCNA [Autosaved]
No ratings yet
WGCNA [Autosaved]
54 pages
5 Microarray PDF
No ratings yet
5 Microarray PDF
79 pages
23-GWO Papers-27-02-2024
No ratings yet
23-GWO Papers-27-02-2024
15 pages
(IJCST-V4I3P23) :fadoua Rafii, Badr Dine Rossi Hassani, M'hamed Aït Kbir
No ratings yet
(IJCST-V4I3P23) :fadoua Rafii, Badr Dine Rossi Hassani, M'hamed Aït Kbir
8 pages
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
No ratings yet
Microarray Data Analysis: Class Discovery and Class Prediction: Clustering and Discrimination
70 pages
Differences Between Biclustering and Clustering
No ratings yet
Differences Between Biclustering and Clustering
3 pages
Smart Business Problems and Analytical Hints in Cancer Research
From Everand
Smart Business Problems and Analytical Hints in Cancer Research
Zemelak Goraga
No ratings yet
Craig Microarray 02
No ratings yet
Craig Microarray 02
7 pages
Gene Regulatory Networks: Gregory T Dewey and David J. Galas
No ratings yet
Gene Regulatory Networks: Gregory T Dewey and David J. Galas
26 pages
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
No ratings yet
Clustering Gene Expression Data: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu April 2001
9 pages
A Survey On Clustering Algorithm For Microarray Gene Expression Data
No ratings yet
A Survey On Clustering Algorithm For Microarray Gene Expression Data
7 pages
2000-Gene Expression Data Analysis
No ratings yet
2000-Gene Expression Data Analysis
8 pages
Gene Expression Data Analysis: Minireview
No ratings yet
Gene Expression Data Analysis: Minireview
8 pages
Notes For Lectures 11 To 16 - 2024
No ratings yet
Notes For Lectures 11 To 16 - 2024
68 pages
Cancer Info
No ratings yet
Cancer Info
11 pages
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
No ratings yet
Cluster Analysis For Gene Expression Data: Jiong Yang Eecs Case Western Reserve University
34 pages
Discretization of Gene Expression Data Revised
No ratings yet
Discretization of Gene Expression Data Revised
13 pages
bbt086 (1)
No ratings yet
bbt086 (1)
12 pages
Bioinformatics 23-20-2692
No ratings yet
Bioinformatics 23-20-2692
8 pages
043 Chenb Hierarchical
No ratings yet
043 Chenb Hierarchical
4 pages
Microarray Full
No ratings yet
Microarray Full
56 pages
Optimization Based Clustering and Classification A
No ratings yet
Optimization Based Clustering and Classification A
171 pages
SURVEY ON MODELLING METHODS APPLICABLE TO GENE REGULATORY NETWORK
No ratings yet
SURVEY ON MODELLING METHODS APPLICABLE TO GENE REGULATORY NETWORK
11 pages
Co Expresion
No ratings yet
Co Expresion
18 pages
Dhaeseleer_nb05
No ratings yet
Dhaeseleer_nb05
4 pages
Functional Analysis of Genes: DOI: 10.2478/v10052-010-0001-Y
No ratings yet
Functional Analysis of Genes: DOI: 10.2478/v10052-010-0001-Y
16 pages
Data Mining MetaAnalysis
No ratings yet
Data Mining MetaAnalysis
39 pages
DNA Microarrays and Gene Expression (BookFi)
No ratings yet
DNA Microarrays and Gene Expression (BookFi)
227 pages
Bioinformatics Unveiled
From Everand
Bioinformatics Unveiled
Joan Melody
No ratings yet
Advanced analysis of gene expression microarray data 1st Edition Aidong Zhang pdf download
100% (1)
Advanced analysis of gene expression microarray data 1st Edition Aidong Zhang pdf download
64 pages
Modeling T-Cell Activation Using Gene Expression Profiling and State Space Models
No ratings yet
Modeling T-Cell Activation Using Gene Expression Profiling and State Space Models
12 pages
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
No ratings yet
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
31 pages
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Using Bayesian Networks To Analyze Expression Data: Nir Friedman Michal Linial
No ratings yet
Using Bayesian Networks To Analyze Expression Data: Nir Friedman Michal Linial
25 pages
BMC Research Notes: Biggests: Integrated Environment For Biclustering Analysis of Time Series Gene Expression Data
No ratings yet
BMC Research Notes: Biggests: Integrated Environment For Biclustering Analysis of Time Series Gene Expression Data
11 pages
White Paper Robotics Process Automation September 2015
No ratings yet
White Paper Robotics Process Automation September 2015
7 pages
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
No ratings yet
An Iterative Data Mining Approach For Mining Overlapping Co Expression Patterns in Noisy Gene Expression
22 pages
Evolutionary Computing Strategies For Gene Selection
No ratings yet
Evolutionary Computing Strategies For Gene Selection
6 pages
Casey Kevin MSThesis
No ratings yet
Casey Kevin MSThesis
51 pages
Introduction to Bioinformatics Using Action Labs
From Everand
Introduction to Bioinformatics Using Action Labs
Jean-Louis Lassez
5/5 (1)
MATH3353 Notes
No ratings yet
MATH3353 Notes
100 pages
AI in Talent Development For Proactive Retention Strategies
No ratings yet
AI in Talent Development For Proactive Retention Strategies
9 pages
Full Download Data-driven Solutions to Transportation Problems 1st Edition- eBook PDF PDF DOCX
100% (4)
Full Download Data-driven Solutions to Transportation Problems 1st Edition- eBook PDF PDF DOCX
69 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
ML GTU Solution
No ratings yet
ML GTU Solution
83 pages
Crime Rate Analysis Using Machine Learning Final
No ratings yet
Crime Rate Analysis Using Machine Learning Final
37 pages
Bumps and Pothole Detection Report Final
No ratings yet
Bumps and Pothole Detection Report Final
64 pages
DM Witten 03
No ratings yet
DM Witten 03
56 pages
DMBI Sample Questions
No ratings yet
DMBI Sample Questions
7 pages
Factors of Investment Decision Practices Among Certified Public Accountants in Davao Del Norte
No ratings yet
Factors of Investment Decision Practices Among Certified Public Accountants in Davao Del Norte
13 pages
Course work syllabus Revised
No ratings yet
Course work syllabus Revised
12 pages
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
No ratings yet
Detection of Abnormalities in Real-Time Computer Network Traffic Empowered by Machine Learning
8 pages
Data Mining of Nutrigenomics Experiments Identification of A Cancer Protective Gene Signature - 2018
No ratings yet
Data Mining of Nutrigenomics Experiments Identification of A Cancer Protective Gene Signature - 2018
3 pages
Predicting Earthquakes Through Data Mining
No ratings yet
Predicting Earthquakes Through Data Mining
12 pages
The Fundamentals of Computational Intelligence (2017)
100% (1)
The Fundamentals of Computational Intelligence (2017)
389 pages
PDF
No ratings yet
PDF
25 pages
ASSIGNMENT 1 TO 4
No ratings yet
ASSIGNMENT 1 TO 4
4 pages
Customer Segmentation With K-means Clustering and Visualization - Colab
No ratings yet
Customer Segmentation With K-means Clustering and Visualization - Colab
3 pages
Project Report
No ratings yet
Project Report
14 pages
Cluster Analysis Hierarchical Cluster
No ratings yet
Cluster Analysis Hierarchical Cluster
12 pages
2015 - Hosseini, Shabani - New Approach To Customer Segmentation Based On Changes in Customer Value - Journal of Marketing Analytics
No ratings yet
2015 - Hosseini, Shabani - New Approach To Customer Segmentation Based On Changes in Customer Value - Journal of Marketing Analytics
12 pages
ECE/CS 559 - Neural Networks Lecture Notes #6: Learning: Erdem Koyuncu
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #6: Learning: Erdem Koyuncu
13 pages
Customer Segmentation Analysis
No ratings yet
Customer Segmentation Analysis
44 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
R18CSE4102-UNIT 2 Data Mining Notes
100% (1)
R18CSE4102-UNIT 2 Data Mining Notes
31 pages
BUSINESS INTELLIGENCE sem 6 question paper
No ratings yet
BUSINESS INTELLIGENCE sem 6 question paper
6 pages
Cluster Analysis in Spark
No ratings yet
Cluster Analysis in Spark
10 pages
Crowdsourced Failure Reports
No ratings yet
Crowdsourced Failure Reports
22 pages
Cluster Analysis and Applications
No ratings yet
Cluster Analysis and Applications
277 pages
Pattern Discovery From Stock Time Series Using Self-Organizing Maps PDF
No ratings yet
Pattern Discovery From Stock Time Series Using Self-Organizing Maps PDF
10 pages

Clustering for Gene Expression Analysis

Uploaded by

Clustering for Gene Expression Analysis

Uploaded by

Clustering for Gene Expression Analysis

2 Clustering analysis for gene expression data analysis

2.1 Hierarchical clustering for gene expression analysis

2.2 Objective function clustering for gene expression analysis

2.3 Density-based algorithms for gene expression analysis

3 Clustering for differential gene expression analysis

In order to overcome the problem of genes’ undistinguishing as small number of differentially

3.1 Iterative clustering solution for differential gene expression analysis

3.2 Data preprocessing

3.3 Data processing

3.4 Parameter analysis and application results

You might also like