Practical Applications Of Sparse Modeling Practical Applications Of Sparse Modeling instant download
Practical Applications Of Sparse Modeling Practical Applications Of Sparse Modeling instant download
https://ebookbell.com/product/practical-applications-of-sparse-
modeling-practical-applications-of-sparse-modeling-56401682
https://ebookbell.com/product/practical-applications-of-medical-
geology-malcolm-siegel-olle-selinus-46404368
https://ebookbell.com/product/practical-applications-of-coaching-and-
mentoring-in-dentistry-janine-brooks-46654252
https://ebookbell.com/product/practical-applications-of-computational-
biology-and-bioinformatics-16th-international-conference-
pacbb-2022-1st-ed-2023-florentino-fdezriverola-46787820
https://ebookbell.com/product/practical-applications-of-physical-
chemistry-in-food-science-and-technology-a-k-haghi-48681378
Practical Applications Of Computational Biology And Bioinformatics
17th International Conference Pacbb 2023 Miguel Rocha
https://ebookbell.com/product/practical-applications-of-computational-
biology-and-bioinformatics-17th-international-conference-
pacbb-2023-miguel-rocha-50849258
https://ebookbell.com/product/practical-applications-of-intravenous-
fluids-in-surgical-patients-2nd-edition-shaila-shodhan-kamat-51883550
https://ebookbell.com/product/practical-applications-of-radioactivity-
and-nuclear-radiations-lowenthal-gc-2045580
https://ebookbell.com/product/practical-applications-of-asymptotic-
techniques-in-electromagnetics-hardvdr-francisco-saez-de-adana-2134262
https://ebookbell.com/product/practical-applications-of-
microresonators-in-optics-and-photonics-optical-science-and-
engineering-1st-edition-andrey-matsko-2212780
2014
c Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means
(including photocopying, recording, or information storage and retrieval) without permission in writing from
the publisher.
MIT Press books may be purchased at special quantity discounts for business or sales promotional use. For
information, please email special_sales@ mitpress.mit.edu.
This book was set in the LATEX programming language by the author. Printed and bound in the United States of
America.
10 9 8 7 6 5 4 3 2 1
Series Foreword
The yearly Neural Information Processing System (NIPS) workshops bring together sci-
entists with broadly varying backgrounds in statistics, mathematics, computer science,
physics, electrical engineering, neuroscience, and cognitive science, unified by a com-
mon desire to develop novel computational and statistical strategies for information
processing and to understand the mechanisms for information processing in the brain.
In contrast to conferences, these workshops maintain a flexible format that both allows
and encourages the presentation and discussion of work in progress. They thus serve as
an incubator for the development of important new ideas in this rapidly evolving field.
The series editors, in consultation with workshop organizers and members of the NIPS
Foundation Board, select specific workshop topics on the basis of scientific excellence,
intellectual breadth, and technical impact. Collections of papers chosen and edited by
the organizers of specific workshops are built around pedagogical introductory chapters,
and research monographs provide comprehensive descriptions of workshop-related top-
ics, to create a series of books that provides a timely, authoritative account of the latest
developments in the exciting field of neural computation.
However, is the promise of sparse modeling fully realized in practice? Despite the
significant advances in the field, a number of open issues remain when sparse mod-
eling meets real-life applications. For example, achieving stability and reproducibil-
ity of sparse models is essential for their interpretability, particularly in computational
biology and other scientific applications. Scalability of sparse learning and sparse sig-
nal recovery algorithms is essential when the number of variables goes much beyond
thousands, as, for example, in neuroimaging applications such as functional magnetic
resonance imaging (fMRI) analysis. Novel, more complex types of structure, dictated by
the nature of applications, require the choice of novel regularizers (so-called structured
sparsity). Moreover, feature construction, or finding a proper dictionary allowing for
sparse representations, remains a critical issue in many practical domains.
The aim of this book is to discuss a range of practical applications of sparse model-
ing, from biology and neuroscience to topic modeling in video analysis, and to provide
an overview of state-of-the-art approaches developed for tackling the challenges pre-
sented by these applications. This book is based on the contributions presented at the
NIPS-2010 Workshop on Practical Applications of Sparse Modeling and several invited
chapters.
The book is structured as follows. Chapter 2 provides a brief overview of some
challenging issues arising in computational biology, one of the traditional applications
of sparse modeling, where the primary goal is to identify biological variables such as
genes and proteins that are most relevant (ultimately, causally related) to a biologi-
cal phenomenon of interest. The chapter introduces several biological fields, such as
genomics, proteomics, metabolomics, and transcriptomics, and discusses some high-
dimensional problems arising in these areas, including genome-wide association stud-
ies (GWAS), gene expression (DNA microarray) data analysis, reverse engineering of
cellular networks, and metabolic network reconstruction. Also, neuroimaging applica-
tions, that is, statistical analysis of fMRI, EEG, PET, and other brain imaging data that
involve prediction of mental states and localizing brain areas most relevant to a par-
ticular mental activity are introduced here as another rich source of high-dimensional,
small-sample problems that can benefit from sparse techniques. Overall, the goal of
chapter 2 is to provide biological background for the subsequent five chapters, which
focus on particular aspects of sparse modeling in applications to biology and neuro-
science.
Chapter 3 discusses several key properties of applications that influence the
choice of the sparse methods: (1) the amount of correlation among the predictive vari-
ables, (2) the expected level of sparsity (the fraction of important variables versus the
total number of predictors), and (3) the primary objective of predictive modeling, such
as accurate recovery of the true underlying sparsity pattern versus an accurate predic-
tion of the target variable. Chapter 3 focuses on two popular biological problems—the
genome-wide association studies (GWAS) and gene expression (DNA microarray) data
analysis—as examples of practical applications with different properties. A simplify-
ing assumption that is traditionally adopted in GWAS and often realized in practice
is that only a very small number of almost uncorrelated input variables (predictors),
Introduction 3
significance of stability based on such null hypothesis. This method appears to signif-
icantly impact the stability results and provides a better, significance-based approach to
stability evaluation. Also, chapter 7 proposes that spatial smoothing be used as a simple
way of improving stability without sacrificing much of prediction accuracy. Studies of
predictive accuracy versus model stability, as defined in that chapter, also demonstrate
that the two metrics can be positively correlated, though highly nonlinearly; thus, as
observed in prior work, including chapter 6, equally predictive models may have quite
different stability, and clearly more stable ones are preferred for the purpose of neuro-
scientific interpretation.
Since highly efficient sparse recovery techniques are essential in large-scale appli-
cations, chapter 8 focuses on improving the efficiency of sparse recovery methods by
using sequential testing approaches. Unlike traditional (nonsequential) sparse recov-
ery, sequential (adaptive) approaches make use of information about previously taken
measurements of an unknown sparse signal when deciding on the next measurement.
While the standard procedures require the number of measurements logarithmic in the
dimension of the signal in order to recover the signal accurately, sequential procedures
require the number of measurements logarithmic in the sparsity level, that is, the num-
ber of nonzeros. This can lead to a dramatic reduction in the number of measurements
when the signals are sufficiently sparse. The chapter considers two motivating appli-
cations: a biological one, concerned with identifying a small subset of a large number
of genes (e.g., more than 13,000 genes in a fruit fly) that are involved in virus replica-
tion using single-deletion strains, and an engineering application known as cognitive
radio, where the task is to quickly perform spectrum sensing (identification of currently
unused bands of the radio spectrum). Chapter 8 discusses the advantages of a novel
sequential testing procedure, sequential thresholding, which does not require knowl-
edge of underlying data distributions and the sparsity level (unlike the standard sequen-
tial probability ratio test (SPRT)), is very simple to implement, and yet is nearly optimal.
The chapter also provides a historic overview of the sequential testing field and sum-
marizes key theoretical results in this domain.
Algorithmic aspects of sparse recovery are further explored in chapter 9. Two
novel sparse recovery methods are proposed that, unlike most of their predecessors,
combine two types of sparsity-enforcing regularizers, or priors: convex l1 -norm and
nonconvex l0 -norm (the number of nonzeros, or sparsity level). Interestingly, this com-
bination results in better empirical performance as compared to state-of-the-art Lasso
solvers and also allows better theoretical sparse recovery guarantees based on weaker
assumptions than traditionally used in sparse recovery. One of the algorithms, called
the game-theoretic approximate matching estimator (GAME), reformulates the sparse
approximation problem that combines both l1 - and l0 -norm regularizers as a zero-sum
game and solves it efficiently. The second algorithm, combinatorial selection and least
absolute shrinkage (CLASH), leads to even better empirical performance than GAME but
requires stronger assumptions on the measurement matrix for estimation guarantees.
Chapter 10 considers the problem of learning sparse latent models, that is, mod-
els including unobserved, or latent, variables. This problem is often encountered in
6 Chapter 1 Irina Rish and colleagues
applications such as text or image analysis, where one might be interested in finding
a relatively small subset of (hidden) topics or dictionary elements that accurately approx-
imate given data samples. That chapter advocates using Bayesian sparsity-enforcing
methods with various sparsity-enforcing priors that go beyond the standard Laplace
prior corresponding to popular l1 -norm minimization. (Note that maximizing Laplace
log-likelihood is equivalent to minimizing the l1 -norm, and thus maximum a posteri-
ori (MAP) inference with Laplace prior is equivalent to standard l1 -norm minimiza-
tion.) Specifically, Chapter 10 focuses on the spike-and-slab prior and demonstrates
on multiple real-life data sets, including analysis of natural scenes, human judgments,
newsgroup text, and SNPs data, that this approach consistently outperforms the
l1 -norm-based methods in terms of predictive accuracy. However, this is a classic exam-
ple of accuracy versus (computational) efficiency trade-off, since Bayesian approaches
based on Markov Chain Monte Carlo (MCMC) inference can be considerably slower
than the l1 optimization. Overall, the message of the chapter is that the Laplace prior
that gives rise to l1 -norm formulations is just one out of many possible ways of enforc-
ing sparsity, and depending on a particular application and modeling goals, other pri-
ors may be preferred. While the current literature on sparse modeling is heavily biased
towards l1 -norm-based approaches, chapter 10 provides a convincing argument for more
widespread use of alternative sparsity-enforcing techniques.
Learning latent variable models, or topic models, is also the focus of chapter 11.
This chapter is motivated by computer vision applications, such as scene analysis and
event detection from video. An example considered here involves the scene analysis of
traffic videos taken at a busy intersection, where many vehicle- and pedestrian-related
activities occur simultaneously, and one would like to identify key activity components,
or sequences (motifs), corresponding to the car and pedestrian movements and perform
event detection. The problem is similar to the detection of changing topics using topic
models in text analysis but is also much more complex and challenging, since there
are multiple simultaneous activities, and no prior knowledge is given about the number
of such activities in the scene. The chapter reviews some sparsity-enforcing methods
for topic modeling and focuses specifically on a topic-based method for temporal activ-
ity mining that extracts temporal patterns from documents where multiple activities
occur simultaneously. Sparsity is enforced on the motif start time distributions of the
probabilistic latent sequential motif (PLSM) model, using information-theoretic formu-
lation. Empirical results on simulated data and real-life video suggest that sparsity con-
straint improves the performance of the method and makes the model more robust in the
presence of noise.
C H A P T E R 2
The Challenges of Systems Biology
Pablo Meyer and Guillermo A. Cecchi
Biology oozes with complexity, from viruses to multicellular organisms. While the
complete physiology of a vertebrate animal, with its brain included, may apparently
dwarf that of a single cell, the intricacy of the interlocking mechanisms that account for
generic and type-specific cellular mechanisms is bewildering in itself. Eukaryotic cells,
for instance, need to coordinate a vast number of processes such as DNA transcription
into RNA, translation of RNA into the amino acid chains that make up proteins, trans-
port of proteins in and out of the nucleus, energy storage, regulation of protein synthesis
in response to sensed external signaling and genetically determined programs. The ini-
tial response to this complexity in the early years of modern molecular biology was
to develop a theoretical perspective that associated specific cellular functions and dis-
eases, such as circadian rhythms or cancer, with one or a handful of genes. It is still
quite common to find journalistic accounts and even scholarly articles on the “gene
for X.” For similar reasons, neuroscience has been also dominated by the grandmother
cell doctrine, the idea that each sufficiently elaborate mental function is reflected in the
activity of a specific neuron.
However, over the past two decades molecular biology has experienced a qual-
itative increase in the amount of data produced to answer its key scientific ques-
tions, forcing the transformation from molecular to systems biology. Molecular biology
tries to discover the missing molecular links between phenotype and genotype, that
is, to find the genes responsible for a particular phenotype/disease. The revolution of
genome sequencing led to new computational methodologies allowing the comparison
and study of species at the whole genome level (Loots 2008). Hence genes responsi-
ble for innate immunity in the fruit fly could be inferred in humans via gene sequence
comparison. Gene comparisons, however, are not enough. The function of genes does
not rely only on their sequence but also on their spatiotemporal expression resulting
from complex regulatory processes. With the advent of high-throughput technologies,
omics1 data types have provided quantitative data for thousands of cellular components
across a variety of scales, or systems. For instance, genomics provides data on a cell’s
8 Chapter 2 Pablo Meyer and Guillermo A. Cecchi
such as the 1000 genomes project (Clarke et al. 2012). It has also allowed genome-wide
association studies (GWAS), where researchers take an unbiased survey of common
single-nucleotide polymorphisms (SNPs) across the genome and look for alleles whose
presence correlates with phenotypic traits such as disease.
SNPs are defined by a single-nucleotide variant in a DNA fragment of the genome
across individuals of the same species or in paired chromosomes of the same individual.
While SNPs tend to be found more in noncoding regions, increasing evidence indicates
that these regions are functionally relevant. It is expected that differences between indi-
viduals in susceptibility to disease and response to treatment are associated with these
genetic variations. GWAS are designed to scan the entire genome for these associations
between SNPs and disease, emerging potentially from millions of single-nucleotide vari-
ants. The utter dimensionality of the genome as target for variants poses a significant
challenge from a computational point of view, compounded with the current lack of
generative models that can connect SNPs and function in a mechanistic way.
As an example, the first GWAS study reported that patients with macular degen-
eration carry two SNPs with altered allele frequency compared to the healthy control
group (Klein et al. 2005). As the example highlights, this approach faces the challenge
of detecting a handful of variables out of several thousands or tens of thousands. More-
over, the molecular mechanisms that link these SNPs with the disease are completely
unclear, as is the extent to which other SNPs, perhaps with individually weaker sta-
tistical associations, may also contribute collectively to patients’ susceptibility to the
disease. However, hundreds of disease-related gene candidates have been found since
then, although most have only a modest effect (McCarthy et al. 2008). A more recent
example, using sparse (l1 -regularized) regression techniques, identified the risk loci
common to five major psychiatric disorders (schizophrenia, major depression, autism
spectrum, bipolar, and attention deficit hyperactivity disorders) and a subset of affected
genes involved in calcium channel signaling, which at least points in the direction of
biological interpretability (Smoller et al. 2013).
Genome sequencing has also facilitated the production of DNA microarrays to
generate genome-wide gene expression profiles based on the Watson-Crick base pair
complementarity of DNA. mRNA extracted from tissues or cells is commonly reverse-
transcribed into cDNA and hybridized into small glass or silicon arrays where a
short section of each of the expressed genes has been attached. The amount of DNA
hybridized is measured with fluorescent markers attached to the short DNA sections
printed in the arrays and reflect the amount of mRNA present in the biological sample.
This field of functional genomics has extended the classical gene-by-gene approach to
find sets of genes that are differentially expressed in cases of disease, such as in breast
cancer where 70 genes are used as a signature for diagnosis and prevention (van ’t Veer
et al. 2002). The extent of functional genomics growth is exemplified in the database
ArrayExpress containing publicly accessible microarray data from 2,284 different exper-
iments, 97,006 assays in 20,458 conditions. DNA sequencing and gene expression have
been recently engulfed in the revolution of new sequencing techniques (Gunderson et al.
2004; Rothberg et al. 2011) by which sequence and expression levels can be extracted;
10 Chapter 2 Pablo Meyer and Guillermo A. Cecchi
they rely on a higher number of sequencing repeats per nucleotide, also called depth.
Deep-sequencing of mRNA transcripts, also called RNA-seq, can detect 25 percent more
genes than microarrays as well as previously unidentified splicing events (Sultan et al.
2008).
generate proteins. Consisting of the three main steps of initiation, elongation, and ter-
mination, translation is a central cellular process with ramifications related to all biolog-
ical and clinical research, including human health (Kimchi-Sarfaty et al. 2007; Coleman
et al. 2008; Lee et al. 2006; Bahir et al. 2009; van Weringh et al. 2011; Vogel et al. 2010;
Pearson 2011; Lavner and Kotlar 2011; Comeron 2006), biotechnology (Gustafsson,
Govindarajan, and Minshull 2004; Kudla et al. 2009; Plotkin and Kudla 2010; Supek
and Smuc 2010), evolution (Bahir et al. 2009; van Weringh et al. 2011; Drummond and
Wilke 2008, 2009; Shah and Gilchrist 2010a, 2010b; Plata, Gottesman, and Vitkup 2010;
Bulmer 1991; Sharp and Li 1987), functional genomics (Danpure 1995; Lindblad-Toh
et al. 2011; Schmeing et al. 2011; Warnecke and Hurst 2010; Zhou, Weems, and Wilke
2009; F. Zhang et al. 2010; Fredrick and Ibba 2010), and systems biology (Bahir et al.
2009; Shah and Gilchrist 2010a; Fredrick and Ibba 2010; Z. Zhang et al. 2010; Man and
Pilpel 2007; Cannarozzi et al. 2010; Schmidt et al. 2007; Elf et al. 2003). There has been
a long-standing debate regarding the rate-limiting stage of translation and whether ini-
tiation or elongation is the bottleneck (Gustafsson, Govindarajan, and Minshull 2004;
Kudla et al. 2009; Burgess-Brown et al. 2008; Supek and Smuc 2010). If the initiation
step is relatively slow compared to elongation, codon bias (i.e., which bases in the third
position are preferred by ribosomes) should not affect the translation rate. However, if
initiation is fast relative to elongation, codon bias should have substantial influence on
protein levels. Additionally, determining which variables of mRNA transcripts are rel-
evant to initiation efficiency is yet not fully resolved, with recently reassessed features
such as mRNA folding strength (Tuller, Waldman et al. 2010) and the nucleotide con-
text of the first start codon ATG at the beginning of the open reading frame (ORF) (Kozak
2005) providing only very weak correlations with protein levels. Finally, it is not clear
if ORF features affect the elongation rate or which features are relevant to elongation
or how they affect translation efficiency (Kudla et al. 2009; Tuller, Waldman et al. 2010;
Welch et al. 2009; Ingolia, Lareau and Weissman 2011; Frenkel-Morgenstern et al. 2012).
Various features related to the translation process (e.g., protein levels, ribosomal
densities, initiation rates) have been taken into account in various model organisms (see
(Tuller, Waldman et al. 2010; Tuller, Kupiec and Ruppin 2007; Zur and Tuller 2012a;
Tuller 2011; Tuller, Veksler et al. 2011; Zur and Tuller 2012b; Reuveni et al. 2011; Tuller,
Carmi et al. 2010)) and to engineer gene translation (Dana and Tuller 2011). A gen-
eral predictor can be based on the different features of the untranslated region (UTR)
(e.g., small ORFs in the UTR named uORFs, GC content, mRNA folding in different parts
of the UTR), the ORF (e.g., codon frequencies and order, amino acid bias, ORF length),
mRNA levels, number of available ribosomes, and degradation rates when available.
Predictors may also be based on machine learning approaches or biophysical models
(Kudla et al. 2009; Welch et al. 2009; Reuveni et al. 2011). The challenge in inferring
causal relations between features of the transcripts and their expression levels is related
to the fact that highly expressed genes are often under evolutionary selection for various
features that do not improve translation. Thus, these features have significant correla-
tion with protein levels of a gene that is not causal or one that does not affect translation
efficiency. For example, highly expressed genes are under selection for features such
as increased mRNA self-folding to prevent aggregation of mRNA molecules (because of
12 Chapter 2 Pablo Meyer and Guillermo A. Cecchi
potential interaction with other genes), even though for a certain gene not interacting
with other mRNA molecules, increased mRNA folding may actually decrease translation
efficiency (Tuller, Veksler et al. 2011; Zur and Tuller 2012b).
The number of transcription factors is relatively large, and the density per gene
depends on the specific species. The human genome contains more than 2,500 bind-
ing sites, in high likelihood corresponding to a similar number of transcription factors.
Transcriptional regulation is involved in most cellular functions: in mature, fully differ-
entiated cells they control housekeeping, mostly through the precise timing of expres-
sion; they regulate the processes associated with the development of an organism and
the differentiation of cells; they are a necessary mechanism for cells to response and
adapt to environmental challenges or normal signals. However, from the point of view
of computational complexity, a remarkable feature of transcription factors is that they
also act on themselves. That is, they form a network of interactions of a highly dynamic
nature (time is of the essence for regulatory purposes), which only in some cases can
be reduced to Boolean functions of a handful of inputs. As such, the small motifs they
form lend themselves to engineering-type analysis as signal processing and detection
devices (Alon 2006). Larger-scale network motifs, however, have been more difficult
to interpret, and their study has relied on statistical characterization and comparison
with generic network models such as small-world and scale-free topologies (Jeong et al.
2000). Moreover, these larger motifs pose a significant computational problem because
search algorithms scale supralinearly with the number of nodes in the network (Ma’ayan
et al. 2008).
The challenges associated with the analysis of reconstructed networks are
compounded with the basic problem of validating the reconstruction itself. Given the
intricate nature of interactions giving rise to function, traditional approaches to net-
work validation based on targeted biochemical interventions, for instance, knock-ins
and knock-outs, are of limited applicability. The notion of model validation through
prediction has taken root recently in the systems community. In particular, the Dia-
logue on Reverse Engineering Assessments and Methods (DREAM) is a project designed
to evaluate model predictions and pathway inference algorithms in systems biology
(Stolovitzky, Prill, and Califano 2009). DREAM is structured in the form of challenges
that comprise open problems presented to the community, whose solutions are known
to the organizers but not to the participants. Participants submit their predictions of
the solutions to the challenges, which are evaluated by the organizers so that rigor-
ous scrutiny of scientific research based on community involvement is possible. In
its most recent edition, the DREAM consortium evaluated more that 30 network infer-
ence methods on microarray data from eukaryotic and prokaryotic cells. Sparse regres-
sion methods performed particularly well for linear network motifs (cascades), whereas
more complex motifs such as loops proved quite difficult across all inference meth-
ods. Interestingly, eukaryotic networks also proved more difficult than prokaryotic ones,
possibly related to that higher degree of post-transcriptional regulation in the former,
which makes the correlation between the levels of mRNA of transcription factors and
their corresponding targets weaker than in the latter. However, the method aggregation
approach resulted in a significantly improved reconstruction accuracy: by integrating
predictions from multiple methods, networks including close to 1,700 transcriptional
interactions were identified with high precision for each of E. coli (prokaryotic) and
14 Chapter 2 Pablo Meyer and Guillermo A. Cecchi
S. aureus (eukaryotic) cells. Moreover, the study identified more than 50 novel interac-
tions, of which close to half were experimentally confirmed (Marbach et al. 2012).
5 OUTLOOK
The high-dimensional nature of cellular processes and the inevitable sources of noise
in data make learning statistical models in this field particularly prone to generalization
errors. Thus, regularization approaches, such as sparse regression, become an essential
tool for improving prediction accuracy as well as for the validation and parameter esti-
mation of mechanistic, interpretable models in biomedical and clinical applications.
Specific applications of sparse modeling in the context of systems biology are discussed
in chapters 3, 4, and 5.
So far, we have focused on systems biology, but similar challenges are confronted
by researchers trying to make sense of neuroscientific data, in particular, those produced
by multielectrode arrays and brain imaging. The technology of arrays is in accelerated
development, and while at present arrays consist of fewer than 1,000 electrodes, typi-
cally sampled at the high-end spiking frequency of 1 kHz, potentially a few orders of
magnitude more electrodes may be recorded and sampled at higher frequencies if mem-
brane potentials are considered (Nicolelis and Lebedev 2009). However, it is in the con-
text of brain imaging that sparse modeling has shown the most promising results (Carroll
et al. 2009). In particular, fMRI can at present record the activity of about 30,000 brain
voxels, sampled at 0.5 to 1 Hz. Given that for humans scanning time is typically lim-
ited to a few minutes, samples are limited to less than 1,000 independent volumes, and
therefore multivariate models are severely underdetermined. Chapters 6 and 7 address
issues arising in sparse modeling of fMRI data, particularly the stability of sparse models
across multiple subjects and experiments.
Finally, the near future is very likely to witness the increasing convergence of sys-
tems biology data with other organism-level measurements, such as heart and brain
imaging, as well as the myriad behavioral markers routinely utilized by clinicians
(e.g., temperature, blood pressure, skin conductance, tremors, speech). We envision an
integrated approach to the simultaneous characterization of genotypic and phenotypic
features related to diseases ranging from Alzheimer’s and Parkinson’s to autism and
schizophrenia, for the purpose of better prognosis and drug development. In this hypo-
thetical (but realistic) landscape of flooding data, sparse modeling will be an essential
tool for the challenges of an augmented systems biology.
NOTE
1. Omics is a general term referring to biological subfields such as genomics, proteomics, metabolomics,
and transcriptomics. Genomics is a subfield of genetics focused on sequencing, assembling, and analyz-
ing the function and structure of genomes, that is, the complete set of DNA within a single cell of an
The Challenges of Systems Biology 15
organism. Proteomics is studying the structure and function of proteins, and metabolomics is concerned
with chemical processes involving metabolites, (the intermediates and products of metabolism). The tran-
scriptome is the set of all RNA molecules (mRNA, rRNA, tRNA, and other noncoding RNA); the fields of
transcriptomics, or expression profiling, analyzes the expression level of mRNAs in a given population of
cells, often using methods such as DNA microarray technology.
REFERENCES
Alon, U. An Introduction to Systems Biology. Chapman and Hall, 2006.
Bahir, I., et al. Viral adaptation to host: A proteome-based analysis of codon usage and
amino acid preferences. Molecular System Biology 5(311):1–14, 2009.
Çakir, T., et al. Integration of metabolome data with metabolic networks reveals reporter
reactions. Molecular Systems Biology 2(Oct.), 2006.
Cannarozzi, G., et al. A role for codon order in translation dynamics. Cell 141(2):355–
367, 2010.
Clarke, L., et al. The 1000 genomes project: Data management and community access.
Nature Methods 9(5):459–462, 2012.
Coleman, J. R., et al. Virus attenuation by genome-scale changes in codon pair bias.
Science 320(5884):1784–1787, 2008.
Dana, A., and T. Tuller. Efficient manipulations of synonymous mutations for controlling
translation rate. Journal of Computational Biology 19(2):200–231, 2011.
Danpure, C. J. How can the products of a single gene be localized to more than one
intracellular compartment? Trends in Cell Biology 5(6):230–238, 1995.
Elf, J., et al. Selective charging of tRNA isoacceptors explains patterns of codon usage.
Science 300(5626):1718–1722, 2003.
Fredrick, K., and M. Ibba. How the sequence of a gene can tune its translation. Cell
141(2):227–229, 2010.
Frenkel-Morgenstern, M., et al. Genes adopt nonoptimal codon usage to generate cell
cycle-dependent oscillations in protein levels. Molecular Systems Biology 8(572):572,
2012.
Gunderson, K. L., et al. Decoding randomly ordered DNA arrays. Genome Research
14(5):870–877, 2004.
Gustafsson, C., S. Govindarajan, and J. Minshull. Codon bias and heterologous protein
expression. Trends in Biotechnology 22(7):346–353, 2004.
Jeong, H., et al. The large-scale organization of metabolic networks. Nature 407:651–654,
2000.
Kimchi-Sarfaty, C., et al. A silent polymorphism in the MDR1 gene changes substrate
specificity. Science 315(5811):525–528, 2007.
Kochetov, A. V. Alternative translation start sites and their significance for eukaryotic
proteomes. Molecular Biology 40(5):705–712, 2006.
Lander, E. S., et al. Initial sequencing and analysis of the human genome. Nature
409(6822):860–921, 2001.
Lavner, Y., and D. Kotlar. Codon bias as a factor in regulating expression via translation
rate in the human genome. Gene 345(1):127–138, 2005.
Lee, J. W., et al. Editing-defective tRNA synthetase causes protein misfolding and neu-
rodegeneration. Nature 443(7107):50–55, 2006.
Ma’ayan, A., et al. Ordered cyclic motifs contribute to dynamic stability in bio-
logical and engineered networks. Proceedings of the National Academy of Sciences
105(49):19235–19240, 2008.
Marbach, D., et al. Wisdom of crowds for robust gene network inference. Nature Methods
9(8):796–804, 2012.
McCarthy, M. I., et al. Genome-wide association studies for complex traits: Consensus,
uncertainty and challenges. Nature Reviews Genetics 9(5):356–369, 2008.
Pfau, T., N. Christian, and O. Ebenhoh. Systems approaches to modelling pathways and
networks. Briefings in Functional Genomics 10(5):266–279, 2011.
Plata, G., M. E. Gottesman, and D. Vitkup. The rate of the molecular clock and the cost
of gratuitous protein synthesis. Genome Biology 11(9):R98, 2010.
Plotkin, J. B., and G. Kudla. Synonymous but not the same: The causes and conse-
quences of codon bias. Nature Reviews Genetics 12(1):32–42, 2010.
Reuveni, S., et al. Genome-scale analysis of translation elongation with a ribosome flow
model. PLoS Computational Biology 7(9):e1002127, 2011.
Schmeing, T. M., et al. How mutations in tRNA distant from the anticodon affect the
fidelity of decoding. Nature Structural and Molecular Biology 18(4):432–436, 2011.
Schmidt, M. W., et al. Comparative proteomic and transcriptomic profiling of the fission
yeast Schizosaccharomyces pombe. Molecular Systems Biology 3:79, 2007.
Shah, P., and M. A. Gilchrist. Effect of correlated tRNA abundances on translation errors
and evolution of codon usage bias. PLoS Genetics 6(9):e1001128, 2010a.
———. Explaining complex codon usage patterns with selection for translational effi-
ciency, mutation bias, and genetic drift. Proceedings of the National Academy of Sci-
ences 108(25):10231–10236, 2010b.
Smoller, J. W., et al. Identification of risk loci with shared effects on five major psychi-
atric disorders: A genome-wide analysis. Lancet 381(9875):1371–1379, 2013.
Stolovitzky, G., R. J. Prill, and A. Califano. Lessons from the DREAM2 Challenges.
Annals of the New York Academy of Sciences 1158:159–195, 2009.
Sultan, M., et al. A global view of gene activity and alternative splicing by deep sequenc-
ing of the human transcriptome. Science 321(5891):956–960, 2008.
Supek, F., and T. Smuc. On relevance of codon usage to expression of synthetic and
natural genes in Escherichia coli. Genetics 185(3):1129–1134, 2010.
Tuller, T., A. Carmi et al. An evolutionarily conserved mechanism for controlling the
efficiency of protein translation. Cell 141(2):344–354, 2010.
Tuller, T., M. Kupiec, and E. Ruppin. Determinants of protein abundance and translation
efficiency in S. cerevisiae. PLoS Computational Biology 3(12):2510–2519, 2007.
Tuller, T., I. Veksler, et al. Composite effects of gene determinants on the translation
speed and density of ribosomes. Genome Biology 12(11):R110, 2011.
Tuller, T., Y. Waldman, et al. Translation efficiency is determined by both codon bias
and folding energy. Proceedings of the National Academy of Sciences 107(8):3645–3650,
2010.
van Weringh, A., et al. HIV-1 modulates the tRNA pool to improve translation efficiency.
Molecular Biology and Evolution 28(6):1827–1834, 2011.
The Challenges of Systems Biology 19
van ’t Veer, L. J., et al. Gene expression profiling predicts clinical outcome of breast
cancer. Nature 415(6871):530–536, 2002.
Vogel, C., et al. Sequence signatures and mRNA concentration can explain two-thirds of
protein abundance variation in a human cell line. Molecular System Biology 6(400):1–9,
2010.
Warnecke, T., and L. D. Hurst. GroEL dependency affects codon usage: –Support for a
critical role of misfolding in gene evolution. Molecular Systems Biology 6(340):1–11,
2010.
Welch, M., et al. Design parameters to control synthetic gene expression in Escherichia
coli. PLoS One 4(9):1–10, 2009.
Zhang, Z., et al. Nonsense-mediated decay targets have multiple sequence-related fea-
tures that can inhibit translation. Molecular Systems Biology 6(442):1–9, 2010.
Zhou, T., M. Weems, and C. O. Wilke. Translationally optimal codons associate with
structurally sensitive sites in proteins. Molecular Biology and Evolution 26(7):1571–
1580, 2009.
Zur, H., and T. Tuller. RFMapp: Ribosome flow model application. Bioinformatics
28(12):1663–1664, 2012a.
———. Strong association between mRNA folding strength and protein abundance in S.
cerevisiae. EMBO Reports 13:272–277, 2012b.
C H A P T E R 3
Practical Sparse Modeling
An Overview and Two Examples
from Genetics
Saharon Rosset
q
E(Y|x) 5 b l x jl ,
l51
to capture the effect of genotype on the phenotype. It is usually assumed (and invari-
ably confirmed by GWAS results) that only a small number of SNPs are associated with
any specific phenotype. Thus, the GWAS-based model describing the dependence of the
phenotype on SNP genotypes is expected to be sparse, usually extremely sparse1 . This
example is discussed further in the next section.
A second class of relevant problems is gene microarray modeling. Before the
advent of GWAS, the major technology geared toward finding connections between
genetic and phenotypic information was to measure gene expression levels in differ-
ent individuals or different tissues. In this mode, the quantities being measured are the
expressions or activity levels of actual proteins. Proteins are encoded by genes, which
are fragments of the genome. Hence, gene expression experiments can be thought of as
measuring the association between genomic regions and phenotypes except that this is
done through the actual biological mechanisms as expressed in proteins rather than by
direct inspection of genetic sequences, as in GWAS. Not surprisingly, gene expression
analysis also typically assumes that only a few genes are actually directly related to
the phenotype of interest. Thus, this is also a sparse modeling situation, although the
statistical setup has some major differences from the GWAS.
Fundamentally, sparse recovery approaches pursue the following two major
goals:
Earlier methods for sparse recovery used to fall into one of the two main cate-
gories:
they combine variable selection with estimation of model parameters by setting some
of the parameters to zero. Such sparse techniques can provably succeed in situations
where both wrapper and filter methods are unlikely to result in successful recovery.
A detailed technical review of this class of methods is omitted here; rather, a qualitative
description of these approaches and their properties is given. 1 -type methods all share
some version of the same basic (and quite intuitive) conditions for success in sparse
recovery:
Three qualitative levels of sparsity are considered: very sparse, where the number
of important variables is O(1); sparse, where the number is O(n), n being the number
of samples, and not sparse otherwise. Three qualitative levels of correlation between
nonzero covariates and other covariates are considered: uncorrelated/orthogonal, low
correlation, as defined in the 1 sparse recovery literature, and high correlation. The
genetic motivating applications can be characterized in terms of these dimensions: in
the GWAS example, it is typically assumed that the model is very sparse and the nonzero
covariates (SNPs) almost uncorrelated between them and with almost all zero covariates;
in the gene expression modeling example, it is typically assumed that large groups of
covariates (genes) may have high correlation within them but low correlations between
groups, and so the sparse situation pertains (Leung and Cavalieri 2003).
Considering which sparse recovery approaches fit which situation, one can make
some observations: (1) in the very sparse situation, combinatorial wrapper approaches
are often likely to do well, in particular, if one assumes q is very small and pq is
24 Chapter 3 Saharon Rosset
F i g u r e 3.1
A schematic view of sparse modeling scenarios.
observations (individuals) n in the thousands, and there are also the following statis-
tical characteristics:
• Frequently, only a very small number of SNPs are associated with the pheno-
type y, typically ten or fever. Thus, it is clearly the very sparse scenario.
• The vast majority of SNP pairs are uncorrelated. This is owing to the recombina-
tion process driving the SNP-SNP correlation in the genome. SNPs that are far
from each other on the genome, and certainly SNPs on different chromosomes,
are in linkage equilibrium, meaning they are completely uncorrelated, because
of being separated by many recombination events in the genetic history of the
sample being considered. Hence one can assume that each SNP is correlated
only with a tiny fraction of all other SNPs, and typically all truly associated
SNPs are uncorrelated between them. However, keep in mind that every SNP
typically has some neighboring SNPs that are in high correlation with it.
The first two questions are addressed here, starting from the second: Is selec-
tion by p-value justified? To frame the discussion theoretically, let’s assume a standard
univariate linear regression formulation, where
y 5 bT x 1 e , e ∼ N(0, s 2 ),
and assume for simplicity s 2 is known and there is only one truly associated SNP. In
other words, we assume all bj are zero except for one. The coordinates xj can be highly
correlated and the dimensionality of the problem is not too high (i.e., assume concentra-
tion in the genomic region around the true association). The primary goal is to identify
the SNP j0 with the true association.
A statistical approach in this situation is to use the maximum likelihood (ML) esti-
mation. Assuming the noise distribution is Gaussian and only one coefficient is nonzero,
it is easy to see that ML estimation in this case amounts to finding the univariate model
with the minimal residual sum of square (RSS):
n
jˆ0 5 arg min (yi 2 bj xij )2 .
j,bj
i51
How does this compare to selecting jˆ0 as the SNP attaining the minimal p-value in per-
forming a z-test on the coefficient of the SNP (or equivalently, a test for the univariate
model against the null model)? As it turns out, the two are completely equivalent in this
case, in the sense that ranking of the SNPs according to RSS is identical to their ranking
according to z-test p-values. To see this, denote for SNP j the sum of squares of x·j by
Sxxj 5 i xij2 2 nx̄·j , and denote Sxyj 5 i xij yi 2 nx̄·j ȳ and Syy 5 i yi2 2 nȳ.
Then the coefficient of the regression of y on xj is bj 5 Sxyj /Sxxj and the p-value
of the z-test is
2|bj |
pj 5 2 ∗ F ,
s 2 /Sxxj
where F(·) is the cumulative standard normal distribution function. Note that this
expression is a monotone function of
|bj | Sxyj
~ .
s 2 /Sxx j Sxxj
From the standard theory of linear regression it follows that the best RSS for the uni-
variate model with SNP j is
2
nSxyj2 Sxyj
RSS(b̂j ) 5 Syy 2 5 Syy 2 n ,
Sxxj Sxxj
Practical Sparse Modeling 27
which is also clearly a monotone function of Sxyj / Sxxj . Thus, selecting the lowest
p value or using ML is mathematically equivalent.
This perfect equivalence breaks down once one moves away from the simplest lin-
ear regression setting. For example, consider a logistic regression setup, where GWAS
typically uses the Wald statistic for p value calculation (McCullagh and Nelder 1989).
This is based on a quadratic approximation of the likelihood around the estimate.
Selecting the SNP that gives the lowest p value is no longer equivalent to selecting
the one that gives the best likelihood in a univariate model. One would intuitively
expect that the maximum likelihood approach would be slightly better than the p-value-
based approach. To demonstrate that this is indeed the case, consider a simplistic sim-
ulation. Assume there are two SNPs, with xi1 ∼ N(0, 1) and xi2 5 xi1 1 r · N(0, 1), and
P(yi 5 1|xi ) 5 exp(xi1 )/(1 1 exp(xi1 )). Thus, SNP 1 is the true association, but the two
SNPs are correlated with
√
cor(x·1 , x·2 ) 5 1/ 1 1 r 2 .
Examine the rate of success of both approaches in identifying SNP 1 as the more highly
associated, as a function of r. Results are given in figure 3.2. As expected, the success
rate of both approaches is similar, but the approach based on likelihood is slightly better
for all values of r.
0.90
likelihood
p−value
0.85
0.80
% correct
0.75
0.70
0.65
0.60
F i g u r e 3.2
Percentage of cases the correct true association is identified by maximum likelihood and Wald test p value in a
logistic regression setup. The maximum likelihood criterion is slightly superior for all levels of correlation.
28 Chapter 3 Saharon Rosset
To summarize the discussion of the use of p-values for model selection: this crite-
rion is generally similar to using maximum likelihood but could be inferior, depending
on the approximations used for calculating p-value, which may break down the equiv-
alence.
The other question to be addressed pertains to the use of univariate models, as
opposed to multivariate sparse modeling approaches like Lasso (Tibshirani 1996). Con-
sider again a genomic region with correlated SNPs, where at most one SNP is associated,
and one would like to compare the use of univariate models to find the associated SNP
to the use of Lasso or similar methods. The Lasso formulation,
b̂(l) 5 arg min (yi 2 bT xi )2 1 lb1 , (3.1)
b
i
Figure 3.3 presents the results. The x-axis is the Lasso constraint (in its Lagrange-
equivalent constrained form), and the y-axis is the percentage of correct identification
of the first explanatory variable as the best association. The univariate approach and
the standardized Lasso with small constraint (high penalty) are much better than the
other two approaches. On the simulation data, there were a few examples where the
standardized Lasso added the wrong variable first but then for higher constraint values
the order of absolute coefficients reversed and the first variable was correctly chosen.
Hence, there is a range of constraint around 0.4 where the Lasso does very slightly bet-
ter than univariate. The generality of this phenomenon requires further research. Not
Practical Sparse Modeling 29
0.90
0.85
0.80
% correct selection
0.75
0.70
0.65
Univariate
Least squares
LASSO−standardized
0.60
LASSO−nonstand
F i g u r e 3.3
Success of different variable selection schemes on a simulated GWAS example.
surprisingly, the least squares approach and the nonstandardized Lasso are far inferior
in their model selection performance.
To summarize the analysis of univariate GWAS tests, it has been shown that the
common practice of using p-values for selection is generally similar to using maximum
likelihood, although the latter may be slightly superior in some cases. Also, under the
assumption of a very sparse problem with almost uncorrelated variables, the univariate
approach works quite well and is comparable to multivariate approaches such as Lasso
for the purpose of identifying the associated SNPs.
The third question, how the selection should be affected by follow-up study
design, has not been discussed. As a simple example, if planned follow-up work is
a search for the biological mechanisms underlying statistical associations, then it may
make sense to bias modeling toward identification of associations in biologically plausi-
ble genomic regions (such as inside genes). This can be accomplished by using Bayesian
priors or other intuitive weighting schemes (Cantor, Lange, and Sinsheimer 2010). Fur-
ther discussion of this aspect is outside the scope of this chapter.
be different individuals, different tissues, or even the same tissue under different envi-
ronmental conditions. The most prevalent goal in analyzing gene expression data is to
identify which genes are associated with the response of interest, which can be disease
status, as in GWAS (in which case, the same case control design as in GWAS can be
used), a measure of the environmental conditions being applied (such as concentration
of sugar or temperature), and so on. The number of samples (n) is usually in the tens
or low hundreds, and the number of genes (p) is usually in the thousands or tens of
thousands; hence one is in the p . .n situation of wide data.
As in GWAS, it is usually assumed that the true association relation between
gene expression and the response is sparse or very sparse, in the sense that the true
dependence (e.g., conditional expectation) of the response on the gene expression can
be almost fully modeled using a few true genes. However, the correlation structure
among expressions of genes is much more complex than the correlation among SNPs,
since genes are organized in pathways and networks (Davidson and Levin 2005), which
interact and co-regulate in complex ways. It is usually not assumed that these interac-
tions and the resulting correlation structure are known; hence, one can consider this an
example of a sparse modeling scenario with arbitrary complex correlations between
the explanatory variables. In particular, one cannot assume that the few true genes
are uncorrelated as in the GWAS case. Hence, univariate approaches are unlikely to
properly address this situation, and although they had originally been used for gene
expression analysis, in particular, for identification of differentially expressed genes
(Leung and Cavalieri 2003), they have been surpassed in this task, too, by multivariate
approaches, which have been demonstrated to be much more effective (Meinshausen
2007; Wang et al. 2011). It should be noted that combinatorial variable selection wrap-
per approaches are unlikely to be relevant, since enumerating all sparse models with
several dozens of nonzero coefficients out of thousands is clearly intractable.
Another important difference between GWAS and gene expression analysis is that
in the latter case we are often interested in building an actual prediction model to
describe the relation between gene expression and the response rather than just identi-
fying the associated genes for further study (Leung and Cavalieri 2003). This also affects
the choice of models.
Since we are seeking a sparse prediction model in high dimension with limited
samples, Lasso-type methods are a natural approach to consider. The standard Lasso has
some major shortcomings in this situation:
• With p . .n, Lasso regularized models are limited to choosing at most n genes
in the model (Efron et al. 2004). This can become a problem in gene expres-
sion modeling with very few samples. Furthermore, Lasso typically selects
one representative from each group of highly correlated explanatory variables
(in gene expression, this could represent genes in a specific pathway). This is
not necessarily desirable, as there could be multiple independent associations
in the same path, or separating the true association from other genes that are
Practical Sparse Modeling 31
highly correlated with it can be very difficult. Hence a selection of a single gene
can be arbitrary or nonrepresentative.
• If one is interested in prediction, then the shrinkage Lasso performs on
its selected variables is likely to a lead to suboptimal predictive model
(Meinshausen 2007).
• Elastic Net (Zou and Hastie 2005), which adds a second quadratic penalty to
the Lasso formulation in (3.1), thus allowing solution with more than n distinct
features and similar coefficients for highly correlated features.
• Adaptive Lasso (Zou 2006), which adds weighting to the Lasso penalty of each
feature, using the least square coefficients as weights. This leads to favorable
theoretical properties and has also shown improved empirical performance.
• Relaxed Lasso (Meinshausen 2007), which uses Lasso for variable selection but
then fits a less regularized model in these variables only, thus partly avoiding
the excessive shrinkage behavior.
• VISA (Radchenko and James 2008), which implements a more involved version
of the same idea, of performing less shrinkage on the “good” variables Lasso
identifies than warranted by the Lasso solution.
• Random Lasso (Wang et al. 2011).
Random Lasso is described here in more detail, and the relative performance of these
algorithms is demonstrated on simulated and real gene expression data, following Wang
et al. (2011).
4 RANDOM LASSO
When many highly correlated features are present, one wants to consider the portion of
them that is useful for predictive modeling purposes. Lasso-type regularization tends to
pick one of them semiarbitrarily, which can be considered a model instability issue.
The statistics literature offers some recipes for dealing with instability, most pop-
ular among them Breiman’s proposals of Bagging and Random Forest (Breiman 2001).
The basic idea is to generate a variety of slightly modified versions of the data or mod-
ified versions of the model-fitting algorithm, generating a variety of different prediction
models that approximately fit the data. Then averaging these models has a stabilizing
effect, as one hopes that models not chosen for the original data would occasionally
get chosen when the data are changed. Empirically, this usually leads to much more
accurate prediction models (Breiman 2001).
32 Chapter 3 Saharon Rosset
1. Iterate B1 times:
a. Bootstrap sample the data and sub sample the features (two-dimensional sam-
pling).
3. Generate an importance measure for each variable, typically proportional to its aver-
age coefficient.
a. Bootstrap sample the data and subsample the features according to their impor-
tance measure.
5. The final model is the average of the B2 models from the second stage.
Detailed discussion of the motivation behind the exact formulation of the algo-
rithm is beyond the scope of this chapter, but a comparison of the various Lasso exten-
sions is shown here on simulation and real gene expression data.
In the simulation scenario there are p 5 40 variables. The first ten coefficients are
nonzero. The correlation between each pair of the first ten variables is set to be 0.9. The
remaining 30 variables are independent with each other and also independent with the
first ten variables. Let
and
y 5 bT x 1 e, , e ∼ N(0, 9).
Practical Sparse Modeling 33
T a b l e 3.1
Variable selection frequencies (%) of different methods for the simulation example
n 5 100
IV 69 82 76 62 62 99
UV 52 21 35 36 37 30
RME 505 313 471 487 487 132
IV, important variables; UV, unimportant variables; RME, relative model error (lower is better).
T a b l e 3.2
Analysis of the glioblastoma data set
5 SUMMARY
Practical applications of sparse modeling can possess quite different properties, and
thus selection of appropriate sparse methods should strongly depend on the problem
at hand. Particularly, as shown in this chapter, the specific type of sparsity and the
correlation structure across the covariates, or predictive variables, are two important
considerations, as well as the desired performance metrics for the model: successful
variable selection, favorable predictive performance, or both.
Two common problems from computational biology were considered as examples:
GWAS and gene expression analysis. In the case of the GWAS problem, where the main
goal is to identify associated SNPs for follow-up studies, the commonly used univariate
filter approach often appears to be sufficient, under the common assumptions of extreme
sparsity and uncorrelated covariates. However, in the case of gene expression analysis,
where the correlation structure among the variables is more complex, and both vari-
able selection and good predictive performance are equally important, a more complex
methodology is required. Accordingly, variants of Lasso were surveyed that aim to take
the specifics of the problem into account and accomplish both goals.
Chapter 4 focuses on GWAS problems, extending the traditional sparse
approaches discussed here to cases of more complex relation structure among the
covariates and also among the multiple output variables in GWAS.
NOTE
1. In recent years, random- and mixed-effect models have been used to demonstrate that there are likely
many more associations between genotype and phenotype that we are currently unable to discover (Yang
et al. 2010; Lee et al. 2011). Since current studies lack power to identify the specific SNPs underlying
these associations, this intriguing direction is outside of the scope of our discussion, which focuses on
the traditional fixed effects regression framework.
REFERENCES
Breiman, L. Random forests. Machine Learning 45:5–32, 2001.
Candès, E., and T. Tao. The Dantzig selector: Statistical estimation when p is much larger
than n. Annals of Statistics 35(6):2313–2351, 2007.
Practical Sparse Modeling 35
Davidson, E., and M. Levin. Gene regulatory networks. Proceedings of the National
Academy of Sciences 102(14):4935, 2005.
Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. Annals of
Statistics 32(2):407–499, 2004.
Guyon, I., and A. Elisseeff. An introduction to variable and feature selection. Journal of
Machine Learning Research 3:1157–1182, 2003.
Leung, Y. F., and D. Cavalieri. Fundamentals of cDNA microarray data analysis. Trends
in Genetics 19(11):649–659, 2003.
McCullagh, P., and J. Nelder. Generalized Linear Models. Chapman and Hall, 1989.
Meinshausen, N., and B. Yu. Lasso-type recovery of sparse representations for high-
dimensional data. Annals of Statistics 37(1):246–270, 2009.
Radchenko, P., and G. M. James. Variable inclusion and shrinkage algorithms. Journal of
the American Statistical Association, 103(483):1304–1315, 2008.
Tibshirani, R. Regression shrinkage and selection via the Lasso. Journal of the Royal
Statistical Society 58(1):267–288, 1996.
36 Chapter 3 Saharon Rosset
Wang, S., B. Nan, S. Rosset, and J. Zhu. Random lasso. Annals of Applied Statistics
5(1):468–485, 2011.
Zou, H. The adaptive Lasso and its oracle properties. Journal of the American Statistical
Association 101(476):1418–1429, 2006.
Zou, H., and T. Hastie. Regularization and variable selection via the Elastic Net. Journal
of the Royal Statistical Society Series B, 67(2):301–320, 2005.
High-Dimensional Sparse Structured
C H A P T E R 4
Input-Output Models, with
Applications to GWAS
Eric P. Xing, Mladen Kolar, Seyoung Kim, and Xi Chen
Efron et al. 2004; Beck and Teboulle 2009, and references therein) as well as theory on
generalization properties and variable selection consistency (see, e.g., Wainwright 2009;
Zhao and Yu 2006; Bickel, Ritov, and Tsybakov 2009; Zhang 2009).
Although a widely studied and popular procedure, Lasso was shown to be limited
in its power for selecting SNPs that are truly influencing complex traits. The main reason
is that regularization with the 1 -norm is equivalent to the assumption that the regres-
sion coefficients are independent variables (following Laplace priors) and hence cannot
model more complex relations among the predictors, such as, for example, group selec-
tion. Similarly, Lasso does not model potentially nontrivial relations among multiple
outputs. In practice, however, relations and structures among input or output variables
exist, which should be leveraged to improve the estimation procedure. For example,
module structures in gene co-expression patterns are often captured by gene networks
or hierarchical clustering trees. Thus, in an investigation for genetic effects on gene
expression traits, the module structures could be leveraged to improve the statistical
power by considering multiple related gene expression traits jointly to identify SNPs
influencing gene modules. Regarding input structures, it is well known in genetics that
in genomes there exist local correlation structures known as linkage disequilibrium,
nonlinear interaction among SNPs in their influence on traits, and population structure
often captured by different genotype frequencies in different populations.
These problems can be approached using structurally penalized linear regression,
where the penalty reflects some prior knowledge or structure of the problem, such as
relations among input or output variables. Early work considered variables to be par-
titioned in nonoverlapping groups, which reflects prior knowledge that blocks of vari-
ables should be selected or ignored jointly. The resulting estimator, in the context of
multivariate regression, is called group Lasso (M. Yuan and Lin 2006). The grouped
penalty was shown to improve both predictive performance and interpretability of the
models (Lounici et al. 2010; Huang and Zhang 2010). More complex prior knowledge
can be encoded by allowing groups to overlap (see, e.g., Zhao, Rocha, and Yu 2009;
Jacob, Obozinski, and Vert 2009; Jenatton, Audibert, and Bach 2009/2011; Bach et al.
2011). Another structural penalty arising in applications to GWAS is the total variation
penalty, which in the context of multivariate linear regression results in the fused Lasso
(Tibshirani et al. 2005). It assumed that there is a natural ordering of the input vari-
ables, and the total variation penalty is used to encode the prior information that nearby
regression coefficients have similar values.
These structural penalties also arise in the context of multitask learning. In GWAS
it is common to observe multiple traits that are all related to the same set of input vari-
ables. In this context it is useful to use multioutput multivariate regression models to
further reduce the number of falsely selected input variables. The simplest multitask
model assumes that the output variables are only related by sharing the same feature
set. In this context one can use the nonoverlapping group penalty to select the rele-
vant variables for all tasks (see, e.g., Turlach, Venables, and Wright 2005; Liu, Palatucci,
and Zhang 2009; Obozinski, Taskar, and Jordan 2010; Lounici et al. 2009, Kolar, Laf-
ferty, and Wasserman 2011 and references therein). With additional prior knowledge one
High-Dimensional Sparse Structured Input-Output Models 39
can use overlapping group penalties (Kim and Xing 2010) or fusion penalties (Kim and
Xing 2009).
Therefore, given structures on either or both the input and output sides of a regres-
sion problem, what we need to consider in GWAS is a sparse structured input-output
regression model of high dimensionality. General interior point convex program solvers
can be used to find parameters of the structurally penalized regression models. However,
interior point methods are not suitable for solving relevant real-world problems arising
in GWAS. Although they provide high accuracy solutions, they are not scalable to high-
dimensional problems because they do not exploit the special structure of the penalties
commonly used in practice. For large-scale problems, it is found that first-order meth-
ods, especially proximal gradient algorithms can effectively exploit the special structure
of the typical convex programs and can be efficiently applied to the problems arising in
GWAS.
In the remainder of this chapter, we review various designs of penalties used to
incorporate prior knowledge in the inputs and outputs of the aforementioned structured
input-output regression models used in GWAS, followed by a survey of convex opti-
mization algorithms applicable to estimating such models in general. Then we provide
details on the proximal methods that are particularly effective in solving the convex
problems in high-dimensional settings in GWAS, followed by an empirical comparison
of different optimization approaches on simulation data. We conclude with a number
of illustrative examples of applying the structured input-output models to GWAS under
various contexts.
yk 5 X k 1 ⑀ k , ∀k 5 1, . . . , K, (4.1)
where  k is a vector of J regression coefficients (b1k , . . . , bJk )T for the kth output, and ⑀ k is
a vector of N independent error terms having mean 0 and a constant variance. We center
the yk ’s and xj ’s such that i yki 5 0 and i xji 5 0, and consider the model without an
intercept. Let B 5 ( 1 , . . . ,  K ) denote the J 3 K matrix of regression coefficients for all
K outputs.
40 Chapter 4 Eric P. Xing and Colleagues
As discussed, when J is large and the number of inputs relevant to the output is
small, ordinary multivariate regression does not perform well and the penalized linear
regression should be used. Throughout the chapter we consider problems of form
where
1 1
(B) 5 Y 2 XB2F 5 (yk 2 X k )T · (yk 2 X k ) (4.3)
2 2
k
is the quadratic loss function and V : RJ 3K → R is a penalty that encodes prior knowl-
edge about the problem into the optimization procedure.
Lasso offers an effective feature selection method for the model in eq. (4.1). Lasso
Lasso
estimator B̂ can be obtained by solving the optimization problem in eq. (4.2) with
the following penalty:
j
VLasso (B) 5 l |bk |. (4.4)
j k
Lasso
The estimator B̂ will be sparse in the sense that a number of its elements will exactly
Lasso
equal zero. The sparsity of B̂ is controlled by a tuning parameter l. Setting l to
larger values leads to a smaller number of nonzero regression coefficients. The resulting
estimator is good in situations where one has only information that the true parameter
B has few nonzero elements. However, the penalty VLasso does not offer a mechanism
to explicitly couple the estimates of the regression coefficients for correlated output
variables nor to incorporate information about correlation between input variables.
y 5 X 1 ⑀ , (4.5)
J
and VLasso ( ) 5 l 1 5 l j51 |bj |.
High-Dimensional Sparse Structured Input-Output Models 41
Vstruct ( ) ≡ g wg  g 2 , (4.7)
g∈G
where  g ∈ R|g| is the subvector of  for the inputs in group g; wg is the predefined
weight for group g; and · 2 is the vector 2 -norm. The 1 /2 mixed-norm penalty V( )
plays the role of setting all the coefficients within each group to zero or nonzero values.
The widely used hierarchical tree-structured penalty (Zhao, Rocha, and Yu 2009) is a
special case of eq. (4.7). It is worthwhile to note that the 1 /` mixed-norm penalty can
also achieve the similar grouping effect. Although our approach can also be used for the
1 /` penalty as well, we focus on the 1 /2 penalty.
We also note that the penalty Vstruct ( ) ≡ g g∈G wg  g 2 enforces group-level
sparsity but not sparsity within each group. More precisely, if the estimated ˆ g 2 5 0,
each b̂j for j ∈ g will be nonzero. With the 1 regularization VLasso ( ) on top of Vstruct ( )
as in eq. (4.6), we not only select groups but also variables within each group. Simon
et al. (2012) give more details.
42 Chapter 4 Eric P. Xing and Colleagues
where t(r) weights the fusion penalty for each edge e 5 (m, l), such that bm and bl for
highly correlated inputs with larger |rml | receive a greater fusion effect. We consider
t(r) 5 |r|, but any monotonically increasing function of the absolute values of correla-
tions can be used. The sign(rml ) indicates that for two positively correlated nodes, the
corresponding coefficients tend to influence the output in the same direction, and for
two negatively correlated nodes, the effects (bm and bl ) take the opposite direction. Since
this fusion effect is calibrated by the edge weight, the graph-guided fusion penalty in
eq. (4.8) encourages highly correlated inputs corresponding to a densely connected sub-
network in G to be jointly selected as relevant. We notice that if rml 5 1 for all e 5 (m, l),
the penalty function in eq. (4.8) reduces to
Vstruct ( ) 5 g |bm 2 bl |. (4.9)
e5(m,l)∈E,m,l
21
The standard fused Lasso penalty (Tibshirani et al. 2005) defined as g Jj51 |bj11 2 bj | is
a special case of eq. (4.9), where the graph structure is confined to be a chain and the
widely used fused signal approximator refers to the simple case where the design matrix
X is orthogonal.
3 OPTIMIZATION ALGORITHMS
In this section, we discuss numerical procedures for solving the optimization problem
in eq. (4.6) with penalties introduced in the previous sections. The problem in eq. (4.6)
is convex, and there are a number of methods that can be used to find a minimizer. Gen-
eral techniques like subgradient methods and interior point methods (IPMs) for second-
order cone programs (SOCPs) can be used. However, these methods are not suitable for
high-dimensional problems arising in practical applications because of their slow con-
vergence rate or poor scalability. On the other hand, block gradient methods and proxi-
mal gradient methods, although not as general, do exploit the structure of the penalties
and can scale well to large problems. In the following section, we first discuss some
general methods for solving convex programs and then focus on proximal methods.
Each optimization algorithm is measured by its convergence rate, that is, the number of
iterations t to achieve an e-accurate solution: f ( t ) 2 f ( ∗ ) # e, where  ∗ is one of the
minimizers of f ( ).
High-Dimensional Sparse Structured Input-Output Models 43
The method involves updating the estimate  t11 with the following iterations:
c1
 t11 5  t 2 äf ( t ), (4.10)
t c2
where
where c1 in eq. (4.10) is a constant parameter and c2 5 1 for strongly convex loss ( )
and c2 5 1/2 for nonstrongly convex loss ( ). The updates are equivalent to the usual
gradient descent with the gradient substituted with a subgradient. The algorithm con-
verges under suitable conditions, but this convergence is slow. In particular, the con-
vergence rate for subgradient descent is O 1e for strongly convex loss and O e12 for
nonstrongly convex loss ( ). In high-dimensional settings with J @ N, XT X is rank-
deficient and hence ( ) is nonstrongly convex. Therefore, the vanilla subgradient
descent has a slow convergence rate of O e12 in our problems.
This optimality condition can be obtained for each block coefficient  g , and using this
condition, we can derive an optimization procedure that iteratively computes an opti-
mal  g fixing other coefficients. The general optimization procedure is as follows: for
each group g, we check the group sparsity condition that  g 5 0. If it is true, no update
is needed for  g . Otherwise, we solve eq. (4.6) over  g with all other coefficients fixed.
This step can be efficiently solved by using a standard optimization technique such as
accelerated generalized gradient descent (Simon et al. 2012; Beck and Teboulle 2009).
This procedure is continued until a convergence condition is met. Block coordinate
descent is efficient to solve eq. (4.6) only with nonoverlapping group Lasso penalty.
However, this method cannot be used for overlapping group Lasso penalty owing to the
lack of convergence guarantee (Tseng and Yun 2009).
44 Chapter 4 Eric P. Xing and Colleagues
1
J
min s1g w g tg 1 l qj
2 g∈G j51
We can also formulate the optimization problem with graph-guided fusion penalty
into a QP by letting bj 5 q1 2 1 2 1 2
j 2 qj with qj , qj $ 0 and bm 2 sign(rml )bl 5 sml 2 sml with
1 2
sml , sml $ 0.
The benefit of these approaches is that the standard IPMs along with many read-
ily available toolboxes (e.g., SDPT3 (Tütüncü, Toh, and Todd 2003)) can be directly
used to solve the convex problems. Even though IPMs achieve a faster convergence rate
of O log 1e and can lead to solutions with very high precision, solving the Newton
linear system at each iteration of an IPM is computationally too expensive. Therefore,
IPMs can only be used to solve small or medium-scale problems.
where the function ( ) is a differentiable convex function and V( ) is a nonsmooth
penalty. Proximal gradient methods, which are descendants of the classical projected
gradient algorithms, have become popular because they only utilize the gradient infor-
mation and hence can scale up to very large problems. A typical iteration of the algo-
rithm looks like
L
 t11 5 argmin ( t ) 1 ∇( t ),  2  t 1  2  t 22 1 V( ), (4.12)
2
where L . 0 is a parameter that should upper bound the Lipschitz constant of ∇( ).
This step is often called the proximal operator, proximal mapping, or simply projection
step.
Efficiency of this iterative algorithm relies on the ability to efficiently solve the
proximal operator exactly without any error. When there is an exact solution of the
High-Dimensional Sparse Structured Input-Output Models 45
proximal operator, it can be shown that the proximal gradient method with an accel-
eration scheme (Nesterov 2007; Beck and Teboulle 2009) leads to a convergence rate
of O e12 , and this rate is optimal under the first-order black-box model (Nesterov 2003).
The proximal operator in eq. (4.12) can be rewritten as
1 1 1
min  2 ( t 2 ∇( t ))22 1 V( );
 2 L L
1 1
ˆ 5 arg min  2 v22 1 V( ).
 2 L
This can be done in a closed form for the Lasso type penalty, i.e., when V( ) 5
VLasso ( ) 5 l 1 the solution ˆ can be obtained by the soft-thresholding operator
(Friedman, Hastie, and Tibshirani 2010):
l
b̂j 5 sign(vj ) max 0, |vj | 2 . (4.13)
L
A closed-form solution can also be obtained for the 1 /2 mixed-norm penalties
for nonoverlapping groups. In particular, when V( ) 5 Vstruct ( ) 5 g g∈G  g 2 with
nonoverlapping groups, the closed-form solution of the proximal operator takes the
form (Duchi and Singer 2009)
g
ˆ g 5 max 0, 1 2 vg .
Lvg
approximation to Vstruct ( ) can be introduced using the technique from Nesterov (2005)
such that its gradient with respect to  can be easily calculated.
Vstruct ( ) 5 g wg max ␣ Tg  g 5 max gwg␣ Tg  g 5 max ␣ T C , (4.14)
␣ g 2 #1 ␣ ∈Q ␣ ∈Q
g∈G g∈G
where C ∈ R g∈G |g|3J is a matrix defined as follows. The rows of C are indexed by all
pairs of (i, g) ∈ {(i, g)|i ∈ g, i ∈ {1, . . . , J}}, the columns are indexed by j ∈ {1, . . . , J}, and
each element of C is given as
gwg if i 5 j,
C(i,g),j 5 (4.15)
0 otherwise.
Note that C is a highly sparse matrix with only a single nonzero element in each row
and g∈G |g| nonzero elements in the entire matrix, and hence it can be stored with only
a small amount of memory during the optimization procedure.
g t(rml )|bm 2 sign(rml )bl | ≡ C 1 ,
e5(m,l)∈E,m,l
⎧
⎪
⎨ g · t(rml )
⎪ if j 5 m
Ce5(m,l),j 5 2g · sign(rml )t(rml ) if j 5 l (4.16)
⎪
⎪
⎩0 otherwise.
High-Dimensional Sparse Structured Input-Output Models 47
Again, note that C is a highly sparse matrix with 2 · |E| nonzero elements. Since the
dual norm of the ` -norm is the 1 -norm, the graph-guided fusion penalty can be further
rewritten as
C 1 ≡ max ␣ T C , (4.17)
␣ ` #1
Smooth Approximation
With the reformulation using the dual norm, all different forms of structured sparsity–
inducing penalties can be formulated into a maximization problem of the form
However, it is still a nonsmooth function of  , and this makes the optimization chal-
lenging. To tackle this problem, a smooth approximation of Vstruct ( ) can be constructed
using Nesterov’s smoothing technique (Nesterov 2005):
Vmstruct ( ) 5 max ␣ T C 2 md(␣ ) , (4.19)
␣ ∈Q
where d(␣ ) is defined as 12 ␣ 22 , and m is the positive smoothness parameter that controls
the quality of the approximation:
where D 5 max␣ ∈Q d(␣ ). Given the desired accuracy e, the convergence result suggests
e
m 5 2D to achieve the best convergence rate.
The function Vmstruct ( ) is smooth in b with a simple form of the gradient:
∇Vmstruct ( ) 5 C T ␣ ∗ , (4.20)
where ␣ ∗ is the optimal solution to eq. (4.19). The optimal ␣ ∗ can be easily obtained in a
closed form for a number of penalties of interest. In particular, for the overlapping group
gwg 
Lasso penalty, ␣ ∗ is composed of {␣ ∗g }g∈G for each group g ∈ G and ␣ ∗g 5 S( m g ). Here
S is the projection operator that projects any vector u to the 2 ball:
⎧
⎨ u
u2 . 1,
u2
S(u) 5
⎩u u2 # 1.
Random documents with unrelated
content Scribd suggests to you:
compromising its fidelity or efficiency, she made very attractive by its
literary qualities and its entertaining and instructive miscellany.
Mrs. Maria W. Chapman, who wielded gracefully a trenchant pen,
plied it busily in our cause with great effect. Her successive numbers
of “Right and Wrong in Boston” were too incisive not to touch the
feelings of the good people of that metropolis, which claimed to be
the birthplace of American independence, but had ceased to be
jealous for “the inalienable rights of man.” Year after year her
“Liberty Bell” rung out the clearest notes of personal, civil, and
spiritual liberty, and she compiled our Antislavery Hymn Book,—“The
Songs of the Free,”—effusions of her own and her sisters’ warm
hearts, and of their kindred spirits in this country and England.
But though the excellent women whom I have named, and many
more like them, constantly attended our meetings, and often
suggested the best things that were said and done at them, they
could not be persuaded to utter their thoughts aloud. They were
bound to silence by the almost universal sentiment and custom
which forbade “women to speak in meeting.”
In 1836 two ladies of a distinguished family in South Carolina—
Sarah and Angelina E. Grimké—came to New York, under a deep
sense of obligation to do what they could in the service of that class
of persons with whose utter enslavement they had been familiar
from childhood. They were members of the “Society of Friends,” and
were moved by the Holy Spirit, as the event proved, to come on this
mission of love. They made themselves acquainted with the
Abolitionists, our principles, measures, and spirit. These commended
themselves so entirely to their consciences and benevolent feelings
that they advocated them with great earnestness, and enforced their
truth by numerous facts drawn from their own past experience and
observation.
In the fall of 1836 Miss A. E. Grimké published an “Appeal to the
Women of the South,” on the subject of slavery. This evinced such a
thorough acquaintance with the American system of oppression, and
so deep a conviction of its fearful sinfulness, that Professor Elizur
Wright, then Corresponding Secretary of the American Antislavery
Society, urged her and her sister Sarah to come to the city of New
York and address ladies in their sewing-circles, and in parlors, to
which they might be invited to meet antislavery ladies and their
friends. No man was better able than Professor Wright to appreciate
the value of the contributions which these South Carolina ladies
were prepared to make to the cause of impartial liberty and
outraged humanity. As early as 1833, while Professor of Mathematics
and Natural Philosophy in Western Reserve College, he published an
elaborate and powerful pamphlet on “The Sin of Slave-holding,”
which we accounted one of our most important tracts. Commended
by him and by others who had read her “Appeal,” Miss Grimké and
her sister attracted the antislavery women of New York in such
numbers that soon no parlor or drawing-room was large enough to
accommodate those who were eager to hear them. The Rev. Dr.
Dunbar, therefore, offered them the use of the vestry or lecture-
room of his church for their meetings, and they were held there
several times. Such, however, was the interest created by their
addresses, that the vestry was too small for their audiences.
Accordingly, the Rev. Henry G. Ludlow opened his church to them
and their hearers, of whom a continually increasing number were
gentlemen.
Early in 1837 the Massachusetts Antislavery Society invited these
ladies to come to Boston to address meetings of those of their own
sex. But it was impossible to keep them thus exclusive, and soon,
wherever they were advertised to speak, there a large concourse of
men as well as women was sure to be assembled. This was an
added offence, which our opposers were not slow to mark, nor to
condemn in any small measure. It showed plainly enough that “the
Abolitionists were ready to set at naught the order and decorum of
the Christian church.”
My readers may smile when I confess to them that at first I was
myself not a little disturbed in my sense of propriety. But I took the
matter into serious consideration. I looked the facts fully in the face.
Here were millions of our countrymen held in the most abject, cruel
bondage. More than half of them were females, whose condition in
some respects was more horrible than that of the males. The people
of the North had consented to this gigantic wrong with those of the
South, and those who had risen up to oppose it were denounced as
enemies of their country, were persecuted, their property and their
persons violated. The pulpit for the most part was dumb, the press
was everywhere, with small exceptions, wielded in the service of the
oppressors, the political parties were vying with each other in
obsequiousness to the slaveholding oligarchy, and the petitions of
the slaves and their advocates were contemptuously and angrily
spurned from the legislature of the Republic. Surely, the condition of
our country was wretched and most perilous. I remembered that in
the greatest emergencies of nations women had again and again
come forth from the retirement to which they were consigned, or in
which they preferred to dwell, and had spoken the word or done the
deed which the crises demanded. Surely, the friends of humanity, of
the right and the true, never needed help more than we needed it.
And here had come two well-informed persons of exalted character
from the midst of slavedom to testify to the correctness of our
allegations against slavery, and tell of more of its horrors than we
knew. And shall they not be heard because they are women? I saw,
I felt it was a miserable prejudice that would forbid woman to speak
or to act in behalf of the suffering, the outraged, just as her heart
may prompt and as God has given her power. So I sat me down and
penned as earnest a letter as I could write to the Misses Grimké,
inviting them to come to my house, then in South Scituate, to stay
with us as long as their engagements would permit, to speak to the
people from my pulpit, from the pulpit of my excellent cousin, Rev.
E. Q. Sewall, Scituate, and from as many other pulpits in the county
of Plymouth as might be opened to them.
They came to us the last week of October, 1837, and tarried
eight days. It was a week of highest, purest enjoyment to me and
my precious wife, and most profitable to the community.
On Sunday evening Angelina addressed a full house from my
pulpit for two hours in strains of wise remark and eloquent appeal,
which settled the question of the propriety of her “speaking in
meeting.”
The next afternoon she spoke to a large audience in Mr. Sewall’s
meeting-house in Scituate, for an hour and a half, evidently to their
great acceptance. The following Wednesday I took the sisters to
Duxbury, where, in the Methodist Church that evening, Angelina held
six hundred hearers in fixed attention for two hours, and received
from them frequent audible (as well as visible) expressions of assent
and sympathy.
On Friday afternoon I went with them to the Baptist meeting-
house in Hanover, where a crowd was already assembled to hear
them. Sarah Grimké, the state of whose voice had prevented her
speaking on either of the former occasions, gave a most impressive
discourse of more than an hour’s length on the dangers of slavery,
revealing to us some things which only those who had lived in the
prison-house could have learnt. Angelina followed in a speech of
nearly an hour, in which she made the duty and safety of immediate
emancipation appear so plainly that the wayfaring man though a fool
must have seen the truth. If there was a person there who went
away unaffected, he would not have been moved though an angel
instead of Angelina had spoken to him. I said then, I have often said
since, that I never have heard from any other lips, male or female,
such eloquence as that of her closing appeal. Several gentlemen
who had come from Hingham, not disposed nor expecting to be
pleased, rushed up to me when the audience began to depart, and
after berating me roundly for “going about the neighborhood with
these women setting public sentiment at naught and violating the
decorum of the church,” said “there can be no doubt that they have
a right to speak in public, and they ought to be heard; do bring
them to Hingham as soon as may be. Our meeting-house shall be at
their service.” Accordingly, the next day I took them thither, and they
spoke there with great effect on Sunday evening, November 5th,
from the pulpit of the Unitarian Church, then occupied by Rev.
Charles Brooks.
The experience of that week dispelled my Pauline prejudice. I
needed no other warrant for the course the Misses Grimké were
pursuing than the evidence they gave of their power to speak so as
to instruct and deeply impress those who listened to them. I could
not believe that God gave them such talents as they evinced to be
buried in a napkin. I could not think they would be justified in
withholding what was so obviously given them to say on the great
iniquity of our country, because they were women. And ever since
that day I have been steadfast in the opinion that the daughters of
men ought to be just as thoroughly and highly educated as the sons,
that their physical, mental, and moral powers should be as fully
developed, and that they should be allowed and encouraged to
engage in any employment, enter into any profession, for which they
have properly qualified themselves, and that women ought to be
paid the same compensation as men for services of any kind equally
well performed. This radical opinion is spreading rapidly in this
country and in England, and it will ultimately prevail, just as surely
as that God is impartial and that “in Christ Jesus there is neither
bond nor free, neither male nor female.” And yet it has been, and is,
as strenuously opposed and as harshly denounced as was our
demand of the immediate emancipation of the enslaved. Men and
women, press and pulpit, statesmen and clergymen, legislative and
ecclesiastical bodies have raised the cry of alarm, and pronounced
the advocates of the equal rights of women dangerous persons,
disorganizers, infidels.
The first combined assault was made upon “The Rights of
Women” by the Pastoral Association of Massachusetts in the fall of
1837 or the spring of 1838, in their spiritual bull against the
antislavery labors of the Misses Grimké, which it utterly condemned
as unchristian and demoralizing. This, of course, made it the duty, as
it was pleasure, of the New England Abolitionists to stand by those
excellent women, who had rendered such inestimable services to the
cause of the enslaved, the down-trodden, the despised millions of
our countrymen. Therefore, at the next New England Antislavery
Convention, held in Boston, May, 1838, attended by delegates from
eleven States, it was “Voted, That all persons present, or who may
be present, at subsequent meetings, whether men or women, who
agree with us in sentiment on the subject of slavery, be invited to
become members and participate in the proceedings of the
Convention.”
This gave rise to a long and very animated discussion, but was
passed by a very large majority. Immediately eight Orthodox
clergymen requested to have their names erased from the roll of
that Convention, and seven others, including some of our faithful
fellow-laborers, presented a protest against the vote, which, by their
request, was entered upon the records, and published with the
doings of the Convention.
At that same great gathering a committee of three persons was
appointed to prepare and transmit a memorial to each and all of the
ecclesiastical associations in New England, of every sect, beseeching
them to testify against the further continuance in our country of
slavery, and take such measures as they might deem best to induce
the members of their several denominations who were guilty of the
dreadful iniquity to consider and turn away from it. One of that
committee was a much respected woman, as well qualified as either
of her associates to discharge the duties assigned them. An excellent
memorial was prepared and presented in accordance with the vote.
But it was very coldly received by some, and rudely treated by
others of the ecclesiastical bodies to which it was sent. On the
presentation of it to the Rhode Island Congregational Consociation, a
scene of great excitement ensued. The memorial was treated with
all possible indignity. Most of the brethren who had been earnest for
the reception of it, and for such action as it requested, when they
were informed that one of the committee by whom the memorial
was prepared was a woman, united in a vote “to turn the illegitimate
product from the house, and obliterate from the records all traces of
its entrance.” No deliberative assembly ever behaved in a more
indecorous manner. And those who were most active in trampling
upon that respectful petition in behalf of bleeding humanity were the
professed ministers of Him who came to preach deliverance to the
captive. “O tempora! O mores!!”
The singing of such hymns and songs as these was like the
bugle’s blast to an army ready for battle. No one seemed unmoved.
If there were any faint hearts amongst us, they were hidden by the
flush of excitement and sympathy.
In 1838 or 1839 Mrs. Chapman, assisted by her sisters, the
Misses Weston, and Mrs. Child, commenced the publication of The
Liberty Bell. A volume with this title was issued annually by them for
ten or twelve years, especially for sale at the yearly antislavery fair.
These volumes were full of poetry in prose and verse. The editors
levied contributions upon the true-hearted of other countries besides
our own, and enriched their pages with articles from the pens of all
the above-named, and from Whittier, Pierpont, Lowell, Longfellow,
Phillips, Quincy, Clarke, Sewall, Adams, Channing, Bradburn,
Pillsbury, Rogers, Wright, Parker, Stowe, Emerson, Furness,
Higginson, Sargent, Jackson, Stone, Whipple, our own countrymen
and women; and Bowring, Martineau, Thompson, Browning, Combe,
Sturge, Webb, Lady Byron, and others, of England; and Arago,
Michelet, Monod, Beaumont, Souvestre, Paschoud, and others, of
France. It would not be easy to find elsewhere so full a treasury of
mental and moral jewels.
The names of most of our illustrious American poets appear in
The Liberty Bell more or less frequently. To all of them we were and
are much indebted. James Russell Lowell was never, I believe, a
member of the Antislavery Society. He was seldom seen at our
meetings. But his muse rendered us essential services. His poems
—“The Present Crisis,” “On the Capture of Fugitive Slaves near
Washington,” “On the Death of Charles T. Torrey,” “To John G.
Palfrey,” and especially his “Lines to William L. Garrison,” and his
“Stanzas sung at the Antislavery Picnic in Dedham, August 1,
1843”—committed him fully to the cause of freedom,—the cause of
our enslaved countrymen.
Rev. John Pierpont gave us his hand at an earlier day. He took
upon himself “our reproach” in 1836, when we most needed help. I
have already made grateful mention of his “Word from a Petitioner,”
sent to me by the hand of the heroic Francis Jackson in the midst of
the convention of the constituents of Hon. J. Q. Adams, called at
Quincy to assure their brave, invincible representative of their deep,
admiring sense of obligation to him for his persistent and almost
single-handed defence of the sacred right of petition on the floor of
Congress.
Mr. Pierpont’s next was a tocsin in deed as well as in name. He
was impelled to strike his lyre by the alarm he justly felt at the
tidings from Alton of the destruction of Mr. Lovejoy’s antislavery
printing-office, and the murder of the devoted proprietor. His
indignation was roused yet more by the burning of “Pennsylvania
Hall” in Philadelphia, and the shameful fact that at the same time,
1838, no church or decent hall could be obtained in Boston for “love
or money,” in which to hold an antislavery meeting; but we were
compelled to resort to an inconvenient and insufficient room over
the stable of Marlborough Hotel.
His next powerful effusion was The Gag, a caustic and scathing
satire upon the Hon. C. G. Atherton, of New Hampshire, for his base
attempt in the House of Representatives at Washington to put an
entire stop to any discussion of the subject of slavery.
His next piece was The Chain, a most touching comparison of
the wrongs and sufferings of the slaves with other evils that injured
men have been made to endure.
Then followed The Fugitive Slave’s Apostrophe to the North Star,
which showed how deeply he sympathized with the many hundreds
of our countrymen who, to escape from slavery, had toiled through
dismal swamps, thick-set canebrakes, deep rivers, tangled forests,
alone, by night, hungry, almost naked and penniless, guided only by
the steady light of the polar star, which some kind friend had taught
them to distinguish, and had assured them would be an unerring
leader to a land of liberty. They who have heard the narratives of
such as have so escaped need not be told that Mr. Pierpont must
M
have had the tale poured through his ear into his generous heart.
But of all our American poets, John G. Whittier has from first to
last done most for the abolition of slavery. All my antislavery
brethren, I doubt not, will unite with me to crown him our laureate.
From 1832 to the close of our dreadful war in 1865 his harp of
liberty was never hung up. Not an important occasion escaped him.
Every significant incident drew from his heart some pertinent and
often very impressive or rousing verses. His name appears in the
first volume of The Liberator, with high commendations of his poetry
and his character. As early as 1831 he was attracted to Mr. Garrison
by sympathy with his avowed purpose to abolish slavery. Their
acquaintance soon ripened into a heartfelt friendship, as he declared
in the following lines, written in 1833:—
“Champion of those who groan beneath
Oppression’s iron hand:
In view of penury, hate, and death,
I see thee fearless stand.
Still bearing up thy lofty brow,
In the steadfast strength of truth,
In manhood sealing well the vow
And promise of thy youth.
* * * * *
“I love thee with a brother’s love;
I feel my pulses thrill,
To mark thy spirit soar above
The cloud of human ill.
My heart hath leaped to answer thine,
And echo back thy words,
As leaps the warrior’s at the shine
And flash of kindred swords!
* * * * *
“Go on—the dagger’s point may glare
Amid thy pathway’s gloom,—
The fate which sternly threatens there
Is glorious martyrdom!
Then onward with a martyr’s zeal;
And wait thy sure reward,
When man to man no more shall kneel,
And God alone be Lord!”
ebookbell.com