0% found this document useful (0 votes)
23 views

Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller - The ebook in PDF format is ready for immediate access

The document promotes various ebooks available for instant download on ebookgate.com, focusing on topics such as text processing with Ruby, social media mining, and business valuation tools. It highlights the practical applications of Ruby for text processing and includes endorsements from notable authors in the programming community. Additionally, it provides an overview of the contents of the book 'Text Processing with Ruby' by Rob Miller, emphasizing its relevance for programmers working with text.

Uploaded by

izarulritja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Text Processing with Ruby Extract Value from the Data That Surrounds You 1st Edition Rob Miller - The ebook in PDF format is ready for immediate access

The document promotes various ebooks available for instant download on ebookgate.com, focusing on topics such as text processing with Ruby, social media mining, and business valuation tools. It highlights the practical applications of Ruby for text processing and includes endorsements from notable authors in the programming community. Additionally, it provides an overview of the contents of the book 'Text Processing with Ruby' by Rob Miller, emphasizing its relevance for programmers working with text.

Uploaded by

izarulritja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

Instant Ebook Access, One Click Away – Begin at ebookgate.

com

Text Processing with Ruby Extract Value from the


Data That Surrounds You 1st Edition Rob Miller

https://ebookgate.com/product/text-processing-with-ruby-
extract-value-from-the-data-that-surrounds-you-1st-edition-
rob-miller/

OR CLICK BUTTON

DOWLOAD EBOOK

Get Instant Ebook Downloads – Browse at https://ebookgate.com


Click here to visit ebookgate.com and download ebook now
Instant digital products (PDF, ePub, MOBI) available
Download now and explore formats that suit you...

Value Maps Valuation Tools That Unlock Business Wealth 1st


Edition Warren D. Miller

https://ebookgate.com/product/value-maps-valuation-tools-that-unlock-
business-wealth-1st-edition-warren-d-miller/

ebookgate.com

Mastering Social Media Mining with R Extract valuable data


from your social media sites and make better business
decisions using R 1st Edition Ravindran
https://ebookgate.com/product/mastering-social-media-mining-with-r-
extract-valuable-data-from-your-social-media-sites-and-make-better-
business-decisions-using-r-1st-edition-ravindran/
ebookgate.com

Data and the City 1st Edition Rob Kitchin

https://ebookgate.com/product/data-and-the-city-1st-edition-rob-
kitchin/

ebookgate.com

Text Processing with JavaScript 1 (Version: P1.0) /


converted Edition Faraz Kelhini

https://ebookgate.com/product/text-processing-with-
javascript-1-version-p1-0-converted-edition-faraz-kelhini/

ebookgate.com
Probability and random processes with applications to
signal processing and communications 1st Edition Scott
Miller
https://ebookgate.com/product/probability-and-random-processes-with-
applications-to-signal-processing-and-communications-1st-edition-
scott-miller/
ebookgate.com

From Com to Profit Inventing Business Models That Deliver


Value and Profit Nick Earle

https://ebookgate.com/product/from-com-to-profit-inventing-business-
models-that-deliver-value-and-profit-nick-earle/

ebookgate.com

Value Leadership The 7 Principles that Drive Corporate


Value in Any Economy 1st Edition Peter S. Cohan

https://ebookgate.com/product/value-leadership-the-7-principles-that-
drive-corporate-value-in-any-economy-1st-edition-peter-s-cohan/

ebookgate.com

Corporate Boards that Create Value 1st Edition John Carver

https://ebookgate.com/product/corporate-boards-that-create-value-1st-
edition-john-carver/

ebookgate.com

Programming Cocoa with Ruby Create Compelling Mac Apps


Using RubyCocoa The Facets of Ruby Series 1st Edition
Brian Marick
https://ebookgate.com/product/programming-cocoa-with-ruby-create-
compelling-mac-apps-using-rubycocoa-the-facets-of-ruby-series-1st-
edition-brian-marick/
ebookgate.com
More books at 1Bookcase.com
More books at 1Bookcase.com
Early praise for Text Processing with Ruby

It is rare that a programming language can be unequivocally stated to be the right


tool for a job. But when it comes to scanning, extracting, and transforming text,
Ruby is that tool, and Rob Miller is the right guide to instruct you in the most ef-
fective and efficient application of it.
➤ Avdi Grimm
Author, Confident Ruby; Head Chef, RubyTapas.com

This is a fun, readable, and very useful book. I’d recommend it to anyone who
needs to deal with text—which is probably everyone.
➤ Paul Battley
Developer, maintainer of text gem

While Ruby has become established as a Web development language, thanks to


Rails, it’s an excellent language for working with text as well. Text Processing with
Ruby covers the nuts and bolts of what I believe is a natural domain for Ruby, all
the way from bringing text into the environment via files, the Web, and other
means through to parsing what it says and sending it back out again.
➤ Peter Cooper
Editor of Ruby Weekly, Cooper Press

I’d recommend this book to anyone who wants to get started with text processing.
Ruby has powerful tools and libraries for the whole ETL workflow, and this book
describes everything you need to get started and succeed in learning.
➤ Hajba Gábor László
Developer

A lot of people get into Ruby via Rails. This book is really well suited to anyone
who knows Rails, but wants to know more Ruby.
➤ Drew Neil
Director, Studio Nelstrom, and author of Practical Vim

More books at 1Bookcase.com


We've left this page blank to
make the page numbers the
same in the electronic and
paper books.

We tried just leaving it out,


but then people wrote us to
ask about the missing pages.

Anyway, Eddy the Gerbil


wanted to say “hello.”

More books at 1Bookcase.com


Text Processing with Ruby
Extract Value from the Data That Surrounds You

Rob Miller

The Pragmatic Bookshelf


Dallas, Texas • Raleigh, North Carolina

More books at 1Bookcase.com


Many of the designations used by manufacturers and sellers to distinguish their products
are claimed as trademarks. Where those designations appear in this book, and The Pragmatic
Programmers, LLC was aware of a trademark claim, the designations have been printed in
initial capital letters or in all capitals. The Pragmatic Starter Kit, The Pragmatic Programmer,
Pragmatic Programming, Pragmatic Bookshelf, PragProg and the linking g device are trade-
marks of The Pragmatic Programmers, LLC.
Every precaution was taken in the preparation of this book. However, the publisher assumes
no responsibility for errors or omissions, or for damages that may result from the use of
information (including program listings) contained herein.
Our Pragmatic courses, workshops, and other products can help you and your team create
better software and have more fun. For more information, as well as the latest Pragmatic
titles, please visit us at https://pragprog.com.

The team that produced this book includes:


Jacquelyn Carter (editor)
Potomac Indexing, LLC (index)
Cathleen Small; Liz Welch (copyedit)
Dave Thomas (layout)
Janet Furlow (producer)
Ellie Callahan (support)

For international rights, please contact [email protected].

Copyright © 2015 The Pragmatic Programmers, LLC.


All rights reserved.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted,


in any form, or by any means, electronic, mechanical, photocopying, recording, or otherwise,
without the prior consent of the publisher.

Printed in the United States of America.


ISBN-13: 978-1-68050-070-7
Encoded using the finest acid-free high-entropy binary digits.
Book version: P1.0—September 2015

More books at 1Bookcase.com


Contents

Acknowledgments . . . . . . . . . . . ix
Introduction . . . . . . . . . . . . . xi

Part I — Extract: Acquiring Text


1. Reading from Files . . . . . . . . . . . 3
Opening a File 3
Reading from a File 4
Treating Files as Streams 7
Reading Fixed-Width Files 12
Wrapping Up 18

2. Processing Standard Input . . . . . . . . . 19


Redirecting Input from Other Processes 19
Example: Extracting URLs 22
Concurrency and Buffering 25
Wrapping Up 27

3. Shell One-Liners . . . . . . . . . . . 29
Arguments to the Ruby Interpreter 30
Prepending and Appending Code 35
Example: Parsing Log Files 37
Wrapping Up 39

4. Flexible Filters with ARGF . . . . . . . . . 41


Reading from ARGF as a Stream 42
Modifying Files 45
Manipulating ARGV 47
Wrapping Up 49

More books at 1Bookcase.com


Contents • vi

5. Delimited Data . . . . . . . . . . . . 51
Parsing a TSV 52
Delimited Data and the Command Line 56
The CSV Format 58
Wrapping Up 62

6. Scraping HTML . . . . . . . . . . . . 63
The Right Tool for the Job: Nokogiri 63
Searching the Document 64
Working with Elements 72
Exploring a Page 77
Example: Reading a League Table 80
Wrapping Up 88

7. Encodings . . . . . . . . . . . . . 89
A Brief Introduction to Character Encodings 90
Ruby’s Support for Character Encodings 92
Detecting Encodings 98
Wrapping Up 99

Part II — Transform: Modifying and


Manipulating Text
8. Regular Expressions Basics . . . . . . . . 103
A Gentle Introduction 104
Pattern Syntax 105
Regular Expressions in Ruby 108
Wrapping Up 114

9. Extraction and Substitution with Regular Expressions . . 115


Matching Against Patterns 115
Global Match Variables 117
Extracting Multiple Matches 119
Transforming Text 122
Wrapping Up 126

10. Writing Parsers . . . . . . . . . . . . 127


Simple Parsers with StringScanner 128
Example: Parsing a Config File 132
Rule-Based Parsers 135

More books at 1Bookcase.com


Contents • vii

Example: Parsing RTF Files 143


Wrapping Up 153

11. Natural Language Processing . . . . . . . . 155


What Is Natural Language Processing? 155
Example: Extracting Keywords from Articles 156
Example: Fuzzy Searching 161
Wrapping Up 169

Part III — Load: Writing Text


12. Standard Output and Standard Error . . . . . . 173
Simple Output 173
Formatting Output with printf 178
Redirecting Standard Output 182
Wrapping Up 187

13. Writing to Other Processes and to Files . . . . . . 189


Writing to Other Processes 189
Writing to Files 193
Temporary Files 195
Wrapping Up 198

14. Serialization and Structure: JSON, XML, CSV . . . . 199


JSON 200
XML 205
CSV 207
Wrapping Up 211

15. Templating Output with ERB . . . . . . . . 213


Writing Templates 214
Example: Generating a Purchase Ledger 217
Evaluating Templates 218
Passing Data to Templates 221
Controlling Presentation with Decorators 224
Wrapping Up 226

More books at 1Bookcase.com


Contents • viii

Part IV — Appendices
A1. A Shell Primer . . . . . . . . . . . . 229
Running Commands 229
Controlling Output 230
Exit Statuses and Flow Control 232

A2. Useful Shell Commands . . . . . . . . . 235


Index . . . . . . . . . . . . . . 245

More books at 1Bookcase.com


Acknowledgments
Thanks to my long-suffering partner, Gillian, for enduring a year of lost
weekends, late nights, and generally having a sullen and distracted boyfriend
who woke up in the middle of the night in a cold sweat, having had another
nightmare about character encodings. Who knew writing a book could be so
stressful?

Many thanks to Alessandro Bahgat, Paul Battley, Jacob Chae, Peter Cooper,
Iris Faraway, Kevin Gisi, Derek Graham, James Edward Gray II, Avdi Grimm,
Hajba Gábor László, Jeremy Hinegardner, Kerri Miller, and Drew Neil for their
helpful technical review comments, questions, and suggestions—all of which
shaped this book for the better.

Thanks to Rob Griffiths, Mark Rogerson, Samuel Ryzycki, David Webb, Lewis
Wilkinson, Alex Windett, and Mike Wright for ensuring there was no chance
I got too big for my football boots.

Finally, the amazing folks at Pragmatic. Thanks to Susannah Davidson


Pfalzer for taking a chance on me and my daft idea. Thanks to Jackie Carter
for her incredible patience in guiding a first-time author through the editing
process, and for contributing much to the structure and readability of the
book. And thanks to Andy and Dave for creating a truly brilliant publisher
that I’m proud to be even a tiny a part of.

More books at 1Bookcase.com report erratum • discuss


Introduction
Text is everywhere. Newspaper articles, database dumps, spreadsheets, the
output of shell commands, keyboard input; it’s all text, and it can all be pro-
cessed in the same fundamental way. Text has been called “the universal
interface,” and since the early days of Unix in the 1960s this universal inter-
face has survived and flourished—and with good reason.

Unlike binary formats, text has the pleasing quality of being readable by
humans as well as computers, making it easy to debug and requiring no
distinction between output that’s for human consumption and output that’s
to be used as the input for another step in a process.

Processing text, then, is a valuable skill for any programmer today—just as


it was fifty years ago, and just as it’s likely to be fifty years hence. In this book
I hope to provide a practical guide to all the major aspects of working with
text, viewed through the lens of the Ruby programming language—a language
that I think is ideally suited to this task.

About This Book


Processing text is generally concerned with three things. The first concern is
acquiring the text to be processed and getting it into your program. This is
the subject of Part I of this book, which deals with reading from plain text
files, standard input, delimited files, and binary files such as PDFs and Word
documents.

This first part is fundamentally an exploration of Ruby’s core and standard


library, and what’s possible with IO and its derived classes like File. Ruby’s
history and design, and the high-level nature of these tasks, mean that we
don’t need to dip into third-party libraries much, but we’ll use one in partic-
ular—Nokogiri—when looking at scraping data from web pages.

The second concern is with actually processing the text once we’ve got it into
the program. This usually means either extracting data from within the text,
parsing it into a Ruby data structure, or transforming it into another format.

More books at 1Bookcase.com report erratum • discuss


Introduction • xii

The most important subject in this second stage is, without a doubt, regular
expressions. We’ll look at regular expression syntax, how Ruby uses regular
expressions in particular, and, importantly, when not to use them and instead
reach for solutions such as parsers.

We’ll also look at the subject of natural language processing in this part of
the book, and how we can use tools from computational linguistics to make
our programs smarter and to process data that we otherwise couldn’t.

The final step is outputting the transformed text or the extracted data some-
where—to a file, to a network service, or just to the screen. Part of this process
is concerned with the actual writing process, and part of it is concerned with
the form of the written data. We’ll look at both of these aspects in the third
part of the book.

Together, these three steps are often described as “extract, transform, and
load” (ETL). It’s a term especially popular with the “big data” folks. Many text
processing tasks, even ones that seem on the surface to be very different from
one another, fall into this pattern of three steps, so I’ve tried to mirror that
structure in the book.

In general, we’re going to explore why Ruby is an excellent tool to reach for
when working with text. I also hope to persuade you that you might reach
for Ruby sooner than you think—not necessarily just for more complex tasks,
but also for quick one-liners.

Most of all, I hope this book offers you some useful techniques that help you
in your day-to-day programming tasks. Where possible, I’ve erred toward the
practical rather than the theoretical: if it does anything, I’d like this book to
point you in the direction of practical solutions to real-world problems. If your
day job is anything like mine, you probably find yourself trawling through
text files, CSVs, and command-line output more often than you might like.
Helping to make that process quick and—dare I say it?—fun would be fantas-
tic.

Who This Book Is For


Throughout the book, I try not to assume an advanced understanding of
Ruby. If you’re familiar with Ruby’s syntax—perhaps after having dabbled
with Rails a little—then that should be enough to get by. Likewise, if Ruby is
your first programming language and you’re looking to learn about data pro-
cessing, you should be able to pick things up as you go along—though natu-
rally this book is about teaching text processing more than it is about teaching
Ruby.

More books at 1Bookcase.com report erratum • discuss


About This Book • xiii

While the book starts with material likely to be familiar to anyone who’s
written a command-line application in Ruby, there’s still something here for
the more advanced user. Even people who’ve worked with Ruby a lot aren’t
necessarily aware of the material covered in Chapter 3, Shell One-Liners, on
page 29, for example, and I see far too many developers reaching for regular
expressions to parse HTML rather than using the techniques outlined in
Chapter 6, Scraping HTML, on page 63.

Even experienced developers might not have written parsers before (covered
in Chapter 10, Writing Parsers, on page 127), or dabbled in natural language
processing (as we do in Chapter 11, Natural Language Processing, on page
155)—so hopefully those subjects will be interesting regardless of your level of
experience.

How to Read This Book


Although the book follows a structure of extractions first, transformations
second, and loading third, the chapters are relatively self-contained and can
be read in any order you wish—so feel free to dive into a later chapter if you’re
particularly interested in the material it covers.

I’ve tried to include in each of the chapters material of interest even to more
advanced Rubyists, so there aren’t any chapters that are obvious candidates
to skip if you’re at that end of the skill spectrum.

If you’re not familiar with how to use the command line, there’s a beginner’s
tutorial in Appendix 1, A Shell Primer, on page 229, and a guide to various
commands in Appendix 2, Useful Shell Commands, on page 235. These
appendixes will give you more than enough command-line knowledge to follow
all of the examples in the book.

About the Code


All of the code samples in the book can be downloaded from the book’s web-
site.1 They’ve been tested in Ruby 2.2 running on OS X, Linux, and Cygwin
on Microsoft Windows, but they should run just fine on any version of Ruby
after 2.0 (released in February 2013).

The book assumes that you’re running in a Unix-like environment. Users of


Mac OS X, Linux, and BSD will be right at home. Microsoft Windows users,
though, will only be able to get the most out of some sections of the book by

1. https://pragprog.com/book/rmtpruby/text-processing-with-ruby

More books at 1Bookcase.com report erratum • discuss


Introduction • xiv

installing Cygwin.2 Cygwin provides a Unix-like environment on Windows,


including a full command-line environment and Unix shell. This gives Windows
users access to the core text processing utilities referenced in this book. This
is particularly true of the chapters on shell one-liners, writing flexible filters
with ARGF, and writing to other processes.

Online Resources
The page for this book on the Pragmatic Bookshelf website3 contains a discus-
sion forum, where you can post any comments or questions you might have
about the book and make suggestions for any changes or expansions you’d
like to see in future editions. If you discover any errors in the book, you can
submit them there, too.

Rob Miller
August 2015

2. https://www.cygwin.com/
3. https://pragprog.com/book/rmtpruby/text-processing-with-ruby

More books at 1Bookcase.com report erratum • discuss


Part I

Extract: Acquiring Text

The first part of our text processing journey is concerned with getting text into our
program. This text might reside in files, might be entered by the user, or might come
from other processes; wherever it comes from, we’ll learn how to read it.

We’ll also look at taking structure from the text that we read, learning how to parse
CSV files and even scrape information from web pages.

More books at 1Bookcase.com


CHAPTER 1

Reading from Files


Our first concern when processing text is to get the text into our program,
and perhaps the most common place to source text is from the humble file.
Whether it’s log files from a server, exports from database, or text you’ve
written yourself, there’s lots of information that lives on the filesystem.
Learning to read from files effectively opens up a world of text to process.

Throughout the course of this chapter, we’ll look at how we can use Ruby to
reach text that resides in files. We’ll look at the basics you might expect, with
some methods to straightforwardly read files in one go. We’ll then look at a
technique that will allow us to read even the biggest files in a memory-efficient
way, by treating files as streams, and look at how this can give us random
access into even the largest files. Let’s take a look.

Opening a File
Before we can do something with a file, we need to open it. This signals our
intent to read from or write to the file, allowing Ruby to do the low-level that
make that intention actually happen on the filesystem. Once it’s done those
things, Ruby gives us a File object that we can use to manipulate the file.

Once we have this File object, we can do all sorts of things with it: read from
the file, write to it, inspect its permissions, find out its path on the filesystem,
check when it was last modified, and much more.

To open a file in Ruby, we use the open method of the File class, telling it the
path to the file we’d like to open. We pass a block to the open method, in which
we can do whatever we like with the file. Here’s an example:
File.open("file.txt") do |file|
# ...
end

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files •4

Because we passed a block to open, Ruby will automatically close the file for
us after the block finishes, freeing us from doing that cleanup work ourselves.
The argument that open passes to our block, which in this example I’ve called
file, is a File object that points to the file we’ve requested access to (in this case,
file.txt). Unless we tell Ruby otherwise, it will open files in read-only mode, so
we can’t write to them accidentally—a safe default.

Kernel#open
In the real world, it’s common to see people using the global open method rather than
explicitly using File.open:

open("file.txt") do |file|
# ...
end

As well as being shorter, which is always nice, this convenient method is actually a
wrapper for a number of different types of IO objects, not just files. You can use it to
open URLs, other processes, and more. We’ll cover some more uses of open later; for
now, use either File.open or regular open as you prefer.

There’s nothing in our block yet, so this code isn’t very useful; it doesn’t
actually do anything with the file once it’s opened. Let’s take a look at how
we can read content from the file.

Reading from a File


Once we’ve opened a file, the next step is to read its contents. We’ll start with
the simplest way to do this—reading the whole file into a string, allowing us
to perform many kinds of processing with the text contained in the file. We’ll
then look at how we can break the file’s content up into lines and loop through
them, a task that’s frequently necessary when processing log files, when
processing text written by people, and in many other situations.

Reading a Whole File at Once


The easiest way to access the contents of a file in Ruby is to read the entire
file in one go. It’s not always the right solution, especially when working with
bigger files, but it makes sense in many cases.

We can achieve this by using the read method on our File object:
File.open("file.txt") do |file|
contents = file.read
end

More books at 1Bookcase.com report erratum • discuss


Reading from a File •5

The read method returns for us a string containing the file’s contents, no
matter how large they might be.

Alternatively, if all we’re doing is reading the file and we have no further use
for the File object once we’ve done so, Ruby offers us a shortcut. There’s a read
method on the File class itself, and if we pass it the name of a file, then it will
open the file, read it, and close it for us, returning the contents:
contents = File.read("file.txt")

Whichever method we use, the result is that we have the entire contents of
the file stored in a string. This is useful if we want to blindly pass those con-
tents over to something else for processing—to a Markdown parser, for
example, or to insert it into a database, or to parse it as JSON. These are all
very common things to want to do, so read is a widely used method.

For example, if our file contained some JSON data, we could parse it using
Ruby’s built-in JSON library:
require "json"

json = File.read("file.json")
data = JSON.parse(json)

Often, though, we want to do something with the contents ourselves. The


most common task we’re likely to face is to split the file into lines and do
something with each line. Let’s look at a simple way to achieve this.

Line-by-line Processing
Lots of plain-text formats—log files, for instance—use the lines of a file as a
way of structuring the content within them. In files like this, each line repre-
sents a distinct item or record. It’s about the simplest way to separate data,
but this kind of structure is more than enough for many use cases, so it’s
something you’ll run into frequently when processing text.

One example of this sort of log file that you might have encountered before
is from the popular web server Apache. For each request made to it, Apache
will log some information: things like the IP address the request came from,
the date and time that the request was made, the URL that was requested,
and so on. The end result looks like this:
127.0.0.1 - [10/Oct/2014:13:55:36] "GET / HTTP/1.1" 200 561
127.0.0.1 - [10/Oct/2014:13:55:36] "GET /images/logo.png HTTP/1.1" 200 23260
192.168.0.42 - [10/Oct/2014:14:10:21] "GET / HTTP/1.1" 200 561
192.168.0.91 - [10/Oct/2014:14:20:51] "GET /person.jpg HTTP/1.1" 200 46780
192.168.0.42 - [10/Oct/2014:14:20:54] "GET /about.html HTTP/1.1" 200 483

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files •6

Let’s imagine we wanted to process this log file so that we could see all the
requests made by a certain IP address. Because each line in the file represents
one request, we need some way to loop over the lines in the file and check
whether each one matches our conditions—that is, whether the IP address
at the start of the line is the one we’re interested in.

One way to do this would be to use the readlines method on our File object. This
method reads the file in its entirety, breaking the content up into individual
lines and returning an array:
File.open("access_log") do |log_file|
requests = log_file.readlines
end

At this point, we’ve got an array—requests—that contains every line in the file.
The next step is to loop over those lines and only output the ones that match
our conditions:
File.open("access_log") do |log_file|
requests = log_file.readlines

requests.each do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end
end

Using each, we loop over each request. We then ask the request if it starts
with 127.0.0.1, and if the response is true, we output it. Lines that don’t start
with 127.0.0.1 will simply be ignored.

While this solution works, it has a problem. Because it reads the whole file
at once, it consumes an amount of memory at least equal to the size of the
file. This will hold up okay for small files, but as our log file grows, so will the
memory consumed by our script.

If you think about it, though, we don’t actually need to have the whole file in
memory to solve our problem. We’re only ever dealing with one line of the file
at any given moment, so we only really need to have that particular line in
memory. For some problems it’s necessary to read the whole file at once, but
this isn’t one of them. Let’s look at how can we rework this example so that
we only read one line at a time.

More books at 1Bookcase.com report erratum • discuss


Treating Files as Streams •7

Treating Files as Streams


We’ve seen that reading the entire contents of a file isn’t always the best
solution. For a start, it forces us to keep the entire contents of the file in
memory. This might merely be wasteful with smaller files, but it can turn out
to be plain impossible with much larger ones. Imagine wanting to process a
50GB file on a computer that has only 4GB of memory; it would be impossible
for us to read the entire file at once.

The solution is to treat the file as a stream. Instead of reading from the
beginning of the file to the end in one go, and keeping all of that information
in memory, we read only a small amount at a time. We might read the first
line, for example, then discard it and move onto the second, then discard that
and move onto the third, and so on until we reach the end of the file. Or we
might instead read it character by character, or word by word. The important
thing is that at no point do we have the full file in memory: we only ever store
the little bit that we’re processing.

This enables us to work with enormous files—gigabytes in size, if neces-


sary—without consuming anywhere near that much memory. By varying
exactly what that “bit by bit” is, we can also step through the file in a way
that reflects its structure. If we know that the file has many lines, each of
which represents a record, then we can read one line at a time. If we know
that the file is one enormous line, but that fields are separated by commas,
we can read up to the next comma each time, processing the text one field at
a time.

Streaming Files Line by Line


Let’s revisit our web server log example, where we outputted only those
requests that came from a certain IP address, and see how we can adapt it
to use streaming. Luckily, the solution is straightforward—in fact, it’s actually
easier than the method that reads the whole file into memory.

The File object yielded to our block has a method called each_line. This method
accepts a block and will step through the file one line at a time, executing
that block once for each line.
File.open("access_log") do |log_file|
log_file.each_line do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end
end

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files •8

That’s it. The each_line method allows us to step through each line in the file
without ever having more than one line of the file in memory at a time. This
method will consume the same amount of memory no matter how large the
file is, unlike our first solution.

Just like with File.read, Ruby offers us a shortcut that doesn’t require us to
open the file ourselves: File.foreach. Using it trims the previous example down
a little:
File.foreach("access_log") do |request|
if request.start_with?("127.0.0.1 ")
puts request
end
end

On my machine, working on a 5,000,000-line, 315MB file, the stream method


uses 18MB of memory, while the non-streaming version uses 706MB—an
increase of almost forty times. As the size of the file you’re dealing with
increases, the streaming method won’t use any more memory, whereas the
readlines method will. So if you’re dealing with files that are more than a few
kilobytes in size, and if the processing that you’re doing doesn’t require you
to have the whole file in memory at once, each_line will result in a noticeably
more efficient solution.

Enumerable Streams
The each_line method of the File class is aliased to each. This might not seem
particularly remarkable, but it’s actually tremendously useful. This is because
Ruby has a module called Enumerable that defines methods like map, find_all,
count, reduce, and many more. The purpose of Enumerable is to make it easy to
search within, add to, delete from, iterate over, and otherwise manipulate
collections. (You’ve probably used methods like these when working with
arrays, for example.)

Well, a file is a collection too. By default, Ruby considers the elements of that
collection to be the lines within the file, so because File includes the Enumerable
module, we can use all of its methods on those lines. This can make many
processing operations simple and expressive, and because many of Enumerable’s
methods don’t require us to consume the whole file—they’re lazy, in other
words—we often retain the performance benefits of streaming, too.

To explore what this means, we can revisit our log example. Let’s imagine you
wanted to group all of the requests made by each IP address, and within that
group them by the URL requested. In other words, you want to end up with
a data structure that looks something like this:

More books at 1Bookcase.com report erratum • discuss


Treating Files as Streams •9

{
"127.0.0.1" => [
"/",
"/images/logo.png"
],
"192.168.0.42" => [
"/",
"/about.html"
],
"192.168.0.91" => [
"/person.jpg"
]
}

Here’s a script that uses the methods offered by Enumerable to achieve this:
requests-by-ip.rb
requests =
File.open("data/access_log") do |file|
file
.map { |line| { ip: line.split[0], url: line.split[5] } }
.group_by { |request| request[:ip] }
.each { |ip, requests| requests.map! { |r| r[:url] } }
end

We open the file just like we did previously. But instead of using each_line to
iterate over the lines of the file, we use map. This loops over the lines of the
file, building up an array as it does so by taking the return value of the block
we pass to it. Here our block is using split to separate the lines on whitespace.
The first of these whitespace-separated fields contains the IP, and the sixth
contains the URL that was requested, so the block returns a hash. The result
of our map operation is therefore an array of hashes that contain only the
information about the request that we’re interested in—the IP address and
the URL.

Next, we use the group_by method. This transforms our array of hashes into a
single hash. It does so by checking the return value of the block that we pass
to it; all the elements of the array that return the same value will be grouped
together. In this case, our block returns the IP part of the request, which
means that all of the requests made by the same IP address will be grouped
together.

The data structure after the group_by operation looks something like this:
{
"127.0.0.1" => [
{:ip=>"127.0.0.1", :url=>"/"},
{:ip=>"127.0.0.1", :url=>"/images/logo.png"}
],

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files • 10

"192.168.0.42" => [
{:ip=>"192.168.0.42", :url=>"/"},
{:ip=>"192.168.0.42", :url=>"/about.html"}
],
"192.168.0.91" => [
{:ip=>"192.168.0.91", :url=>"/person.jpg"}
]
}

This is almost what we were after. The problem is that we have both the IP
address and the URL of the request, rather than just the URL. So the next
step in our chain uses each to loop over these IP address and request combi-
nations. It then uses map! to replace the array of hashes with just the URL
portion, leaving us with an array of strings.

The final result is exactly what we wanted:


{
"127.0.0.1" => [
"/",
"/images/logo.png"
],
"192.168.0.42" => [
"/",
"/about.html"
],
"192.168.0.91" => [
"/person.jpg"
]
}

This transformation is relatively complex, but it was easily achieved with


Ruby’s enumerable methods. Each step in the chain of methods performs
one particular transformation on the data, getting us closer and closer to the
final structure that we’re after. If you can break your problem down into small
steps like these, then you’ll find that you can get Ruby to do much of the
work for you. When processing text, you’ll find yourself writing collection
pipelines like this fairly frequently, so it’s definitely worth getting acquainted
with Enumerable and the functionality that it offers you.

Other Streaming Methods


Although it’s most common to want to stream files line by line, it’s not your
only option. As well as each_line, Ruby also offers a general streaming method
in the form of each.

More books at 1Bookcase.com report erratum • discuss


Treating Files as Streams • 11

By default, each behaves exactly like each_line, looping over the lines in the file.
But it also accepts an argument allowing you to change the character on
which it will split, from a newline to anything else you might like.

Let’s imagine we had a file with only a single line in it, but that contained
many different records separated by commas:
this is field one,this is field two,this is field three

To process this file as a stream, we could pass a comma character to each,


thereby telling it to give us a new record each time it encountered a comma:
File.open("commas.txt") do |file|
file.each(",") do |record|
puts record
# >> "this is field one,"
# "this is field two,"
# "this is field three"
end
end

Instead of giving us a new record whenever it encountered a new line, as


each_line did and as is the default behavior of each, we now get a new record
each time Ruby sees a comma. This allows us to process this type of file with
all the benefits of streaming.

Another commonly used streaming method is each_char, which will yield to us


each character in the file. So if we wanted to see how many b characters were
in a file, we could use each_char:
n = 0
File.open("file.txt") do |file|
file.each_char do |char|
n += 1 if char == "b"
end
end

puts "#{n} b characters in file.txt"

Again, this method has all of the benefits of other streaming examples; we
only ever have a single character in memory at one time, so we can process
even the largest of files.

Like many enumerating methods in Ruby, if we don’t pass a block to methods


like each_char and each, they’ll return for us an Enumerator. That means we can
also use these other streaming methods with all of the collection-related
methods Enumerable offers us simply by calling them on the Enumerator that’s
returned for us.

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files • 12

For example, we could rewrite the previous example, where we quite verbosely
initialized our n variable and incremented it manually, by using Enumerable’s
count method:

character-count.rb
n =
File.open("file.txt") do |file|
file.each_char.count { |char| char == "b" }
end

puts "#{n} b characters in file.txt"

The count method accepts a block and will return the number of values for
which the block returned true. This is exactly what our previous code was
doing, but this way is a little shorter and a little neater, and reveals our
intentions more clearly.

IO: Ruby’s Input/Output Class


All of the methods we’ve seen so far—read, each_line, each_char, and so on—aren’t actually
defined by the File class itself. They’re defined by the class that File inherits from: IO.

This might seem like an academic distinction, but it has an important benefit: it
means that other types of IO in Ruby have those same methods, too. Files, streams,
sockets, Unix pipelines—all of these things are fundamentally similar, and it’s in IO
that these similarities are gathered into one abstraction. In the words of Ruby’s own
documentation, IO is “the basis for all input and output in Ruby.” By learning to read
from files, then, you’ll learn both principles and concrete methods that will translate
to all the other places from which you might want to acquire text.

If you know how to write output to the screen, then—using puts—you already know
how to write to a file: by calling puts on the file object. Our screen and a file are both
IO objects—of two different kinds—so the way we interact with them is the same.
This similarity will be very useful throughout our text processing journey.

Reading Fixed-Width Files


Another way of processing files without consuming them whole is to consume
a fixed number of bytes. It’s much like the streaming examples that we’ve
seen, where we read from the file until we encountered a newline or until we
encountered a comma. But instead of reading until we hit a particular char-
acter, we read, say, ten bytes from the current position, receiving a string
containing those ten bytes.

This might seem an inflexible and impractical way of doing things. After all,
how can we know at how many bytes from the start of the file we’ll find the

More books at 1Bookcase.com report erratum • discuss


Reading Fixed-Width Files • 13

information we want? But this sort of processing has several real-world


applications and has an important performance characteristic that gives it
many of the benefits of a “proper” database system, so it’s worth exploring.
Let’s see how we can use it.

The Data File


In this section, we’ll be playing with some scientific data. The National
Oceanic and Atmospheric Administration (NOAA) releases data on the surface
temperature of four regions in the Pacific Ocean, as measured every week
since 1990, offering for download a text file that looks like this:1
03JAN1990 23.4-0.4 25.1-0.3 26.6 0.0 28.6 0.3
10JAN1990 23.4-0.8 25.2-0.3 26.6 0.1 28.6 0.3
17JAN1990 24.2-0.3 25.3-0.3 26.5-0.1 28.6 0.3

…and so on for many hundreds more rows.

Let’s imagine we wanted to dig into this data. We might want to find out what
the warmest week was, or plot the results on a graph, or just show what the
temperature of a particular region was last week. To do any of these things,
we need to parse the data and get it into our script.

First, a quick explanation of the data. The first column contains the date of
the week in which the measurements were taken. The other four columns
represent different regions of the ocean. For each of them we have two num-
bers: the first representing the recorded temperature, and the second repre-
senting the departure from the expected temperature that this recording
represents (the “sea surface temperature anomaly”). In the first row, then,
the first region recorded a temperature in the week of 3 January 1990 of 23.4
degrees, which is an anomaly of -0.4 degrees.

The pleasing visual quality that this data has—the fact that all the columns
in the table line up neatly—will help us in this task. If we were to count the
characters across each line, we’d see that each field started at exactly the
same place in each row. The first column, containing the date of the week in
question, is always twelve characters long. The next number is nine characters
long, always, and the following number is always four characters, regardless
of whether it has a negative sign. This nine/four pattern repeats three more
times for the other three regions.

In trying to get this data into our script, let’s look at how to read the first row
of data.

1. The data is available on the NOAA website: http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for.

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files • 14

Reading a Fixed Number of Bytes


We know that in the file we’re looking at, each column is a fixed size. That
means to read each one, we just need to read a different number of bytes,
and to do this we need to revisit a method that we’ve already looked at: read.

Previously, we used read in its basic form, without any arguments, which read
the entire file into memory. But if we pass an integer as the first argument,
read will read only that many bytes forward from the current position in the
file.

So, from the start of the file, we could read in each field in the first row as
follows:
noaa-first-row-simple.rb
File.open("data/wksst8110.for") do |file|
puts file.read(10)
4.times do
puts file.read(9)
puts file.read(4)
end
end
# >> 03JAN1990
# >> 23.4
# >> -0.4
# >> 25.1
# >> -0.3
# >> 26.6
# >> 0.0
# >> 28.6
# >> 0.3

We first read ten bytes, to get the name of the week. Then we read nine bytes
followed by four bytes to extract the numbers, doing this four times so that
we extract all four regions.

From here, it’s not much work to have our script continue through the rest
of the file, slurping up all of the data within and converting it into a Ruby
data structure—in this case, a hash:
noaa-all-rows.rb
File.open("data/wksst8110.for") do |file|
weeks = []

until file.eof?
week = {
date: file.read(10).strip,
temps: {}
}

More books at 1Bookcase.com report erratum • discuss


Reading Fixed-Width Files • 15

[:nino12, :nino3, :nino34, :nino4].each do |region|


week[:temps][region] = {
temp: file.read(9).to_f,
change: file.read(4).to_f
}
end

file.read(1)

weeks << week


end

weeks
# => [{:date=>"03JAN1990",
# :temps=>
# {:nino12=>{:temp=>23.4, :change=>-0.4},
# :nino3=>{:temp=>25.1, :change=>-0.3},
# :nino34=>{:temp=>26.6, :change=>0.0},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# {:date=>"10JAN1990",
# :temps=>
# {:nino12=>{:temp=>23.4, :change=>-0.8},
# :nino3=>{:temp=>25.2, :change=>-0.3},
# :nino34=>{:temp=>26.6, :change=>0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# {:date=>"17JAN1990",
# :temps=>
# {:nino12=>{:temp=>24.2, :change=>-0.3},
# :nino3=>{:temp=>25.3, :change=>-0.3},
# :nino34=>{:temp=>26.5, :change=>-0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}},
# ...snip...
end

The logic is fundamentally the same as when reading the first row. To loop
over all the rows in the file, there are two main changes: first, we loop until we
hit the end of the file by checking file.eof?; it will return true when the end of
the file is reached and therefore end our loop. The other addition is the call
to file.read(1) at the end of the row; this will consume the newline character at
the end of each line. We’re also using strip to strip the whitespace from the
week name, and to_f to convert the temperature numbers to floats.

This method works and is fast. But by only using read to consume a fixed
numbers of bytes, we haven’t seen the most important advantage of treating
the file in this way: the fact that it offers us random access to the records
within the file.

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files • 16

Seeking Through the File


Until now we’ve looked at advancing through the file as a stream, starting at
the beginning and moving through the whole file. But just as we can use read
to advance through and consume a portion of the file, we can also move to a
specific location without consuming anything. This is a fast way to skip data
that we’re not interested in.

To continue with our temperature data, let’s imagine we wanted to be able to


access a particular week. Not necessarily the first one and not all of them at
once, but instead just a single row from within the records.

Well, because each of the columns within the data has a fixed width, that
means that all of the rows do, too. Adding up the columns, including the
newline at the end, gives us 10 + 4 * (9 + 4) + 1 = 63 characters, so we know that
each of our records is 63 bytes long.

If we used seek to skip 63 bytes into the file, then our first call to read would
begin reading from the second record:
noaa-skip-first-row.rb
File.open("data/wksst8110.for") do |file|
file.seek(63)
file.read(10)
# => " 10JAN1990"
end

As we can see, our first call to read returns for us the date of the second week
in the file, not the first. Using this method, we can now skip to arbitrary
records—the first, the tenth, the thousandth, whatever we like—and read
their data.

The most important part of this is that seeking happens in constant time.
That means that it takes the same amount of time no matter how large the
file is and no matter how far into the file we want to seek. We’ve finally
uncovered the amazing benefit to fixed-width files like this—that we gain the
ability to access records within them at random, so it’s no slower to find the
303rd record than it is to find the third—or even the 300,003rd.

In the final version of our script, then, we can write a get_week method that
will retrieve a record for us given an index for that record (1 for the first, 2 for
the second, and so on):

More books at 1Bookcase.com report erratum • discuss


Reading Fixed-Width Files • 17

noaa-seek.rb
def get_week(file, week)
file.seek((week - 1) * 63)

week = {
date: file.read(10).strip,
temps: {}
}

[:nino12, :nino3, :nino34, :nino4].each do |region|


week[:temps][region] = {
temp: file.read(9).to_f,
change: file.read(4).to_f
}
end

week
end

File.open("data/wksst8110.for") do |file|
get_week(file, 3)
# => {:date=>"17JAN1990",
# :temps=>
# {:nino12=>{:temp=>24.2, :change=>-0.3},
# :nino3=>{:temp=>25.3, :change=>-0.3},
# :nino34=>{:temp=>26.5, :change=>-0.1},
# :nino4=>{:temp=>28.6, :change=>0.3}}}
get_week(file, 303)
# => {:date=>"18OCT1995",
# :temps=>
# {:nino12=>{:temp=>20.0, :change=>-0.8},
# :nino3=>{:temp=>24.1, :change=>-0.9},
# :nino34=>{:temp=>25.8, :change=>-0.9},
# :nino4=>{:temp=>28.2, :change=>-0.5}}}
get_week(file, 1303)
# => {:date=>"17DEC2014",
# :temps=>
# {:nino12=>{:temp=>22.9, :change=>0.1},
# :nino3=>{:temp=>26.0, :change=>0.8},
# :nino34=>{:temp=>27.4, :change=>0.8},
# :nino4=>{:temp=>29.4, :change=>1.0}}}
end

Here we use the get_week method to fetch the third, 303rd, and 1,303rd records.
With this method we can treat the data within the file almost as though it
was a data structure within our script—like an array—even though we haven’t
had to read any of it in. This allows us to randomly access data within even
the largest of files in a very fast and efficient way.

More books at 1Bookcase.com report erratum • discuss


Chapter 1. Reading from Files • 18

One important caveat is that read and seek operate at the level of bytes, not
characters. You’ll learn more about the difference between the two in Chapter
7, Encodings, on page 89, but it’s worth noting that if you’re using a multibyte
character encoding, like UTF-8, then using seek carelessly might leave you in
the middle of a multibyte character and might mean that you get some gib-
berish when you try to read data.

You should therefore use these methods only when you know that you’re
dealing solely with single-byte characters or when you know that the location
you’re seeking to will never be in the middle of a character—as in our temper-
ature data example, where we’re seeking to the boundaries between records.

Despite this limitation of seek, hopefully you can see the benefit of using a
fixed-width file like this. We can retrieve any value, no matter how big the file
is, without reading any unnecessary data; we have what’s called random
access to the data within. To retrieve the tenth record, we just need to seek
567 bytes from the start of the file; to retrieve the 703rd, we just need to seek
44,226 bytes from the start; and so on. The wonderful thing is that no matter
how large our file gets, this operation will always take the exact same amount
of time—even if we’ve got hundreds of megabytes of data. That’s why it’s
sometimes worth putting up with the limitations of such a format: it’s both
very simple and very fast.

Wrapping Up
That’s about it for reading files. We looked at how to open a file and what we
can do with the resulting File object. We covered reading files in one go and
processing them like streams, and why you’d prefer one or the other. We
explored how we can use the methods offered by Enumerable to transform and
manipulate the content of files. We looked at line-by-line processing and
reading arbitrary numbers of bytes, and how we can seek to arbitrary locations
in the file to replicate some of the functionality of a database.

With these techniques, we’ve gained an impressive arsenal for reading text
from files large and small. Next, we’ll take our newfound knowledge of streams
and apply it to another source of text: standard input.

More books at 1Bookcase.com report erratum • discuss


CHAPTER 2

Processing Standard Input


We’ve looked at how we can use Ruby to read data that exists on the filesys-
tem. Another common source of information, though, is direct input to a
script. This might be text that users input using their keyboard, or it might
be text that’s redirected to your script from another process. From the per-
spective of Ruby, these two different types of input are actually the same
thing, processed in the same way.

This source of input is called standard input, and it’s one of the foundations
of text processing. Along with its output equivalents standard output and
standard error, it enables different programs to communicate with one
another in a way that doesn’t rely on complex protocols or unintelligible
binary data, but instead on straightforward, human-readable text.

Learning how to process standard input will allow you to write flexible and
powerful utilities for processing text, primarily by enabling you to write pro-
grams that form part of text processing pipelines. These chains of programs,
linked together so that each one’s output flows into the input of the next, are
incredibly powerful. Mastering them will allow you to make the most both of
your own utilities and of those that already exist, giving you the most text
processing ability for the least amount of typing possible.

Let’s take a look at how we can write scripts that process text from standard
input, slotting into pipeline chains and giving us flexibility, reusability, and
power.

Redirecting Input from Other Processes


Standard input can be used to read text from the keyboard. You might have
used it that way when learning Ruby, prompting users for input and storing
the text they typed:

More books at 1Bookcase.com report erratum • discuss


Chapter 2. Processing Standard Input • 20

print "What's your name? "


name = $stdin.gets.chomp
puts "Hi, #{name}!"

Here we ask standard input—$stdin—for a line of input using the gets method,
using chomp to remove the trailing newline. This gives us a string, which we
store in name.

This simplistic use of standard input isn’t particularly useful, let’s face it.
But it’s actually only half of the story. Standard input isn’t just used to read
from the keyboard interactively; it can also read from input that’s been redi-
rected—or piped—to your script from another process.

The ultimate goal here is to be able to use your scripts in pipeline chains.
These are chains of programs strung together so that the output from the
first is fed into the input of the second, the output of the second becomes the
input of the third, and so on. Here’s an example:
$ ps ax | grep ruby | cut -d' ' -f1

Here we use three separate commands, each of which performs a different


individual task, and combine them to perform quite a complex operation. In
this case, that operation is to fetch all of the processes running on the system,
search for ones that contain ruby in their name, and then display the first
whitespace-separated column of the output (which contains the process ID).
That’s actually quite a feat, and it was achieved without writing a script; we
can do all of that processing just by typing a command into our shell.

That example used preexisting commands to do its work. But we can write
our own programs that slot into such workflows. Imagine that we frequently
wanted to convert sections of text to uppercase. We know how to convert text
to uppercase in Ruby, so we could write a script that works like this:
$ echo "hello world" | ruby to-uppercase.rb
HELLO WORLD

…and that also works like this:


$ hostname | ruby to-uppercase.rb
ROB.LOCAL

In other words, we could write a program that converts any text it receives
on standard input to uppercase, then outputs that converted text. It won’t
know where the text is coming from (for example, the echo command we saw
previously versus the hostname command)—it accepts anything you pass to it.
This gives you great flexibility in how you use the script, opening up ways of
using it that you might not have foreseen when writing it.

More books at 1Bookcase.com report erratum • discuss


Redirecting Input from Other Processes • 21

This flexibility is what makes such scripts useful. Your goal, or at least a
pleasant side effect of processing text in this way, is to build up a library of
such scripts so that, if you encounter the same problem again, you can just
slot the script you wrote last time into the new pipeline chain and be on your
way. The to-uppercase.rb script is a good example of this: you might need to write
it from scratch the first time you encounter the problem of converting input
to uppercase, but after that it can be used again and again in completely
different situations.

Actually writing our to-uppercase.rb script is pretty straightforward. In fact, we


don’t need to know anything more than we learned when prompting users
for their name. That’s because Ruby doesn’t distinguish between standard
input that comes from the keyboard and standard input that’s redirected;
you can just read blindly from standard input. Likewise, you don’t have to
care whether your output is being written to the screen or being piped to
another process; you can just write blindly to standard output. The shell will
take care of redirections for you.

To write our to-uppercase.rb example, we need to read from standard input,


convert that text to uppercase, and then output it to standard output. In
Ruby, that’s one line:
puts $stdin.read.upcase

Saving this script as to-uppercase.rb, we’ve got everything we need. We can run
it like this:
$ echo "hello world" | ruby to-uppercase.rb
HELLO WORLD
$ hostname | ruby to-uppercase.rb
ROB.LOCAL
$ whoami | ruby to-uppercase.rb
ROB

We now have a script that reads from standard input, modifies what it receives,
and outputs it to standard output. It’s general purpose. It doesn’t know or
care where its input comes from, but it processes the input happily regardless.

Countless examples of this type of tool already exist, distributed with Unix-
like operating systems: grep, for example, which outputs only lines that match
a given pattern, or sort, which outputs an alphabetically sorted version of its
input. The scripts you write yourself will be right at home with these standard
Unix utilities as part of your text processing pipelines.

More books at 1Bookcase.com report erratum • discuss


Chapter 2. Processing Standard Input • 22

Example: Extracting URLs


In the previous example, we used read to read all the standard input in one
go. But as we saw with files, reading everything into memory in one gulp often
isn’t the best idea, especially when our input begins to grow in size.

It was also annoying in the previous example that we had to type ruby to-
uppercase.rb. Other commands are short and snappy—cut, grep—but we had to
type what feels like a lot of superfluous information.

For our next example, we’re going to write a script that extracts URLs from
the input passed to it, outputting any that it finds and ignoring the rest of
the input. So, if we passed it the following text:
Alice's website is at http://www.example.com
While Jane's website is at https://example.net and contains a useful blog.

we’d expect to have these URLs extracted from it:


http://www.example.com
https://example.net

This script will be called urls, and once we’ve written it we’ll be able to use it
in any pipeline we like. Because it will treat its input as a stream, we’ll be
able to use it on whatever input we like, no matter how large it is. So we’ll be
able to extract the URLs from a text file:
$ cat file.txt | urls

Or extract the URLs from within a web page:


$ curl http://example.com | urls

Let’s take a look at what we need to do to write the urls script.

The Shebang
Up until now we’ve only run our Ruby scripts by telling the Ruby interpreter
the name of the file to execute. But when we’re using ordinary Unix commands,
such as grep or uniq, we just specify them as commands in their own right.
Ideally, we want to be able to do the same with our URL extractor. It would
be annoying if we had to type ruby urls.rb or something similar each time we
wanted to use it, especially if we’re going to be using it a lot.

But if we just called our script urls, how would our shell know that it was a
Ruby script and know to pass its contents to Ruby to execute? The answer
is, because we tell it to, and we tell it using a special line at the top of our
script called the shebang. In this case, we’d use:

More books at 1Bookcase.com report erratum • discuss


Example: Extracting URLs • 23

#!/usr/bin/env ruby

The special part is the #!—it’s this that gives the line its name (“hash” +
“bang”). Since the Ruby interpreter might be in different places on different
people’s computers, we use a command called env to tell the shell to use ruby,
wherever ruby might be.

The presence of this shebang allows us to save our script as a file called urls
and run it directly, rather than as ruby urls. The final step in this process is to
allow the file to be executed. We can do this with the chmod command:
$ chmod +x urls

That’s it. We can now call ./urls from within the directory our urls file resides
in, and it will execute our script as Ruby code.

If we wanted to be able to call our version from anywhere, not just from the
directory in which it’s saved, we could put it into a directory that’s within our
PATH—/usr/local/bin, for example. Many people create a directory under their
home directory—typically called bin—and put that into their path, so that they
have a place to keep all of their own scripts without losing them among the
ones that came with their system or that they’ve installed from elsewhere.

Putting the script in a directory that’s in your PATH will make it feel just like
any other text processing command and make it really easy to use wherever
you are. If you think you’ll use a particular script regularly, then don’t hesitate
to put it there. The only thing you need to do is to make sure the name of the
script doesn’t clash with an existing command that you still want to be able
to use—otherwise, you’ll run your script when you type the command, rather
than whatever command originally had that name. So don’t call it ls or mv!

Looping Over Lines


When writing scripts like these, we’ll often want to loop over the input we’re
passed one line at a time. That way, we don’t need to read the whole input
into memory at once, which would be less scaleable and would also slow down
the other parts of our pipeline chain.

Just like the File objects we saw in the previous chapter, $stdin has an each_line
method that allows us to iterate over the lines in our input:
$stdin.each_line do |line|
# ...
end

Wherever possible, we should try to treat standard input as a stream. If our


script is used to process a large amount of data, this stream processing will

More books at 1Bookcase.com report erratum • discuss


Chapter 2. Processing Standard Input • 24

mean that we can pass our output along to the next stage in the process as
and when we process it. If our script is the last stage in the pipeline, that
means the user sees output more quickly; and if we’re earlier in the pipeline,
then it means the next part of the pipeline can be doing its processing while
we’re working on our next chunk.

The Logic
Unlike our to-uppercase.rb example, we’re not actually interested in printing the
line of output, even in a modified form. Instead we want to extract any URLs
we find in it and then output those. To do that, we’ll use a regular expression.
We’ll be covering these in depth in Chapter 8, Regular Expressions Basics,
on page 103, so don’t worry too much about them now:
urls
#!/usr/bin/env ruby

$stdin.each_line do |line|
urls = line.scan(%r{https?://\S+})
urls.each do |url|
puts url
end
end

Here we use String’s scan method to extract everything that looks like a URL.
Then, we loop over them—after all, there might be multiple URLs in a single
line—and output each one of them.

Running the Script


If we invoke our script as follows, assuming that we’ve put it in our PATH, we’ll
see the output we’re expecting:
$ printf "hello\nworld http://www.example.com/\nhttps://example.net/" | urls
http://www.example.com/
https://example.net/

We now have the general-purpose URL-matcher that we were after, and it


took only a few lines of Ruby code! That’s great. We can now find all the URLs
in a file:
$ cat some-file.txt | urls
http://www.example.org.uk
https://example.co.uk
<literal:elide>snip</literal:elide>

More books at 1Bookcase.com report erratum • discuss


Concurrency and Buffering • 25

Or all the URLs in a log file from our web server:


$ cat /var/log/webserver/access | urls
https://example.com/about-us
http://www.example.com/contact
https://example.com/about-us
<literal:elide>snip</literal:elide>

Of course, we’re not limited to having our script be the final stage in the
pipeline. We could use it as an intermediary step—for example, to fetch a web
page, extract the URLs from it, and then download each of those URLs:
$ curl http://example.com | urls | xargs wget

Hopefully you can imagine many scenarios where having such a script and
other tools like it would come in handy. Before long, if you’re anything like
me, you’ll have built up quite the collection of them, each in true Unix fashion
built to do one thing—but to do it well.

Concurrency and Buffering


When thinking about pipeline chains, you could be forgiven for thinking that
they’re executed in sequence; that is, that the first command generates all of
its output, then the second command takes that input and outputs whatever
it needs to, and then after that the third process does its bit, and so on.

In reality, though, that’s not the case. All of the programs in the pipeline chain
run simultaneously, and data flows between them bit by bit—just like water
through a real pipe. While the second process is working with the first chunk
of information, the first process is generating another chunk; by the time the
first chunk is through to the third or fourth process in the pipeline, the first
process may be onto the third, tenth, or hundredth chunk.

The amazing thing about this concurrency is that the processes themselves
need know nothing about it. It’s all taken care of by the operating system and
the shell, leaving the individual process to worry only about fetching input
and producing output.

We can prove this concurrency by typing the following into our command
line:
$ sleep 5 | echo "hello, world"
hello, world

If the tasks were executed in series, we’d see nothing for five seconds, and
only then would hello, world appear on our screen. But instead, because the
echo command starts at the same time as sleep, we see the output immediately.

More books at 1Bookcase.com report erratum • discuss


Chapter 2. Processing Standard Input • 26

When we request more data from standard input—when calling $stdin.gets, for
example—Ruby will do one of two things. If it has input available in its buffer,
it will pass it on immediately. If it doesn’t, though, it will block, waiting until
the process before it in the pipeline has generated enough output.

What constitutes “enough output” is up to the operating system, but the


upshot is that input will be passed to your script in chunks; on a Linux sys-
tem, for example, those chunks will be 65,536 bytes in size. If the process
before you generates 65,535 bytes and then waits ten seconds before gener-
ating some more output, then your process will wait ten seconds before
receiving any input at all.

This can be frustrating when the input you’re receiving is in many small
chunks, especially if those small chunks are slow to generate. One example
is the find command, which searches the filesystem for files matching given
conditions. It might generate hundreds of filenames per second, or it might
generate one per minute, depending on how many files you’re searching
through and how many of them match your conditions.

If we pipe the result of a find into this script, it will be a long time before the
script actually receives any input, and because this buffering happens at the
output stage, not the input stage, there’s nothing we can do about it. Our
supposedly concurrent pipeline sometimes doesn’t behave concurrently at
all.

While we have no control over the behavior of other programs, if we’re writing
programs ourselves that generate slow output like find does, then we can
remove this buffering by telling our standard output stream to behave syn-
chronously. To illustrate the change, here’s a script that uses the default
behavior and therefore has its output buffered:
stdout-async.rb
100.times do
"hello world".each_char do |c|
print c
sleep 0.1
end
print "\n"
end

If we run this script and pipe the output into cat:


$ ruby stdout-async.rb | cat

then we’ll see the problem: nothing happens for a very, very long time. Because
we’re outputting a character only every 0.1 seconds, it would take us 410

More books at 1Bookcase.com report erratum • discuss


Wrapping Up • 27

seconds to fill up a 4,096-byte buffer and a staggering two hours to fill up a


65,536-byte buffer, so we see nothing until the program ends.

The synchronous version of the script avoids this problem:


stdout-sync.rb
$stdout.sync = true

100.times do
"hello world".each_char do |c|
print c
sleep 0.1
end
print "\n"
end

Here we set $stdout.sync to true, telling our standard output stream not to buffer
but instead to flush constantly. If we pipe the input from this script into cat,
we’ll see a character appear every 0.1 second. Although the script will take
the same amount of time in total to execute, the next program in the pipeline
will have the chance to work with the output immediately, potentially speeding
up the overall time the pipeline takes.

Wrapping Up
We looked now at how to use standard input to obtain input from users’
keyboards, how to redirect the output of other programs into our own, and
how powerful text processing pipelines can be. We saw the value of small
tools that perform a single task and how they can be composed together in
different ways to perform complex text processing tasks. We learned how to
write scripts that can be directly executed and that can process standard
input as a stream and so can work with large quantities of input.

We used standard input so far from the perspective of scripts—simple ones,


at times, but scripts nevertheless. Sometimes, though, we don’t want to have
to go to the trouble of writing a full-fledged script to process some data.
Wouldn’t it be nice to have the flexibility of Ruby in throwaway one-liners
that we write in our shell? The next chapter looks at how we can do just that.

More books at 1Bookcase.com report erratum • discuss


CHAPTER 3

Shell One-Liners
We’ve looked at processing text in Ruby scripts, but there exists a stage of
text processing in which writing full-blown scripts isn’t the correct approach.
It might be because the problem you’re trying to solve is temporary, where
you don’t want the solution hanging around. It might be that the problem is
particularly lightweight or simple, unworthy of being committed to a file. Or
it might be that you’re in the early stages of formulating a solution and are
just trying to explore things for now.

In such cases, it would be advantageous to be able to process text from the


command line, without having to go to the trouble of committing your thoughts
to a file. This would allow you to quickly throw together text processing
pipelines and scratch whatever particular itch that you have—either solving
the problem directly or forming the foundation of a future, more solid solution.

Such processing pipelines will inevitably make use of standard Unix utilities,
such as cat, grep, cut, and so on. In fact, those utilities might actually be suffi-
cient—tasks like these are, after all, what they’re designed for. But it’s common
to encounter problems that get just a little too complex for them, or that for
some reason aren’t well suited to the way they work. At times like these, it
would nice if we could introduce Ruby into this workflow, allowing us to
perform the more complex parts of the processing in a language that’s familiar
to us.

It turns out that Ruby comes with a whole host of features that make it a
cinch to integrate it into such workflows. First, we need to discover how we
can use it to execute code from the command line. Then we can explore dif-
ferent ways to process input within pipelines and some tricks for avoiding
lengthy boilerplate—something that’s very important when we’re writing
scripts as many times as we run them!

More books at 1Bookcase.com report erratum • discuss


Chapter 3. Shell One-Liners • 30

Arguments to the Ruby Interpreter


You probably learned on your first day of programming Ruby that you can
invoke Ruby from the command line by passing it the filename of a script to
run:
$ ruby foo.rb

This will execute the code found in foo.rb, but otherwise it won’t do anything
too special. If you’ve ever written Ruby on the command line, you’ll definitely
have started Ruby in this way.

What you might not know is that by passing options to the ruby command,
you can alter the behavior of the interpreter. There are three key options that
will make life much easier when writing one-liners in the shell. The first is
essential, freeing you from having to store code in files; the second and third
allow you to skip a lot of boilerplate code when working with input. Let’s take
a look at each them in turn.

Passing Code with the -e Switch


By default, the Ruby interpreter assumes that you’ll pass it a file that contains
code. This file can contain references to other files (require and load statements,
for example), but Ruby expects us to pass it a single file in which execution
will begin.

When it comes to using Ruby in the shell, this is hugely limiting. We don’t
want to have to store code in files; we want to be able to compose it on the
command line as we go.

By using the -e flag when invoking Ruby, we can execute code that we pass
in directly on the command line—removing the need to commit our script to
a file on disk. (It might be helpful to remember -e as standing for evaluate,
because Ruby is evaluating the code we pass contained within this option.)
The universal “hello world” example, then, would be as follows:
$ ruby -e 'puts "Hello world"'
Hello world

Any code that we could write in a script file can be passed on the command
line in this way. We could, though it wouldn’t be much fun, define classes
and methods, require libraries, and generally write a full-blown script, but
in all likelihood we’ll limit our code to relatively short snippets that just do a
couple of things. Indeed, this desire to keep things short will lead to making

More books at 1Bookcase.com report erratum • discuss


Arguments to the Ruby Interpreter • 31

choices that favor terseness over even readability, which isn’t usually the
choice we make when writing scripts.

This is the first step toward being able to use Ruby in an ad hoc pipeline: it
frees us from having to write our scripts to the filesystem. The second step
is to be able to read from input. After all, if we want our script to be able to
behave as part of a pipeline, as we saw in the previous chapter, then it needs
to be able to read from standard input.

The obvious solution might be to read from STDIN in the code that we pass in
to Ruby, looping over it line by line as we did in the previous chapter:
$ printf "foo\nbar\n" | ruby -e 'STDIN.each { |line| puts line.upcase }'
FOO
BAR

But this is a bit clunky. Considering how often we’ll want to process input
line by line, it would be much nicer if we didn’t have to write this tedious
boilerplate every time. Luckily, we don’t. Ruby offers a shortcut for just this
use case.

Streaming Lines with the -n Switch


If we pass Ruby the -n switch as well as -e, Ruby will act as though the code
we pass to it was wrapped in the following:
while gets
# execute code passed in -e here
end

This means that the code we pass in the -e argument is executed once for
each line in our input. The content of the line is stored in the $_ variable. This
is one of Ruby’s many global variables, sometimes referred to as cryptic globals,
and it always points to the last line that was read by gets.

So instead of writing the clunky looping example that we saw earlier:


$ printf "foo\nbar\n" | ruby -e 'STDIN.each { |line| puts line.upcase }'
FOO
BAR

we can simply write:


$ printf "foo\nbar\n" | ruby -ne 'puts $_.upcase'
FOO
BAR
</code>

<p> There's more to <inlinecode>$_</inlinecode> than this, though.

More books at 1Bookcase.com report erratum • discuss


Chapter 3. Shell One-Liners • 32

Ruby also defines some global methods that either act on


<inlinecode>$_</inlinecode> or have it as a default argument.
<ic>print</ic> is one of them: if you call it with no arguments,
it will output the value of <inlinecode>$_</inlinecode>. So we
can output the input that we receive with this short script:
</p>

[code language="session"]
$ printf "foo\nbar\n" | ruby -ne 'print'
foo
bar

This implicit behavior is particularly useful for filtering down the input to
only those lines that match a certain condition—only those that start with f,
for example:
$ printf "foo\nbar\n" | ruby -ne 'print if $_.start_with? "f"'
foo

This kind of conditional output can be made even more terse with another
shortcut. As well as print, regular expressions also operate implicitly on $_.
We’ll be covering regular expressions in depth in Chapter 8, Regular Expres-
sions Basics, on page 103, but if in the previous example we changed our
start_with? call to use a regular expression instead, it would read:

$ printf "foo\nbar\n" | ruby -ne 'print if /^f/'

This one-liner is brief almost to the point of being magical; the subject of both
the print statement and the if are both completely implicit. But one-liners like
this are optimized more for typing speed than for clarity, and so tricks like
this—which have a subtlety that might be frowned upon in more permanent
scripts—are a boon.

There are also shortcut methods for manipulating input. If we invoke Ruby
with either the -n or -p flag, Ruby creates two global methods for us: sub and
gsub. These act just like their ordinary string counterparts, but they operate
on $_ implicitly.

This means we can perform search and replace operations on our lines of
input in a really simple way. For example, to replace all instances of COBOL
with Ruby:
$ echo 'COBOL is the best!' | ruby -ne 'print gsub("COBOL", "Ruby")'
Ruby is the best!

We didn’t need to call $_.gsub, as you might expect, since the gsub method
operates on $_ automatically. This is a really handy shortcut.

More books at 1Bookcase.com report erratum • discuss


Other documents randomly have
different content
Howell, James. Familiar letters of James Howell;
with an introd. by Agnes Repplier. 2v. $6; Special
ltd. ed. 4v. *$15. Houghton.
7–15871.

An attractive new edition of letters which “speak for


themselves, and surely no reader will pine for erudite guidance
through the maze of curious anecdote, lively narrative, and
characteristically intimate comment and reflection which Howell
has constructed, writing always crisply and lucidly, in accordance
with his belief that a letter should be ‘short-coated and closely
couch’d’ and should ‘not preach but epistolize.’” (Dial.)

“The letters themselves ... possess all the charm and gossipy
interest of their time that the letters of Horace Walpole contained
a century later.” Laurence Burnham.
+ Bookm. 26: 101. S. ’07. 360w. (Review of 4 v. ed.)
+

+ Dial. 43: 214. O. 1, ’07. 430w. (Review of 2 v. ed.)


+
“In her pleasant way Miss Repplier brings out, by incident and
characterization, the qualities which have made his letters the
constant reading of lovers of literature since they first appeared.”
+ Outlook. 87: 357. O. 19, ’07. 280w. (Review of 2 v. ed.)
+
“It is a book that seems as fresh to-day as when it was written
nearly three centuries ago, and, though it may never be popular,
it will always be valued by the discriminating few.” Charlotte
Harwood.
+ Putnam’s. 2: 446. Jl. ’07. 700w. (Review of 4 v. ed.)
+
“The wide careless world will pay little attention to these
volumes, but they will have their own sure welcome.” H. W.
Boynton.
+ Putnam’s. 3: 233. N. ’97. 830w. (Review of 2 v. ed.)
+

Howells, William Dean. Between the dark and the


daylight. †$1.50. Harper.
7–34775.

Of the seven tales told by old friends at the club four are
psychological romances, stories of that mental borderland
suggested by the book’s title. “A sleep and a forgetting” tells of a
strange lapse of memory in a young girl; “The eidolons of Brooks
Alford” concerns the visions of a broken down professor and the
pretty widow who disperses them; “A memory that worked over
time” is a confusion of memory and imagination; and “A case of
metaphantasmia” enters into the question of dream-
transference. The three stories which conclude the book,
“Editha,” “Braybridge’s offer,” and “The chick of the Easter egg”
are plain day-light stories, a protest against war, a speculation as
to the average proposal, and an amusing Easter comedy.

Reviewed by A. Schade van Westrum.


+ Bookm. 26: 275. N. ’07. 1000w.
“They are queer and creepy without being exactly
supernatural.”
+ Ind. 63: 1377. D. 5, ’07. 150w.
“The stories are graceful social pictures written with charm
and humor.”
+ N. Y. Times. 12: 652. O. 19, ’07. 30w.
“We can only congratulate ourselves that he does not sit
before his fire enjoying it all to himself, as he might be tempted
to do.”
+ Outlook. 87: 624. N. 23, ’07. 190w.
+
“All the stories are full of delightful reading. They would not be
Mr. Howells’s if they were not.”
+ Spec. 99: 717. N. 9, ’07. 210w.
+

Howells, William Dean. Certain delightful English


towns, with glimpses of the pleasant country
between. **$3. Harper.
6–38895.

Descriptive note in Annual, 1906.


“It is only a Stevenson or a Howells who could achieve
fascination for [this task]. But Mr. Howells is triumphantly
successful. The American humor, which has always been
attuned, in Mr. Howells, to a delicate strain, becomes tender
whimsicality. We know no one who writes more beautifully in
modern English.”
+ Ath. 1907. 1: 435. Ap. 13. 1040w.
+
“How dare we use anything so rough and rude as the
downright word praise of anything so delicate?”
+ Lond. Times. 6: 100. Mr. 29, ’07. 1590w.
+

+ Spec. 98: 450. Mr. 23, ’07. 1560w.


Howells, William Dean. Through the eye of the
needle. †$1.50. Harper.
7–15545.

Part 1 of this sociological story contains a view of modern New


York as seen by a traveler from Altruria. The tall, bleak
apartment houses, the social distinctions, and the greed for gain
impress him so strongly that he says at the very outset,—“If I
spoke with Altrurian breath of the way New Yorkers live, I should
begin by saying that the New Yorkers do not live at all.” Part 2
contains an account of Altruria as seen by the American wife
whom he takes home with him, and who has a difficult time
adjusting her American ideas to a country which has neither
money nor social gradations, and, where lord and farmer work
happily for their living, side by side.

“Done in the author’s usual delightful manner.”


+ A. L. A. Bkl. 3: 135. My. ’07.
+
“Unhappily, these sociological criticisms are not conveyed in an
interesting form of fiction. We cannot be absorbed in Mr.
Homos’s love affair with an attractive American widow, and we
are thrown back for diversion on his strictures on American
conditions.”
+ Ath. 1907, 1: 786. Je. 29. 250w.

“He is writing, not a thesis on the future economics of the
world at large, but a kindly satire, a sort of twentieth century
parable.” Frederic Taber Cooper.
+ Bookm. 25: 394. Je. ’07. 270w.
Reviewed by A. Schade van Westrum.
Bookm. 25: 434. Je. ’07. 1230w.
+ Ind. 62: 1207. My. 23, ’07. 670w.
“In this novel, dealing with a theme peculiarly congenial to
him, we have an example of Mr. Howells’s style arrived at its
perihelion.”
+ Lit. D. 34: 885. Je. 1, ’07. 330w.
+
“We should rather be thankful for a piece of very grateful
fancy, and not the least for a deft and witty introduction which is
an almost faultless little piece of irony.”
+ Lond. Times. 6: 165. My. 24, ’07. 530w.
“The account of these plutocrats endeavoring to maintain the
forms of an obsolete social order verges perilously upon comic
opera. This, however, is of small consequence, the point of
interest being that with Mr. Howells’s deep love of humanity as
he finds it, the apostle of realism in American fiction should care
to spend (almost waste) his precious gifts upon such a toy of the
imagination as the island of Altruria.”
+ Nation. 84: 134. My. 9, ’07. 690w.

N. Y. Times. 12: 255. Ap. 20, ’07. 170w.


“Certain it is that whatever be our attitude toward socialism, or
our opinion of what we may presume to be Mr. Howells’s own
theories, we must needs enjoy the exquisite literary flavor of
these letters to and from Altruria, and can hardly fail to be lifted
to a higher plane by the author’s own sincere enthusiasm of
humanity and widely inclusive sympathies.” M. Gordon Pryor
Rice.
+ N. Y. Times. 12: 297. My. 11, ’07. 3370w.
“Mr. Howells has written in his characteristic whimsical vein.”
+ N. Y. Times. 12: 581. Je. 15. ’07. 210w.
“Mr. Howells writes, not as a reformer with a grievance, but
simply as a lover of his kind, perturbed over current errors but
too wise to let them warp his judgment.” Royal Cortissoz.
+ No. Am. 186: 127. S. ’07. 650w.
+

+ Outlook. 86: 339. Je. 15, ’07. 400w.


+
“Somehow, it leaves the reader not half so kindly disposed
toward his fellow-men, not half so eager to make this a better
world, as he was after reading ‘Lemuel Barker’ or ‘Silas
Lapham.’” Vernon Atwood.
+ Putnam’s. 2: 619. Ag. ’07. 290w.
+

“It embodies much cogent criticism of every important phase
of American life.”
+ R. of Rs. 35: 761. Je. ’07. 80w.
“Mr. Howells is always welcome in whatever guise his message
comes, and a special interest attaches to his new romance, since
it exhibits his distinguished talent in an unfamiliar light.”
+ Spec. 98: 836. My. 25, ’07. 840w.
+

Hoy, Mary Lavinia Thompson (Mrs. Frank L.


Hoy). Adrienne. $1.50. Neale.
6–46252.

A southern story of Civil war days in which the fair play-day


world is transformed for a group of irresponsible Southern girls
into a dreary world of waiting and anxiety.

Hoyt, William Henry. Mecklenburg declaration of


independence. **$2.50. Putnam.
7–15929.

A study of evidence that the alleged early declaration of


independence of Mecklenburg county, North Carolina, on May 20,
1775, is spurious.

“The last page leaves the reader as helpless as the first, in


ability to separate hearsay from evidence. But the book is
valuable as a history of a controversy.”
+ Dial. 43: 123. S. 1, ’07. 400w.

“The book offers a very good example of an historical
investigation, conducted in a judicial spirit, and carries conviction
with its conclusions. The illustrations are excellent, but nothing
can excuse the absence of an index.”
+ Nation. 85: 187. Ag. 29. ’07. 540w.
+

R. of Rs. 36: 128. Jl. ’07. 120w.

* Hubbard, Elbert (Fra Elbertus, pseud.). Little


journeys to the homes of eminent orators. (Little
journeys, new ser.) $2.50. Putnam.
7–36125.

An unusual aggregation of orators is presented here. The


group includes Pericles, Mark Antony, Savonarola, Marat,
Ingersoll, Patrick Henry, Starr King, Henry Ward Beecher and
Wendell Phillips.
“It is an incongruous array in time, character, and purpose, but
the author brings out strongly their common characteristics.”
+ N. Y. Times. 12: 711. N. 9, ’07. 100w.
“The book has real interest, especially to that curious boy, or
man, who ‘wants to know.’”
+ Outlook. 87: 618. N. 23, ’07. 60w.

Hubbard, Frank McKinney. Abe Martin, of Brown


county, Indiana. il. **$1. Bobbs.
7–15475.

Mr. Meredith Nicholson characterizes Abe Martin as a “Plato on


a cracker barrel; or radiant Socrates after Xantippe’s departure to
visit her own folks in Tecumseh township.” Cartoons of Abe’s
neighbors who are characterized in epigram appear,
accompanied by brief bibliographical bits. Then follow the “mirth-
provoking epigrams” themselves, which do justice to an Artemus
Ward.

Hubbard, George H. Teachings of Jesus in


parables. *$1.50. Pilgrim press.
7–16710.

“Mr. Hubbard recognizes the fact that the parables of Jesus


were addressed to plain people.... He abstains from dogmatizing
and from critical exegesis, and gives a free homiletical exposition
of what he sees as the central truth of the short story.”—Outlook.

“These popular and interesting expositions of the parables


reveal clear religious insight, practical common-sense, and no
small degree of literary skill.”
+ Bib. World. 30: 79. Jl. ’07. 20w.
“Fresh thoughts in new points of view make this volume a
helpful addition to the abundant literature of its subject. Those
who have read any number of works upon the gospel parables
find need to supplement or correct one author by another, and
this volume, though excellent, occasions no exception to that
experience.”
+ Outlook. 86: 835. Ag. 17, ’07. 140w.

Hubbard, Winfred D., and Kiersted, Wynkoof.


Water-works management and maintenance. $4.
Wiley.
7–21739.

“Part 1., which fills 217 out of a total of 419 pages, deals with
the securing of water supplies from various sources, and the
selection and installation of pumps; Part 2, 167 pages, discusses
more particularly the various features of management and
maintenance, but also necessarily contains much that relates to
construction work; and Part 3, 35 pages, treats from various
points of view the subjects of franchise, water rates and
depreciation.”—Engin. N.

A. L. A. Bkl. 3: 167. O. ’07.


“The title of this important book is somewhat misleading, as
less than half the volume is devoted to the management and
maintenance of water-works. Along with a reproduction of many
facts already well known to every competent water-works man,
and many citations from papers which have already been
frequently published, there are a great many useful and practical
suggestions nearly all of which are in the line of good modern
practice. All of these make the work a valuable addition to water-
works literature.” Dabney H. Maury.
+ Engin. N. 58: 294. S. 12, ’07. 1720w.
+

+ Nature. 76: 517. S. 19, ’07. 340w.

Huber, John Bessnes. Consumption. **$3.


Lippincott.
6–17682.

Descriptive note in Annual, 1906.


“This work, though burdened by a too ambitious title, is really
a very valuable compilation of the facts of the present day anti-
tuberculosis campaign in this and other countries.” Christopher
Easton.
+ Charities. 17: 493. D. 15, ’06. 980w.
+

Huchon, Rene. George Crabbe and his times,


1754–1832: a critical and biographical study; tr.
from the French by Frederick Clarke. *$5. Dutton.
W 7–149.

With less of narrative and more of criticism, M. Huchon aims


to write “a psychological biography of the poet, with a view to
the interpretation of his works.”
“The picture he presents of the young Crabbe is clear and
convincing. When in the later portion of his book he is dealing
with the actual poems he develops these tendencies at which he
has previously hinted, with great skill, so that he brings the
reader very close to the intimate side of the poet’s character.”
+ Acad. 72: 286. Mr. 23, ’07. 1360w.
+
“As a biographer M. Huchon is full, clear, and precise, rivalling
the late James Dykes Campbell in his zest for research and
verification.”
+ Ath. 1907, 1: 407. Ap. 6. 2090w.
+
“At times the narrative is too discursive ... but on the whole it
is a just and clear biography, with sympathetic interpretation.”
Annie Russell Marble.
+ Dial. 43: 39. Jl. 16, ’07. 1290w.
+

“To speak frankly, a book that proposes to introduce an
English poet to the French, and yet in some 700 pages scarcely
quotes a line of his verse as he wrote it, seems to us an
absurdity. The truth is that it has gone a long way to spoil an
admirable book. It is an injustice to the French reader; to the
English reader it is a constant annoyance. And yet the book,
even as it is, deserves to have plenty of English readers.”
+ Lond. Times. 6: 193. Je. 21. ’07. 1370w.

“Its abundance of literary judgment is presented rather in
dispersion than compactness, for the purpose of elucidating the
biographical theses; and the complete proportion and harmony
preserved throughout may well be considered the crowning
achievement of the work.”
+ Nation. 84: 476. My. 23, ’07. 1170w.
+
“Though the French scholar may have prepared a better
biography than the younger Crabbe’s, time will have to judge
whether he has written a better book.” H. W. Boynton.
+ N. Y. Times. 12: 491. Ag. 10, ’07. 1790w.
“Is distinctly original and unconventional.”
+ Outlook. 87: 41. S. 7, ’07. 1720w.
“Of M. Huchon’s volume (not at all badly translated by Mr.
Clarke) we may say, in one word, that it is the work of an expert.
If only as a piece of social history the work is full of value. Our
main praise, however, we reserve for the judgment and taste
with which M. Huchon has made his quotations.”
+ Sat. R. 103: 462. Ap. 13, ’07. 1020w.
+

* Huck, A. Synopsis of the first three Gospels


arranged for English readers; ed. by Ross L.
Finney. *$1. Meth. bk.
An English version of Huck’s “Synopse,” a Greek harmony used
widely in Germany as an aid to Holtzmann’s “Hand-commentar.”
“The present volume exhibits Mark as the basal work of the
evangelic records, the use of Mark by both Matthew and Luke,
the collection of Logia, and the material peculiar to each
evangelist. The use of this harmony does not blind the student
to the special characteristics of the several evangelists and their
relations of mutual dependence, as is often the case with the
older manuals.”

“The work is faithfully done, but it is based on Huck’s second


edition in 1898. This is most unfortunate, as in his recent third
edition, 1906, Huck has fundamentally remodeled his work,
greatly improving and enriching it.”
+ Bib. World. 30: 480. D. ’07. 80w.

“This is decidedly the best harmony for historical study, and its
wide use would promote greatly the knowledge of the New
Testament.”
+ Ind. 63: 1314. N. 28, ’07. 190w.
“This harmony, which follows the order of Mark, is the most
useful in existence for historical students.”
+ Nation. 85: 398. O. 31, ’07. 140w.
+

Huckel, Oliver. Modern study of conscience.


(Boardman lectureship in Christian ethics.) 50c.
Univ. of Pa.
7–13922.

The study looks into the origin and nature of conscience, its
means of education and enlightenment, and finally considers the
grounds for the present and perpetual authority of conscience.

Hudson, Charles Bradford. Crimson conquest: a


romance of Pizarro and Peru. il. †$1.50. McClurg.
7–32156.

A story of aboriginal America. The events fall in the period of


Pizarro’s conquest of the Peruvian chief and his determined
hosts. The hero, Viracocha Christoval, is one of the bravest of
the Castilian knights and the heroine is an Inca princess for love
of whom Christoval fights against his own army. Barbaric
splendor and Spanish chivalry combine in producing splendid
dramatic coloring.

N. Y. Times. 12: 656. O. 19, ’07. 20w.


“There is not a bit of harm in the book, except that it is very
long and strikes us as being very dull.”
− N. Y. Times. 12: 678. O. 26, ’07. 90w.
+

Hudson, William Henry. Crystal age. **$1.50.


Dutton.
“This is a second edition of a book published in the eighties....
One Smith of Great Britain loses consciousness through a fall
and wakes to find himself in a crystal age of organized human
beings with senses of exquisite keenness and souls of crystal
purity.... The cloud on Smith’s horizon is the strange fact that
warmer than fraternal love is unknown. The passion that he
conceives for a daughter of ‘The house’ brings him against a
blank wall of incomprehension. For the perfecting of the race it
has come about that its renewal is vouchsafed only to elect
morals who must be fitted for their high office by a sacred
training. A cryptic catastrophe ends the story, leaving the reader
free to suppose anything.”—Nation.

Lond. Times. 5: 368. N. 2, ’06. 1060w.


“Like most stories of the impossible future it contains its
touches of the credible among the prevailing absurdities and the
occasional touch of the tiresome amid many fascinations. Unlike
most, it has the ring of genuine poetry, the zeal of the open air,
kinship with beauty of all sorts, and a relieving glint of humor.”
+ Nation. 84: 341. Ap. 11, ’07. 400w.
+

+ N. Y. Times. 12: 178. Mr. 23, ’07. 230w.

Hueffer, Ford Madox. England and the English: an


interpretation. **$2. McClure.
7–19051.

The three divisions of Mr. Hueffer’s book, “The soul of


London,” “The heart of the country,” and “The spirit of the
people,” constitute a view of modern life. “Mr. Hueffer here
dedicates himself to essays in descriptive impressionism” (Ath.)
offering to the traveler in and about London almost every type to
be met with and revealing an intimate understanding of
prevailing conditions.
“The volume may be profitably read by anyone proposing a
trip to England for the introductory impressions it affords of the
people and their environment. The reader of serious purpose will
feel no little disappointment that the ‘interpretation’ is not more
interpretative. The author’s over-fondness for dissertation is a
blemish that grows more trying to the reader as he advances.”
+ Dial. 43: 255. O. 16, ’07. 370w.

“Here is an antidote to the tour of the sights which leaves an
American visitor far better informed about historical monuments
and the homes of distinguished Englishmen than any English
resident, but without any real insight into the lives and ideals of
the English of to-day. It is a pity that a volume otherwise
admirably got up should be marred by so many errors in
proofreading. Their number is inexcusable.”
+ Nation. 85: 148. Ag. 15, ’07. 400w.

“As for the success of the book in its desire to interpret for us
the spirit of England and her people, that is as it may be. But it
does give a wonderful series of pictures—a vitascope, as it were,
of life on the island, yet not a photographic one; for each picture
is tinged with the personality of the author, if it be no other than
the desire he feels that his personality shall not intrude.”
Hildegarde Hawthorne.
+ N. Y. Times. 12: 650. O. 19, ’07. 2900w.

“A voluminous ‘author’s note’ is prefixed, supplemented by one
of similar length, in which egotism and over-sophistication of
view-point and utterance contend, as, indeed, they do
throughout.”
− Outlook. 86: 746. Ag. 3, ’07. 140w.
“A rather ambitious volume which, on the whole, fairly reaches
its aim.”
+ R. of Rs. 36: 128. Jl. ’07. 100w.

Hueffer, Ford Madox. Hans Holbein the younger: a


critical monograph. *75c. Dutton.
6–1911.

Uniform with the “Popular library of art.” “A striking feature of


Mr. Hueffer’s text is his comparison of Holbein with Dürer. Both
stand between the Old World and the modern, between the old
faith and the new learning. With Dürer the old age ends; with
Holbein a new age begins.... Dürer stands for the great
imaginers who went before—the Minnesingers, the Tristan poets,
the great feudal upholders. As defining his country’s great place
in art, Holbein represented what Bach did in music—namely,
completeness and thoroughness in getting out of a preceding
epoch and in getting into our own.” (Outlook.)

“Is a model of what such a study should be.”


+ Dial. 41: 285. N. 1, ’06. 240w.
+
“Authoritatively informing, sufficiently critical and admirably
well written.”
+ Ind. 61: 818. O. 4, ’06. 50w.
+

N. Y. Times. 11: 329. My. 19, ’06. 240w.


“A worthy addition to that attractive series.”
+ Outlook. 83: 670. Jl. 21, ’06. 180w.
+
Hugo, Victor. Novels. 8v. ea. $1.25. Crowell.
Uniform with the thin paper sets. The eight volumes included
are Les Miserables, two volumes, Notre Dame, Ninety-three,
Toilers of the sea, Man who laughs, Hans of Iceland, and Bug
Jargal.

Hugo, Victor. Poems; ed. by Arthur Graves


Canfield. $1. Holt.
6–43525.

A student’s edition of Hugo’s poems in handy form, containing


an introduction, biographical summary and notes.

+ Nation. 84: 387. Ap. 25, ’07. 130w.

Hugo, Victor Marie, viscomte. Victor Hugo’s


intellectual autobiography; tr. with an introd. by
Lorenzo O’Rourke. **$1.20. Funk.
7–21356.

A translation of “what will hereafter be regarded as Victor


Hugo’s ultimate Confession of faith. The volume dates from the
period of the great romanticist’s exile in the English island of
Guernsey, to which he fled when Napoleon III. usurped the
throne of France. It is composed of a group of rhapsodies on
such themes as ‘Genius’, ‘Life and death’, ‘Reveries on God’, in
which the most versatile of nineteenth century men-of-letters
sets down his final convictions on art, on religion, and on life.”—
Ind.
“Of the sons of the nineteenth century, Victor Hugo, it seems
to us, was preëminent as a transmitter of the light.” B. O. Flower.
+ Arena. 38: 263. S. ’07. 9000w.
+
“An interesting and, on the whole, a well-written volume.”
+ Ath. 1907, 2: 238. Ag. 31. 600w.

“A graceful and scholarly translation.”
+ Ind. 62: 1469. Je. 20, ’07. 610w.
“A well-written and illuminating piece of work, being not only
critical but to some extent biographical.”
+ Lit. D. 35: 131. Jl. 27, ’07. 170w.
“The effect of the volume in its English form is of a wild
medley of jerky phrases.”
− Nation. 85: 124. Ag. 8, ’07. 540w.

N. Y. Times. 12: 383. Je. 15, ’07. 100w.


“Lorenzo O’Rourke, has contrived to throw into his rendering
some of the eloquence of the Titan—more than a suggestion of
his volcanic force and white hot rush of his burning words.”
+ N. Y. Times. 12: 414. Je. 29, ’07. 1050w.
“The whole book is but a last illustration of Hugo’s
incomparable gift of phrase-making, of his self-consciousness,
his egotism, his reliance upon a superb, but purely external,
literary gift, upon a craftmanship that apparently never was in
close communion with its possessor’s essential inner self, which,
instead, always looked abroad for stimulation to the intellectual,
social or political preoccupations of the hour.” A. Schade Van
Westrum.
+ No. Am. 185: 783. Ag. 2, ’07. 1470w.

R. of Rs. 36: 636. N. ’07. 90w.


“We cannot but feel however, that Mr. O’Rourke is not always
qualified for his task.”
− Spec. 99: 170. Ag. 3, ’07. 250w.

Hulbert, Archer Butler. Ohio river; a course of


empire. **$3.50. Putnam.
6–35979.

The sixth river to be treated in the series known as “Great


waterways of America.” “The illustrations which are numerous,
are from photographs, old prints, maps, and paintings, and are a
distinct contribution to the value of the book.... The age of the
canoe, the flatboat, and the steamer, as he names the divisions
of the Ohio’s history, are each treated fully and entertainingly, in
a fashion to vivify the heroes of each period, from La Salle,
Boone, and the Clarks, to St. Clair, ‘Mad Anthony’ Wayne, and
the rest of the Indian fighters who in their turn were supplanted
by the heterogeneous multitude of pioneers.” (Dial.)

“By far the most valuable portions of the book are those which
deal with the distinctly human side of the subject—the conditions
of pioneer existence with which the emigrant had to wrestle, the
life of flatboatman and trader, the reign of outlaw and rowdy, the
intermingling of racial elements, and particularly the jealous
contact of Yankee and Virginian on the north and south banks of
the river. So far as political history is concerned, the student will
find nothing new. The book is unfortunately subject to the
limitations and defects of a hasty and somewhat scrappy
narrative.” Frederic Austin Ogg.
+ Am. Hist. R. 12: 662. Ap. ’07. 790w.
+

“A useful survey, not scientific, but helpful in illustrating the
successive phases of social life on the river.”
+ A. L. A. Bkl. 3: 68. Mr. ’07.
“Mr. Hulbert brings to his work unusual qualifications, for he
unites a local interest and pride in the region of which he writes,
with a large perspective, and accuracy and perseverance in
research with picturesque and pungent style.”
+ Dial. 41: 395. D. 1, ’06. 320w.
+
“Fewer extracts and more concise treatment would make for
vividness, but the book, with its excellent illustrations, shows
careful research and gives a thoro knowledge of the region with
which it deals.”
+ Ind. 62: 100. Ja. 10, ’07. 220w.
+

“Comes near to being a model of what such a book ought to
be.”
+ Ind. 63: 1233. N. 21, ’07. 140w.
+
“Mr. Hulbert has made what we are inclined to think is a most
intrinsically important addition yet made to the Messrs. Putnam’s
series.”
+ Lit. D. 33: 727. N. 17, ’06. 140w.
+
“There is no chapter in this book which is not of historical
interest and value. But without depreciating its genuine worth, it
must be said that the treatment should have been more
systematic and complete.”
+ Nation. 84: 60. Ja. 17, ’07. 910w.
+

“On the whole the author has produced a volume of great
historic value and interest.”
+ N. Y. Times. 12: 12. Ja. 5, ’07. 2300w.
+

Hulbert, Archer Butler. Pilots of the republic.


*$1.50. McClurg.
6–41537.

Descriptive note in Annual, 1906.


Am. Hist. R. 12: 721. Ap. ’07. 50w.

A. L. A. Bkl. 3: 69. Mr. ’07. S.


“Narrated in a pleasant popular manner.”
+ Dial. 42: 147. Mr. 1, ’07. 260w.

+ Ind. 63: 457. Ag. 22, ’07. 270w.


“The book is a direct and forceful contribution to American
history, and is well printed, as its text merits.”
+ Outlook. 85: 526. Mr. 7, ’07. 200w.
+
“Mr. Hulbert’s style is attractive and in general, his
presentation of historical facts is good. One of the best chapters
of the book is that on Marcus Whitman, the hero of Oregon.”
+ R. of Rs. 35: 112. Ja. ’07. 250w.

Hulbert, Homer Beza. Passing of Korea. **$3.80.


Doubleday.
6–32372.

Descriptive note in Annual, 1906.


“Exhaustive, authoritative, and readable.”
+ A. L. A. Bkl. 3: 10. Ja. ’07.
“The author has long resided in the country, and is conversant
with its language and literature. He is, we believe, the first writer
on Korea who possesses the latter indispensable qualification.”
+ Ath. 1906, 2: 765. D. 15. 1720w.
+
“Certain fundamental changes which are coming about as
results of the late war in the far east are described with insight
and vigor.” Frederic Austin Ogg.
+ Dial. 43: 85. Ag. 16, ’07. 900w.
“One of the best books on Korea that has yet been written.”
+ Sat. R. 103: 114. Ja. 26. ’07. 1440w.
+
“In so far as it is a picture of the social life of a backward
people, it is intensely interesting; but Mr. Hulbert is bitter when
he ventures on politics, so much so that one feels that he should
have named his book ‘The betrayal of Korea.’ He has nothing
good to say of the Japanese. Mr. Hulbert knows Korea and
Koreans thoroughly, and writes of both authoritatively and
attractively.”
+ Spec. 98: sup. 646. Ap. 27, ’07. 620w.

Huling, Caroline A. Letters of a business woman to


her niece. *$1. Fenno.
7–508.

In a series of personal letters to a young woman there is a


vast deal of sound sense which forms a general and impersonal
contribution to conduct. The writer is a woman of keen
observation and ready sympathies who has solved her problems
of business life in a great city thru experience, and from her fund
of acquired wisdom, talks freely to her niece. Matters of conduct,
morals and dress are taught with matter-of-fact allegiance to
independence and dignity.

“The advice is sensible, if trite.”


+ Ind. 62: 742. Mr. 28, ’07. 80w.

Reviewed by Hildegarde Hawthorne.
N. Y. Times. 12: 41. Ja. 26, ’07. 1220w.

Hull, Walter Henry, ed. Practical problems in


banking and currency; being a number of selected
addresses delivered in recent years by prominent
bankers, financiers, and economists. **$3.50.
Macmillan.
7–17036.

The sixty addresses included in this volume cover the period


since 1900 and deal authoritatively with practical problems as
they affect actual conditions. The papers are grouped in three
sections; General banking, Banking reform and currency, and
The trust company, and they discuss these subjects in three
various subdivisions and from various points of view. The volume
is intended as a reference book in connection with studies in
banking and currency.

“The collection will be found useful to students of our


monetary situation even though few of these papers have any
such value as would make them worthy of a permanent place in
the literature of money.” L.
+ J. Pol. Econ. 15: 494. O. ’07. 390w.
N. Y. Times. 12: 296. My. 11, ’07. 60w.

N. Y. Times. 12: 300. My. 11, ’07. 560w.


“It brings together a mass of valuable information not usually
dealt with—or, at any rate, not dealt with in detail—in the
standard textbook.”
+ Outlook. 86: 972. Ag. 31, ’07. 480w.
+

+ Pol. Sci. Q. 22: 560. S. ’07. 150w.


“The present volume is a valuable addition to our knowledge
and understanding of the theory of credit, and when this is said
no fuller acknowledgment of is importance can be made.”
+ Spec. 99: sup. 642. N. 2, ’07. 310w.

Hume, Martin Andrew Sharp. Through Portugal.


**$2. McClure.
7–25498.

The author says that this volume is a self-prescribed penance


for his former injustice toward the most beautiful country and
the most unspoiled and courteous peasantry in Southern Europe.
So he makes amends for hitherto rating the Portuguese as a
Spaniard without any good qualities. His greatest interest centers
in such places as “Bussaco, Thomar and Leiria, of which he gives
a vivid series of impressions, picturesque, alert, and eminently
good-humoured.” (Ath.)

“His vivid description of the scenery and the people, and his
observations on art, history and archaeology make up a book of
more than usual interest and charm.”
+ A. L. A. Bkl. 3: 167. O. ’07. S.
“The easy, flowing style of the book takes one from one scene
to another without effort, and the vivid descriptions enable the
reader to ‘see without traveling.’”
+ Ann. Am. Acad. 30: 594. N. ’07. 140w.
“The book is charmingly illustrated, and abounds in engaging,
sincere enthusiasm.”
+ Ath. 1907, 1: 350. Mr. 23. 190w.
“Whatever Mr. Hume describes in and about Oporto, Bussaco,
Coimbra, Alcoboca, Cintra, Lisbon, or places of lesser note, is
done with a well-considered and creditable enthusiasm, and in
an unusually graceful style.”
+ Dial. 42: 373. Je. 16, ’07. 200w.
“It ought to be a revelation to those who know Portugal only
from a guide book, or who think of it only as an unimportant
strip of seashore to be neglected for royal Spain.”
+ Nation. 85: 236. S. 12, ’07. 490w.
“The fault we have to find with the clever sketches in colour is
that they are somewhat faint in tint and rather too much en
vignette.”
+ Sat. R. 103: 434. Ap. 6, ’07. 190w.

Hunt, Thomas Forsyth. How to choose a farm.


**$1.75. Macmillan.
6–26525.

“The chief elements considered are: First, character and


topography of the soil; second, climatic conditions, including
healthfulness and water supply; third, location; fourth,
improvements. A complete and somewhat technical classification
of the soils of the United States is given, along with the crops
best adapted to them.... The subject is treated from an
economic point of view, abundant statistical data being given in
support of statements.”—Ann. Am. Acad.

“The book suffers through an attempt to cover too wide a


field. The style is ordinary. Though at times involved, it is
generally lucid. The subject is treated practically and
dispassionately. The book is valuable to persons considering the
possibility of owning or living on a farm.”
+ Ann. Am. Acad. 29: 216. Ja. ’67. 310w.

“A remarkable volume for the amount of information that has
been compressed without loss of enthusiasm and dryness of
style.”
+ Nation. 83: 467. N. 29, ’06. 270w.

Hunt, Rev. William, ed. Irish parliament, 1775;


from an official and contemporary manuscript.
*$1.20. Longmans.
7–26445.

An interesting addition to the literature of Parliament. It is a


reprint of a manuscript, supposedly a confidential document,
prepared probably with the object of guiding the Irish
government in its course of bribing the Parliament. Dr. Hunt has
prefixed an introduction describing the regime of the time.

“The volume adds less than might be expected from a


document introduced by Dr. Hunt.” C. Litton Falkiner.
+ Eng. Hist. R. 22: 811. O. ’07. 770w.
“As a collection of character-portraits by a keen, if prejudiced
critic, the black list of Sir John Blaquiere presents a very curious
study.”
+ Lond. Times. 6: 116. Ap. 12, ’07. 1950w.
“Had the manuscripts been put forward quite alone it would
have told its own sordid story, and more graphically than any
monograph on the Irish parliament that now exists it would have
exemplified the character of the institution that disappeared at
the Union of 1800.”
+ Nation. 35: 78. Jl. 25. ’07. 1600w.
“The book adds nothing of the substance to what is already
known of the state of politics or of political morality in the period
immediately preceding Grattan’s. Though Mr. Hunt’s essay
exhibits the acumen and judgment which are characteristic of all
his work, it supplies nothing of importance which cannot be as
readily found in familiar authorities.”
+ Sat. R. 104: 368. S. 21, ’07. 660w.

Spec. 98: 544. Ap. 6, ’07. 100w.

Hunt, Rev. William, and Poole, Reginald Lane,


eds. Political history of England. 12v. ea. *$2.60.
Longmans.
Descriptive note in December, 1905.
“We must confess that Mr. Fisher’s portrait of Henry VII. is not
satisfactory.”
+ Acad. 72: 159. F. 16, ’07. 1310w. (Review of v. 5.)

“We leave his book convinced of its very great historical, and
we might add literary value.”
+ Acad. 72: 247. Mr. 9, ’07. 2270w. (Review of v. 4.)
+
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookgate.com

You might also like