Clojure Guides_ Strings
Clojure Guides_ Strings
This cookbook covers working with strings in Clojure using built-in functions, standard and contrib
libraries, and parts of the JDK via interoperability.
This work is licensed under a Creative Commons Attribution 3.0 Unported License
(https://creativecommons.org/licenses/by/3.0/) (including images & stylesheets). The source is available
on Github (https://github.com/clojure-doc/clojure-doc.github.io).
Overview
Strings are plain Java strings
(https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/String.html). You can use
anything which operates on them.
Java strings are immutable, so they're convenient to use in Clojure.
You can't add metadata to Java strings.
Clojure supports some convenient notations:
"foo" java.lang.String
#"\d" java.util.regex.Pattern (in this case, one which matches a single di
\f java.lang.Character (in this case, the letter 'f')
Caveat: Human brains and electronic computers are rather different devices. So Java strings
(sequences of UTF-16 characters
(https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/lang/Character.html#unicode))
don't always map nicely to user-perceived characters. For example, a single Unicode "code point"
doesn't necessarily equal a user-perceived character. (Like Korean Hangul Jamo, where user-
perceived characters are composed from two or three Unicode code points.) Also, a Unicode code
point may sometimes require 2 UTF-16 characters to encode it.
Preliminaries
Some examples use clojure.string (https://clojure.github.io/clojure/clojure.string-api.html), clojure.edn
(https://github.com/edn-format/edn) and clojure.pprint (https://clojure.github.io/clojure/clojure.pprint-
api.html). We'll assume your ns macro contains:
;; Concatenate
(str "foo" "bar") ;=> "foobar"
(str/join ["0" "1" "2"]) ;=> "012"
(str/join "." ["0" "1" "2"]) ;=> "0.1.2"
As of Clojure 1.8, clojure.string has functions for both of those but they return nil for no match:
(str/index-of "foo" "oo") ;=> 1
(str/index-of "foo" "x") ;=> nil
(str/last-index-of "foo" \o) ;=> 2 - can find string or character, not int
(str/last-index-of "foo" (int \o))
;; Execution error: java.lang.Integer cannot be cast to java.lang.String
;; Substring
(subs "0123" 1) ;=> "123"
(subs "0123" 1 3) ;=> "12"
(str/trim " foo ") ;=> "foo"
(str/triml " foo ") ;=> "foo "
(str/trimr " foo ") ;=> " foo"
;; Multiple substrings
(seq "foo") ;=> (\f \o \o)
(str/split "foo/bar/quux" #"/") ;=> ["foo" "bar" "quux"]
(str/split "foo/bar/quux" #"/" 2) ;=> ["foo" "bar/quux"]
(str/split-lines "foo
bar") ;=> ["foo" "bar"]
;; Case
(str/lower-case "fOo") ;=> "foo"
(str/upper-case "fOo") ;=> "FOO"
(str/capitalize "fOo") ;=> "Foo"
;; Escaping
(str/escape "foo|bar|quux" {\| "||"}) ;=> "foo||bar||quux"
;; Making keywords
(keyword "foo") ;=> :foo
;; Parsing numbers
(bigint "20000000000000000000000000000") ;=> 20000000000000000000000000000N
(bigdec "20000000000000000000.00000000") ;=> 20000000000000000000.00000000M
(Integer/parseInt "2") ;=> 2 - java.lang.Integer
(Float/parseFloat "2") ;=> 2.0 - java.lang.Float
(Long/parseLong "2") ;=> 2 - java.lang.Long
(Double/parseDouble "2") ;=> 2.0 - java.lang.Double
As of Clojure 1.11, clojure.core has parsing functions for numbers, Booleans, and UUIDs:
(parse-long "2") ;=> 2 - java.lang.Long
(parse-double "2") ;=> 2.0 - java.lang.Double
Regex reference.
(https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/regex/Pattern.html)
Groups: Regex groups are useful, when we want to match more than one substring. (Or refer to matches
later.) In the regex #"(group-1) (group-2)" , the 0th group is the whole match. The 1st group is started
by the left-most ( , the 2nd group is started by the second-left-most ( , etc. You can even nest groups.
You can refer to groups later using $0 , $1 , etc.
Matching
;; Simple matching
(re-find #"\d+" "foo 123 bar") ;=> "123"
Replacing
We use str/replace . Aside from the first arg (the initial string), the next two args are match and
replacement:
match / replacement can be:
string / string
char / char
pattern / (string or function of match).
Context-free grammars
Context-free grammars offer yet another boost in expressive matching power, compared to regexes. You
can express ideas like nesting.
(def barely-tested-json-parser
(insta/parser
"object = <'{'> <w*> (members <w*>)* <'}'>
<members> = pair (<w*> <','> <w*> members)*
<pair> = string <w*> <':'> <w*> value
<value> = string | number | object | array | 'true' | 'false' | 'null'
array = <'['> elements* <']'>
<elements> = value <w*> (<','> <w*> elements)*
number = int frac? exp?
<int> = '-'? digits
<frac> = '.' digits
<exp> = e digits
<e> = ('e' | 'E') (<'+'> | '-')?
<digits> = #'[0-9]+'
(* First sketched state machine; then it was easier to figure out
regex syntax and all the maddening escape-backslashes. *)
string = <'\\\"'> #'([^\"\\\\]|\\\\.)*' <'\\\"'>
<w> = #'\\s+'"))
Format strings
Java's templating mini-language helps you build many strings conveniently. Reference.
(https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Formatter.html)
;; Again, 1$, 2$, etc prefixes let us refer to args in arbitrary orders.
(format "New year: %2$tY. Old year: %1$tY"
#inst"2000-01-02T00:00:00"
#inst"3111-12-31T00:00:00")
;=> "New year: 3111. Old year: 2000"
CL-Format
cl-format is a port of Common Lisp's notorious, powerful string formatting mini-language. For example,
you can build strings from sequences. (As well as oddities like print numbers in English or two varieties of
Roman numerals.) However, it's weaker than plain format with printing dates and referring to args in
arbitrary order.
Remember that cl-format represents a (potentially unreadable) language which your audience didn't
sign up to learn. If you're the sort of person who likes it, try to only use it in sweetspots where it provides
clarity for little complexity.
Contributors
Tj Gabbour [email protected] (mailto:[email protected]), 2013 (original author)
Links
About (/articles/about/)
Table of Contents (/articles/content/)
Getting Started (/articles/tutorials/getting_started/)
Introduction to Clojure (/articles/tutorials/introduction/)
Clojure Editors (/articles/tutorials/editors/)
Clojure Community (/articles/ecosystem/community/)
Basic Web Development (/articles/tutorials/basic_web_development/)
Language: Functions (/articles/language/functions/)
Language: clojure.core (/articles/language/core_overview/)
Language: Collections and Sequences (/articles/language/collections_and_sequences/)
Language: Namespaces (/articles/language/namespaces/)
Language: Java Interop (/articles/language/interop/)
Language: Polymorphism (/articles/language/polymorphism/)
Language: Concurrency and Parallelism (/articles/language/concurrency_and_parallelism/)
Language: Macros (/articles/language/macros/)
Language: Laziness (/articles/language/laziness/)
Language: Glossary (/articles/language/glossary/)
Ecosystem: Library Development and Distribution (/articles/ecosystem/libraries_authoring/)
Ecosystem: Web Development (/articles/ecosystem/web_development/)
Ecosystem: Generating Documentation (/articles/ecosystem/generating_documentation/)
Building Projects: tools.build and the Clojure CLI (/articles/cookbooks/cli_build_projects/)
Data Structures (/articles/cookbooks/data_structures/)
Strings
Mathematics with Clojure (/articles/cookbooks/math/)
Date and Time (/articles/cookbooks/date_and_time/)
Working with Files and Directories in Clojure (/articles/cookbooks/files_and_directories/)
Middleware in Clojure (/articles/cookbooks/middleware/)
Parsing XML in Clojure (/articles/cookbooks/parsing_xml_with_zippers/)
Growing a DSL with Clojure (/articles/cookbooks/growing_a_dsl_with_clojure/)