Skip to content

concat can be faster #550

Closed
Closed
@meooow25

Description

@meooow25

I recently wanted a reverseConcat, which led me to the current implementation of concat and the realization that it is not very efficient:

text/src/Data/Text.hs

Lines 1033 to 1046 in e84c7a3

-- | /O(n)/ Concatenate a list of 'Text's.
concat :: [Text] -> Text
concat ts = case ts' of
[] -> empty
[t] -> t
_ -> Text (A.run go) 0 len
where
ts' = L.filter (not . null) ts
len = sumP "concat" $ L.map lengthWord8 ts'
go :: ST s (A.MArray s)
go = do
arr <- A.new len
let step i (Text a o l) = A.copyI l arr i a o >> return (i + l)
foldM step 0 ts' >> return arr

  • Allocates a new list of Texts (lazily, but it still has a cost) because of the filter
  • Allocates a list of boxed int lengths to sum them, since sumP does not fuse

Here is an alternate straightforward implementation:

concat :: [T.Text] -> T.Text
concat ts0 = case ts0 of
  [] -> T.empty
  [t] -> t
  _ | len == 0 -> T.empty
    | otherwise -> T.Text arr 0 len
  where
    flen acc (T.Text _ _ l)
      | acc' > 0 || l == 0 = acc'
      | otherwise = concatOverflowError
      where
        acc' = acc + l
    len = foldl' flen 0 ts0
    arr = A.run $ do
      marr <- A.new len
      let loop !i [] = pure marr
          loop i (T.Text a o l : ts) = A.copyI l marr i a o *> loop (i+l) ts
      loop 0 ts0

This not exactly the same, since the current implementation performs a case match on the list after filtering out null Texts. But I doubt such a preemptive step helps.

Benchmarks on GHC 9.6.3 with -O, concating a list of all Chars:

    concat:     OK
      54.8 ms ± 3.1 ms, 174 MB allocated,  52 MB copied, 325 MB peak memory
    alt concat: OK
      13.1 ms ± 901 μs, 4.2 MB allocated, 773 B  copied, 325 MB peak memory
    ==:         OK
Benchmark file
{-# LANGUAGE BangPatterns #-}
import Prelude hiding (concat)
import Data.Foldable (foldl')
import Test.Tasty.Bench
import Test.Tasty.HUnit

import qualified Data.Text as T
import qualified Data.Text.Internal as T
import qualified Data.Text.Array as A

main :: IO ()
main = defaultMain
  [ env (pure xs_) $ \xs -> bgroup ""
    [ bench "concat" $ whnf T.concat xs
    , bench "alt concat" $ whnf concat xs
    , testCase "==" $ concat xs @?= T.concat xs
    ]
  ]
  where
    xs_ = map T.singleton [minBound .. maxBound]

concat :: [T.Text] -> T.Text
concat ts0 = case ts0 of
  [] -> T.empty
  [t] -> t
  _ | len == 0 -> T.empty
    | otherwise -> T.Text arr 0 len
  where
    flen acc (T.Text _ _ l)
      | acc' > 0 || l == 0 = acc'
      | otherwise = concatOverflowError
      where
        acc' = acc + l
    len = foldl' flen 0 ts0
    arr = A.run $ do
      marr <- A.new len
      let loop !i [] = pure marr
          loop i (T.Text a o l : ts) = A.copyI l marr i a o *> loop (i+l) ts
      loop 0 ts0

concatOverflowError :: a
concatOverflowError = error "Data.Text.concat: size overflow"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions