当前位置 : 主页 > 网络安全 > 测试自动化 >

在Haskell中读取字符串到Int的性能(Bytestring vs [Char])

来源:互联网 收集:自由互联 发布时间:2021-06-22
只是对Bytestring和String做一些简单的基准测试.代码加载10,000,000行的文件,每行一个整数;然后将每个字符串转换为整数.原来Prelude.read比ByteString.readInt慢得多. 我想知道效率低下的原因是什
只是对Bytestring和String做一些简单的基准测试.代码加载10,000,000行的文件,每行一个整数;然后将每个字符串转换为整数.原来Prelude.read比ByteString.readInt慢得多.

我想知道效率低下的原因是什么.同时,我也不确定性能分析报告的哪一部分对应于加载文件的时间成本(数据文件大约为75 MB).

这是测试的代码:

import System.Environment
import System.IO
import qualified Data.ByteString.Lazy.Char8 as LC

main :: IO ()
main = do
  xs <- getArgs
  let file = xs !! 0

  inputIo <- readFile file
  let iIo = map readInt  . linesStr $inputIo
  let sIo = sum iIo

  inputIoBs <- LC.readFile file
  let iIoBs = map readIntBs  . linesBs $inputIoBs
  let sIoBs = sum iIoBs

  print [sIo, sIoBs]

linesStr = lines

linesBs  = LC.lines


readInt :: String -> Int
readInt x = read x :: Int

readIntBs :: LC.ByteString -> Int
readIntBs bs = case LC.readInt bs of
                Nothing -> error "Not an integer"
                Just (x, _) -> x

代码编译和执行如下:

> ghc -o strO2 -O2  --make Str.hs -prof -auto-all -caf-all -rtsopts
> ./strO2  a.dat +RTS -K500M -p

注意“a.dat”是上述格式,大约75MB.分析结果是:

strO2 +RTS -K500M -p -RTS a.dat

    total time  =      116.41 secs   (116411 ticks @ 1000 us, 1 processor)
    total alloc = 117,350,372,624 bytes  (excludes profiling overheads)

COST CENTRE MODULE  %time %alloc

readInt     Main     86.9   74.6
main.iIo    Main      8.7    9.5
main        Main      2.9   13.5
main.iIoBs  Main      0.6    1.9


                                                        individual     inherited
COST CENTRE   MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN          MAIN                     54           0    0.0    0.0   100.0  100.0
 main         Main                    109           0    2.9   13.5   100.0  100.0
  main.iIoBs  Main                    116           1    0.6    1.9     1.3    2.4
   readIntBs  Main                    118    10000000    0.7    0.5     0.7    0.5
  main.sIoBs  Main                    115           1    0.0    0.0     0.0    0.0
  main.sIo    Main                    113           1    0.2    0.0     0.2    0.0
  main.iIo    Main                    111           1    8.7    9.5    95.6   84.1
   readInt    Main                    114    10000000   86.9   74.6    86.9   74.6
  main.file   Main                    110           1    0.0    0.0     0.0    0.0
 CAF:main1    Main                    106           0    0.0    0.0     0.0    0.0
  main        Main                    108           1    0.0    0.0     0.0    0.0
 CAF:linesBs  Main                    105           0    0.0    0.0     0.0    0.0
  linesBs     Main                    117           1    0.0    0.0     0.0    0.0
 CAF:linesStr Main                    104           0    0.0    0.0     0.0    0.0
  linesStr    Main                    112           1    0.0    0.0     0.0    0.0
 CAF          GHC.Conc.Signal         100           0    0.0    0.0     0.0    0.0
 CAF          GHC.IO.Encoding          93           0    0.0    0.0     0.0    0.0
 CAF          GHC.IO.Encoding.Iconv    91           0    0.0    0.0     0.0    0.0
 CAF          GHC.IO.FD                86           0    0.0    0.0     0.0    0.0
 CAF          GHC.IO.Handle.FD         84           0    0.0    0.0     0.0    0.0
 CAF          Text.Read.Lex            70           0    0.0    0.0     0.0    0.0

编辑:

输入文件“a.dat”是10,000,000行数:

1
2
3
...
10000000

在讨论之后,我将“a.dat”替换为10,000,000行1,这不会影响上述性能观察:

1
1
...
1
read比readInt做得更难.例如,比较:

> map read ["(100)", " 100", "- 100"] :: [Int]
[100,100,-100]
> map readInt ["(100)", " 100", "- 100"]
[Nothing,Nothing,Nothing]

read本质上是解析Haskell.再加上它消耗链表,这一点并不奇怪,确实非常慢.

网友评论