我们的想法是尝试从两个映射列表中的完整日期列表中查找每个日期,并创建一个“三元组”列表(日期,值1,值2),其中仅包含两个数据集都有日期和值的日期.然后我可以将它们写入文件并正确地比较它们.
请不要仔细阅读代码,仅包括良好的测量方法
这是代码(它根本不是最优的,但对于那个小任务,它很好地完成了它的工作):
import qualified Data.Map as M
import Data.List (transpose)
import Data.Maybe (fromJust)
main = do
dts <- readFile "dates.txt"
cts1 <- readFile "eu.txt"
cts2 <- readFile "usa.txt"
let
dates = lines dts
cols1 = transpose $map words $lines cts1
cols2 = transpose $map words $lines cts2
prs1 = zip (head cols1) (last cols1)
prs2 = zip (head cols2) (last cols2)
map1 = M.fromList prs1
map2 = M.fromList prs2
trips = map fromJust (filter (/=Nothing) (map (\date -> getTrips date map1 map2) dates))
cols3 = map (\(a,b,c) -> [a,b,c]) trips
result = unlines $map unwords $cols3
writeFile "trips.txt" result
getTrips :: String -> M.Map String String -> M.Map String String -> Maybe (String, String, String)
getTrips date map1 map2
| is1 /= Nothing && is2 /= Nothing = Just (date, fromJust is1, fromJust is2)
| otherwise = Nothing
where
is1 = M.lookup date map1
is2 = M.lookup date map2
TL; DR:代码有效(虽然我很乐意听到一些意见/建议),但我有一些问题:
>只有大约2000个日期,因此我不太关心性能(你可以看到我在各处使用Strings);是在使用Data.Map一个矫枉过正的呢?什么时候Data.Map应该优先于元组列表?
> Map是从字符串元组创建的 – 它是否正常或者密钥是否始终为数字,以便平衡和查找正常工作?
there were only around 2000 dates, therefore I didn’t care much about
performance (you can see that I was using Strings everywhere); was
using Data.Map an overkill then? When should Data.Map be preferred
over lists of tuples?
您应该使用适合您的问题和性能/编程时间约束的数据结构,因此使用Map可能是一个好主意.也许在您的情况下,如果您的数据已经订购,您可以做到
union [] _ = []
union _ [] = []
union xss@((dx,vx):xs) yss@((dy,vy):ys) =
case compare dx dy of
EQ -> (dx, vx, vy) : union xs ys
GT -> union xss ys
LT -> union xs yss
the Map was created from tuples of Strings – is it fine or should the
key always be numeric in order for the balancing and lookups to work
properly?
不,如果你的代码类型检查你的Map将正常工作(w / r / t你定义Ord实例的方式).但正如CA McCann建议的那样,如果您的密钥是列表,则trie可能更合适,特别是如果密钥前缀之间存在很多重叠(查看列表上的Ord实例如何实现,并想象必须进行的操作数量)将“abcdx”,“abcdy”和“abcdz”键插入地图与trie结构中以说服自己).
