Safe Haskell | None |
---|---|
Language | Haskell2010 |
Agda.Utils.VarSet
Description
Manage sets of natural numbers (de Bruijn indices).
This file contains an implementation of finite sets of integers
that is optimized for storing sets of free variables. Notably,
de Bruijn indices/levels are not uniformly distributed across the
range of Int
, and are always very small (EG: less than 64).
This makes IntSet
somewhat ill-suited for the task.
IntSet
is designed to be able to handle values distributed
across the entire range of Int
, at the cost of pointer indirections
and poor memory locality.
Instead, VarSet
stores free variables as a bitvector. When the
elements in the set are all smaller than 64, we can fit the entire set
into a single unboxed Word#
, which makes most of our set operations
a single instruction.
When we exceed this bound, we fall back to an unboxed ByteArray#
.
Experimentally, this happens less than 0.0001% of the time.
Experimental Results
Agda builds instrumented with traceVarSetSize
give the following
statistics for VarSet
sizes across the following libraries:
library | commit hash | varsets allocated | non-word sized sets |
---|---|---|---|
1lab | 11f4672ca7dcfdccb1b4ce94ca4ca42c9b732e62 | 118198551 | 0 |
agda-stdlib | 0e97e2ed1f999ea0d2cce2a3b2395d5a9a7bd36a | 38960049 | 2178 |
cubical | 1f2af52701945bef003a525f78fa41aeadb7c6ae | 82751805 | 0 |
Synopsis
- data VarSet
- = VB# ByteArray#
- | VS# Word#
- empty :: VarSet
- singleton :: Int -> VarSet
- fromList :: [Int] -> VarSet
- fromDescList :: [Int] -> VarSet
- full :: Int -> VarSet
- range :: Int -> Int -> VarSet
- insert :: Int -> VarSet -> VarSet
- inserts :: [Int] -> VarSet -> VarSet
- insertsDesc :: [Int] -> VarSet -> VarSet
- delete :: Int -> VarSet -> VarSet
- null :: VarSet -> Bool
- member :: Int -> VarSet -> Bool
- lookupMin :: VarSet -> Maybe Int
- lookupMax :: VarSet -> Maybe Int
- size :: VarSet -> Int
- disjoint :: VarSet -> VarSet -> Bool
- isSubsetOf :: VarSet -> VarSet -> Bool
- inRange :: Int -> Int -> VarSet -> Bool
- union :: VarSet -> VarSet -> VarSet
- unions :: Foldable f => f VarSet -> VarSet
- intersection :: VarSet -> VarSet -> VarSet
- difference :: VarSet -> VarSet -> VarSet
- (\\) :: VarSet -> VarSet -> VarSet
- complement :: Int -> VarSet -> VarSet
- split :: Int -> VarSet -> (VarSet, VarSet)
- filterLT :: Int -> VarSet -> VarSet
- filterLE :: Int -> VarSet -> VarSet
- filterGT :: Int -> VarSet -> VarSet
- filterGE :: Int -> VarSet -> VarSet
- foldr :: (Int -> a -> a) -> a -> VarSet -> a
- foldl :: (a -> Int -> a) -> a -> VarSet -> a
- foldr' :: (Int -> a -> a) -> a -> VarSet -> a
- foldl' :: (Int -> a -> a) -> a -> VarSet -> a
- minView :: VarSet -> Maybe (Int, VarSet)
- maxView :: VarSet -> Maybe (Int, VarSet)
- strengthen :: Int -> VarSet -> VarSet
- weaken :: Int -> VarSet -> VarSet
- toDescList :: VarSet -> [Int]
- toAscList :: VarSet -> [Int]
VarSet type
Invariants
A variable set whose maximum entry is less than than 64 is always encoded
via the VS#
constructor.
The final word of a VB#
is always non-zero.
Performance
GHC will use pointer-tagging for datatypes with less than 4 constructors
on 32 bit machines (8 on 64 bit), which means that VarSet
will be laid
out as a tagged pointer to a word-sized piece of data.
We also pay attention to the constructor ordering here.
GHC will lay out the machine code for case statements in
reverse order of the datatypes constructors. Both ARM and x86 statically
predict that forward branches are not taken, so putting VB#
before
VS#
means that we can eek out a few extra drops of performance.
Finally, many functions in this module follow the following pattern:
@
union :: VarSet -> VarSet -> VarSet
union (VS# w1) (VS# w2) = VS# (w1 or#
w2)
union vs1 vs2 = unionSlow vs1 vs2
{-# INLINE union #-}
unionSlow :: VarSet -> VarSet -> VarSet
unionSlow (VS# w1) (VS# w2) = VS# (w1 or#
w2)
unionSlow (VS# w1) (VB# bs2) = VB# (bigNatOrWord# bs2 w1)
unionSlow (VB# bs1) (VS# w2) = VB# (bigNatOrWord# bs1 w2)
unionSlow (VB# bs1) (VB# bs2) = VB# (bigNatOr bs1 bs2)
{-# NOINLINE unionSlow #-}
@
This makes the 0.0001% of slow cases notably worse, but makes the
happy path approximately 3x faster for things like inRange
, as
GHC is able to coalesce all of the happy paths.
A set of de Bruijn indices/levels.
Constructors
VB# ByteArray# | A variable set whose maximum entry is greater than or equal to 64. |
VS# Word# | A small variable set whose maximum entry is less than than 64. |
Instances
EmbPrj VarSet Source # | |
Null VarSet Source # | |
Binary VarSet Source # | |
NFData VarSet Source # | |
Defined in Agda.Utils.VarSet | |
Monoid VarSet Source # |
|
Semigroup VarSet Source # |
|
Show VarSet Source # | |
Eq VarSet Source # | |
Ord VarSet Source # | |
Hashable VarSet Source # | |
Defined in Agda.Utils.VarSet | |
IsVarSet () VarSet Source # | |
Defined in Agda.TypeChecking.Free | |
Singleton Int VarSet Source # | |
Construction
fromDescList :: [Int] -> VarSet Source #
Construct a set of variables from a list of indices sorted in descending order.
This can be more efficient than fromList
, as we only need to perform a single
to determine that elements of the list are smaller than 64.
The precondition that the variables are sorted in descending order is
not checked by fromDescList
.
range :: Int -> Int -> VarSet Source #
Construct a variable set that contains the indices [lo+1..hi-1]
.
Insertion
insertsDesc :: [Int] -> VarSet -> VarSet Source #
Insert a list of variables sorted in descending order into a VarSet
.
This can be more efficient than inserts
, as we only need to perform a single
to determine that elements of the list are smaller than 64.
The precondition that the variables are sorted in descending order is
not checked by insertsDesc
.
Deletion
Queries
isSubsetOf :: VarSet -> VarSet -> Bool Source #
isSubsetOf vs1 vs2
determines if vs1
is a subset of vs2
.
inRange :: Int -> Int -> VarSet -> Bool Source #
inRange lo hi vs
determines if all the variables in vs
are within the range [lo+1 .. hi-1]
.
Combining variable sets
Filters
filterLT :: Int -> VarSet -> VarSet Source #
Filter a variable set to elements that are less than to an index.
filterLE :: Int -> VarSet -> VarSet Source #
Filter a variable set to elements that are less than or equal to an index.
filterGT :: Int -> VarSet -> VarSet Source #
Filter a variable set to elements that are greater than to an index.
filterGE :: Int -> VarSet -> VarSet Source #
Filter a variable set to elements that are greater than or equal to an index.
Folds
Like most unordered containers, foldr
and foldl
are somewhat ambigious
names for folds over a VarSet
. For the sake of consistency, we adopt the
same convention as IntSet
, where
VarSet.foldr f x = foldr f x . VarSet.toAscList
VarSet.foldl f x = foldl f x . VarSet.toAscList
Unfortunately, this convention the opposite of how our sets are laid out in memory.
foldr :: (Int -> a -> a) -> a -> VarSet -> a Source #
Lazily fold over the elements of a variable set in ascending order.
foldl' :: (Int -> a -> a) -> a -> VarSet -> a Source #
Strictly fold over the elements of a variable set in descending order.
Views
minView :: VarSet -> Maybe (Int, VarSet) Source #
Retrieve the smallest index in the variable set along with the set sans that
element, or Nothing
if the set was empty.
maxView :: VarSet -> Maybe (Int, VarSet) Source #
Retrieve the smallest index in the variable set along with the set sans that
element, or Nothing
if the set was empty.
Contextual operations
strengthen :: Int -> VarSet -> VarSet Source #
Remove the first n
entries from the variable set,
and shift the indices of all other entries down by n
.
Conversions
toDescList :: VarSet -> [Int] Source #
Convert a variable set to a descending list of indices.