Brian Hamrick's Bloghttps://www.brianhamrick.com/blogThoughts about whatever comes to mind.http://www.rssboard.org/rss-specificationThoughts on Diversityhttps://www.brianhamrick.com/blog/diversityThu, 20 Apr 2017 23:59:59 UTC<p>Diversity is a bit of a politicized topic. For most of my blog posts
so far, I've intentionally been avoiding topics like this, as discussions can
often be unproductive. But that avoidance comes with the downside that I don't
get the opportunity to talk about my views and allow them to evolve.</p><p>When people talk about diversity, there's a lot of components to it that are
very different. In general, the goal of diversity initiatives is to create a society with
minimal luck of birth. In the current world, there are a lot of ways in which your
birth can affect your whole life. These include things like overtly racist or sexist
policies, but also things like your family's economic situation and even the quality
of your local school system.</p><p>Even within the goal of minimal luck of birth, there are different interpretations
that often come into conflict. Should we work to minimize luck of birth for people
right now or for people in 20 years, or even 100 years? Or is any sort of timeframe
like this wrong and we should envision the ideal end state and work to minimize the
time it takes to get there? It's not necessarily obvious that these ideas conflict,
and it's difficult to give a straightforward example, but when discussing diversity
it's important for every participant to understand where they stand on that tradeoff.</p><h3>Affirmative Action</h3><p>As a relatively young person, affirmative action is probably the diversity initiative
that I'm most familiar with. It refers to the practice of preferring applicants from
historically disadvantaged groups, usually in the context of college admissions.</p><p>First of all, let's be clear that any sort of preference does mean that the standards
for qualification will be lower for the preferred group. As the applicant pool gets
larger you'll have more flexibility in who you accept, so the difference between
the standards can become smaller, but in practice the applicant pool for the disadvantaged
group will be very small and as a result there are a lot of schools that have
significantly different standards for applicants of different races.</p><p>I regard affirmative action as a bad idea. Especially for colleges, I believe that
affirmative action is a negative for three groups of people, which add up to a huge
section of students:</p><ul><li>The people who get rejected in exchange for the preferred applicants. This group
should be pretty obvious. They get less choice in schools, and they may end up
going to a lower quality one.</li><li>The people of the disadvantaged group who would have been accepted regardless.
In general, affirmative action creates the question "Did this person get accepted
because of their credentials or because they are a ?"
Even when the answer is the credentials, the fact that the question exists can
be detrimental, especially since it takes a long time to learn the answer.</li><li>The accepted students that aren't part of the disadvantaged group. In my eyes,
the biggest draw to a particular school is the other students that you interact
with. Lowering the average quality of the students lowers the average quality
of these interactions, and as a result lowers the quality of the education being
offered.</li></ul><p>In exchange for these negatives, affirmative action creates a world in which some
members of a disadvantaged group will have access to a (presumably) better college
education. The positive effect for a single student in this group is certainly much
larger than the negative effects for a student in the other groups, but the negatives
impact a much larger group of students. Is the tradeoff worth it? In my eyes, no.</p><p>In addition to my feeling that this tradeoff isn't worth it, I dislike affirmative
action because it reinforces negative stereotypes. To illustrate, let's imagine
an extreme example, where affirmative action creates a situation where the disadvantaged
group (for brevity we'll call it group B, and the rest of the student body group A)
is preferred so much that only 10% of the people from group B would get in otherwise.
In the resulting world, 90% of interactions with group B students will be reinforcing
the stereotype that group B students are worse. This feedback loop makes me believe that
affirmative action programs extend the lifetime of these stereotypes.</p><h3>Financial Aid</h3><p>Colleges in the US are rather expensive, and the price is a big factor in many students'
choice of school. Financial aid makes it so that students who would otherwise turn
themselves away are able to attend. Unlike affirmative action, I think that financial aid
is extremely beneficial and overall a great idea.</p><p>Unlike affirmative action, financial aid raises the standards for qualification by
increasing the size of the pool of potential students. Whenever an extremely qualified
student turns down a college because of price, every other student at that school suffers
a little. Financial aid makes that situation less common.</p><p>Financial aid actually parallels the idea of a progressive tax system. The wealthy pay
more taxes because money is disproportionately beneficial for the poor. As a result,
a progressive tax is a fairer way to fund the government than a flat tax would be.
Similarly, tuition serves as a way to fund a college. Students who come from wealthy
families will be less pained by the burden of tuition, and as a result it makes sense
for their tuition to be higher.</p><p>Financial aid tends to come in two flavors. One is financial aid for particular groups,
such as minorities. The other is financial aid based on need, which is usually determined
by the family's income. I strongly prefer need based financial aid, because the benefit of
financial aid is preventing students from turning down the college due to cost, so the
students receiving aid should be the ones who would otherwise be at risk of turning themselves
away. Financial aid based on other metrics such as race can lead to confounding situations,
where (for example), a financially secure black student might get aid while a struggling
white student wouldn't.</p><p>That said, in the current world race and socioeconomic status are correlated reasonably strongly,
so race based financial aid will still get a large amount of the distribution right,
and I definitely prefer race based aid to no aid at all. I have also heard some horror stories
of need based financial aid, where one parent has a good amount of wealth, but is unwilling to pay for
college. In a perfect world, the cost of tuition shouldn't be a consideration for students at all,
but it is a difficult road to get there.</p><h3>Curriculum Changes</h3><p>It's common to notice that in STEM (Science, Technology, Engineering, and Math) fields, you
generally find significantly more males than females. One possible factor toward this
phenomenon is that the education system pushes boys toward these fields and girls away from
them. It's hard to say exactly to what extent this is true, but removing such an effect would
be a good goal.</p><p>That said, it's important to avoid compromising the quality of education. For example,
I read a newspaper article once that said programming classes appealed more to female students
when there was more emphasis on fun and less emphasis on arcane symbols and text.
Of course since the article was written for a general audience, it was probably simplified
in a misleading way, but one has to wonder whether the resulting class is really worthy
of being called a programming class. After all, a large part of a programmer's day
is spent reading or writing these arcane texts. Furthermore, what the heck does it
mean to emphasize fun and do boys not care about having fun?</p><p>Overall, I think that curriculum adjustments are a promising idea, but I am skeptical
of the current initiatives on that front. I also believe that this should be a technique
primarily aimed at lower education. Once you are studying to be an expert in a field,
the things you are being taught should be the accepted best practice in that field.</p><h3>Women-Only Events</h3><p>Many competitions see a pattern of male-dominance. For example, only one woman has ever
been in the top 10 chess players worldwide. In my own experience from math competitions,
those are also dominated by males at the top level. As a result, some people have started
women-only competitions.</p><p>It's hard to say whether I'm for or against these competitions. There's a very difficult
balance to achieve here. Essentially, women-only competitions are trying to encourage
women to enter into the competitive field by lowering the expectations for them.
It's a similar idea as having local competitions, state competitions, and national competitions.
They allow a wider audience to enjoy the sense of competition at a level right for them.</p><p>However, in practice women-only competitions have a very significant difference: they aren't
considered stepping stones. When someone competes at a local competition, then if they
win they will often set their sights on the state or national competition. This means
that you get their attention with a lowered bar, and then turn that attention toward
the top. When that happens, the field gains a valuable member.</p><p>[Math Prize for Girls] is a high school math competition that falls in this category.
They offer larger prizes than almost any other high school math competition, and as such
it makes it difficult to turn the resulting attention to the more prestigous competitions such as
the <a href="https://en.wikipedia.org/wiki/United_States_of_America_Mathematical_Olympiad" target="_blank">USAMO</a>.
Essentially, the existence of such large prizes send the signal that that level is "good enough,"
and so it is reasonable for girls to be motivated to meet that level and then stop.</p><p>If there's one thing that humans are good at, it's meeting expectations with the bare minimum.
I don't believe that Math Prize for Girls has had or will have any effect in making the US's
team to the IMO include more girls. In fact, I am concerned that the lowered ceiling makes
it even less likely for a girl to decide to pursue math to the extent of making the IMO team.</p>Mixed Strategieshttps://www.brianhamrick.com/blog/mixed-strategiesThu, 6 Apr 2017 23:59:59 UTC<p>Competitive Pokemon has been an interesting experience, especially because it
has some major differences with most other competitive games. Most competitive
games take one of two forms. In the first, players are making actions in real
time. Most physical sports fall into this category, along with Starcraft,
DOTA, League of Legends, and so on. In the second category, players take
alternating turns making moves. This category includes games like Chess,
Go, Magic, Hearthstone, and Poker.</p><p>In contrast to these games, in Pokemon the players make their decisions simultaneously.
Once both players have sent their orders, then those orders get executed and the players
begin thinking about their orders for the next turn. As a result, the type of skill
needed to play Pokemon differs quite a bit from the skills needed for the other games
I mentioned before.</p><p>In real time games, there is typically a high amount of mechanical
skill, and doing the wrong thing well is often better than doing the right thing poorly.
For example, in Golf you could spend a lot of time thinking about the proper club to use,
but the decision is meaningless if your swing is bad. In Starcraft, you could have the right
unit composition, but if your macro is bad your army will be too small for that advantage to
matter.</p><p>In the turn-based games, there isn't the pressure of good execution. The decision making
and the execution are one and the same. As a result, the focus is on finding the "best"
move. Even in games with hidden information, there is typically a theoretically optimal
play based on the information that you have, and your goal is to find that play.</p><p>Pokemon works entirely differently. In a sense, there is still a theoretically best play,
but that play is almost always going to be a mixed strategy instead of a pure one.
In other words, given the same position in multiple games, you'll play differently in some
than in others. You might say that this is true in other games as well, but the situation
is a bit different. In Chess and Go, your opponent is going to see your move before they
have to make theirs, so they can respond appropriately. When you play different moves in
Chess or Go, it's because you believe they are approximately equally good, or that one is
slightly better for the day or opponent. In Poker, you want to create uncertainty for
your opponent, but the way you balance betting and folding is by basing the decision on the cards
in your hand.</p><p>To illustrate, let's look at the simultaneous decision game that everyone knows: Rock Paper Scissors.
Rock beats scissors, scissors beats paper, and paper beats rock. It's clear that there's no
single move that is the best here. If you were to always play rock, you'd lose to someone who
always plays paper. Indeed, the theoretical optimum is to play each option randomly one third
of the time (however, in practice opponents are predictable and you can do better by accurately
predicting them). But let's change the situation a bit.</p><p>Let's play a new game, called Modified Rock Paper Scissors. In this game, rock beats scissors and
scissors beats paper, just like before, but if one player plays paper and the other plays rock, then
paper wins only two thirds of the time (for example, we roll a die and paper wins if it lands 3 or higher).
It's tempting to think, "Why would you ever play paper?" but at the same time it's obvious that if your
opponent always plays rock then the proper answer is to always play paper.</p><p>In the end, the best strategy still involves playing all three options some portion of the time, but
some will be played more often than others. The proper balance will have the property that it has
the same chance to win regardless of the opponent's move. For analysis, we'll pretend that a tie
is actually a coin flip to win, since of course if our opponent also plays optimally, the rematch will
be a 50-50. Let's use \(r, p, s\) to denote the probabilities that we play rock, paper, and scissors,
respectively. We know that \(r + p + s = 1\).</p><ul><li>Against an opponent playing rock, our winrate is \(\frac{1}{2}r + \frac{2}{3}p\).</li><li>Against an opponent playing scissors, our winrate is \(r + \frac{1}{2}s\).</li><li>Against an opponent playing paper, our winrate is \(\frac{1}{3}r + s + \frac{1}{2}p\).</li></ul><p>Counterintuitively (for me, at least), if you solve this system of equations you get \( r = p = \frac{3}{7} \)
and \( s = \frac{1}{7} \). The move that was weakened gets played more than when it was strong! Instead,
the move that gets played less is the move that defeats the "bad" move. Once you've seen the result, it
makes sense. Playing rock a lot already gives you a decent matchup against paper, so you don't need to
counter it with scissors as often. On the other hand, your opponent is going to be playing rock quite a bit,
so you'll still play paper a lot to catch that.</p><p>Comparatively, Pokemon is like Modified Modified Modified Modified Modified Modified Modified Modified
Rock Paper Scissors. Each round leads to a slightly different situation in the future round, and so the
calculations to determine the proper mix of moves are very complicated, but the general idea is the same.</p><p>In the Pokemon VGC format there are two Pokemon on each side of the field at a time. Pokemon make their
moves based on their speed, so fast Pokemon will move before slower Pokemon. Let's call the Pokemon on
one side A and B, while the Pokemon on the other side are C and D. It is a very common situation where
neither A nor B can knock out C or D on their own, but combined they'd be able to knock one out.
If both A and B are faster than C and "double up" onto it to knock it out, then C effectively won't get a turn.</p><p>It's usually very powerful to get two moves off while your opponent only gets one, so a newcomer to
VGC might wonder why you don't double up every turn. The answer is the move Protect. Protect defends
the user from (with a few rare exceptions) any attack that might hit it for the turn. If A and B
double up into C, but C used Protect, then A and B did nothing while D got a free turn. Just like
getting two moves off while your opponent gets one is strong, so is getting one move while your
opponent gets none.</p><p>Once you learn the power of protect, it is tempting to ask the opposite question: Why would you ever
double up? If you always split your attacks, so say A attacks C and B attacks D, then if they protect
one Pokemon you'll still get one hit in. The answer lies in the fact that Pokemon are generally balanced
so that fast Pokemon are more fragile or hit weaker than slow Pokemon. So what appears to be a fair
one-for-one or two-for-two trade will usually not be so fair: the slower side will tend to slowly
gain an advantage this way.</p><p>It's difficult to internalize this fact. When you double up into a protect, it often feels like
you lost the game on the spot. On the other hand, splitting the attacks feels like the safe play,
because even though your opponent might have gained an advantage, you still got something done.
In reality, both of those are usually false. Comebacks after a bad turn happen all the time, so
the downside of doubling up into a protect are smaller than they appear (and the upside of doubling
up while your opponent protects the other Pokemon is massive), and the apparently small disadvantage
is often an almost guaranteed loss.</p><p>This is the basis for two of the most important skills for high level Pokemon play. The first
is the ability to accurately judge situations. Ideally, you'd want to know your exact win probability
for each of the various decision combinations. However, you only have 45 seconds to make a move, so
such a calculation is going to be impossible. Advice like "Think about your win condition" and
"Consider what the threats are on each side" are about judging the situation. It's important to
consider the decision from both sides. Just like we saw in Modified Rock Paper Scissors, even though
scissors seemed weaker, it is still a very valuable play because it counters your opponent's apparently
strong play.</p><p>The other important skill is the ability to fall flat on your face, stand back up, and continue walking.
Pokemon is full of luck-based mechanics such as critical hits, moves with 85 percent accuracy, moves
with a 30 percent chance for a beneficial side effects, and so on. However, even without these mechanics
there would be a significant amount of luck created from the mixed strategy dynamic. Even when playing
optimally, sometimes you'll get a terrible result. In order to be a top player, you have to be able to
move past that and continue making the best decisions possible.</p>Exploring Rusthttps://www.brianhamrick.com/blog/exploring-rustWed, 29 Mar 2017 23:59:59 UTC<p>For a while now, people have been saying that <a href="https://www.rust-lang.org/" target="_blank">Rust</a>
should appeal to <a href="https://www.haskell.org/" target="_blank">Haskell</a> programmers as an intermediate
between Haskell and C. Rust gives you a much more low-level view of your program's
execution, but with valuable language constructs such as <a href="https://en.wikipedia.org/wiki/Algebraic_data_type" target="_blank">algebraic data types</a>
that allow you to build up to a high-level view of your algorithm. Rust's community
places a priority on the ability to create "zero-cost abstractions" so that writing
high level code doesn't come at the cost of performance.</p><p>The idea of zero-cost abstractions is very appealing. In the Haskell community, there
are many great libraries that build up useful abstractions. For some of them, the
authors have put a great deal of work in making sure that the performance costs are
minimal, such as by making sure the compiler will inline code in common use cases.
However, there are also quite a few abstractions that sound very appealing but
kill the asymptotics of your program. <a href="https://hackage.haskell.org/package/free-4.12.4/docs/Control-Monad-Free.html" target="_blank">Free monads</a>
are one example, where some algorithms will have bad asymptotics unless you
apply a <a href="http://comonad.com/reader/2011/free-monads-for-less/" target="_blank">non-obvious trick</a>.</p><p>In any case, it's probably been two or three years since I first heard of Rust, and
it was about time for me to actually write some code in it. I was especially interested
in trying out Rust's style of generics. I've previously <a href="/blog/why-generics" target="_blank">ranted</a> about
why I've been frustrated by the lack of generics in <a href="https://golang.org/" target="_blank">Go</a>, so
I was keen to see what generics in an imperative language could look like.</p><p>In order to get the picture I wanted, I chose to implement a data structure based on
<a href="https://en.wikipedia.org/wiki/Splay_tree" target="_blank">splay trees</a>, because data structures are
the realm in which generics truly shine. By writing the splay algorithm in an appropriately
general form, it becomes possible to incorporate that existing code into a new,
slightly different data structure.</p><h2>The Problem</h2><p>The problem I chose is to write a data structure that supports the following operations:</p><ul><li>Set the value at a given index to a given value</li><li>Get the value at a given index</li><li>Given two indices <code>l</code> and <code>r</code>, reverse the interval <code>[l, r)</code></li></ul><p>The first two operations can be trivially implemented in constant time with an array.
However, if you make the decision to use an array, then the third operation will
(as far as I know) require linear time, as you need to iterate over the entire interval.</p><p>On the other hand, any balanced binary search tree will allow you to implement the
first two operations in logarithmic time if you keep the size of the subtree at
each node. Splay trees additionally have the property that with a proper sequence of
splay operations, you can isolate an interval into its own subtree. By then using
a technique called "lazy propagation," it is possible to implement all three
operations in (amortized) logarithmic time.</p><h2>The Code</h2><ul><li><a href="https://github.com/bhamrick/fixalgs" target="_blank">Haskell code</a> (RangeReverse example)</li><li><a href="https://github.com/bhamrick/rust_splay" target="_blank">Rust code</a></li></ul><p>For these purposes, it's not enough to simply create a splay tree parameterized by
the type of data it contains. In a data structure with lazy propagation, each node
has a marker for any transformation that has been applied to that subtree. Whenever
we visit that node, we want to push that transformation down to its children.
In the case of reversals, the pushing operation involves swapping the left and
right children and marking each of them as reversed (or unmarking if they are already
marked).</p><p>Since binary search tree operations and splay operations visit nodes, we want to write
them in such a way that they can automatically call this push operation as needed
without being unnecessarily aware of them. This can be done very cleanly with the ideas
of <a href="https://en.wikipedia.org/wiki/F-algebra" target="_blank">F-Algebras</a> and <a href="https://en.wikipedia.org/wiki/F-coalgebra" target="_blank">F-Coalgebras</a>.</p><p>For this particular case, we start with a functor that describes the branching structure
of a tree holding values of type <code>a</code>.</p><pre><code>data TreeF a b = Empty | Branch a b b</code></pre><p>A <code>TreeF a</code>-algebra is a type <code>t</code> with a function <code>TreeF a t -> t</code>. Similarly, a
<code>TreeF a</code>-coalgebra is a type <code>t</code> with a function <code>t -> TreeF a t</code>. If we were to
define a binary tree holding <code>a</code>s in the usual fashion, we'd find it is both a
<code>TreeF a</code>-algebra and a <code>TreeF a</code>-coalgebra.</p><pre><code>data Tree a = Leaf | Node a (Tree a) (Tree a)
alg :: TreeF a (Tree a) -> Tree a
alg Empty = Leaf
alg (Branch v l r) = Node v l r
coalg :: Tree a -> TreeF a (Tree a)
coalg Leaf = Empty
coalg (Node v l r) = Branch v l r</code></pre><p>These <code>alg</code> and <code>coalg</code> functions become the appropriate places to put extra computation
that does things like maintain a size annotation or perform the push operations for
lazy propagation. In both the Haskell and Rust code, I define all the tree operations
in terms of these functions rather than direct pattern matching, which makes the
actual lazy propagation relatively painless.</p><h2>Differences In Rust</h2><h3>No Higher Kinded Types</h3><p>If you read through the code, you might notice that the way I define an algebra in Haskell
is more general than the way I define it in Rust. Haskell gives me the ability to talk about
higher kinded types, so I can have an <code>Algebra</code> typeclass that's parameterized by a
functor, which in this case will be <code>TreeF a</code>. In Rust, all type parameters need to be
concrete types, so I instead define a <code>TreeAlgebra</code> trait that is parameterized by <code>a</code>
instead. This can somewhat reduce the potential for code reuse, but I think Rust's
system still gets you most of the way there.</p><h3>Think About Memory</h3><p>Rust is famously not garbage collected, and as a result memory concerns spread themselves
through every function. For this use case I wanted to keep only one tree around at any given time, so
it was convenient to have functions that take ownership of the whole tree, and return a structure
(either the new tree or a zipper) that encapsulates all of the information needed to construct
the next tree. This is a particularly big benefit for splay trees, because improper use of an old
value could destroy the amortized time bounds.</p><p>Because I was writing my functions in a take-ownership style, I was able to avoid having to think
too hard about lifetimes. In fact, none of my functions use explicit lifetime parameters. However,
the move semantics did force me to use a few tricks, like when I pattern match on <code>separate(node)</code>,
if I found a value where I didn't want to change anything, I couldn't just go back and use <code>node</code>
like I could in Haskell, but instead needed to recombine the result into a new node.</p><p>Overall, both implementations felt rather mathematical, with the same sort of pattern matching and
formulas. However, Haskell's combination of lazy propagation and garbage collection make the experience
a bit closer to the mathematics. For example, in Rust I have a <code>right_zipper</code> function and a <code>left_zipper</code>
function to move down the tree. In Haskell, I can substitute these functions with an
<code>Coalgebra (TreeF a)</code> instance for zippers, and because of lazy evaluation as long as I use only one of
the branches the other won't be evaluated at all.</p><h2>The Payoff</h2><p>The most clear payoff is that the Rust code is dramatically more efficient. I ran both versions of the
code (compiled with optimizations) on a randomly generated test case containing 1 million operations
on a structure of size 100,000. The two programs yielded identical output (as they should!), but
the Rust code ran almost 6 times faster and with 20 times less memory:</p><pre><code>Haskell
-------
Elapsed (wall clock) time: 47.59 seconds
Maximum resident set size: 334368 kB
Rust
----
Elapsed (wall clock) time: 8.78 seconds
Maximum resident set size: 16856 kB</code></pre><p>Although not something I observed in this experience, another benefit of Rust's memory management system
is that there is no garbage collection. Haskell's garbage collector is optimized for throughput and as
a result <a href="https://making.pusher.com/latency-working-set-ghc-gc-pick-two/" target="_blank">can cause significant execution pauses</a>.
This makes Haskell a questionable choice for real-time applications, as one of these pauses will likely
blow right past any latency requirements you might have. If there's one thing that would make me hesitant to use
Haskell in a production system right now, it's the garbage collector. Rust sidesteps the problem entirely.</p><p>Overall, Rust seems superior to Haskell for most end products. The performance gains are significant without
any real increase in work. However, the Haskell code was very conducive
to exploring the abstraction space, whereas I had the advantage of knowing the approach from the get-go when
writing the Rust code. If I were trying to do that exploration in Rust, I think the memory management concerns
would be quite annoying, because I'd be trying to focus on what the abstraction should be, rather than how it's
implemented. In the future I expect a lot of my projects to start in Haskell for the exploratory phase, then
get ported to Rust if I want a more performant product.</p>Trivial Dependent Typeshttps://www.brianhamrick.com/blog/trivial-dependent-typesWed, 22 Mar 2017 23:59:59 UTC<p><a href="https://en.wikipedia.org/wiki/Dependent_type" target="_blank">Dependent types</a> are types that depend on
values. With dependent types, in addition to being able to express a type that is
"a list of integers," you could also express things like "a list of integers of length 5."
Having access to dependent types allows you to express invariants whose values aren't
known until runtime, but have relations that can be proven at compile time.</p><p>For example, consider the following Haskell function that zips two lists together:</p><pre><code>zip :: [a] -> [b] -> [(a,b)]
zip _ [] = []
zip [] _ = []
zip (x:xs) (y:ys) = (x,y) : zip xs ys</code></pre><p>This function has the property that if the two lists are different sizes, then
the end of the longer one is essentially thrown away. In some cases, that might be
undesirable or surprising behavior. So we might hope for a version of <code>zip</code> that
can only be called when we know for sure that the two lists are the same length.
If we have a list type that is parameterized by length, then that's possible.
We'd give <code>zip</code> a type like <code>forall a b (n : Int). List a n -> List b n -> List (a, b) n</code>.
(The syntax could vary considerably). As an added bonus, we'd no longer need two
base cases, and we could instead replace them with the case <code>zip [] [] = []</code>.</p><p>Unfortunately, dependent types are very difficult to implement in a reasonable way.
Ideally, we want to be able to encode an property in our type, and we want the
type checker to prove that the code we write has that property. In extreme cases,
these properties could be things like "This operation takes logarithmic runtime."
The problem is that this means that the compiler needs to also be a <a href="https://en.wikipedia.org/wiki/Automated_theorem_proving" target="_blank">theorem prover</a>,
and as the proofs become more and more complicated, the programmer is going to
need to significantly guide the compiler.</p><p>With our current technology and techniques, using dependent types in this way
is a far cry from the ideal of writing down properties and letting the compiler
check that they hold (On the other hand, if you are willing to use randomize testing
instead of theorem proving, tools such as <a href="https://en.wikipedia.org/wiki/QuickCheck" target="_blank">QuickCheck</a>
do a good job).</p><p>That said, there are some applications of dependent types where the proofs are not complicated
at all. For example, if we're working with numbers <a href="https://en.wikipedia.org/wiki/Modular_arithmetic" target="_blank">mod n</a>,
then in C we'd need to write code like <code>((a*b % n)*c % n)</code>, and so on, being very careful to
<code>% n</code> after each operation to prevent overflow.
It'd be much nicer and lead to simpler code if we could have a type that was "Integers modulo <code>n</code>",
and have the arithmetic operators do the intermediate modulos for us.
This kind of thing is possible in Haskell <a href="https://www.schoolofhaskell.com/user/thoughtpolice/using-reflection" target="_blank">using the <code>reflection</code> package</a>,
but the resulting type signatures take a while to get used to, and feel much
more complicated than the problem that I want to be able to solve.</p><p>In an application like this, our type parameter is just going to be a constant for a huge
section of our program. It's trivial to prove equality in cases like this: we're just saying
x = x. So I'd like to propose a variant of dependent types where only trivial proofs are allowed.
For lack of a better name, let's call these trivial dependent types.</p><p>Implementing these in a Haskell-like language with immutable data should not be too hard. Suppose
we have a dependent type <code>Foo (x : Int)</code>, which has a numerical parameter. During type checking
we might need to determine whether <code>Foo x</code> and <code>Foo y</code> are equal. The answer in this case is
the same as the answer to whether <code>x</code> and <code>y</code> can share the same memory location. So if
<code>x</code> is read from user input, while <code>y</code> is the result of a long computation, then <code>Foo x</code> and
<code>Foo y</code> won't be compatible. On the other hand, if you write <code>let y = x</code>, then they will.</p><p>It should also straightforward to have type inference with this style of dependent types.
A value parameter to a type can be represented in the compiler by the location in the AST
that defines the expression, or an equivalence class of those locations that you're computing
anyway in order to do <a href="https://en.wikipedia.org/wiki/Common_subexpression_elimination" target="_blank">common subexpression elimination</a>.</p><p>In the presence of mutable values, inference will be harder, but should still be possible.
You could, for example, have your type checker (but not your code generator!) interpret a
changing value as the creation of a new variable of the same name that shadows the old one.
In this way, if you try to use <code>Foo n</code> both before and after a potential modification, you'd
get a type mismatch because the <code>n</code> after the assignment refers to a different <code>n</code> as before
the assignment, even though those two variables will use the same memory location when it
comes to code generation.</p><p>Loops also pose a potential problem, but again it should be possible to work with in. In addition
to values that might have changed in prior lines, you also need to worry about the possibility
that the value changed in lines that appear later in the loop but in earlier iterations. It's possible
that this could lead you down a similar road that <a href="https://www.rust-lang.org/" target="_blank">Rust</a> went down
to manage memory, but I'm not familiar enough with Rust's borrow checker to highlight the similarities.</p><p>I'd love to try out a language with these types to see if it feels expressive enough or
if I'll just end up wishing I had the next step of dependent types, and I think it'd be
a fun project to work on building such a language, but realistically I won't have the time
or the motivation to do it. If someone out there is interested, let me know and I'd
probably help out.</p><p>To close this post, I'll highlight a couple cool things that you could do with this type of
dependent type, whether in functional or imperative style.</p><h3>Augmented Data Structures</h3><p>Binary search tree variants such as splay trees can often be used as a starting point
for efficient algorithms by augmenting each node of the tree with some auxillary data, such
as the sum of all the elements in that subtree. With trivial dependent types, we could
write a tree parameterized by the type of the augmentation and the function that produces
a node's augment from either its value (if the node is a leaf) or the augments of its children.</p><pre><code>data TreeF a x = Leaf a | Branch x x
data AugmentedTree a aug (combine : TreeF a aug -> aug) = ...</code></pre><h3>Safe indexing without bounds checking</h3><p>The technique here provides less help than something like <a href="https://en.wikipedia.org/wiki/Refinement_(computing)#Refinement_types" target="_blank">refinement types</a>,
but we can still do something. If you write C code, it's very common to write a loop
such as:</p><pre><code>// a is an array of size N
for(int i = 0; i < N; i++) {
// do something with a[i]
}</code></pre><p>With trivial dependent types, we could create a type <code>Index (x : Array a)</code> that is parameterized
by any type of array and represents an in-bounds index into that array. We could then write
an indexing function <code>index : (x : Array a) -> (i : Index x) -> a</code> that is essentially
implemented as <code>x[i]</code>. Then, we could define iteration constructs that give us an <code>Index x</code>
instead of an <code>int</code>, and avoid any need to do bounds checking if we only index by values
of this type.</p><p>Of course, we lose the ability to add two indices together, but that's an inherently unsafe operation,
and in most cases we'd want a bounds check in that case anyway. Refinement types or similar
techniques might be able to prove that a particular computation is safe, but once again those
techniques are much more difficult to get right. While we lose some amount of arithmetic, we can
still compare indices perfectly fine, and more generally provide any operation that cannot produce
invalid indices.</p>Math is About Communicationhttps://www.brianhamrick.com/blog/math-is-communicationWed, 15 Mar 2017 23:59:59 UTC<p>When students come out of middle school, or even high school, a lot of them
have a mistaken idea of the point of math. Of course, the students aren't to blame.
It's conceivable, likely even, that the teachers have the same idea.
Math is fundamentally about communicating ideas, often very precise ideas.
You might also be familiar with the argument that <a href="https://www.maa.org/external_archive/devlin/LockhartsLament.pdf" target="_blank">math is art</a>
but of course art is also fundamentally about the communication of
ideas from the artist.</p><p>The problem, if you can call it a problem, is that the baseline of math
is so old and so well tested that to a young student it is indistinguishable
from fact. Instead of thinking of math in terms of communication, these
students think of math in terms of right and wrong answers.
This misunderstanding leads to more misunderstandings down the road,
and I want to highlight a couple examples.</p><h2>1 is not prime</h2><p>When you learned about prime numbers, you were probably told something like "A number is prime if
it is only divisible by 1 and itself." So numbers like 2, 3, 5, and 7 are prime. On the other hand,
1 also seems to fit this definition, as it is not divisible by anything other than 1 and itself
(although the two are the same in this case). You might've even gotten a bullshit reason like
"Oh, primes have to have exactly two factors" for why 1 is not prime.</p><p>The real reason that 1 is not prime is because it is more convenient to say that 1 is not prime.
It is true that 1 has a lot of properties that primes have, but it also has a lot of properties
that primes do not have, and there are a huge number of situations where we would want to separate
1 out. Here's the usual definition of a few terms in commutative ring theory (here I use the word "number"
to mean "element of a commutative ring"):</p><blockquote><p>A <i>unit</i> is a number \( u \) with an inverse. i.e. There is some other number \( v \) such that \( uv = 1 \).
A <i>prime</i> is a nonunit number \( p \) such that if \( p \) divides \( ab \) then either \( p \) divides \( a \) or
\( p \) divides \( b \) (or both).
An <i>irreducible</i> is a nonunit number \( p \) such that if \( p = ab \) then either \( a \) or \( b \) is a unit.</p></blockquote><p>For the integers, the units are 1 and -1. The concepts of primes and irreducible both correspond to what we usually think
of as primes, but also include the negative primes -2, -3, -5, etc.
Notice the sneaky "nonunit" in both the definitions of primes and irreducibles. In fact, units
satisfy both of the other criteria for primes and irreducibles.</p><p>So why do mathematicians want to not include units in the list of primes? Well, for example, when we
define a <a href="https://en.wikipedia.org/wiki/Unique_factorization_domain" target="_blank">unique factorization domain</a>,
we call a factorization of a number \( n \) as a product \( u p_1^{e_1} p_2^{e_2}\cdots p_k^{e_k} \)
where \( u \) is a unit and \( p_1, \ldots, p_k \) are primes.</p><p>If units were included in primes, then instead of "unit" and "prime", we'd be saying "unit" and
"nonunit prime". When you look at the interesting theorems in ring theory and the topics based on it,
the prime-without-unit idea shows up in a huge number of places, and the prime-with-unit idea
shows up essentially not at all. It's like the question of whether a hotdog is a sandwich.
Maybe you could come up with an argument of why hotdogs should technically be sandwiches, but
the practical result is that if you want a sandwich you'd have to ask for a "non-hotdog sandwich"
instead. So it's more convenient to just say that hotdogs are not sandwiches.</p><h2>0.999... = 1</h2><p>This one is common to the point of absurdity. It is absolutely and
unquestionably true that the real number denoted by 0.999... and the real
number denoted by 1 are the same number. There are plenty of basic "proofs"
of this fact, that look something like the following:</p><ol><li>1/3 = 0.3333...</li><li>3*1/3 = 0.999...</li><li>1 = 0.999...</li></ol><p>The problem with these proofs is that they don't actually answer the fundamental
misunderstandings of the people who believe that 0.999... and 1 are different.
These people usually have the following gaps:</p><ol><li>They don't know what a real number is.</li><li>They don't know what 0.999... means.</li></ol><p>If you've seen enough of these discussions I'm sure that you've seen cases
where the person accepts the given "proof" but then still believes that 0.999...
and 1 are different with one of the following resolutions:</p><ul><li>Well, 0.333... is very close to 1/3, but not exactly.</li><li>The sum \( \sum_{n=1}^{\infty} \frac{9}{10^n} \) is 1 but 0.999... is something different.</li></ul><p>These outcomes are because nobody addressed the fundamental gaps of knowledge.
The real numbers aren't usually formally defined until you take a real analysis course,
which means either college or never for most people. Instead, they have this fuzzy
understanding that the real numbers include the integers, the rational numbers, as well
as numbers like \( \sqrt{2} \) and \( \pi \) and \( e \), but not things like \( \sqrt{-1} \).</p><p>There are two typical definitions of real numbers that you are likely to see in a real analysis course.
The first is in terms of <a href="https://en.wikipedia.org/wiki/Cauchy_sequence" target="_blank">Cauchy sequences</a>.
A sequence \( x_1, x_2, \ldots \) is said to be a Cauchy sequence if for every rational \( \varepsilon > 0 \)
there is some \( N \) such that for any \( n, m > N \), we have \( |x_n - x_m| < \varepsilon \).
The intuition here is that every sequence with this property ought to converge, because we can find smaller
and smaller intervals narrowing down what the limit should be, but when we narrow it down "all the way",
the point that we want might be missing!</p><p>However, some sequences should converge to the same number. For example, we don't want the sequence
\( 0, 1, 1, 1, 1, \ldots \) to be considered differently from \( 1, 1, 1, 1, \ldots \). They should both
simply represent the number 1. Essentially, two sequences should converge to the same value if they eventually
become arbitrarily close: for any \( \varepsilon > 0 \), there is some \( N \) such that for any \( n > 0 \),
we have \( |x_n - y_n| < \varepsilon \). This notion of "should converge to the same value" is an equivalence
relation on Cauchy sequences, and we call one of these equivalence classes a real number.</p><p>The second definition is in terms of <a href="https://en.wikipedia.org/wiki/Dedekind_cut" target="_blank">Dedekind cuts</a>.
If you had a totally ordered set \( S \), you can always "cut" it at any element (call it \( x \)) into two
sets, \( A = \{ y \in S\ |\ y < x \} \) and \( B = \{ y \in S\ |\ y \geq x \} \). But there are some pairs of
sets that look like cuts that don't come from elements of \( S \). For example, if \( S \) is the set of rational
numbers, then \( A = \{ y \in \mathbb{Q}\ |\ y < 0 \text{ or } y^2 < 2 \} \), \( B = \{ y \in \mathbb{Q}\ |\ y > 0 \text{ and } y^2 \geq 2 \} \) looks like a cut that should correspond to a number that squares to 2, but no rational number squares to 2.
In the Dedekind cut construction of real numbers, each of these pairs of sets is called a real number.</p><p>I have left out a lot of details, such as the full definition of a Dedekind cut,
how arithmetic is defined, and why these are sensible definitions
of what we think of as "real numbers", why these two definitions give the same result, and so on.
However, if you trust me that these definitions work, it's not too difficult to see why 0.999... = 1.</p><p>If we're working with the Cauchy sequence construction, when we write 0.999... we probably mean the Cauchy sequence
\( 0, 0.9, 0.99, 0.999, \ldots \). It is straightforward to check that this is in the same equivalence class as
the sequence \( 1, 1, 1, 1, \ldots \), which is clearly the number 1.</p><p>If we're working with the Dedekind cut construction, when we write 0.999... we probably mean the cut
\( A = \{ y \in \mathbb{Q}\ |\ \exists N. y < 1 - \frac{1}{10^n} \} \), \( B = \{ y \in \mathbb{Q}\ |\ \forall N. y \geq 1 - \frac{1}{10^n} \} \). It is not too difficult to check that \( A = \{ y \in \mathbb{Q}\ |\ y < 1 \) and \( B = \{ y \in \mathbb{Q}\ |\ y \geq 1 \} \), which
is the cut corresponding to the number 1.</p>