<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.3.2">Jekyll</generator><link href="https://embracingtherandom.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://embracingtherandom.com/" rel="alternate" type="text/html" /><updated>2023-06-17T09:43:54+10:00</updated><id>https://embracingtherandom.com/feed.xml</id><title type="html">Embracing the Random</title><subtitle>This is my life</subtitle><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><entry><title type="html">Statistical significance intuition</title><link href="https://embracingtherandom.com/experiments/python/statistical-significance-intuition/" rel="alternate" type="text/html" title="Statistical significance intuition" /><published>2023-05-15T05:00:00+10:00</published><updated>2023-05-15T05:00:00+10:00</updated><id>https://embracingtherandom.com/experiments/python/statistical-significance-intuition</id><content type="html" xml:base="https://embracingtherandom.com/experiments/python/statistical-significance-intuition/"><![CDATA[<blockquote>
  <p>An attempt at explaining a difficult concept in a characteristically dense but intuitive way</p>
</blockquote>

<p>References (I highly recommend all of these books):</p>

<blockquote>
  <p>“Statistical methods in online A/B testing” by Georgi Z. Georgiev<br />
“Introductory Statistics and Analytics: A Resampling Perspective” by Peter C. Bruce<br />
“Practical Statistics for Data Scientists” by Peter Bruce, Andrew Bruce and Peter Gedeck<br />
“Reasoned Writing” and “A Framework for Scientific Papers” by Devin Jindrich<br />
“Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing” by Ron Kohavi, Diane Tang and Ya Xu</p>
</blockquote>

<p>I’m going to try to explain what “statistical significance” means, using a fake experiment to test a rates-based metric (the conversion rate). I’ll try not to use much jargon and almost no maths. I might be imprecise. The goal of this post is intuition over precision.</p>

<p><strong>Notes:</strong></p>

<ul>
  <li>I’ll be ignoring concepts like power, minimum detectable effect, and one/two-tailed tests as they’ll cloud intuition.</li>
  <li>I’ll be using a resampling approach rather than relying on traditional formulae because the resampling approach is more intuitive.</li>
</ul>

<p>Let’s jump right in!</p>

<h2 id="python-stuff-used-in-this-post">Python stuff used in this post</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">random</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">statsmodels.stats.proportion</span> <span class="kn">import</span> <span class="n">score_test_proportions_2indep</span>
<span class="kn">from</span> <span class="n">scipy</span> <span class="kn">import</span> <span class="n">stats</span>
<span class="kn">from</span> <span class="n">numba</span> <span class="kn">import</span> <span class="n">njit</span><span class="p">,</span> <span class="n">prange</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">matplotlib.ticker</span> <span class="k">as</span> <span class="n">ticker</span>
<span class="kn">import</span> <span class="n">seaborn</span> <span class="k">as</span> <span class="n">sns</span>
</code></pre></div></div>

<p>Some settings:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rng</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">default_rng</span><span class="p">(</span><span class="mi">1337</span><span class="p">)</span> <span class="c1"># random number generator using the one true seed
</span><span class="n">sns</span><span class="p">.</span><span class="nf">set_style</span><span class="p">(</span><span class="sh">'</span><span class="s">white</span><span class="sh">'</span><span class="p">)</span>
</code></pre></div></div>

<h1 id="set-up-a-fake-experiment">Set up a fake experiment</h1>

<p><strong>Note: We’ll be using absurdly small sample sizes in the following. Again, intuition is the goal.</strong></p>

<p>Let’s say that we have a new search ranking algorithm that we think outperforms our current search ranking algorithm. In our experiment, we have two groups that we’ll be comparing:</p>

<ul>
  <li>One set of users who are exposed to the search ranking algorithm in production today. We’ll call this the “control group”.</li>
  <li>Another set of users who are exposed to the new search ranking algorithm. We’ll call this the “variant group”.</li>
</ul>

<p>How do users get into these groups in the first place?</p>

<h2 id="on-the-random-assignment-of-users-to-our-control-and-variant-groups">On the random assignment of users to our control and variant groups</h2>

<p>We will randomly assign users appearing  on our website into our control and variant groups. The key reason why we randomly assign users to our control and variant groups is this:</p>

<blockquote>
  <p>We want to be confident that the only difference between our two groups is the difference in search ranking algorithms.</p>
</blockquote>

<p>What if we were to do this instead?</p>

<ul>
  <li>Assign users who live in Australia to the control group</li>
  <li>Assign users who live in the USA to the variant group</li>
</ul>

<p>Now, the differences between our groups aren’t solely the difference in search ranking algorithms. We now have a geographical difference. Users living in Australia might not behave in the same way as users in the USA. The two countries might be different in demographical factors like age.</p>

<p>These differences cloud our ability to measure the “true” degree of outperformance between our algorithms. So let’s stick to random assignments.</p>

<h2 id="how-will-we-measure-the-performance-of-each-algorithm">How will we measure the performance of each algorithm?</h2>

<p>We now have two groups of randomly assigned users who are exposed to different search ranking algorithms.</p>

<p>We need a metric that we can use to call our experiment a success or a failure. Let’s use something familiar to most - the <strong>“conversion rate”</strong>. We’ll define “conversion” as “a user buying something”.</p>

<h3 id="the-conversion-rate-for-a-single-group-in-our-experiment">The conversion rate for a single group in our experiment</h3>

<p>For each user in our experiment, there are two outcomes:</p>

<ul>
  <li>A user converts during our experiment</li>
  <li>A user doesn’t convert during our experiment</li>
</ul>

<p>We’ll define our conversion rate like this:</p>

\[\text{Control group conversion rate} = \frac{\text{Number of unique converting users in our control group}}{\text{Number of unique users in our control group}}\]

<p>We do the same for our variant group:</p>

\[\text{Variant group conversion rate} = \frac{\text{Number of unique converting users in our variant group}}{\text{Number of unique users in our variant group}}\]

<p>Let’s illustrate our conversion rate. Here is a timeline of an experiment:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/conversion_rate_0.jpg" width="55%" class="align-center" />
</div>

<p>A user shows up in our experiment. They haven’t bought anything yet, so the conversion rate is 0%:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/conversion_rate_1.jpg" width="85%" class="align-center" />
</div>

<p>Hooray! The user bought a pair of sweet, sweet sneakers. The conversion rate is now 100%:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/conversion_rate_2.jpg" width="85%" class="align-center" />
</div>

<p>A second user shows up in our experiment. They unfortunately don’t buy anything and the experiment ends. The conversion rate is 50% for the experiment:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/conversion_rate_3.jpg" width="85%" class="align-center" />
</div>

<h3 id="how-will-we-measure-the-outperformance-of-our-new-algorithm-over-the-existing-one">How will we measure the “outperformance” of our new algorithm over the existing one?</h3>

<p>We have two groups with an equal number of randomly assigned users. We now have two conversion rates:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/diff_in_rates.jpg" width="80%" class="align-center" />
</div>

<p>How can we distil these two rates into a single metric that can be used to describe the difference in performance between the two algorithms? That’s easy!</p>

<p>What we’re interested in is the <strong>difference in conversion rates between these groups</strong>. Let’s calculate the difference in conversion rates in the above scenario:</p>

\[\begin{align*}
\text{Difference in conversion rates} &amp;= \text{Variant conversion rate} - \text{Control conversion rate} \\
&amp;\approx 66.7\% - 50.0\% \\
&amp;\approx 16.7\%
\end{align*}\]

<p>Wow, a <code class="language-plaintext highlighter-rouge">+~16.7%</code> absolute difference to control! That’s a relative change of \(\frac{16.7\%}{50\%} \approx ~33.3\%\)!!!</p>

<h1 id="is-the-difference-in-conversion-rates-just-random-noise">Is the difference in conversion rates just random noise?</h1>

<p>We have a <code class="language-plaintext highlighter-rouge">+~16.7%</code> absolute difference in conversion rates. It looks like our variant algorithm performs better! We have a winner! <strong>Let’s release this to production.</strong> Right? Right?</p>

<p>No! Not yet. Let’s dive head-first into the rabbit hole.</p>

<p>We can’t include all current and all future users in our experiment. That’s impossible! We’re constrained by reality. In our experiment, we’ve only sampled 12 users in total, 6 of whom were randomly assigned to the control group, and 6 of whom were randomly assigned to the variant group.</p>

<p>By only sampling 12 users from a hypothetically infinite group of users, there’s some <strong>randomness</strong> in which users show up during our experiment. Furthermore, even though we’ve randomly allocated users to the control and variant groups, there’s a naturally <strong>random variation</strong> in how users behave within each group.</p>

<p>This means our conversion rate measurements are <strong>“tinged with randomness”</strong>. Oh, beautiful randomness.</p>

<p>Our job is to make an intelligent guess from our limited experiment users whether what we’ve observed is a random blip in outperformance, or whether it could be “true outperformance”.</p>

<h2 id="on-our-parallel-worlds">On our parallel worlds</h2>

<p>We have two competing states of the world that describe our observed difference in conversion rates:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/two_states_of_the_world.jpg" width="45%" class="align-center" />
</div>

<p>We don’t know which state of the world we’re in. We need to <strong>infer</strong> which state of the world best describes our experiment results.</p>

<h2 id="our-reasoning-process">Our reasoning process</h2>

<p>We reason in a reversed way.</p>

<p>We firstly assume we’re in World 1:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/hypothesis_testing_1.jpg" width="75%" class="align-center" />
</div>

<p>We consider our experiment results and how <strong>special they are</strong> in World 1. We can think of this as answering the question “<strong>How out of place</strong> are our experiment results if we’re in a world where randomness might have caused our experiment results?”:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/hypothesis_testing_2.jpg" width="65%" class="align-center" />
</div>

<p>If the experiment results are special enough (i.e. they’re rare in a world where randomness might have caused the results we’ve observed), we say that randomness is a poor descriptor of our experiment results. We conclude that there’s good evidence that our variant algorithm performs better than our control algorithm:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/hypothesis_testing_3.jpg" width="80%" class="align-center" />
</div>

<p>If the experiment results aren’t special enough, we conclude that there’s not enough evidence to say that the variant algorithm performs better than the control algorithm:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/hypothesis_testing_4.jpg" width="77%" class="align-center" />
</div>

<h2 id="on-quantifying-how-well-randomness-explains-our-results">On quantifying how well randomness explains our results</h2>

<p>How can we quantify how special or not special our experiment result of <code class="language-plaintext highlighter-rouge">~16.7%</code> difference in conversion rates really is?</p>

<ul>
  <li>We could come up with a range of differences in conversion rates caused by randomness alone, and see how extreme our result of <code class="language-plaintext highlighter-rouge">~16.7%</code> difference is.</li>
  <li>To come up with a range of random variation in the difference in rates, we can use <strong>simulations</strong> by “injecting randomness” into our experiment results!</li>
  <li>We’ll inject randomness into our experiment results by <strong>randomly shuffling</strong> our users and creating many different control and variant groups. The key is that we’re randomly shuffling our users. It’s like we’re shuffling a deck of cards. <strong>This is pure randomness!</strong></li>
</ul>

<p>Here are our users from the above artificial experiment, indicating whether they converted or didn’t convert during our experiment:</p>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/original_groups.jpg" width="90%" class="align-center" />
</div>

<p>We’re going to chuck our users into a bag. Let’s create our “bag”:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/our_bag.jpg" width="90%" class="align-center" />
</div>

<p>Let’s do the “chucking”:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/chuck_em_into_bag.jpg" width="90%" class="align-center" />
</div>

<p>Let’s randomly shuffle our users:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/shuffled_users.jpg" width="91%" class="align-center" />
</div>

<p>From here on, we’ll keep only our users’ conversion statuses so that it’s easier to understand:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/shuffled_users_simplified.jpg" width="87%" class="align-center" />
</div>

<p>We’ll take the first 6 users and call this our <strong>“simulated variant group created by pure randomness”</strong>. We’ll take the next 6 users and call this our <strong>“simulated control group created by pure randomness”</strong>:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/new_simulated_groups.jpg" width="88%" class="align-center" />
</div>

<p>We then calculate our difference in rates between our simulated control and variant groups caused by <strong>“pure randomness”</strong>:</p>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/simulated_diff_in_rates.jpg" width="90%" class="align-center" />
</div>

<p>We repeat this process many times to get a <strong>distribution of pure randomness</strong> and see where our observed difference of <code class="language-plaintext highlighter-rouge">~16.7%</code> lies.</p>

<p>Now, onto Python!</p>

<h2 id="creating-our-range-of-pure-randomness-in-python">Creating our range of pure randomness in Python</h2>

<p>We’ll be using these techniques for the rest of the post.</p>

<p>Let’s create an array that represents users in our control group and whether the user at that position in the array converted or not. We’re dealing with binary outcomes (the user either converts or not):</p>

<ul>
  <li>A user who converted will be given the value <code class="language-plaintext highlighter-rouge">1</code></li>
  <li>A user who didn’t convert will be given the value <code class="language-plaintext highlighter-rouge">0</code></li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_CONTROL_USERS</span> <span class="o">=</span> <span class="mi">6</span> <span class="c1"># total number of users in control group
</span></code></pre></div></div>

<p>We’ll create an array in which each index represents a user in our control group:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">control_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="n">NUM_CONTROL_USERS</span><span class="p">)</span>

<span class="n">control_users</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([0., 0., 0., 0., 0., 0.])
</code></pre></div></div>

<p>We’ll set the converting users to <code class="language-plaintext highlighter-rouge">1</code>’s:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_CONVERTING_CONTROL_USERS</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">control_users</span><span class="p">[:</span><span class="n">NUM_CONVERTING_CONTROL_USERS</span><span class="p">]</span> <span class="o">=</span> <span class="mf">1.0</span>

<span class="n">control_users</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([1., 1., 1., 0., 0., 0.])
</code></pre></div></div>

<p>What’s our control conversion rate?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">control_conversion_rate</span> <span class="o">=</span> <span class="n">control_users</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">control conversion rate: </span><span class="si">{</span><span class="n">control_conversion_rate</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>control conversion rate: 50.0%
</code></pre></div></div>

<p>We’ll do the same for our variant group users:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_VARIANT_USERS</span> <span class="o">=</span> <span class="mi">6</span>
<span class="n">NUM_CONVERTING_VARIANT_USERS</span> <span class="o">=</span> <span class="mi">4</span>

<span class="n">variant_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="n">NUM_VARIANT_USERS</span><span class="p">)</span>
<span class="n">variant_users</span><span class="p">[:</span><span class="n">NUM_CONVERTING_VARIANT_USERS</span><span class="p">]</span> <span class="o">=</span> <span class="mf">1.0</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">variant_conversion_rate</span> <span class="o">=</span> <span class="n">variant_users</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variant conversion rate: </span><span class="si">{</span><span class="n">variant_conversion_rate</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>variant conversion rate: 66.7%
</code></pre></div></div>

<p>The difference in rates we saw in our experiment is this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">observed_diff_in_rates</span> <span class="o">=</span> <span class="n">variant_conversion_rate</span> <span class="o">-</span> <span class="n">control_conversion_rate</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">observed difference in conversion rates: </span><span class="si">{</span><span class="n">observed_diff_in_rates</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>observed difference in conversion rates: 16.7%
</code></pre></div></div>

<p>Very good!</p>

<p>Let’s run one iteration of our simulation. Let’s chuck our control and variant users into a bag:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">all_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">hstack</span><span class="p">([</span><span class="n">control_users</span><span class="p">,</span> <span class="n">variant_users</span><span class="p">])</span>

<span class="n">all_users</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([1., 1., 1., 0., 0., 0., 1., 1., 1., 1., 0., 0.])
</code></pre></div></div>

<p>Let’s randomly shuffle them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rng</span><span class="p">.</span><span class="nf">shuffle</span><span class="p">(</span><span class="n">all_users</span><span class="p">)</span>

<span class="n">all_users</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([0., 1., 1., 0., 1., 1., 1., 0., 0., 0., 1., 1.])
</code></pre></div></div>

<p>We create our simulated control and variant groups:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">simulated_control_group</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[:</span><span class="n">NUM_CONTROL_USERS</span><span class="p">]</span>
<span class="n">simulated_variant_group</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[</span><span class="n">NUM_CONTROL_USERS</span><span class="p">:]</span>

<span class="n">simulated_control_group</span><span class="p">,</span> <span class="n">simulated_variant_group</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(array([0., 1., 1., 0., 1., 1.]), array([1., 0., 0., 0., 1., 1.]))
</code></pre></div></div>

<p>What are our simulated conversion rates?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">simulated control conversion rate: </span><span class="si">{</span><span class="n">simulated_control_group</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">simulated variant conversion rate: </span><span class="si">{</span><span class="n">simulated_variant_group</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>simulated control conversion rate: 66.7%
simulated variant conversion rate: 50.0%
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">simulated_diff_in_rates</span> <span class="o">=</span> <span class="n">simulated_variant_group</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span> <span class="o">-</span> <span class="n">simulated_control_group</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">simulated difference in conversion rates: </span><span class="si">{</span><span class="n">simulated_diff_in_rates</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>simulated difference in conversion rates: -16.7%
</code></pre></div></div>

<p>Not quite our result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>, is it?</p>

<p>Let’s do this many times:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_SIMULATIONS</span> <span class="o">=</span> <span class="mi">10_000</span>
<span class="n">simulated_diffs_in_rates</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">NUM_SIMULATIONS</span><span class="p">):</span>
    <span class="n">rng</span><span class="p">.</span><span class="nf">shuffle</span><span class="p">(</span><span class="n">all_users</span><span class="p">)</span>
    <span class="n">control_conversion_rate</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[:</span><span class="n">NUM_CONTROL_USERS</span><span class="p">].</span><span class="nf">mean</span><span class="p">()</span>
    <span class="n">variant_conversion_rate</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[</span><span class="n">NUM_CONTROL_USERS</span><span class="p">:].</span><span class="nf">mean</span><span class="p">()</span>
    <span class="n">simulated_diffs_in_rates</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">variant_conversion_rate</span> <span class="o">-</span> <span class="n">control_conversion_rate</span><span class="p">)</span>

<span class="n">simulated_diffs_in_rates</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">simulated_diffs_in_rates</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s inspect our first 10 simulated difference in rates:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">rate</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">simulated_diffs_in_rates</span><span class="p">[:</span><span class="mi">10</span><span class="p">]):</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">simulated diff in rates </span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">rate</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>simulated diff in rates 1: -50.0%
simulated diff in rates 2: -16.7%
simulated diff in rates 3: 16.7%
simulated diff in rates 4: 16.7%
simulated diff in rates 5: 50.0%
simulated diff in rates 6: 50.0%
simulated diff in rates 7: -16.7%
simulated diff in rates 8: -16.7%
simulated diff in rates 9: -50.0%
simulated diff in rates 10: 16.7%
</code></pre></div></div>

<p>Let’s plot our <strong>distribution of pure randomness</strong>. This is our distribution of random noise!</p>

<p><strong>Note: we’re dealing with extremely small sample sizes so the distribution ain’t pretty. We’ll be running a large-scale fake example next.</strong></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">plot_hist</span><span class="p">(</span><span class="n">experiment_results</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">,</span>
              <span class="n">bins</span><span class="o">=</span><span class="mi">100</span><span class="p">,</span>
              <span class="n">observed_rate</span><span class="p">:</span> <span class="nb">float</span> <span class="o">=</span> <span class="bp">None</span><span class="p">,</span>
              <span class="n">title</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="bp">None</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">sns</span><span class="p">.</span><span class="nf">histplot</span><span class="p">(</span><span class="n">experiment_results</span><span class="p">,</span> <span class="n">bins</span><span class="o">=</span><span class="n">bins</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">observed_rate</span><span class="p">:</span>
        <span class="n">plt</span><span class="p">.</span><span class="nf">axvline</span><span class="p">(</span><span class="n">observed_rate</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="sh">'</span><span class="s">r</span><span class="sh">'</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sh">'</span><span class="s">Diff in rates observed in experiment</span><span class="sh">'</span><span class="p">)</span>
        <span class="n">plt</span><span class="p">.</span><span class="nf">legend</span><span class="p">(</span><span class="n">bbox_to_anchor</span><span class="o">=</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="o">-</span><span class="mf">0.2</span><span class="p">),</span> <span class="n">loc</span><span class="o">=</span><span class="sh">"</span><span class="s">lower center</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">if</span> <span class="n">title</span><span class="p">:</span>
        <span class="n">plt</span><span class="p">.</span><span class="nf">title</span><span class="p">(</span><span class="n">title</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">gca</span><span class="p">().</span><span class="n">xaxis</span><span class="p">.</span><span class="nf">set_major_formatter</span><span class="p">(</span><span class="n">ticker</span><span class="p">.</span><span class="nc">PercentFormatter</span><span class="p">(</span><span class="n">xmax</span><span class="o">=</span><span class="mf">1.0</span><span class="p">))</span>
    <span class="n">sns</span><span class="p">.</span><span class="nf">despine</span><span class="p">(</span><span class="n">left</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">bottom</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
    <span class="n">plt</span><span class="p">.</span><span class="nf">tight_layout</span><span class="p">()</span>


<span class="nf">plot_hist</span><span class="p">(</span><span class="n">simulated_diffs_in_rates</span><span class="p">,</span>
          <span class="n">bins</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span>
          <span class="n">title</span><span class="o">=</span><span class="sh">"</span><span class="s">Our range of pure randomness</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/output_107_0.png" width="55%" class="align-center" />
</div>

<h2 id="how-special-are-our-experiment-results-on-statistical-significance">How “special” are our experiment results? On statistical significance</h2>

<p>Let’s now infer which world we’re in. Let’s recap our reasoning process:</p>

<ul>
  <li>If our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code> <strong>“isn’t special enough”</strong>, we’re probably in World 1. This is the world where we assume that our experiment result is probably due to random chance. We conclude that, given our experiment, <strong>there’s insufficient evidence</strong> to say that our variant algorithm performs better than our control algorithm.</li>
  <li>If our experiment result <strong>“is special enough”</strong>, then we’re probably in World 2. This is the world where random chance isn’t a good description for our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>. We conclude that, given our experiment, <strong>there’s strong evidence</strong> to suggest that our variant algorithm performs better than our control algorithm.</li>
</ul>

<p>Let’s restate our phrases “isn’t special” and “special” in the context of our distribution of random noise:</p>

<ul>
  <li>If it’s <strong>“common”</strong> to see an experiment result greater than or equal to <code class="language-plaintext highlighter-rouge">~+16.7%</code> in our distribution of random noise, then our result “isn’t special”. This means that random noise explains our experiment results well.</li>
  <li>If it’s <strong>“rare”</strong> to see an experiment result greater than or equal to <code class="language-plaintext highlighter-rouge">~+16.7%</code> in our distribution of random noise, then our result is “special”. This means that random noise doesn’t explain our experiment results well.</li>
</ul>

<p>We can now address the <strong>“enough”</strong> part of the phrases “isn’t special enough” and “is special enough”. The most common threshold of “specialness” used is <code class="language-plaintext highlighter-rouge">5%</code>. What does this mean in our context?</p>

<ul>
  <li>If more than <code class="language-plaintext highlighter-rouge">5%</code> of our randomly generated differences in conversion rates are greater than or equal to our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>, then we say that our experiment result “isn’t special enough”.
    <ul>
      <li>We conclude that the difference in performance we’ve seen is probably just some random noise and that there’s probably no difference in performance.</li>
      <li>We conclude that our result <strong>“is not statistically significant”</strong>.</li>
    </ul>
  </li>
  <li>If <code class="language-plaintext highlighter-rouge">5%</code> or less of our randomly generated differences in conversion rates are greater than or equal to our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>, then we say that our experiment result “is special enough”.
    <ul>
      <li>We conclude that our experiment result is unlikely to be random noise and that there’s enough evidence that the variant algorithm performs better than our control algorithm.</li>
      <li>We conclude that our result <strong>is statistically significant</strong>.</li>
    </ul>
  </li>
</ul>

<p>Let’s see the above paragraphs in action. Where do our experiment results lie in our distribution of random noise?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">plot_hist</span><span class="p">(</span><span class="n">simulated_diffs_in_rates</span><span class="p">,</span>
          <span class="n">bins</span><span class="o">=</span><span class="mi">15</span><span class="p">,</span>
          <span class="n">observed_rate</span><span class="o">=</span><span class="mf">0.167</span><span class="p">,</span>
          <span class="n">title</span><span class="o">=</span><span class="sh">"</span><span class="s">Our range of pure randomness</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/output_113_0.png" width="55%" class="align-center" />
</div>

<p><strong>Looking to the right of our red line</strong>, we can see the observations that are greater than or equal to our observed result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>. It already looks like more than <code class="language-plaintext highlighter-rouge">5%</code> of our randomly generated differences in conversion rates are <code class="language-plaintext highlighter-rouge">&gt;= ~+16.7%</code>. But let’s confirm!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">OBSERVED_DIFF_IN_RATES</span> <span class="o">=</span> <span class="mf">0.167</span> <span class="c1"># this is our experiment result
</span>
<span class="n">num_diffs_gte_observed</span> <span class="o">=</span> <span class="p">(</span><span class="n">simulated_diffs_in_rates</span> <span class="o">&gt;=</span> <span class="n">OBSERVED_DIFF_IN_RATES</span><span class="p">).</span><span class="nf">sum</span><span class="p">()</span>
<span class="n">num_samples</span> <span class="o">=</span> <span class="n">simulated_diffs_in_rates</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">num_diffs_gte_observed</span><span class="si">:</span><span class="p">,</span><span class="si">}</span><span class="s"> out of </span><span class="si">{</span><span class="n">num_samples</span><span class="si">:</span><span class="p">,</span><span class="si">}</span><span class="s"> random samples show differences in rates greater than or equal to </span><span class="si">{</span><span class="n">OBSERVED_DIFF_IN_RATES</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">percentage of random noise distribution with difference in rates greater than or equal to </span><span class="si">{</span><span class="n">OBSERVED_DIFF_IN_RATES</span><span class="si">:</span><span class="p">.</span><span class="mi">1</span><span class="o">%</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">num_diffs_gte_observed</span> <span class="o">/</span> <span class="n">num_samples</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1,267 out of 10,000 random samples show differences in rates greater than or equal to 16.7%
percentage of random noise distribution with difference in rates greater than or equal to 16.7%: 12.67%
</code></pre></div></div>

<p>Now to conclude:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">~12.67%</code> of our random samples are greater than or equal to our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code></li>
  <li>We hoped that <code class="language-plaintext highlighter-rouge">5%</code> or less of our random samples would be greater than or equal to our observed experiment result of <code class="language-plaintext highlighter-rouge">~+16.7%</code>.</li>
  <li>Our experiment result “isn’t special enough”. In other words, our result “is not statistically significant”.</li>
  <li>Given how we’ve set up our experiment, our result is probably due to random noise.</li>
  <li>We don’t have enough evidence to say that our variant algorithm performs better than our control algorithm.</li>
</ul>

<p>Nice!</p>

<h1 id="an-example-where-our-results-are-special-enough">An example where our results are “special enough”</h1>

<p>To wrap this up, let’s simulate an example where the difference in performance is “special enough” and we therefore conclude that our variant algorithm might perform better than our control algorithm.</p>

<p>Let’s create a larger fake experiment:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_CONTROL_USERS</span> <span class="o">=</span> <span class="mi">1_000_000</span>
<span class="n">NUM_CONVERTING_CONTROL_USERS</span> <span class="o">=</span> <span class="mi">26_000</span>

<span class="n">NUM_VARIANT_USERS</span> <span class="o">=</span> <span class="mi">1_000_000</span>
<span class="n">NUM_CONVERTING_VARIANT_USERS</span> <span class="o">=</span> <span class="mi">26_400</span>

<span class="c1"># create our arrays of users
</span><span class="n">control_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="n">NUM_CONTROL_USERS</span><span class="p">)</span>
<span class="n">control_users</span><span class="p">[:</span><span class="n">NUM_CONVERTING_CONTROL_USERS</span><span class="p">]</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="n">control_conversion_rate</span> <span class="o">=</span> <span class="n">control_users</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>

<span class="n">variant_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="n">NUM_VARIANT_USERS</span><span class="p">)</span>
<span class="n">variant_users</span><span class="p">[:</span><span class="n">NUM_CONVERTING_VARIANT_USERS</span><span class="p">]</span> <span class="o">=</span> <span class="mf">1.0</span>
<span class="n">variant_conversion_rate</span> <span class="o">=</span> <span class="n">variant_users</span><span class="p">.</span><span class="nf">mean</span><span class="p">()</span>

<span class="c1"># calculate our experiment result
</span><span class="n">observed_diff_in_rates</span> <span class="o">=</span> <span class="n">variant_conversion_rate</span> <span class="o">-</span> <span class="n">control_conversion_rate</span>

<span class="c1"># chuck our users into a bag
</span><span class="n">all_users</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">hstack</span><span class="p">([</span><span class="n">control_users</span><span class="p">,</span> <span class="n">variant_users</span><span class="p">])</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">control conversion rate = </span><span class="si">{</span><span class="n">control_conversion_rate</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">variant conversion rate = </span><span class="si">{</span><span class="n">variant_conversion_rate</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">observed absolute difference in conversion rates = </span><span class="si">{</span><span class="n">observed_diff_in_rates</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">observed relative difference in conversion rates = </span><span class="si">{</span><span class="n">observed_diff_in_rates</span> <span class="o">/</span> <span class="n">control_conversion_rate</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>control conversion rate = 2.60%
variant conversion rate = 2.64%
observed absolute difference in conversion rates = 0.04%
observed relative difference in conversion rates = 1.54%
</code></pre></div></div>

<p>We’ll use the <code class="language-plaintext highlighter-rouge">numba</code> library to do the resampling efficiently:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@njit</span><span class="p">(</span><span class="n">parallel</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">sample_diffs_in_rates</span><span class="p">(</span><span class="n">all_users</span><span class="p">,</span> <span class="n">num_control_users</span><span class="p">,</span> <span class="n">num_simulations</span><span class="p">):</span>
    <span class="n">results</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">zeros</span><span class="p">(</span><span class="n">num_simulations</span><span class="p">)</span>
    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">prange</span><span class="p">(</span><span class="n">num_simulations</span><span class="p">):</span>
        <span class="c1"># numpy random shuffling appears to be slower when using numba
</span>        <span class="n">random</span><span class="p">.</span><span class="nf">shuffle</span><span class="p">(</span><span class="n">all_users</span><span class="p">)</span>
        <span class="n">control_rate</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[:</span><span class="n">num_control_users</span><span class="p">].</span><span class="nf">mean</span><span class="p">()</span>
        <span class="c1"># we assume the rest of the users are variant users
</span>        <span class="n">variant_rate</span> <span class="o">=</span> <span class="n">all_users</span><span class="p">[</span><span class="n">num_control_users</span><span class="p">:].</span><span class="nf">mean</span><span class="p">()</span>
        <span class="n">results</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">variant_rate</span> <span class="o">-</span> <span class="n">control_rate</span>
    <span class="k">return</span> <span class="n">results</span>
</code></pre></div></div>

<p>Run the simulations!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_SIMULATIONS</span> <span class="o">=</span> <span class="mi">10_000</span>

<span class="n">sampled_diffs</span> <span class="o">=</span> <span class="nf">sample_diffs_in_rates</span><span class="p">(</span><span class="n">all_users</span><span class="p">,</span> <span class="n">NUM_CONTROL_USERS</span><span class="p">,</span> <span class="n">NUM_SIMULATIONS</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s look at where our observed difference of <code class="language-plaintext highlighter-rouge">0.04%</code> lies in our range of randomness:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">plot_hist</span><span class="p">(</span><span class="n">sampled_diffs</span><span class="p">,</span>
          <span class="n">bins</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
          <span class="n">observed_rate</span><span class="o">=</span><span class="n">observed_diff_in_rates</span><span class="p">,</span>
          <span class="n">title</span><span class="o">=</span><span class="sh">"</span><span class="s">Our range of pure randomness</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>
<div>
    <img src="/assets/post_images/2023-05-15-statistical-significance-intuition/output_125_0.png" width="55%" class="align-center" />
</div>

<p>Given our “specialness threshold” of <code class="language-plaintext highlighter-rouge">5%</code>, how well does randomness describe our experiment results?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sampling_specialness_result</span> <span class="o">=</span> <span class="p">(</span><span class="n">sampled_diffs</span> <span class="o">&gt;=</span> <span class="n">observed_diff_in_rates</span><span class="p">).</span><span class="nf">sum</span><span class="p">()</span> <span class="o">/</span> <span class="n">sampled_diffs</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">sampled </span><span class="sh">'</span><span class="s">specialness</span><span class="sh">'</span><span class="s"> result: </span><span class="si">{</span><span class="n">sampling_specialness_result</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sampled 'specialness' result: 3.41%
</code></pre></div></div>

<p>That’s less than <code class="language-plaintext highlighter-rouge">5%</code>! We have a special, statistically significant result. Our variant algorithm probably performs better than our control one.</p>

<p>Let’s compare this to the “actual” result as derived through traditional statistics:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">actual_specialness_result</span> <span class="o">=</span> <span class="nf">score_test_proportions_2indep</span><span class="p">(</span><span class="n">NUM_CONVERTING_VARIANT_USERS</span><span class="p">,</span> <span class="n">NUM_VARIANT_USERS</span><span class="p">,</span> <span class="n">NUM_CONVERTING_CONTROL_USERS</span><span class="p">,</span> <span class="n">NUM_CONTROL_USERS</span><span class="p">,</span> <span class="n">alternative</span><span class="o">=</span><span class="sh">"</span><span class="s">larger</span><span class="sh">"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">actual </span><span class="sh">'</span><span class="s">specialness</span><span class="sh">'</span><span class="s"> result: </span><span class="si">{</span><span class="n">actual_specialness_result</span><span class="p">.</span><span class="n">pvalue</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="o">%</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>actual 'specialness' result: 3.83%
</code></pre></div></div>

<p>We also have a statistically significant result.</p>

<p>The “specialness” values aren’t equal because they’re derived in different ways. However, they’re close enough for the day-to-day of data scientists. What’s more important is that you <strong>specify your “specialness threshold” before running an experiment</strong> and stick to it when making decisions based on your experiment’s results.</p>

<h1 id="a-note-for-if-youre-doing-this-at-work">A note for if you’re doing this at work</h1>

<p>You shouldn’t implement this stuff from scratch. Use a robust implementation like the one in <a href="https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.permutation_test.html" target="_blank" rel="noopener">scipy</a>.</p>

<p>Phew! We’re done!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Experiments&quot;, &quot;Python&quot;]" /><summary type="html"><![CDATA[An attempt at explaining a difficult concept in a characteristically dense but intuitive way]]></summary></entry><entry><title type="html">Faster web scraping with Python and asyncio</title><link href="https://embracingtherandom.com/software/python/faster-web-scraping-with-asyncio/" rel="alternate" type="text/html" title="Faster web scraping with Python and asyncio" /><published>2023-02-11T12:39:00+11:00</published><updated>2023-02-11T12:39:00+11:00</updated><id>https://embracingtherandom.com/software/python/faster-web-scraping-with-asyncio</id><content type="html" xml:base="https://embracingtherandom.com/software/python/faster-web-scraping-with-asyncio/"><![CDATA[<p>Do you use Jupyter? Are you still sending sequential web requests like a noob? Then this article is for you!</p>

<p>This article is for intermediate Python programmers, so I’m going to skip over a lot of the detail.</p>

<p>I couldn’t have done this without these two books:</p>

<blockquote>
  <p><em>Python Concurrency with asyncio</em> by Matthew Fowler</p>
</blockquote>

<blockquote>
  <p><em>Using Asyncio in Python: Understanding Python’s Asynchronous Programming Features</em> by Caleb Hattingh</p>
</blockquote>

<p>These books are great. Buy them!</p>

<h2 id="packages-used-in-this-post">Packages used in this post</h2>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># standard library packages
</span><span class="kn">import</span> <span class="n">asyncio</span> 
<span class="kn">import</span> <span class="n">gzip</span> 
<span class="kn">import</span> <span class="n">functools</span> 
<span class="kn">from</span> <span class="n">functools</span> <span class="kn">import</span> <span class="n">partial</span> 
<span class="kn">from</span> <span class="n">time</span> <span class="kn">import</span> <span class="n">perf_counter</span>
<span class="kn">import</span> <span class="n">random</span>

<span class="c1"># other packages
</span><span class="kn">import</span> <span class="n">requests</span>
<span class="kn">import</span> <span class="n">aiohttp</span>
</code></pre></div></div>

<h2 id="getting-some-data-to-work-with">Getting some data to work with</h2>

<p>We’ll be using a sample file from the Wikimedia dumps:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WIKIMEDIA_DUMP_URL</span> <span class="o">=</span> <span class="sh">"</span><span class="s">https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-all-titles.gz</span><span class="sh">"</span>

<span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">WIKIMEDIA_DUMP_URL</span><span class="p">)</span>
</code></pre></div></div>

<p>The file is a <code class="language-plaintext highlighter-rouge">gzip</code> file, so we need to decompress it. <code class="language-plaintext highlighter-rouge">gzip.decompress</code> returns a byte string that can be decoded to UTF-8:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">titles</span> <span class="o">=</span> <span class="n">gzip</span><span class="p">.</span><span class="nf">decompress</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">content</span><span class="p">).</span><span class="nf">decode</span><span class="p">(</span><span class="sh">"</span><span class="s">utf-8</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<p>Let’s take a look at our data. There are some interesting article titles here. I’m leaving them in <strong>for science</strong>! No need to hide them.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">titles</span><span class="p">[:</span><span class="mi">100</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'page_namespace\tpage_title\n0\t!\n0\t!!\n0\t!!!\n0\t!!!!!!!\n0\t!!!Fuck_You!!!\n0\t!!!Fuck_You!!!_And_Then_Some\n0'
</code></pre></div></div>

<p>We make some observations:</p>
<ul>
  <li>We’ve got one huge string</li>
  <li>There’s a header row, indicating there are two columns: <code class="language-plaintext highlighter-rouge">page_namespace</code> and <code class="language-plaintext highlighter-rouge">page_title</code></li>
  <li>The file is tab-delimited, indicated by the <code class="language-plaintext highlighter-rouge">\t</code> characters</li>
  <li>There are new line characters, indicated by the <code class="language-plaintext highlighter-rouge">\n</code> characters</li>
</ul>

<p>I don’t want to process this whole string for this demonstration. Let’s take the first 100k article titles:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_new_lines</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="nf">len</span><span class="p">(</span><span class="n">titles</span><span class="p">)):</span>
    <span class="k">if</span> <span class="n">titles</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="sh">"</span><span class="se">\n</span><span class="sh">"</span><span class="p">:</span>
        <span class="n">num_new_lines</span> <span class="o">+=</span> <span class="mi">1</span>
    <span class="c1"># we add 1 to the number of lines to account for the header row
</span>    <span class="k">if</span> <span class="n">num_new_lines</span> <span class="o">&gt;=</span> <span class="mi">100_000</span> <span class="o">+</span> <span class="mi">1</span><span class="p">:</span>
        <span class="n">target_index</span> <span class="o">=</span> <span class="n">i</span>
        <span class="k">break</span>    
</code></pre></div></div>

<p>This is the position of the new line character immediately after the 100,000th article title:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="n">target_index</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>2302543
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">titles_substr</span> <span class="o">=</span> <span class="n">titles</span><span class="p">[:</span><span class="n">target_index</span><span class="p">]</span>
</code></pre></div></div>

<p>Next, we extract the article titles. We’ll do these things:</p>
<ul>
  <li>Split the string on new line characters</li>
  <li>Split each line by the tab delimiter and take the second element, which is the article title</li>
  <li>Remove the first element to discard the header row</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">titles_sample</span> <span class="o">=</span> <span class="p">[</span><span class="n">line</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">'</span><span class="se">\t</span><span class="sh">'</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">titles_substr</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">'</span><span class="se">\n</span><span class="sh">'</span><span class="p">)]</span>
<span class="n">titles_sample</span> <span class="o">=</span> <span class="n">titles_sample</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
</code></pre></div></div>

<p>The result is a list of 100k article titles:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">titles_sample</span><span class="p">[:</span><span class="mi">10</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['!',
 '!!',
 '!!!',
 '!!!!!!!',
 '!!!Fuck_You!!!',
 '!!!Fuck_You!!!_And_Then_Some',
 '!!!Fuck_You!!!_and_Then_Some',
 '!!!_(!!!_album)',
 '!!!_(American_band)',
 '!!!_(Chk_Chk_Chk)']
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">number of article titles: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">titles_sample</span><span class="p">)</span><span class="si">:</span><span class="p">,</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>number of article titles: 100,000
</code></pre></div></div>

<p>Noice!</p>

<h2 id="making-the-requests">Making the requests</h2>

<p>We’ll be making the request to URLs looking like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">WIKIPEDIA_BASE_URL</span> <span class="o">=</span> <span class="sh">"</span><span class="s">https://en.wikipedia.org/wiki/{TITLE}</span><span class="sh">"</span>
</code></pre></div></div>

<p>Here’s the first URL:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">title</span> <span class="o">=</span> <span class="n">titles_sample</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">WIKIPEDIA_BASE_URL</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">TITLE</span><span class="o">=</span><span class="n">title</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>https://en.wikipedia.org/wiki/!
</code></pre></div></div>

<p>Let’s make a single request:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
</code></pre></div></div>

<p>The response is a byte string, so we’ll decode it to UTF-8:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">response</span><span class="p">.</span><span class="n">content</span><span class="p">.</span><span class="nf">decode</span><span class="p">(</span><span class="sh">'</span><span class="s">utf-8</span><span class="sh">'</span><span class="p">)[:</span><span class="mi">1000</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>'&lt;!DOCTYPE html&gt;\n&lt;html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled" lang="en" dir="ltr"&gt;\n&lt;head&gt;\n&lt;meta charset="UTF-8"/&gt;\n&lt;title&gt;Exclamation mark - Wikipedia&lt;/title&gt;\n&lt;script&gt;document.documentElement.className="client-js vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled";(function(){var cookie=document.cookie.matc'
</code></pre></div></div>

<p>Note the <code class="language-plaintext highlighter-rouge">&lt;title&gt;Exclamation mark - Wikipedia&lt;/title&gt;</code>. We have an article about the exclamation mark!</p>

<p>Let’s make a list of URLs we’ll be using for the rest of this post:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">urls</span> <span class="o">=</span> <span class="p">[</span><span class="n">WIKIPEDIA_BASE_URL</span><span class="p">.</span><span class="nf">format</span><span class="p">(</span><span class="n">TITLE</span><span class="o">=</span><span class="n">title</span><span class="p">)</span> <span class="k">for</span> <span class="n">title</span> <span class="ow">in</span> <span class="n">titles_sample</span><span class="p">]</span>

<span class="n">urls</span><span class="p">[:</span><span class="mi">10</span><span class="p">]</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['https://en.wikipedia.org/wiki/!',
 'https://en.wikipedia.org/wiki/!!',
 'https://en.wikipedia.org/wiki/!!!',
 'https://en.wikipedia.org/wiki/!!!!!!!',
 'https://en.wikipedia.org/wiki/!!!Fuck_You!!!',
 'https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some',
 'https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some',
 'https://en.wikipedia.org/wiki/!!!_(!!!_album)',
 'https://en.wikipedia.org/wiki/!!!_(American_band)',
 'https://en.wikipedia.org/wiki/!!!_(Chk_Chk_Chk)']
</code></pre></div></div>

<h3 id="the-sequential-way">The sequential way</h3>

<p>Let’s first make 10 requests sequentially using the <code class="language-plaintext highlighter-rouge">requests</code> package. This is painfully slow:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">sequential_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">:</span>
        <span class="n">response</span> <span class="o">=</span> <span class="n">requests</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
</code></pre></div></div>

<p>We’ll time how long it takes on average by running it 5 times:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%%</span><span class="n">timeit</span> <span class="o">-</span><span class="n">n</span> <span class="mi">1</span> <span class="o">-</span><span class="n">r</span> <span class="mi">5</span>
<span class="nf">sequential_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">10</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>4.43 s ± 76.7 ms per loop (mean ± std. dev. of 5 runs, 1 loop each)
</code></pre></div></div>

<p>Boo-urns!</p>

<h3 id="the-async-way">The async way</h3>

<p>We’ll be using <code class="language-plaintext highlighter-rouge">asyncio</code> and <code class="language-plaintext highlighter-rouge">aiohttp</code> to make async requests.</p>

<h4 id="creating-a-timing-decorator">Creating a timing decorator</h4>

<p>I can’t use cell magic here so will create an async timer decorator to use in the rest of this post:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">async_timer</span><span class="p">(</span><span class="n">num_iter</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">timer_decorator</span><span class="p">(</span><span class="n">func</span><span class="p">):</span>
        <span class="nd">@functools.wraps</span><span class="p">(</span><span class="n">func</span><span class="p">)</span>
        <span class="k">async</span> <span class="k">def</span> <span class="nf">wrapper</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
            <span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
            <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">num_iter</span><span class="p">):</span>
                <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">running iteration </span><span class="si">{</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
                <span class="n">start_time</span> <span class="o">=</span> <span class="nf">perf_counter</span><span class="p">()</span>
                <span class="k">await</span> <span class="nf">func</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
                <span class="n">end_time</span> <span class="o">=</span> <span class="nf">perf_counter</span><span class="p">()</span>
                <span class="n">time_taken</span> <span class="o">=</span> <span class="n">end_time</span> <span class="o">-</span> <span class="n">start_time</span>
                <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">iteration </span><span class="si">{</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s"> took </span><span class="si">{</span><span class="n">time_taken</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="sh">"</span><span class="p">)</span>
                <span class="n">results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">time_taken</span><span class="p">)</span>
            <span class="n">mean_time</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">results</span><span class="p">)</span> <span class="o">/</span> <span class="nf">len</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
            <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">Average time over </span><span class="si">{</span><span class="n">num_iter</span><span class="si">}</span><span class="s"> iterations: </span><span class="si">{</span><span class="n">mean_time</span><span class="si">:</span><span class="p">.</span><span class="mi">2</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="sh">"</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">wrapper</span>
    <span class="k">return</span> <span class="n">timer_decorator</span>
</code></pre></div></div>

<h4 id="write-some-coroutines">Write some coroutines</h4>

<p>What’s a coroutine? I can’t describe them better than Matthew Fowler, so here’s a quote from his book from page 24:</p>

<blockquote>
  <p>Think of a coroutine like a regular Python function but with the superpower that it
can pause its execution when it encounters an operation that could take a while to
complete. When that long-running operation is complete, we can “wake up” our
paused coroutine and finish executing any other code in that coroutine. While a
paused coroutine is waiting for the operation it paused for to finish, we can run other
code. This running of other code while waiting is what gives our application concur-
rency. We can also run several time-consuming operations concurrently, which can
give our applications big performance improvements.</p>
</blockquote>

<p>We create coroutines using the <code class="language-plaintext highlighter-rouge">async</code> keyword. We tell them to pause using the <code class="language-plaintext highlighter-rouge">await</code> keyword.</p>

<p>Coroutines aren’t scheduled for execution until we schedule them for execution! To do this, we wrap our coroutines in <code class="language-plaintext highlighter-rouge">Tasks</code>, which we’ll create using <code class="language-plaintext highlighter-rouge">asyncio.create_task</code>.</p>

<p>Here’s how we’ll be creating our list of tasks to complete:</p>
<ul>
  <li>We create a single task by passing our coroutine <code class="language-plaintext highlighter-rouge">make_request</code> into <code class="language-plaintext highlighter-rouge">asyncio.create_task</code>.</li>
  <li>We do this for each url in our urls list</li>
</ul>

<p>We then wait for our tasks to complete using <code class="language-plaintext highlighter-rouge">asyncio.gather</code>:</p>
<ul>
  <li>We specify <code class="language-plaintext highlighter-rouge">return_exceptions=True</code> because the default behaviour of <code class="language-plaintext highlighter-rouge">asyncio.gather</code> is to raise the first exception it encounters.</li>
  <li>This leaves any remaining tasks running in the background even though it looks like code execution has stopped.</li>
  <li>We instead want <code class="language-plaintext highlighter-rouge">asyncio.gather</code> to return exceptions in its list of results. This allows us to handle any exceptions as we see fit.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
        <span class="k">return</span> <span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">text</span><span class="p">()</span>

<span class="nd">@async_timer</span><span class="p">(</span><span class="n">num_iter</span><span class="o">=</span><span class="mi">5</span><span class="p">)</span>
<span class="k">async</span> <span class="k">def</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
        <span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">))</span> <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">]</span>
        <span class="k">return</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">,</span> <span class="n">return_exceptions</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
</code></pre></div></div>

<p>We’ll be running <code class="language-plaintext highlighter-rouge">make_requests</code> 5 times just like we did for our sequential requests using the <code class="language-plaintext highlighter-rouge">@async_timer(num_iter=5)</code> decorator:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">10</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>running iteration 1
iteration 1 took 0.64 seconds
running iteration 2
iteration 2 took 0.61 seconds
running iteration 3
iteration 3 took 0.64 seconds
running iteration 4
iteration 4 took 0.61 seconds
running iteration 5
iteration 5 took 1.55 seconds
Average time over 5 iterations: 0.81 seconds
</code></pre></div></div>

<p>That’s a lot better, but not much better. We see the benefit of the async approach when we make a larger number of requests:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">await</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">1_000</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>running iteration 1
iteration 1 took 16.05 seconds
running iteration 2
iteration 2 took 32.65 seconds
running iteration 3
iteration 3 took 65.37 seconds
running iteration 4
iteration 4 took 65.73 seconds
running iteration 5
iteration 5 took 32.74 seconds
Average time over 5 iterations: 42.51 seconds
</code></pre></div></div>

<p>That’s a lot faster! We’re making about 60 requests per second.</p>

<h4 id="why-is-this-so-much-faster">Why is this so much faster?</h4>

<p>We can see that <code class="language-plaintext highlighter-rouge">aiohttp</code> opens up a whole lotta non-blocking sockets!</p>

<p><img src="/assets/post_images/2023-02-11-faster-web-scraping-with-asyncio/sockets.jpg" alt="sockets" width="90%" class="align-center" /></p>

<p>What’s happening under the hood? In the <a href="https://docs.aiohttp.org/en/stable/http_request_lifecycle.html?highlight=blocking#why-is-aiohttp-client-api-that-way">aiohttp documentation</a>, they give us an insight, comparing it to the <code class="language-plaintext highlighter-rouge">requests</code> package. Firstly, they describe what happens in the <code class="language-plaintext highlighter-rouge">requests</code> package when we call <code class="language-plaintext highlighter-rouge">.get()</code>:</p>

<blockquote>
  <p>When doing <code class="language-plaintext highlighter-rouge">response.text</code> in <code class="language-plaintext highlighter-rouge">requests</code>, you just read an attribute. The call to <code class="language-plaintext highlighter-rouge">.get()</code> already preloaded and decoded the entire response payload, in a blocking manner.</p>
</blockquote>

<p>Then on how <code class="language-plaintext highlighter-rouge">aiohttp</code> does it:</p>

<blockquote>
  <p><code class="language-plaintext highlighter-rouge">aiohttp</code> loads only the headers when <code class="language-plaintext highlighter-rouge">.get()</code> is executed, letting you decide to pay the cost of loading the body afterward, in a second asynchronous operation. Hence the <code class="language-plaintext highlighter-rouge">await response.text()</code>.</p>
</blockquote>

<h2 id="what-if-we-want-to-handle-exceptions">What if we want to handle exceptions?</h2>

<h3 id="cancelling-pending-tasks-on-the-first-exception">Cancelling pending tasks on the first exception</h3>

<p>What if we want to shut down any remaining tasks on an exception? We can use <code class="language-plaintext highlighter-rouge">asyncio.wait</code> for more granular control.In the below, we do these things:</p>
<ul>
  <li>We simulate a GET request timing out in <code class="language-plaintext highlighter-rouge">make_request</code>. We set the timeout to be some random integer.</li>
  <li>We create a list of tasks to execute using <code class="language-plaintext highlighter-rouge">asyncio.create_task</code>. Each task created using this function is immediately scheduled for execution.</li>
  <li>We issue <code class="language-plaintext highlighter-rouge">asyncio.wait(tasks, return_when=asyncio.FIRST_EXCEPTION)</code>. On the first exception (in this case, a <code class="language-plaintext highlighter-rouge">aiohttp.ClientTimeout</code> error), we return two <code class="language-plaintext highlighter-rouge">set</code>s:
    <ul>
      <li>The <code class="language-plaintext highlighter-rouge">done set</code> contains tasks that are either completed successfully or completed with an exception.</li>
      <li>The <code class="language-plaintext highlighter-rouge">pending set</code> contains tasks which we don’t have results for yet.</li>
    </ul>
  </li>
  <li>We loop through the <code class="language-plaintext highlighter-rouge">done set</code> to demonstrate that at least one task has returned an exception.</li>
  <li>We then loop through all tasks in the <code class="language-plaintext highlighter-rouge">pending set</code> and cancel them.
    <ul>
      <li>If we don’t cancel them like this, the tasks continue to run despite our exception.</li>
    </ul>
  </li>
  <li>We wait for a few seconds and demonstrate that all pending tasks have been cancelled.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
    <span class="c1"># set a random request timeout to demonstrate cancelling pending
</span>    <span class="c1"># tasks on a TimeoutError
</span>    <span class="n">timeout</span> <span class="o">=</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientTimeout</span><span class="p">(</span><span class="n">total</span><span class="o">=</span><span class="n">random</span><span class="p">.</span><span class="nf">random</span><span class="p">()</span> <span class="o">*</span> <span class="mi">10</span><span class="p">)</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
        <span class="k">return</span> <span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">text</span><span class="p">()</span>
        

<span class="k">async</span> <span class="k">def</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
        <span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">))</span> 
                 <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">]</span>
        
        <span class="n">done</span><span class="p">,</span> <span class="n">pending</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">wait</span><span class="p">(</span><span class="n">tasks</span><span class="p">,</span> <span class="n">return_when</span><span class="o">=</span><span class="n">asyncio</span><span class="p">.</span><span class="n">FIRST_EXCEPTION</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">number of done tasks: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">done</span><span class="p">)</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>
        <span class="n">msg_prefix</span> <span class="o">=</span> <span class="sh">"</span><span class="s">task result: </span><span class="sh">"</span>
        <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">done</span><span class="p">:</span>
            <span class="k">try</span><span class="p">:</span>
                <span class="n">result</span> <span class="o">=</span> <span class="n">task</span><span class="p">.</span><span class="nf">result</span><span class="p">()</span>
                <span class="nf">print</span><span class="p">(</span><span class="n">msg_prefix</span> <span class="o">+</span> <span class="sh">'</span><span class="s">task completed successfully!</span><span class="sh">'</span><span class="p">)</span>
            <span class="k">except</span> <span class="n">asyncio</span><span class="p">.</span><span class="nb">TimeoutError</span><span class="p">:</span>
                <span class="nf">print</span><span class="p">(</span><span class="n">msg_prefix</span> <span class="o">+</span> <span class="sh">"</span><span class="s">task timed out!</span><span class="sh">"</span><span class="p">)</span>
            
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">number of pending tasks to cancel: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">pending</span><span class="p">)</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>
        <span class="k">for</span> <span class="n">pending_task</span> <span class="ow">in</span> <span class="n">pending</span><span class="p">:</span>
            <span class="n">pending_task</span><span class="p">.</span><span class="nf">cancel</span><span class="p">()</span>

        <span class="c1"># wait a few seconds to demonstrate if our pending tasks were actually cancelled
</span>        <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
        
        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">pending_task</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">pending</span><span class="p">):</span>
            <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">pending task </span><span class="si">{</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s"> cancelled? </span><span class="si">{</span><span class="n">pending_task</span><span class="p">.</span><span class="nf">cancelled</span><span class="p">()</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>

</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">await</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">30</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>number of done tasks: 1
task result: task timed out!
number of pending tasks to cancel: 29
pending task 1 cancelled? True
pending task 2 cancelled? True
pending task 3 cancelled? True
pending task 4 cancelled? True
pending task 5 cancelled? True
pending task 6 cancelled? True
pending task 7 cancelled? True
pending task 8 cancelled? True
pending task 9 cancelled? True
pending task 10 cancelled? True
pending task 11 cancelled? True
pending task 12 cancelled? True
pending task 13 cancelled? True
pending task 14 cancelled? True
pending task 15 cancelled? True
pending task 16 cancelled? True
pending task 17 cancelled? True
pending task 18 cancelled? True
pending task 19 cancelled? True
pending task 20 cancelled? True
pending task 21 cancelled? True
pending task 22 cancelled? True
pending task 23 cancelled? True
pending task 24 cancelled? True
pending task 25 cancelled? True
pending task 26 cancelled? True
pending task 27 cancelled? True
pending task 28 cancelled? True
pending task 29 cancelled? True
</code></pre></div></div>

<p>Nice!</p>

<h2 id="what-if-i-have-a-huge-number-of-tasks-and-want-to-avoid-creating-them-up-front">What if I have a huge number of tasks and want to avoid creating them up-front?</h2>

<p>An issue with creating an iterable of <code class="language-plaintext highlighter-rouge">Tasks</code> is that we need to spend some time to create it before we can begin executing them:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">def</span> <span class="nf">dummy_task</span><span class="p">():</span>
    <span class="k">pass</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">num_tasks</span><span class="p">):</span>
    <span class="n">start_time</span> <span class="o">=</span> <span class="nf">perf_counter</span><span class="p">()</span>
    <span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">dummy_task</span><span class="p">())</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">num_tasks</span><span class="p">)]</span>
    <span class="n">end_time</span> <span class="o">=</span> <span class="nf">perf_counter</span><span class="p">()</span>
    <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="o">*</span><span class="n">tasks</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">took </span><span class="si">{</span><span class="n">end_time</span> <span class="o">-</span> <span class="n">start_time</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="s"> seconds</span><span class="sh">'</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">number of tasks to run: </span><span class="si">{</span><span class="nf">len</span><span class="p">(</span><span class="n">tasks</span><span class="p">)</span><span class="si">:</span><span class="p">,</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">num_tasks</span> <span class="ow">in</span> <span class="p">[</span><span class="mi">10</span><span class="o">**</span><span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">7</span><span class="p">)]:</span>
    <span class="k">await</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">num_tasks</span><span class="o">=</span><span class="n">num_tasks</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>took 0.0001 seconds
number of tasks to run: 10
took 0.0005 seconds
number of tasks to run: 100
took 0.0073 seconds
number of tasks to run: 1,000
took 0.0683 seconds
number of tasks to run: 10,000
took 0.4629 seconds
number of tasks to run: 100,000
took 7.3401 seconds
number of tasks to run: 1,000,000
</code></pre></div></div>

<p>How can we avoid creating tasks up front? We can use generators with async <code class="language-plaintext highlighter-rouge">workers</code> and <code class="language-plaintext highlighter-rouge">asyncio.Queue</code>s!</p>

<p>In the below, we do these things:</p>
<ul>
  <li>We create a queue using <code class="language-plaintext highlighter-rouge">asyncio.Queue</code>. I’ll explain the <code class="language-plaintext highlighter-rouge">QUEUE_SIZE</code> in a bit.</li>
  <li>We have an async producer function named <code class="language-plaintext highlighter-rouge">producer_fn</code>.
    <ul>
      <li>Its responsibility is to put tasks onto the queue.</li>
    </ul>
  </li>
  <li>We have an async worker function named <code class="language-plaintext highlighter-rouge">worker_fn</code>.
    <ul>
      <li>Its responsibility is to get tasks from the queue and complete to them.</li>
    </ul>
  </li>
  <li>We initialise an asyncio.Queue with a max queue size of <code class="language-plaintext highlighter-rouge">5</code>. Setting a maximum queue size is the key to solving this problem.</li>
  <li>Now on the <code class="language-plaintext highlighter-rouge">QUEUE_SIZE</code>:
    <ul>
      <li>If we had an unbounded queue, the <code class="language-plaintext highlighter-rouge">producer_fn</code> would quickly create all tasks we want to execute, giving us a result like the one we saw before.</li>
      <li>When our queue reaches <code class="language-plaintext highlighter-rouge">QUEUE_SIZE</code>, the call to <code class="language-plaintext highlighter-rouge">await queue.put()</code> blocks until a worker removes an item from the queue.</li>
    </ul>
  </li>
  <li>Our workers are run in an infinite loop. To exit the infinite loop while cleaning up after ourselves, we do these things:
    <ul>
      <li>When we interrupt the kernel, an <code class="language-plaintext highlighter-rouge">asyncio.CancelledError</code> is raised. We catch it in our <code class="language-plaintext highlighter-rouge">run_tasks</code> coroutine.</li>
      <li>We cancel the running workers and print their statuses to confirm that our cancellations have worked.</li>
    </ul>
  </li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">QUEUE_SIZE</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">NUM_WORKERS</span> <span class="o">=</span> <span class="mi">10</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
        <span class="k">return</span> <span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">text</span><span class="p">()</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">producer_fn</span><span class="p">(</span><span class="n">queue</span><span class="p">,</span> <span class="n">urls</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">:</span>
        <span class="k">if</span> <span class="n">queue</span><span class="p">.</span><span class="nf">qsize</span><span class="p">()</span> <span class="o">==</span> <span class="n">QUEUE_SIZE</span><span class="p">:</span>
            <span class="nf">print</span><span class="p">(</span><span class="sh">'</span><span class="s">PRODUCER - QUEUE FULL...waiting...</span><span class="sh">'</span><span class="p">)</span>
        <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="nf">put</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">producer enqueued </span><span class="si">{</span><span class="n">url</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">'</span><span class="s">producer exhausted</span><span class="sh">'</span><span class="p">)</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">worker_fn</span><span class="p">(</span><span class="n">queue</span><span class="p">,</span> <span class="n">session</span><span class="p">):</span>
    <span class="n">task</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">current_task</span><span class="p">()</span>
    <span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
        <span class="n">val</span> <span class="o">=</span> <span class="k">await</span> <span class="n">queue</span><span class="p">.</span><span class="nf">get</span><span class="p">()</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">--- worker </span><span class="si">{</span><span class="n">task</span><span class="p">.</span><span class="nf">get_name</span><span class="p">()</span><span class="si">}</span><span class="s"> got </span><span class="si">{</span><span class="n">val</span><span class="si">}</span><span class="s"> from queue</span><span class="sh">"</span><span class="p">)</span>
        <span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">--- worker </span><span class="si">{</span><span class="n">task</span><span class="p">.</span><span class="nf">get_name</span><span class="p">()</span><span class="si">}</span><span class="s"> completed task</span><span class="sh">"</span><span class="p">)</span>
        <span class="n">queue</span><span class="p">.</span><span class="nf">task_done</span><span class="p">()</span>
        

<span class="k">async</span> <span class="k">def</span> <span class="nf">run_tasks</span><span class="p">(</span><span class="n">urls</span><span class="p">):</span>
    <span class="n">queue</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nc">Queue</span><span class="p">(</span><span class="n">QUEUE_SIZE</span><span class="p">)</span>
    
    <span class="k">async</span> <span class="k">with</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
        <span class="n">producer</span> <span class="o">=</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">producer_fn</span><span class="p">(</span><span class="n">queue</span><span class="p">,</span> <span class="n">urls</span><span class="p">))</span>
        <span class="n">workers</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">worker_fn</span><span class="p">(</span><span class="n">queue</span><span class="p">,</span> <span class="n">session</span><span class="p">),</span> <span class="n">name</span><span class="o">=</span><span class="sa">f</span><span class="sh">'</span><span class="s">worker_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">NUM_WORKERS</span><span class="p">)]</span>
        
        <span class="k">try</span><span class="p">:</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="n">producer</span><span class="p">,</span>  <span class="o">*</span><span class="n">workers</span><span class="p">)</span>
        <span class="k">except</span> <span class="n">asyncio</span><span class="p">.</span><span class="n">CancelledError</span><span class="p">:</span>
            <span class="nf">print</span><span class="p">(</span><span class="sh">'</span><span class="s">received CancelledError...killing workers</span><span class="sh">'</span><span class="p">)</span>
            <span class="p">[</span><span class="n">w</span><span class="p">.</span><span class="nf">cancel</span><span class="p">()</span> <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">workers</span><span class="p">]</span>
            <span class="k">for</span> <span class="n">w</span> <span class="ow">in</span> <span class="n">workers</span><span class="p">:</span>
                <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">worker cancelled? </span><span class="si">{</span><span class="n">w</span><span class="p">.</span><span class="nf">cancelled</span><span class="p">()</span><span class="si">}</span><span class="sh">'</span><span class="p">)</span>

</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">await</span> <span class="nf">run_tasks</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">20</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>producer enqueued https://en.wikipedia.org/wiki/!
producer enqueued https://en.wikipedia.org/wiki/!!
producer enqueued https://en.wikipedia.org/wiki/!!!
producer enqueued https://en.wikipedia.org/wiki/!!!!!!!
producer enqueued https://en.wikipedia.org/wiki/!!!Fuck_You!!!
PRODUCER - QUEUE FULL...waiting...
--- worker worker_0 got https://en.wikipedia.org/wiki/! from queue
--- worker worker_1 got https://en.wikipedia.org/wiki/!! from queue
--- worker worker_2 got https://en.wikipedia.org/wiki/!!! from queue
--- worker worker_3 got https://en.wikipedia.org/wiki/!!!!!!! from queue
--- worker worker_4 got https://en.wikipedia.org/wiki/!!!Fuck_You!!! from queue
producer enqueued https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some
producer enqueued https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some
producer enqueued https://en.wikipedia.org/wiki/!!!_(!!!_album)
producer enqueued https://en.wikipedia.org/wiki/!!!_(American_band)
producer enqueued https://en.wikipedia.org/wiki/!!!_(Chk_Chk_Chk)
PRODUCER - QUEUE FULL...waiting...
--- worker worker_5 got https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some from queue
--- worker worker_6 got https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some from queue
--- worker worker_7 got https://en.wikipedia.org/wiki/!!!_(!!!_album) from queue
--- worker worker_8 got https://en.wikipedia.org/wiki/!!!_(American_band) from queue
--- worker worker_9 got https://en.wikipedia.org/wiki/!!!_(Chk_Chk_Chk) from queue
producer enqueued https://en.wikipedia.org/wiki/!!!_(album)
producer enqueued https://en.wikipedia.org/wiki/!!!_(band)
producer enqueued https://en.wikipedia.org/wiki/!!!_(disambiguation)
producer enqueued https://en.wikipedia.org/wiki/!!!_discography
producer enqueued https://en.wikipedia.org/wiki/!!Destroy-Oh-Boy!!
PRODUCER - QUEUE FULL...waiting...
--- worker worker_0 completed task
--- worker worker_0 got https://en.wikipedia.org/wiki/!!!_(album) from queue
producer enqueued https://en.wikipedia.org/wiki/!!Fuck_you!!
PRODUCER - QUEUE FULL...waiting...
--- worker worker_1 completed task
--- worker worker_1 got https://en.wikipedia.org/wiki/!!!_(band) from queue
producer enqueued https://en.wikipedia.org/wiki/!!Going_Places!!
PRODUCER - QUEUE FULL...waiting...
--- worker worker_2 completed task
--- worker worker_2 got https://en.wikipedia.org/wiki/!!!_(disambiguation) from queue
producer enqueued https://en.wikipedia.org/wiki/!!Que_Corra_La_Voz!!
PRODUCER - QUEUE FULL...waiting...
--- worker worker_3 completed task
--- worker worker_3 got https://en.wikipedia.org/wiki/!!!_discography from queue
producer enqueued https://en.wikipedia.org/wiki/!!_(chess)
PRODUCER - QUEUE FULL...waiting...
--- worker worker_4 completed task
--- worker worker_4 got https://en.wikipedia.org/wiki/!!Destroy-Oh-Boy!! from queue
--- worker worker_5 completed task
--- worker worker_5 got https://en.wikipedia.org/wiki/!!Fuck_you!! from queue
producer enqueued https://en.wikipedia.org/wiki/!!_(disambiguation)
producer exhausted
--- worker worker_8 completed task
--- worker worker_8 got https://en.wikipedia.org/wiki/!!Going_Places!! from queue
--- worker worker_9 completed task
--- worker worker_9 got https://en.wikipedia.org/wiki/!!Que_Corra_La_Voz!! from queue
--- worker worker_6 completed task
--- worker worker_6 got https://en.wikipedia.org/wiki/!!_(chess) from queue
--- worker worker_0 completed task
--- worker worker_0 got https://en.wikipedia.org/wiki/!!_(disambiguation) from queue
--- worker worker_1 completed task
--- worker worker_3 completed task
--- worker worker_2 completed task
--- worker worker_4 completed task
--- worker worker_5 completed task
--- worker worker_8 completed task
--- worker worker_9 completed task
--- worker worker_6 completed task
--- worker worker_0 completed task
--- worker worker_7 completed task
received CancelledError...killing workers
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
worker cancelled? True
</code></pre></div></div>

<h2 id="what-if-i-want-to-retry-failed-requests">What if I want to retry failed requests?</h2>

<p>This is an idea from p258 of Matthew Fowler’s great book <em>Python Concurrency with asyncio</em>:</p>

<ul>
  <li>We create a <code class="language-plaintext highlighter-rouge">retry</code> coroutine, which wraps the coroutine we want to retry (<code class="language-plaintext highlighter-rouge">make_request</code>)</li>
  <li>We specify some arguments in our calls to <code class="language-plaintext highlighter-rouge">retry</code>:
    <ul>
      <li>The maximum number of times we want to retry (<code class="language-plaintext highlighter-rouge">max_retries</code>)</li>
      <li>The number of seconds before we decide that our request has timed out (<code class="language-plaintext highlighter-rouge">timeout</code>)</li>
      <li>The number of seconds between our retry attempts (<code class="language-plaintext highlighter-rouge">retry_interval</code>)</li>
    </ul>
  </li>
  <li>If our <code class="language-plaintext highlighter-rouge">retry</code> coroutine exceeds the <code class="language-plaintext highlighter-rouge">max_retry</code> number, we raise a <code class="language-plaintext highlighter-rouge">TooManyRetries</code> exception, which is caught in our entry point coroutine, <code class="language-plaintext highlighter-rouge">make_requests</code></li>
  <li>We accumulate our successful requests along with our failed requests for future use</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">TooManyRetries</span><span class="p">(</span><span class="nb">Exception</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__str__</span><span class="p">(</span><span class="n">self</span><span class="p">):</span>
        <span class="n">class_name</span> <span class="o">=</span> <span class="nf">type</span><span class="p">(</span><span class="n">self</span><span class="p">).</span><span class="n">__name__</span>
        <span class="n">exception_args</span> <span class="o">=</span> <span class="n">self</span><span class="p">.</span><span class="n">args</span>
        <span class="k">if</span> <span class="nf">len</span><span class="p">(</span><span class="n">exception_args</span><span class="p">)</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
            <span class="k">return</span> <span class="sa">f</span><span class="sh">'</span><span class="si">{</span><span class="n">class_name</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">exception_args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="si">}</span><span class="sh">'</span>
        <span class="k">return</span> <span class="n">class_name</span>


<span class="k">async</span> <span class="k">def</span> <span class="nf">retry</span><span class="p">(</span><span class="n">coro</span><span class="p">,</span> <span class="n">url</span><span class="p">,</span> <span class="n">max_retries</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mf">2.0</span><span class="p">,</span> <span class="n">retry_interval</span><span class="o">=</span><span class="mf">1.0</span><span class="p">):</span>
    <span class="k">for</span> <span class="n">retry_num</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">max_retries</span><span class="p">):</span>
        <span class="k">try</span><span class="p">:</span>
            <span class="k">return</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">wait_for</span><span class="p">(</span><span class="n">coro</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="n">timeout</span><span class="p">)</span>
        <span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
            <span class="c1"># catch any exception because we want to retry upon any failure
</span>            <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">request to </span><span class="si">{</span><span class="n">url</span><span class="si">}</span><span class="s"> failed. (tried </span><span class="si">{</span><span class="n">retry_num</span> <span class="o">+</span> <span class="mi">1</span><span class="si">}</span><span class="s"> times)</span><span class="sh">'</span><span class="p">)</span>
            <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">sleep</span><span class="p">(</span><span class="n">retry_interval</span><span class="p">)</span>
    <span class="k">raise</span> <span class="nc">TooManyRetries</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>

<span class="k">async</span> <span class="k">def</span> <span class="nf">make_request</span><span class="p">(</span><span class="n">session</span><span class="p">,</span> <span class="n">url</span><span class="p">):</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">session</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">as</span> <span class="n">response</span><span class="p">:</span>
        <span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">response</span><span class="p">.</span><span class="nf">text</span><span class="p">()</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">'</span><span class="s">request to </span><span class="si">{</span><span class="n">url</span><span class="si">}</span><span class="s"> completed successfully!</span><span class="sh">'</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">result</span>
    

<span class="k">async</span> <span class="k">def</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="n">timeout_seconds</span><span class="o">=</span><span class="mf">2.0</span><span class="p">):</span>
    <span class="n">results</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="n">failed</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">async</span> <span class="k">with</span> <span class="n">aiohttp</span><span class="p">.</span><span class="nc">ClientSession</span><span class="p">()</span> <span class="k">as</span> <span class="n">session</span><span class="p">:</span>
        <span class="n">make_request_func</span> <span class="o">=</span> <span class="nf">partial</span><span class="p">(</span><span class="n">make_request</span><span class="p">,</span> <span class="n">session</span><span class="p">)</span>
        
        <span class="c1"># we set the name of the task to recover it during exception handling
</span>        <span class="n">pending_tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="p">.</span><span class="nf">create_task</span><span class="p">(</span><span class="nf">retry</span><span class="p">(</span><span class="nf">make_request_func</span><span class="p">(</span><span class="n">url</span><span class="p">),</span>
                                                   <span class="n">url</span><span class="p">,</span>
                                                   <span class="n">max_retries</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span>
                                                   <span class="n">timeout</span><span class="o">=</span><span class="n">timeout_seconds</span><span class="p">,</span>
                                                   <span class="n">retry_interval</span><span class="o">=</span><span class="mf">1.0</span><span class="p">),</span>
                                             <span class="n">name</span><span class="o">=</span><span class="n">url</span><span class="p">)</span> <span class="k">for</span> <span class="n">url</span> <span class="ow">in</span> <span class="n">urls</span><span class="p">]</span>
        <span class="c1"># process any tasks that completed successfully or completed with an exception
</span>        <span class="k">while</span> <span class="n">pending_tasks</span><span class="p">:</span>
            <span class="n">done</span><span class="p">,</span> <span class="n">pending_tasks</span> <span class="o">=</span> <span class="k">await</span> <span class="n">asyncio</span><span class="p">.</span><span class="nf">wait</span><span class="p">(</span><span class="n">pending_tasks</span><span class="p">,</span> <span class="n">return_when</span><span class="o">=</span><span class="n">asyncio</span><span class="p">.</span><span class="n">FIRST_COMPLETED</span><span class="p">)</span>

            <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">done</span><span class="p">:</span>
                <span class="k">if</span> <span class="n">task</span><span class="p">.</span><span class="nf">exception</span><span class="p">()</span> <span class="ow">is</span> <span class="ow">not</span> <span class="bp">None</span><span class="p">:</span>
                    <span class="n">task_name</span> <span class="o">=</span> <span class="n">task</span><span class="p">.</span><span class="nf">get_name</span><span class="p">()</span>
                    <span class="nf">print</span><span class="p">(</span><span class="n">task</span><span class="p">.</span><span class="nf">exception</span><span class="p">())</span>
                    <span class="n">failed</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="nf">str</span><span class="p">(</span><span class="n">task</span><span class="p">.</span><span class="nf">exception</span><span class="p">()))</span>
                <span class="k">else</span><span class="p">:</span>
                    <span class="n">results</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">task</span><span class="p">.</span><span class="nf">result</span><span class="p">())</span>

        <span class="k">return</span> <span class="n">results</span><span class="p">,</span> <span class="n">failed</span>
</code></pre></div></div>

<p>Let’s run it with a short timeout duration to inspect our retrying behaviour:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">results</span><span class="p">,</span> <span class="n">failed</span> <span class="o">=</span> <span class="k">await</span> <span class="nf">make_requests</span><span class="p">(</span><span class="n">urls</span><span class="p">[:</span><span class="mi">10</span><span class="p">],</span> <span class="n">timeout_seconds</span><span class="o">=</span><span class="mf">0.5</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>request to https://en.wikipedia.org/wiki/!! completed successfully!
request to https://en.wikipedia.org/wiki/!!!Fuck_You!!! completed successfully!
request to https://en.wikipedia.org/wiki/!!!_(!!!_album) completed successfully!
request to https://en.wikipedia.org/wiki/!!!_(American_band) completed successfully!
request to https://en.wikipedia.org/wiki/!!!_(Chk_Chk_Chk) completed successfully!
request to https://en.wikipedia.org/wiki/!!! completed successfully!
request to https://en.wikipedia.org/wiki/! failed. (tried 1 times)
request to https://en.wikipedia.org/wiki/!!!!!!! failed. (tried 1 times)
request to https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some failed. (tried 1 times)
request to https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some failed. (tried 1 times)
request to https://en.wikipedia.org/wiki/! failed. (tried 2 times)
request to https://en.wikipedia.org/wiki/!!!!!!! failed. (tried 2 times)
request to https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some failed. (tried 2 times)
request to https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some failed. (tried 2 times)
TooManyRetries: https://en.wikipedia.org/wiki/!!!!!!!
TooManyRetries: https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some
TooManyRetries: https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some
TooManyRetries: https://en.wikipedia.org/wiki/!
</code></pre></div></div>

<p>Nice! How many successful requests do we have?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">len</span><span class="p">(</span><span class="n">results</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>6
</code></pre></div></div>

<p>Let’s see how many characters each successful result contains:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">([</span><span class="nf">len</span><span class="p">(</span><span class="n">result</span><span class="p">)</span> <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">])</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[41274, 73642, 73078, 115246, 115237, 114936]
</code></pre></div></div>

<p>How many failed requests do we have?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">len</span><span class="p">(</span><span class="n">failed</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>4
</code></pre></div></div>

<p>What have they returned?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">failed</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['TooManyRetries: https://en.wikipedia.org/wiki/!!!!!!!',
 'TooManyRetries: https://en.wikipedia.org/wiki/!!!Fuck_You!!!_and_Then_Some',
 'TooManyRetries: https://en.wikipedia.org/wiki/!!!Fuck_You!!!_And_Then_Some',
 'TooManyRetries: https://en.wikipedia.org/wiki/!']
</code></pre></div></div>

<p>Good!</p>

<h2 id="conclusion">Conclusion</h2>

<p>While I’m no expert in asynchronous programming, the journey so far has been fulfilling.</p>

<p>I hope that I’ve inspired you to start your own <code class="language-plaintext highlighter-rouge">asyncio</code> adventure!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Software&quot;, &quot;Python&quot;]" /><summary type="html"><![CDATA[Do you use Jupyter? Are you still sending sequential web requests like a noob? Then this article is for you!]]></summary></entry><entry><title type="html">How to plan your life</title><link href="https://embracingtherandom.com/life/time-management/how-to-plan-your-life/" rel="alternate" type="text/html" title="How to plan your life" /><published>2022-09-17T08:45:00+10:00</published><updated>2022-09-17T08:45:00+10:00</updated><id>https://embracingtherandom.com/life/time-management/how-to-plan-your-life</id><content type="html" xml:base="https://embracingtherandom.com/life/time-management/how-to-plan-your-life/"><![CDATA[<blockquote>
  <p>Practical lessons from Oliver Burkeman’s “Four Thousand Weeks”</p>
</blockquote>

<p><em>The first post about the lessons found in “Four Thousand Weeks” can be found <a href="https://embracingtherandom.com/life/time-management/antidote-to-hustle-culture/">here</a>.</em></p>

<p>I finished reading <a href="https://www.oliverburkeman.com/books">Oliver Burkeman’s “Four Thousand Weeks”</a>. What a book it is! I still highly recommend it.</p>

<p>The book ends with practical things you can do to spend your time wisely. These tips can be used not only to make your life more meaningful in the long run but also to help you plan out your day.</p>

<p>Let’s learn how you can spend your limited time on this Earth wisely.</p>

<h1 id="the-five-questions-to-ask-yourself">The five questions to ask yourself</h1>

<p>The book ends with some things you can start doing today to implement the philosophy of the book.</p>

<p>The author encourages you to reflect on the following five questions. Below each question are the important ideas that give the questions some context.</p>

<ol>
  <li><strong>Where in your life or your work are you currently pursuing comfort when what’s called for is a little discomfort?</strong>
    <ul>
      <li>Doing the most meaningful things in life is inherently uncomfortable because we might fail at them for any number of reasons.</li>
      <li>By avoiding this discomfort, we tend to make time allocation choices that “prioritize anxiety-avoidance instead.”</li>
      <li>When making significant life decisions such as leaving a job or relationship (or committing to diving deeper into these things), ask yourself, “Does this choice diminish  me, or enlarge me?”</li>
    </ul>
  </li>
  <li><strong>Are you holding yourself to, and judging yourself by, standards of productivity or performance that are impossible to meet?</strong>
    <ul>
      <li>What would you choose to do with your time if you knew that you don’t have the time to accomplish all that you want in life?</li>
    </ul>
  </li>
  <li><strong>In what ways have you yet to accept the fact that you are who you are, not the person you think you ought to be?</strong>
    <ul>
      <li>You enjoy doing the things you enjoy doing and not the things you think you ought to enjoy. We should accept this.</li>
      <li>You’re talented in the things you’re talented in and not the things you wished you were talented in. We should accept this, too.</li>
      <li>There’s a lot in this world that’s broken. That doesn’t mean that your contribution to this world is limited to making a large impact that can potentially change the world.</li>
      <li>Maybe your contribution is more “local”. Perhaps it’s taking care of those around you. Maybe it’s making and releasing the music you’ve written. Maybe it’s being a pasty chef.</li>
    </ul>
  </li>
  <li><strong>In which areas of life are you still holding back until you feel like you know what you’re doing?</strong>
    <ul>
      <li>Many of us tend to treat our lives as dress rehearsals of the “real thing” that might come around in the future.</li>
      <li>Maybe that takes the form of staying in a job you hate for “just one more year” as you gain the experience you think you need to get your next job.</li>
      <li>No matter what you do, you’ll never feel 100% in control of that thing. You might take a new job and not know all the answers to the questions people ask you.</li>
      <li>Everyone around you is winging it, all the time. The people you think are experts in what they do also don’t know all the answers!</li>
      <li>We’re all in the same boat, and that’s a liberating thought.</li>
      <li>So take the leap and do that thing you’ve been thinking of doing!</li>
    </ul>
  </li>
  <li><strong>How would you spend your days differently if you didn’t care so much about seeing your actions reach fruition?</strong>
    <ul>
      <li>Don’t judge what you do by their results.</li>
      <li>There are things that you work on during your life that you may never finish because we’ll die before we see them reach their conclusion.</li>
      <li>We should allocate our time to do the things that we enjoy doing. We should work on the things that we find meaningful, and not the things that we think we ought to find meaningful.</li>
      <li>This way, the results don’t matter.</li>
    </ul>
  </li>
</ol>

<h2 id="how-im-implementing-these-questions-into-my-life">How I’m implementing these questions into my life</h2>

<p>I simply have a calendar reminder set up with these questions in it. I’ve got the calendar invite to repeat every 2 months. Simple!</p>

<p><img src="https://user-images.githubusercontent.com/6435319/190827531-cc9fb7b7-f8e6-426a-a37f-9adfba142ad0.png" alt="2022-09-16-08-38-38" width="40%" class="align-center" /></p>

<p>I have the above dot points in the “Description” section of my calendar:</p>

<p><img src="https://user-images.githubusercontent.com/6435319/190827542-0475d0fe-d248-4b56-a219-917e7f66cfa3.png" alt="2022-09-16-08-44-47" width="70%" class="align-center" /></p>

<p>“Simple!” x 2.</p>

<h1 id="the-ten-tools-to-embrace-your-finitude">The ten tools to embrace your finitude</h1>

<p>In the appendix, you’ll find an additional set of tools that can help us remind ourselves that our time isn’t unlimited and that we need to make conscious decisions about how we spend it to build more meaningful lives.</p>

<ol>
  <li><strong>Adopt a “fixed volume” approach to productivity</strong>
    <ul>
      <li>Create two lists - one is the “open” list and the other is the “closed” list.</li>
      <li>Your “open” list is where all the things on your plate go. This is going to be a huge list!</li>
      <li>Your “closed” list is the few items you’ve moved from your “open” list that you choose to focus on.</li>
      <li>You need to finish a task in your “closed” list before you can move a new task into it from the “open” list.</li>
    </ul>
  </li>
  <li><strong>Serialize, serialize, serialize</strong>
    <ul>
      <li>We try to get rid of that anxious feeling of having too many things to do by getting started on them all and not completing any of them.</li>
      <li>Focus on one big project at a time and see it to completion.</li>
      <li>Apply this same thinking at work if you can.</li>
    </ul>
  </li>
  <li><strong>Decide in advance what to fail at</strong>
    <ul>
      <li>You can’t over-achieve in every area of your life. You simply don’t have the time to be excellent at everything.</li>
      <li>Instead, choose to be less than exceptional in certain areas of your life. For example, you could choose to have a poorly kept garden while you focus your energies on excelling at other parts of your life.</li>
      <li>Proactively choosing to fail makes it easier to accept failures when they occur.</li>
    </ul>
  </li>
  <li><strong>Focus on what you’ve already completed, not just on what’s left to complete</strong>
    <ul>
      <li>If we focus exclusively on what we need to do, rather than to also take note of those things that we’ve completed, we wake up each day feeling as though we owe some “productivity debt” that we need to pay down during the day.</li>
      <li>Keep a “Done” list that contains the things that you’ve completed during the day.</li>
    </ul>
  </li>
  <li><strong>Consolidate your caring</strong>
    <ul>
      <li>Social media and the news are full of depictions of atrocities occurring around the world.</li>
      <li>While each of them deserves our full attention, we simply don’t have the capacity to focus on all of them.</li>
      <li>Instead, devote yourself to working on a select few of them. This way, you’ll make real progress in addressing these pressing world issues.</li>
    </ul>
  </li>
  <li><strong>Embrace boring and single-purpose technology</strong>
    <ul>
      <li>Smartphones mean that we never have to be bored again, no matter where we are.</li>
      <li>These things are detrimental to our being present in the moment.</li>
      <li>Opt for technologies that don’t allow you to do much else but the main thing they are designed for.</li>
      <li>A Kindle is a good example of such a device. It’s difficult to do anything using it but read books.</li>
    </ul>
  </li>
  <li><strong>Seek out novelty in the mundane</strong>
    <ul>
      <li>As we grow older, time seems to speed up.</li>
      <li>One theory to explain this is that the older we grow, the less new information we process from moment to moment. We get to a stage where we’ve seen it all before. We move less, living in similar neighbourhoods. We commute to similar jobs. We see the same friends. When you’re younger, there are many more things you’re experiencing for the first time so your brain is processing lots of new information all the time.</li>
      <li>Seek out novelty in what you do daily. Meditate to train your brain to appreciate the present. Take a different route to work. Go for an unplanned walk.</li>
    </ul>
  </li>
  <li><strong>Be a “researcher” in relationships</strong>
    <ul>
      <li>You can’t control how people act.</li>
      <li>Instead of trying to achieve a goal from an interaction, be curious about the human being who is in front of you. Wonder what life they lead to become the person that they are today. Try to guess how this person might react to your proposal.</li>
      <li>Curiosity is a better stance to take given the unpredictability of human interaction because curiosity is “satisfied by their behaving in ways you like or dislike—whereas the stance of demanding a certain result is frustrated each time things fail to go your way.”</li>
    </ul>
  </li>
  <li><strong>Cultivate instantaneous generosity</strong>
    <ul>
      <li>Whenever you feel the desire to do something generous like helping a stranger or sending a nice message to your friend, act on it right away.</li>
      <li>This is simply a good way to live your life.</li>
    </ul>
  </li>
  <li><strong>Practice doing nothing</strong>
    <ul>
      <li>If you can’t stand doing nothing, you’re more likely to make a bad choice with your time because you’ll end up doing anything to convince yourself that you’re making progress toward some future goal.</li>
      <li>Practice “Do Nothing” meditation. Set a timer for five minutes and do nothing. If you catch yourself focusing on your breathing, stop that. Keep stopping yourself until the timer goes off. One could argue that the act of stopping ourselves from doing something is us doing something - but that isn’t the point!</li>
    </ul>
  </li>
</ol>

<h2 id="how-ive-implemented-the-ten-tools">How I’ve implemented the ten tools</h2>

<p>In the true spirit of the book, I’ve chosen to focus on using a few of the above tools rather than focusing on all of them.</p>

<h3 id="implementing-the-fixed-volume-approach-to-productivity-and-paying-attention-to-the-things-ive-completed">Implementing the fixed volume approach to productivity and paying attention to the things I’ve completed</h3>

<p>I’ve set up a few lists in Google Keep:</p>

<p><img src="https://user-images.githubusercontent.com/6435319/190827546-77709c05-4645-44c7-a88a-342c76477a73.png" alt="2022-09-17-07-59-18" width="70%" class="align-center" /></p>

<p>As I complete things in my “Closed” list, I get a list of things I’ve completed during the day:</p>

<p><img src="https://user-images.githubusercontent.com/6435319/190827547-583167a8-6f7e-4025-9a61-35c6bd6efd74.png" alt="2022-09-17-08-00-02" width="45%" class="align-center" /></p>

<p>I clear the “Done” items every morning when I choose what I want to move from the “Open” list to the “Closed” list.</p>

<h3 id="implementing-the-serialised-approach-to-projects">Implementing the serialised approach to projects</h3>

<p>Right now, I’m itching to work on another blog post. Completing this blog post before moving on to the next is one way I’m implementing the advice in this section.</p>

<p>I have too many unfinished blog post series. The main reason for this is that I invest a lot of energy into each one and it gets tiring! I’m going to opt for working on blog posts for 15 mins per day on most days. My hope is that by doing this, I’ll keep making progress while having fun writing blog posts about solving tricky problems. I want to avoid my pattern of working on a problem for 3 hours straight and eventually hating the problem I’m working on!</p>

<h3 id="implementing-seeking-out-novelty-in-the-mundane">Implementing seeking out novelty in the mundane</h3>

<p>I constantly fail at this but try my best at paying attention to the present by implementing meditation into my life outside of the dedicated (sitting) meditation time. In my decade or so of meditation, it only became apparent to me a few years ago that I should be paying attention to sensations and thoughts while I go about my day.</p>

<p>When I catch myself using my phone too much, I’ve started putting my phone away in a cupboard so as to be more present in what I’m doing.</p>

<h3 id="implementing-being-a-researcher-in-relationships">Implementing being a researcher in relationships</h3>

<p>I’ve found wondering why a person acts in certain ways to be very interesting. Some questions I’ve wondered about are these:</p>

<ul>
  <li>What was the person’s life like before this moment where they are interacting with me?</li>
  <li>What happened in their life for them to have the reactions they have?</li>
  <li>What does this person value? Why do they value these things?</li>
</ul>

<p>Being more curious about people at a deeper level has helped me become more compassionate as these are questions I’ve used to psychoanalyse myself!</p>

<h3 id="implementing-instantaneous-generosity">Implementing instantaneous generosity</h3>

<p>I just try to do this day-to-day. When I feel like I want to help someone, I try to act on it straight away. It feels good to help others!</p>

<p>The other day, I was at the post office. A lady was trying to send something to a family member in Greece. The post office employee told her that she needed to complete a customs declaration form. That form was much easier to complete using a smartphone. The alternative was to fill out a paper form. The lady didn’t know how to scan QR codes to access the online form as she had just bought a new phone so she opted to fill out the paper form. I saw her struggling with it so I asked if she wanted some help. It turns out that her kids didn’t have the patience to teach her how to use her new phone! She was very grateful for my help. She felt happy. I felt happy. Everyone was happy! It felt great to have made this small, positive contribution to this lady’s day.</p>

<h1 id="ending-with-a-beautiful-quote">Ending with a beautiful quote</h1>

<p>This is from the last paragraph of the book:</p>

<blockquote>
  <p>If you can face the truth about time
in this way—if you can step more fully into the condition of
being a limited human—you will reach the greatest heights of
productivity, accomplishment, service, and fulfillment that were
ever in the cards for you to begin with. And the life you will
see incrementally taking shape, in the rearview mirror, will be
one that meets the only definitive measure of what it means to
have used your weeks well: not how many people you helped,
or how much you got done; but that working within the limits
of your moment in history, and your finite time and talents,
you actually got around to doing—and made life more luminous
for the rest of us by doing—whatever magnificent task or weird
little thing it was that you came here for.</p>
</blockquote>

<p>I hope you read this book.</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Life&quot;, &quot;Time-management&quot;]" /><summary type="html"><![CDATA[Practical lessons from Oliver Burkeman’s “Four Thousand Weeks”]]></summary></entry><entry><title type="html">An antidote to hustle culture: a better way to manage your time</title><link href="https://embracingtherandom.com/life/time-management/antidote-to-hustle-culture/" rel="alternate" type="text/html" title="An antidote to hustle culture: a better way to manage your time" /><published>2022-08-13T08:35:00+10:00</published><updated>2022-08-13T08:35:00+10:00</updated><id>https://embracingtherandom.com/life/time-management/antidote-to-hustle-culture</id><content type="html" xml:base="https://embracingtherandom.com/life/time-management/antidote-to-hustle-culture/"><![CDATA[<blockquote>
  <p>Oliver Burkeman’s “Four Thousand Weeks” is already one of the best books I’ve ever read</p>
</blockquote>

<h1 id="a-rant-about-hustle-culture">A rant about “hustle culture”</h1>

<p>For all of my 20s, I was obsessed with optimising my life. I was a keen listener of the Tim Ferriss Show. I’d read “The 4-Hour Workweek”. I had experimented with different sleep cycles to squeeze an extra few productive hours out of each day. I wanted to craft the most efficient morning routine to set myself up for a productive day.</p>

<p>At the same time, I was keenly aware of the fact that my time on this planet is short. Everyone I cared about around me was growing older by the day. My dad had died and my mum was growing older. My dog, Milo (RIP old boy), was starting to go grey. I became focused on maximising my time with them in order to avoid any future regret I might have by not spending my time with them. The thought of them one day dying filled me with dread.</p>

<p>In hindsight, that was all so stupid!</p>

<p>All of these things reduced my happiness and satisfaction with my life. My mind would be focused on whether I was living life to the fullest instead of just enjoying it.</p>

<p>I’m sure you’re all aware of the cultural phenomenon called <strong>“hustle culture”</strong>. My opinion is that hustle culture is toxic. Here’s a hilarious video that’ll give you a taste of the sort of behaviour it glorifies:</p>

<iframe width="560" height="315" src="https://www.youtube.com/embed/_o7qjN3KF8U" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
<p><br />
To put it kindly, I think hustle culture is “silly”. Sleeping less to fit more “crap” into your day? That’s ridiculous! Sleep is so important for your health and well-being:</p>

<blockquote class="twitter-tweet tw-align-center"><p lang="en" dir="ltr">Best nootropic: sleep<br />Best stress relief: sleep<br />Best trauma release: sleep<br />Best immune booster: sleep<br />Best hormone augmentation: sleep<br />Best emotional stabilizer: sleep<br />Sleep Tools: Ep. 2 Huberman Lab Podcast, HLP interview w/Matt Walker <a href="https://t.co/TphgkozQyg">https://t.co/TphgkozQyg</a></p>&mdash; Andrew D. Huberman, Ph.D. (@hubermanlab) <a href="https://twitter.com/hubermanlab/status/1438316907598258177?ref_src=twsrc%5Etfw">September 16, 2021</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

<p>Waking up at 4:00 AM just to do more stuff? Go back to sleep if that’s not how your body works or if you don’t have a good reason to wake up at that time.</p>

<p>Listening to your audiobook at 300% speed to absorb ideas “TO THE MAX”? Bleugh! Just enjoy the damn book!</p>

<h1 id="oliver-burkemans-four-thousand-weeks">Oliver Burkeman’s “Four Thousand Weeks”</h1>

<p>This book has, so far, been amazing. It’ll help you live a more fulfilling life. The key ideas have so far been these:</p>

<ul>
  <li>We try to manage our time to get the most out of our days in the hope that once we get through the tasks on our ever-growing-to-do lists, we will finally get what is truly important to us.</li>
  <li>The paradox is that the more efficient we get at clearing our lists and our inboxes, the more stuff appears to fill it.</li>
  <li>We avoid facing the reality that our time on this earth is frighteningly short by fooling ourselves that we have time to work on everything because this means that we don’t have to make difficult trade-offs as to how we choose to use our time. We fool ourselves by thinking that with enough hard work, we can make all of our dreams come true and that we’re capable of doing everything.</li>
  <li>Others can make impossible demands on your time by asking you to do so many things that it’s simply not possible to do them all. You might be the one making these impossible demands on your time. Once you accept that these demands are in fact impossible, you empower yourself to resist them.</li>
  <li>We should embrace our own limits. We simply don’t have the time to do all of the things we want to do. We might not have the talent required to do some things. We also don’t have the time to do all the things others want us to do. So we shouldn’t beat ourselves up over it!</li>
  <li>Making a choice to spend your time on something inevitably means that you’re choosing not to spend your time doing something else. The important thing is that you’re making a conscious choice and not letting others make that choice for you.</li>
</ul>

<p>One of my favourite quotes so far is this one from page 30:</p>

<blockquote>
  <p>Every decision to use a portion of time on anything
represents the sacrifice of all the other ways in which you
could have spent that time, but didn’t—and to willingly make
that sacrifice is to take a stand, without reservation, on what
matters most to you.</p>
</blockquote>

<h1 id="how-am-i-applying-what-ive-learned-so-far-to-my-own-life">How am I applying what I’ve learned so far to my own life?</h1>

<p>This book has allowed me to reflect on a few aspects of my life:</p>
<ul>
  <li>I probably will never finish my Master’s in computing because it takes up too much of my time and energy. I really love learning about computers, networking, algorithms etc. But I don’t need to go to uni to learn these things. Formal studies detract from my ability to do the things I love, like spending time with my family, hiking and learning and writing about random things just because they’ve captivated my imagination.</li>
  <li>As a result, I will probably never get a PhD in computing or AI. This would mean subjecting myself to years of hard work and taking a big pay cut, which will detract from my ability to take care of my aging mum and build a life of adventure together with my wife (who I love very much).</li>
  <li>I will probably never live and work in Silicon Valley. Making this dream come true would mean trading off a lot of things: time spent with my loved ones in Australia and working my ass off in the Mecca of hustle culture.</li>
</ul>

<p>Choosing not to do the above means I choose to spend my time on the things that mean a lot to me:</p>
<ul>
  <li>Hiking with my wife</li>
  <li>Spending time with my mum</li>
  <li>Reading books because I want to and not because I need to read a prescribed textbook for uni</li>
  <li>Saying “yes” to more social engagements, and not saying “no” because I need to study for an upcoming exam</li>
  <li>Living in another country for a few months whenever we choose to because I don’t have to be back in the country to take an exam</li>
</ul>

<p>The above choices feel good and I will have to remind myself to reassess the things that matter to me most again a few years down the track.</p>

<h1 id="some-choice-quotes">Some choice quotes</h1>

<p>From page 11:</p>

<blockquote>
  <p>It follows from this that time management, broadly defined,
should be everyone’s chief concern. Arguably, time management
is all life is. Yet the modern discipline known as time
management—like its hipper cousin, productivity—is a
depressingly narrow-minded affair, focused on how to crank
through as many work tasks as possible, or on devising the
perfect morning routine, or on cooking all your dinners for the
week in one big batch on Sundays.</p>
</blockquote>

<p>From page 14:</p>

<blockquote>
  <p>In the modern
world, the American anthropologist Edward T. Hall once pointed
out, time feels like an unstoppable conveyor belt, bringing us
new tasks as fast as we can dispatch the old ones; and
becoming “more productive” just seems to cause the belt to
speed up.</p>
</blockquote>

<p>From page 16:</p>

<blockquote>
  <p>Our days are spent trying to “get through” tasks,
in order to get them “out of the way,” with the result that we
live mentally in the future, waiting for when we’ll finally get
around to what really matters—and worrying, in the meantime,
that we don’t measure up, that we might lack the drive or
stamina to keep pace with the speed at which life now seems
to move.</p>
</blockquote>

<p>From page 17:</p>

<blockquote>
  <p>Productivity is a trap. Becoming more efficient
just makes you more rushed, and trying to clear the decks
simply makes them fill up again faster. Nobody in the history
of humanity has ever achieved “work-life balance,” whatever
that might be, and you certainly won’t get there by copying the
“six things successful people do before 7:00 a.m.”</p>
</blockquote>

<p>From page 24:</p>

<blockquote>
  <p>Once time
is a resource to be used, you start to feel pressure, whether
from external forces or from yourself, to use it well, and to
berate yourself when you feel you’ve wasted it. When you’re
faced with too many demands, it’s easy to assume that the
only answer must be to make better use of time, by becoming
more efficient, driving yourself harder, or working for longer—as
if you were a machine in the Industrial Revolution—instead of
asking whether the demands themselves might be unreasonable.</p>
</blockquote>

<p>From pages 24-25:</p>

<blockquote>
  <p>The fundamental problem is that this attitude toward time sets
up a rigged game in which it’s impossible ever to feel as
though you’re doing well enough. Instead of simply living our
lives as they unfold in time—instead of just being time, you
might say—it becomes difficult not to value each moment
primarily according to its usefulness for some future goal, or
for some future oasis of relaxation you hope to reach once your tasks are finally “out of the way.”</p>
</blockquote>

<p>From page 28:</p>

<blockquote>
  <p>After all, it’s painful to confront how limited your
time is, because it means that tough choices are inevitable and
that you won’t have time for all you once dreamed you might
do. It’s also painful to accept your limited control over the time
you do get: maybe you simply lack the stamina or talent or
other resources to perform well in all the roles you feel you
should. And so, rather than face our limitations, we engage in
avoidance strategies, in an effort to carry on feeling limitless.
We push ourselves harder, chasing fantasies of the perfect
work-life balance; or we implement time management systems
that promise to make time for everything, so that tough choices
won’t be required. Or we procrastinate, which is another means
of maintaining the feeling of omnipotent control over
life—because you needn’t risk the upsetting experience of failing
at an intimidating project, obviously, if you never even start it.</p>
</blockquote>

<p>From page 30:</p>

<blockquote>
  <p>In practical terms, a limit-embracing attitude to time means
organizing your days with the understanding that you definitely
won’t have time for everything you want to do, or that other
people want you to do—and so, at the very least, you can stop
beating yourself up for failing. Since hard choices are
unavoidable, what matters is learning to make them consciously,
deciding what to focus on and what to neglect, rather than
letting them get made by default—or deceiving yourself that,
with enough hard work and the right time management tricks,
you might not have to make them at all.</p>
</blockquote>

<h1 id="take-the-antidote">Take the antidote</h1>

<p>There’s a lot to ponder here. Take the antidote and count yourself out of the “rise and grind” culture. Read the book!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Life&quot;, &quot;Time-management&quot;]" /><summary type="html"><![CDATA[Oliver Burkeman’s “Four Thousand Weeks” is already one of the best books I’ve ever read]]></summary></entry><entry><title type="html">Docker and Makefiles</title><link href="https://embracingtherandom.com/software/docker/docker-and-makefiles/" rel="alternate" type="text/html" title="Docker and Makefiles" /><published>2022-08-07T08:29:00+10:00</published><updated>2022-08-07T08:29:00+10:00</updated><id>https://embracingtherandom.com/software/docker/docker-and-makefiles</id><content type="html" xml:base="https://embracingtherandom.com/software/docker/docker-and-makefiles/"><![CDATA[<blockquote>
  <p>A whale of a time!</p>
</blockquote>

<p>I’m learning PyTorch!</p>

<p>I’m writing a Dockerfile using a <a href="https://hub.docker.com/layers/pytorch/pytorch/pytorch/1.12.0-cuda11.3-cudnn8-runtime/images/sha256-1ef1f61b13738de8086ae7e1ce57c89f154e075dae0b165f7590b9405efeb6fe?context=explore">PyTorch base image</a> and installing some Python packages that’ll be useful when developing my models.</p>

<p>I use Makefiles a lot to make my Docker-based workflows easier. I stumbled across a nice Makefile pattern in the <a href="https://github.com/pytorch/pytorch">PyTorch repo</a> and wanted to share it with y’all.</p>

<h1 id="the-original-makefile">The original Makefile</h1>

<p>With my simple Dockerfile in the same directory as my Makefile, I started out writing my Makefile like this:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.PHONY: build
build:
	docker build --progress=plain -t pytorch .

.PHONY: check-gpu
check-gpu:
	docker run --rm --gpus all pytorch nvidia-smi

.PHONY: bash
bash:
    docker run --rm --gpus all pytorch bash
</code></pre></div></div>

<h1 id="the-pytorch-makefile-pattern">The PyTorch Makefile pattern</h1>

<p>In my journey into the PyTorch repo, I found <a href="https://github.com/pytorch/pytorch/blob/master/docker.Makefile">this Makefile</a>, which is used in the <a href="https://github.com/pytorch/pytorch/blob/master/.github/scripts/build_publish_nightly_docker.sh"><code class="language-plaintext highlighter-rouge">build_publish_nightly_docker.sh</code></a> script.</p>

<p>It extracts the <code class="language-plaintext highlighter-rouge">docker build</code> and <code class="language-plaintext highlighter-rouge">docker push</code> commands into the <code class="language-plaintext highlighter-rouge">DOCKER_BUILD</code> and <code class="language-plaintext highlighter-rouge">DOCKER_PUSH</code> Makefile variables:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DOCKER_BUILD  = DOCKER_BUILDKIT=1 \
		docker build \
			--progress=$(BUILD_PROGRESS) \
			$(EXTRA_DOCKER_BUILD_FLAGS) \
			--target $(BUILD_TYPE) \
			-t $(DOCKER_FULL_NAME):$(DOCKER_TAG) \
			$(BUILD_ARGS) .
DOCKER_PUSH = docker push $(DOCKER_FULL_NAME):$(DOCKER_TAG)
</code></pre></div></div>

<p>It also extracts Docker build args into their own variable:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>BUILD_ARGS  = --build-arg BASE_IMAGE=$(BASE_IMAGE) \
		--build-arg PYTHON_VERSION=$(PYTHON_VERSION) \
		--build-arg CUDA_VERSION=$(CUDA_VERSION) \
		--build-arg CUDA_CHANNEL=$(CUDA_CHANNEL) \
		--build-arg PYTORCH_VERSION=$(PYTORCH_VERSION) \
		--build-arg INSTALL_CHANNEL=$(INSTALL_CHANNEL)
</code></pre></div></div>

<p>To build and push the Docker images, other Makefile targets make use of  the above variables. In subsequent Makefile targets, some build args are replaced before executing the command contained within the <code class="language-plaintext highlighter-rouge">DOCKER_BUILD</code> and <code class="language-plaintext highlighter-rouge">DOCKER_PUSH</code> variables:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>runtime-image: BASE_IMAGE := $(BASE_RUNTIME)
runtime-image: DOCKER_TAG := $(PYTORCH_VERSION)-runtime
runtime-image:
	$(DOCKER_BUILD)
	docker tag $(DOCKER_FULL_NAME):$(DOCKER_TAG) $(DOCKER_FULL_NAME):latest

runtime-push: BASE_IMAGE := $(BASE_RUNTIME)
runtime-push: DOCKER_TAG := $(PYTORCH_VERSION)-runtime
runtime-push:
	$(DOCKER_PUSH)
</code></pre></div></div>

<h1 id="my-new-makefile">My new Makefile</h1>

<p>My Makefile is far simpler than the PyTorch one. However, thanks to their Makefile pattern, my simple Makefile is a little bit cleaner and a little bit more maintainable:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>IMAGE_TAG = pytorch
INTERACTIVE = 
DOCKER_RUN = docker run \
		--rm \
		--gpus all \
		$(INTERACTIVE) \
		$(IMAGE_TAG)

.PHONY: build
build:
	docker build --progress=plain -t $(IMAGE_TAG) .

.PHONY: check-gpu
check-gpu:
	$(DOCKER_RUN) nvidia-smi

.PHONY: bash
bash: INTERACTIVE := -it
bash:
	$(DOCKER_RUN) bash	
</code></pre></div></div>

<p>Thank you, PyTorch maintainers!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Software&quot;, &quot;Docker&quot;]" /><summary type="html"><![CDATA[A whale of a time!]]></summary></entry><entry><title type="html">Render LaTeX in Google Docs</title><link href="https://embracingtherandom.com/software/latex-and-google-docs/" rel="alternate" type="text/html" title="Render LaTeX in Google Docs" /><published>2021-09-23T08:11:00+10:00</published><updated>2021-09-23T08:11:00+10:00</updated><id>https://embracingtherandom.com/software/latex-and-google-docs</id><content type="html" xml:base="https://embracingtherandom.com/software/latex-and-google-docs/"><![CDATA[<blockquote>
  <p>This is for all the students out there!</p>
</blockquote>

<p>It turns out that a Google image search for the word “latex” returns many not-safe-for-work images. That’s not the sort of “latex” I’m talking about!</p>

<p>What I’m talking about here is the famous typesetting system, \(\LaTeX\). If you study anything technical at university (or college for readers in the US who seem to make up the majority of my readers!), you would have come across it.</p>

<p>I use Google Docs a lot. I use it at university, at work, and in my personal life. Being a nerd, I frequently need to write equations in Google Docs. Is there a way to write \(\LaTeX\) equations in Google Docs?</p>

<p>Yes, there most certainly is!</p>

<h1 id="how-to-write-and-render-latex-in-google-docs">How to write and render LaTeX in Google Docs</h1>

<h2 id="install-auto-latex-equations">Install “Auto-LaTeX Equations”</h2>

<ol>
  <li>Open up Google Docs and create a new document.</li>
  <li>Go to “Add-ons” -&gt; “Get add-ons”</li>
</ol>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/104984b48fa4c8f956db3242bcaca89295a4c342443272af2e8e9d2bd668a775.jpg" alt="picture 1" width="50%" class="align-center" /></p>

<ol>
  <li>Search for the word “latex”. The first result should be “Auto-LaTeX Equations. Install it!</li>
</ol>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/853a760edcb356d7c1c198fa19f1d46ac74ad75a6ac56e4129ed707692f3cf2a.jpg" alt="picture 2" width="70%" class="align-center" /></p>

<ol>
  <li>Once installed, go back to “Add-ons” -&gt; “Auto-LaTeX Equations” -&gt; “Start”</li>
</ol>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/30d65c5447815a0e41b55842aa6dc8a2ef55bf9037b763b0b899e66f9b55faed.jpg" alt="picture 3" width="50%" class="align-center" /></p>

<p>The Auto-LaTeX Equations toolbar should appear on the right-hand side of the screen.</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/8e01d8207ba3686751b8bb70cd3f963862a319fb509d414a6b6d752639b5467c.jpg" alt="picture 5" width="40%" class="align-center" /></p>

<p>Noice.</p>

<h2 id="writing-single-line-equations">Writing single-line equations</h2>

<p>Just wrap what you want to render into Latex in double dollar signs.</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/d265ced8e70b831749df4796a4612798871a15670a8d99dc524351f557a51845.jpg" alt="picture 4" width="50%" class="align-center" /></p>

<p>Then click on “Render Equations”.</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/543ba45c8a34a6d0b29bf1adf0f4a4c0b8822cdae391776d5c3694d3b80ffa83.jpg" alt="picture 6" width="30%" class="align-center" /></p>

<p>After a little while, you should see something like this!</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/c06ecdabf7444faa8c146e24bc95215d210dea782c56fe29b2213267038413fd.jpg" alt="picture 7" width="80%" class="align-center" /></p>

<p>Nooooooice.</p>

<h2 id="writing-multi-line-equations">Writing multi-line equations</h2>

<ol>
  <li>Start with two dollar signs, just like with single-line equations. Press <code class="language-plaintext highlighter-rouge">shift + enter</code>.</li>
  <li>Type the LaTeX for your first equation. Press <code class="language-plaintext highlighter-rouge">shift + enter</code>.</li>
  <li>Type the LaTeX for your second equation. Press <code class="language-plaintext highlighter-rouge">shift + enter</code>.</li>
  <li>Type the LaTex for your \(n\)th equation. Press <code class="language-plaintext highlighter-rouge">shift + enter</code>.</li>
  <li>Type two dollar signs.</li>
  <li>Press “Render Equations”</li>
</ol>

<p>Something like this:</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/8c120a9960f9a5344bfc0045f43f820821bccd524bb99dc5641dd212b2c019e2.jpg" alt="picture 2" width="15%" class="align-center" /></p>

<p>will turn into this:</p>

<p><img src="/assets/post_images/2021-09-23-latex-and-google-docs/6e3cc3af58962876410fb49c37696017b8277b1f8f7cdaa0028e10a67b3efb86.jpg" alt="picture 3" width="35%" class="align-center" /></p>

<p>Noooooooooooooooice.</p>

<h1 id="how-to-write-some-mathematical-things-using-latex">How to write some mathematical things using LaTeX</h1>

<p>Here’s a brain dump of things I commonly use.</p>

<h2 id="fractions">Fractions</h2>

<p><code class="language-plaintext highlighter-rouge">\frac{1}{2}</code> gives you \(\frac{1}{2}\)</p>

<h2 id="less-than-greater-than-less-than-or-equal-to-greater-than-or-equal-to">Less than, greater than, less than or equal to, greater than or equal to</h2>

<p><code class="language-plaintext highlighter-rouge">&lt;</code> gives you \(&lt;\)<br />
<code class="language-plaintext highlighter-rouge">&gt;</code> gives you \(&gt;\)<br />
<code class="language-plaintext highlighter-rouge">\leq</code> gives you \(\leq\)<br />
<code class="language-plaintext highlighter-rouge">\geq</code> gives you \(\geq\)</p>

<h2 id="exponents-and-subscripts">Exponents and subscripts</h2>

<p><code class="language-plaintext highlighter-rouge">x^i</code> gives you \(x^i\)<br />
<code class="language-plaintext highlighter-rouge">x^{2n + 1}</code> gives you \(x^{2n + 1}\)<br />
<code class="language-plaintext highlighter-rouge">x_{i}</code> gives you \(x_{i}\)<br />
<code class="language-plaintext highlighter-rouge">x_{2n + 1}</code> gives you \(x_{2n + 1}\)</p>

<h2 id="approximately">Approximately</h2>

<p><code class="language-plaintext highlighter-rouge">\approx</code> gives you \(\approx\)</p>

<h2 id="equivalence">Equivalence</h2>

<p><code class="language-plaintext highlighter-rouge">\equiv</code> gives you \(\equiv\)</p>

<h2 id="sums-and-products">Sums and products</h2>

<p><code class="language-plaintext highlighter-rouge">\sum\limits_{i=1}^{n} x_i</code> gives you \(\sum\limits_{i=1}^{n} x_i\)<br />
<code class="language-plaintext highlighter-rouge">\prod\limits_{i=1}^{n} x_i</code> gives you \(\prod\limits_{i=1}^{n} x_i\)</p>

<h2 id="partial-derivatives">Partial derivatives</h2>

<p><code class="language-plaintext highlighter-rouge">\frac{\partial}{\partial x} x^2 + 3y</code> gives you \(\frac{\partial}{\partial x} x^2 + 3y\)</p>

<h2 id="gradient">Gradient</h2>

<p><code class="language-plaintext highlighter-rouge">\nabla f</code> gives you \(\nabla f\)</p>

<h2 id="that-dot-from-one-of-the-ways-to-depict-a-dot-product">That dot from one of the ways to depict a dot product</h2>

<p><code class="language-plaintext highlighter-rouge">x \cdot y</code> gives you \( x \cdot y \)</p>

<h2 id="big-parentheses-and-brackets">Big parentheses and brackets</h2>

<p><code class="language-plaintext highlighter-rouge">7\left(\frac{x + y}{2}\right)</code> gives you \(7\left(\frac{x + y}{2}\right)\)</p>

<p><code class="language-plaintext highlighter-rouge">z\left[x^2 + 7y\right]</code> gives you \(z\left[x^2 + 7y\right]\)</p>

<h2 id="adding-text-in-equations">Adding text in equations</h2>

<p><code class="language-plaintext highlighter-rouge">n\text{th}</code> gives you \(n\text{th}\)</p>

<h2 id="arrows-like-implies">Arrows like “implies”</h2>

<p><code class="language-plaintext highlighter-rouge">\to</code> and <code class="language-plaintext highlighter-rouge">\rightarrow</code> give you \(\to\)</p>

<p><code class="language-plaintext highlighter-rouge">\leftarrow</code> give you \(\leftarrow\)</p>

<p><code class="language-plaintext highlighter-rouge">\implies</code> gives you \(\implies\)</p>

<h2 id="greek-alphabet">Greek alphabet</h2>

<p>They follow a pattern where the lower case variant of the Greek letter begins with a lower case letter. The upper case variant of the same Greek letter begins with an upper case letter. Here are some examples:</p>

<p><code class="language-plaintext highlighter-rouge">\pi</code> and <code class="language-plaintext highlighter-rouge">\Pi</code> give you \(\pi\) and \(\Pi\)<br />
<code class="language-plaintext highlighter-rouge">\phi</code> and <code class="language-plaintext highlighter-rouge">\Phi</code> give you \(\phi\) and \(\Phi\)<br />
<code class="language-plaintext highlighter-rouge">\theta</code> and <code class="language-plaintext highlighter-rouge">\Theta</code> give you \(\theta\) and \(\Theta\)</p>

<h2 id="ellipses">Ellipses</h2>

<p><code class="language-plaintext highlighter-rouge">\dots</code> gives you \(\dots\)</p>

<h2 id="proper-subsets-and-subsets">Proper subsets and subsets</h2>

<p><code class="language-plaintext highlighter-rouge">\subset</code> gives you \(\subset\)<br />
<code class="language-plaintext highlighter-rouge">\subseteq</code> gives you \(\subseteq\)</p>

<h2 id="union-intersection">Union, intersection</h2>

<p><code class="language-plaintext highlighter-rouge">\cup</code> gives you \(\cup\)<br />
<code class="language-plaintext highlighter-rouge">\cap</code> gives you \(\cap\)</p>

<h2 id="aligning-equations">Aligning equations</h2>

<p>Aligning multi-line equations around equal signs can be done by wrapping it in the <code class="language-plaintext highlighter-rouge">align</code> environment. Here’s an example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>\begin{align*}
x &amp;= 20y + z \\
z &amp;= \frac{x}{y}
\end{align*}
</code></pre></div></div>

\[\begin{align*}
x &amp;= 20y + z \\
z &amp;= \frac{x}{y}
\end{align*}\]

<h1 id="conclusion">Conclusion</h1>

<p>Nooooooooooooooooooooooooooooooooooooooooooooooooooooooooice.</p>

<p>No conclusion, really.</p>

<p>I hope you keep on learning!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Software&quot;]" /><summary type="html"><![CDATA[This is for all the students out there!]]></summary></entry><entry><title type="html">The best bolognese recipe</title><link href="https://embracingtherandom.com/food/bolognese/" rel="alternate" type="text/html" title="The best bolognese recipe" /><published>2021-09-12T08:00:00+10:00</published><updated>2021-09-12T08:00:00+10:00</updated><id>https://embracingtherandom.com/food/bolognese</id><content type="html" xml:base="https://embracingtherandom.com/food/bolognese/"><![CDATA[<blockquote>
  <p>Well, that was unexpected…</p>
</blockquote>

<p>Hello!</p>

<p>It’s been ages since I’ve felt like writing anything here. Yesterday I felt the urge again. So here I am! I have a few unfinished post series I want to eventually complete. It can get a tiring always writing about technical things so I’m going to mix it up from time to time.</p>

<p>My wife and I have been enjoying <a href="https://italianfoodies.wordpress.com/2010/10/24/bolognese-like-mamma-used-to-make/">this bolognese recipe</a> for years. If you like an intense sauce, this one’s for you. I’m going to write a variation of it in a no-nonsense way. Here we go!</p>

<h1 id="buy">Buy</h1>

<ul>
  <li>1 carrot</li>
  <li>1 small onion</li>
  <li>1 garlic clove</li>
  <li>500 grams of your preferred minced meat. I normally go with straight-up beef.</li>
  <li>100 grams of pancetta</li>
  <li>Unsalted butter</li>
  <li>Olive oil</li>
  <li>Salt</li>
  <li>Pepper</li>
  <li>Tomato paste</li>
  <li>400 gram tin of chopped tomatoes</li>
  <li>1 bay leaf</li>
  <li>At least 100 ml of beef stock</li>
  <li>Red wine</li>
  <li>Parmigiano-Reggiano</li>
  <li>375 grams of fresh fettuccine or pappardelle.</li>
</ul>

<p>Most importantly - no celery. I hate celery!</p>

<h1 id="prepare">Prepare</h1>

<ul>
  <li>Dice onions</li>
  <li>Dice carrots</li>
  <li>Mince garlic clove</li>
  <li>Dice pancetta into small cubes</li>
  <li>Grate some Parmigiano-Reggiano. You’ll be sprinking this on top of your portions.</li>
</ul>

<h1 id="cook">Cook</h1>

<ul>
  <li>Get a medium sized pot. Put it on low-medium heat.</li>
  <li>Add 3 tablespoons of olive oil and 50 grams of butter to pot.</li>
  <li>Once butter is melted, add your panchetta cubes. Cook them until golden.</li>
  <li>Add onions, carrots, garlic clove and bay leaf to the pot. Cook until onions turn translucent.</li>
  <li>Add minced meat. Break it up and season with pepper. Cook until meat is coloured.</li>
  <li>Turn up heat to high. Give it a few minutes to heat up.</li>
  <li>Add red wine. Let it cook for a few minutes until smell of alcohol disappears.</li>
  <li>Reduce heat to low-medium.</li>
  <li>Add 2 tablespoons of tomato puree, the whole tin of chopped tomatoes and 100 ml of beef stock to the pot.</li>
  <li>If you have the time, let the sauce cook for 1 hour. Add a little bit of water if the sauce is drying out. If you’re in a rush, cook until sauce is thickened to your liking.</li>
  <li>Remove sauce from heat and let it rest for 5 mins.</li>
  <li>About 15 mins before your sauce is done, fill the largest pot you can find with water and bring it to a boil. Salt the water heavily once boiling. Cook your pasta to your liking.</li>
</ul>

<h1 id="serve">Serve</h1>

<ul>
  <li>Strain pasta. Return it to the large pot it was cooked in.</li>
  <li>Pour sauce into the large pot with pasta and mix.</li>
  <li>Serve with heaps of grated Parmigiano-Reggiano.</li>
</ul>

<p>Done!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Food&quot;]" /><summary type="html"><![CDATA[Well, that was unexpected…]]></summary></entry><entry><title type="html">Learning to rank is good for your ML career - Part 2: let’s implement ListNet!</title><link href="https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-2/" rel="alternate" type="text/html" title="Learning to rank is good for your ML career - Part 2: let’s implement ListNet!" /><published>2020-06-07T07:00:00+10:00</published><updated>2020-06-07T07:00:00+10:00</updated><id>https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-2</id><content type="html" xml:base="https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-2/"><![CDATA[<blockquote>
  <p>The second post in an epic to learn to rank lists of things!</p>
</blockquote>

<h2 id="introduction">Introduction</h2>

<p>Now that we know something about word embeddings, let’s use them as inputs into a model that ranks things!</p>

<p>We’ll be working through my implementation of a model called ListNet, which was proposed in this paper:</p>

<blockquote>
  <p><a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-2007-40.pdf"><em>Cao, Zhe et al. “Learning to rank: from pairwise approach to listwise approach.” ICML ‘07 (2007)</em></a></p>
</blockquote>

<p>There’ll be a bunch of maths in this post. But don’t worry! We’ll be stepping through it together. I’m here for you!</p>

<h2 id="the-setup">The setup</h2>

<p>Here be our packages for this post.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">random</span>
<span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>

<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">seed</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>

<span class="kn">import</span> <span class="n">itertools</span>
<span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="n">plt</span><span class="p">.</span><span class="n">style</span><span class="p">.</span><span class="nf">use</span><span class="p">(</span><span class="sh">'</span><span class="s">ggplot</span><span class="sh">'</span><span class="p">)</span>

<span class="kn">import</span> <span class="n">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
</code></pre></div></div>

<p><strong>Note:</strong> it’s notoriously difficult to make TensorFlow and Keras reproducible.  Randomness plays an important part in training neural networks, after all! You might not get the exact numbers shown once we start using TensorFlow later on in the post. But the outcome that I arrive at should be similar to yours! <a href="https://machinelearningmastery.com/reproducible-results-neural-networks-keras/">See this post</a> by fellow Aussie Jason Brownlee for more info on this topic.</p>

<h2 id="lets-start-at-the-end-and-break-it-down">Let’s start at the end and break it down</h2>

<p>Let’s start with a high-level view of what we want to accomplish with ListNet. We’ll use <strong>icons of items of clothing in place of our documents</strong> because they’re more visually pleasing than article headlines!</p>

<p>We’re going to give ListNet a query, and a bunch of documents to rank. Then, as if through some sorcery, we get a ranked list of documents:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-overview.png" alt="" width="700px" class="align-center" /></p>

<p>What magic is involved in producing this ranked list? Prepare to be disappointed - ‘cause it ain’t too complex!</p>

<p>ListNet outputs a bunch of real numbers. Each real number is a <strong>score</strong> assigned to the document we want to rank. We simply sort the documents in <strong>descending order of score</strong>, and this tells us how the original list of documents should be ranked!</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-overview-with-scores.png" alt="" width="700px" class="align-center" /></p>

<p>So how does the paper itself describe ListNet? We find this on page four:</p>

<blockquote>
  <p>We employ a new learning method for optimizing the listwise loss function based on top one probability, with Neural Network as model and Gradient Descent as optimization algorithm. We refer to the method as ListNet.</p>
</blockquote>

<p>Let’s break this down and attack it’s smaller pieces relentlessly in our usual way!</p>

<p>We’ll attack in this order:</p>

<blockquote>
  <p>What do they mean by <strong>listwise</strong>?</p>

  <p>What is this <strong>top one probability</strong> they speak of?</p>

  <p>What is the <strong>listwise loss function</strong>?</p>

  <p>What is the <strong>neural network architecture</strong>?</p>
</blockquote>

<p>If you’ve been reading this, I’ll assume that you know what gradient descent is.</p>

<p>Let’s do this!</p>

<h2 id="whats-a-listwise-approach-to-learning-to-rank">What’s a ‘listwise approach’ to learning to rank?</h2>

<p>Let’s start with our first question!</p>

<p>There are several approaches to learning to rank. In <a href="http://times.cs.uiuc.edu/course/598f14/l2r.pdf"><em>Li, Hang. (2011). A Short Introduction to Learning to Rank.</em></a>, the author describes three such approaches: <strong>pointwise, pairwise and listwise approaches</strong>.</p>

<p>On page seven, the author describes listwise approaches:</p>

<blockquote>
  <p>The listwise approach addresses the ranking problem in a more straightforward way. Specifically, it takes ranking lists as instances in both learning and prediction. The group structure of ranking is maintained and ranking evaluation measures can be more directly incorporated into the loss functions in learning.</p>
</blockquote>

<p>Alright! That’s not too bad. We can make some observations at this point.</p>

<p><strong>Firstly, pointwise and pairwise approaches ignore the group structure of rankings.</strong> Lists can be thought of as groups of objects placed in specific orders. It makes sense that if we take a listwise approach that the structure of objects within our list is maintained!</p>

<p><strong>Learning to rank often involves optimising a surrogate loss function.</strong> This is because the loss function that we want to optimise for our ranking task may be difficult to minimise because it isn’t continuous and uses sorting! ListNet allows us to construct our ranking task in such a way that decreasing its loss values more directly impacts our “true” objective (for example, increasing Normalised Discounted Cumulative Gain or Mean Average Precision).</p>

<p>First question answered. Tick!</p>

<h2 id="where-do-probabilities-fit-into-listnet">Where do probabilities fit into ListNet?</h2>

<p>The authors use a probability-based approach to map their lists of scores to probability distributions. Once this is done, they calculate their loss between the predicted probability distribution and a target probability distribution. The authors describe their rationale for defining the problem in this way on page three:</p>

<blockquote>
  <p>We assume that there is uncertainty in the prediction of ranking lists (permutations) using the ranking function. In other words, any permutation is assumed to be possible, but different permutations may have different likelihood calculated based on the ranking function. We define the permutation probability, so that it has desirable properties for representing the likelihood of a permutation (ranking list), given the ranking function.</p>
</blockquote>

<p>Very nice! The two probability models described are the permutation and top one probability models. We’ll now go through them in turn.</p>

<h3 id="warning-detail-ahead">Warning: detail ahead!</h3>

<p>If you’re pragmatic, then I’ll let you in on a secret: the authors end up using the top one probability model so you can skip the section on ‘permutation probability’.</p>

<p>However, if you have a burning desire to understand things from their deepest depths, read on! Let’s flex our mathematical muscles!</p>

<h3 id="permutation-probability">Permutation probability</h3>

<p>Let’s use the same ‘dress’, ‘shirt’ and ‘pants’ example from above. We have \(n = 3\) objects to rank:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">objects_to_rank</span> <span class="o">=</span> <span class="p">{</span><span class="sh">'</span><span class="s">dress</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">shirt</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">pants</span><span class="sh">'</span><span class="p">}</span>
</code></pre></div></div>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-2-set.png" alt="" width="350px" class="align-center" /></p>

<p>What are all the possible permutations of these three objects?</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">all_permutations</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="n">itertools</span><span class="p">.</span><span class="nf">permutations</span><span class="p">(</span><span class="n">objects_to_rank</span><span class="p">))</span>

<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">all_permutations</span><span class="p">):</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>('dress', 'pants', 'shirt')
('dress', 'shirt', 'pants')
('pants', 'dress', 'shirt')
('pants', 'shirt', 'dress')
('shirt', 'dress', 'pants')
('shirt', 'pants', 'dress')
</code></pre></div></div>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-permutations.png" alt="" width="350px" class="align-center" /></p>

<p>The authors depict this set of possible permutations of \(n\) objects as \(\Omega_n\). The authors depict a single permutation in \(\Omega\) as \(\pi = \langle \pi(1), \pi(2), \dots, \pi(n)\rangle\). Each \(\pi(j)\) denotes the object at position \(j\) in the particular permutation.</p>

<p>Say that each one of these objects is given a real number (a score) by our model which can be used to rank the objects. The authors denote the list of scores associated with each object in a permutation \(\pi\) as \(s = (s_1, s_2, \dots, s_n)\), where each \(s_j\) is the score of the \(j\)-th object.</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-scores.png" alt="" width="350px" class="align-center" /></p>

<p>How can we determine the probability of one of the permutations above, given the ranking function that created these scores?</p>

<p>The authors say that this is how you can do just that:</p>

\[P_s(\pi) = \prod_{j=1}^n \frac{\phi(s_{\pi(j)})}{\sum_{k=j}^n \phi(s_{\pi(k)})}\]

<p>This looks like a lot of stuff! But again I say <strong>“don’t be scared”</strong>! Let’s break it down into tiny pieces.</p>

<ul>
  <li>Firstly, what are we calculating? We are calculating the probability of some permutation \(\pi\) given some list of scores \(s\). This is depicted by the LHS of the above by \(P_s(\pi)\).</li>
  <li>Next, we notice the big \(\Pi\). This is capital \(\pi\). This symbol says that we will be calculating the product of \(n\) terms. This will become clearer when we go through an example, below.</li>
  <li>Next, we have some \(\phi\)’s. This is the letter ‘phi’. Here, it’s simply some transformation applied to our scores. The only requirement is that it is “an increasing and strictly positive function”, as mentioned on page three.</li>
  <li>The denominator contains a big \(\Sigma\). It tells us that we will be summing \(n - k + 1\) terms. Each one of these terms is a score transformed by the same function \(\phi\).</li>
</ul>

<p>Walking through an example will clear things up further! We will depict \(\phi\) as an exponential function just like the authors do. Specifically, we will define it as \(\phi(x) = e^x = exp(x)\).</p>

<p>Let’s randomly generate scores for our three objects:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores_dict</span> <span class="o">=</span> <span class="p">{</span><span class="n">x</span><span class="p">:</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">randn</span><span class="p">(</span><span class="mi">1</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="p">[</span><span class="sh">'</span><span class="s">shirt</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">pants</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">dress</span><span class="sh">'</span><span class="p">]}</span>  

<span class="nf">print</span><span class="p">(</span><span class="n">scores_dict</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{'dress': 1.6243453636632417, 'shirt': -0.6117564136500754, 'pants': -0.5281717522634557}
</code></pre></div></div>

<p>Let’s pick one of our permutations:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pi</span> <span class="o">=</span> <span class="n">random</span><span class="p">.</span><span class="nf">choice</span><span class="p">(</span><span class="n">all_permutations</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">pi</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>('dress', 'shirt', 'pants')
</code></pre></div></div>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-permutation-choice.png" alt="" width="350px" class="align-center" /></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">obj_pos_1</span><span class="p">,</span> <span class="n">obj_pos_2</span><span class="p">,</span> <span class="n">obj_pos_3</span> <span class="o">=</span> <span class="n">pi</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">object at position 1 is </span><span class="sh">'</span><span class="si">{</span><span class="n">obj_pos_1</span><span class="si">}</span><span class="sh">'"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">object at position 2 is </span><span class="sh">'</span><span class="si">{</span><span class="n">obj_pos_2</span><span class="si">}</span><span class="sh">'"</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">object at position 3 is </span><span class="sh">'</span><span class="si">{</span><span class="n">obj_pos_3</span><span class="si">}</span><span class="sh">'"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>object at position 1 is 'dress'
object at position 2 is 'shirt'
object at position 3 is 'pants'
</code></pre></div></div>

<p>We get the scores of the objects at the above positions in our permutation:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">score_obj_pos_1</span> <span class="o">=</span> <span class="n">scores_dict</span><span class="p">[</span><span class="n">obj_pos_1</span><span class="p">]</span>
<span class="n">score_obj_pos_2</span> <span class="o">=</span> <span class="n">scores_dict</span><span class="p">[</span><span class="n">obj_pos_2</span><span class="p">]</span>
<span class="n">score_obj_pos_3</span> <span class="o">=</span> <span class="n">scores_dict</span><span class="p">[</span><span class="n">obj_pos_3</span><span class="p">]</span>
</code></pre></div></div>

<p>Let’s write out the \(n = 3\) terms in our product explicitly!</p>

<p>This is what our first term is:</p>

\[\text{first term} = \frac{e^{s_{dress}}}{e^{s_{dress}} + e^{s_{pants}} + e^{s_{shirt}}}\]

<p>Evaluating it in Python, we get this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">first_term_numerator</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_1</span><span class="p">)</span>
<span class="n">first_term_denominator</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_1</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_3</span><span class="p">)</span>

<span class="n">first_term</span> <span class="o">=</span> <span class="n">first_term_numerator</span> <span class="o">/</span> <span class="n">first_term_denominator</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">first term is </span><span class="si">{</span><span class="n">first_term</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>first term is 0.8176176084739423
</code></pre></div></div>

<p>According to our formula, this is what our second term is:</p>

\[\text{second term} = \frac{e^{s_{pants}}}{e^{s_{pants}} + e^{s_{shirt}}}\]

<p>Evaluating the second term in Python, we get this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">second_term_numerator</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_2</span><span class="p">)</span>
<span class="n">second_term_denominator</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_2</span><span class="p">)</span> <span class="o">+</span> <span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">score_obj_pos_3</span><span class="p">)</span>

<span class="n">second_term</span> <span class="o">=</span> <span class="n">second_term_numerator</span> <span class="o">/</span> <span class="n">second_term_denominator</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">second term is </span><span class="si">{</span><span class="n">second_term</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>second term is 0.47911599189971854
</code></pre></div></div>

<p>Finally, the third term is this:</p>

\[\text{third term} = \frac{e^{s_{shirt}}}{e^{s_{shirt}}} = 1\]

<p>We’ll just assign this value to a variable for the third term:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">third_term</span> <span class="o">=</span> <span class="mf">1.0</span>
</code></pre></div></div>

<p>It’s not that bad when you break it down, right? Putting it all together, the probability of our permutation is then this:</p>

\[P_s(\langle \text{dress, shirt, pants} \rangle) = \prod_{j=1}^3 \frac{e^{s_{\pi(j)}}}{\sum_{k=j}^3 e^{s_{\pi(k)}}}\]

<p>This is equivalent to the following:</p>

\[\frac{e^{s_{dress}}}{e^{s_{dress}} + e^{s_{pants}} + e^{s_{shirt}}} \cdot \frac{e^{s_{pants}}}{e^{s_{pants}} + e^{s_{shirt}}} \cdot \frac{e^{s_{shirt}}}{e^{s_{shirt}}}\]

<p>Evaluating this in Python, we get this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prob_of_permutation</span> <span class="o">=</span> <span class="n">first_term</span> <span class="o">*</span> <span class="n">second_term</span> <span class="o">*</span> <span class="n">third_term</span>

<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">probability of permutation is </span><span class="si">{</span><span class="n">prob_of_permutation</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>probability of permutation is 0.39173367147866855
</code></pre></div></div>

<p>If we calculate the probability of each permutation in our set, we can see that each one is greater than zero and that they sum to one!</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-permutations-probs.png" alt="" width="400px" class="align-center" /></p>

<p>We can make an interesting observation at this point:</p>

<blockquote>
  <p>The scores sorted in descending order have the highest permutation probability.<br />
The scores sorted in ascending order have the lowest permutation probability.</p>
</blockquote>

<p>Interesting! We’re done with the hardest part!</p>

<h3 id="whats-the-issue-with-calculating-permutation-probability">What’s the issue with calculating permutation probability?</h3>

<p>To calculate the difference between our distributions using a listwise loss function, we could first calculate the permutation probability distributions for each training example. But this issue with this approach is that <strong>there are \(n!\)  permutations!</strong> The number of permutations that need to be calculated quickly gets out of hand.</p>

<p>Instead, the authors propose using another probability model that is based on “top one” probability.</p>

<h3 id="top-one-probability">Top one probability</h3>

<p>Given some object we want to rank, \(j\), the top one probability for that object is the sum of the permutation probabilities of the permutations where \(j\) is ranked first.</p>

\[P_s(j) = \sum_{\pi(1)=j,\pi \in \Omega_n} P_s(\pi)\]

<p>Given our above example, the top one probability for ‘shirt’ is then \(\approx 0.0783 + 0.0091 = 0.087\).</p>

<p>The authors then observe that to calculate the top one probability of a given object, one doesn’t need to calculate all permutation probabilities of \(n\) objects to rank! The top one probability of our object is equivalent to this:</p>

\[P_s(j) = \frac{exp(s_j)}{\sum_{k=1}^n exp(s_k)}\]

<p>where \(s_j\) is the score of the \(j\)-th object.</p>

<p>Let’s not take their word for it…let’s confirm this using Python!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="n">scores_dict</span><span class="p">[</span><span class="sh">'</span><span class="s">shirt</span><span class="sh">'</span><span class="p">])</span> <span class="o">/</span> <span class="nf">sum</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nf">exp</span><span class="p">(</span><span class="nf">list</span><span class="p">(</span><span class="n">scores_dict</span><span class="p">.</span><span class="nf">values</span><span class="p">())))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0.08738232042105001
</code></pre></div></div>

<p>Would you look at that? It works! The proof of the above can be found in the appendix of the paper for those who are keen.</p>

<h2 id="converting-scores-and-relevance-labels-into-probability-distributions">Converting scores and relevance labels into probability distributions</h2>

<p>The astute reader may have realised that the formula we used to calculate our top one probability looks a lot like the <strong>softmax function</strong>. You are correct! Given the way in which we defined our probability function, We can apply the softmax function to our scores to get the top one probability for each object to rank!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">ordered_scores</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">scores_dict</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">xlabs</span><span class="p">]).</span><span class="nf">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">predicted_prob_dist</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="nf">softmax</span><span class="p">(</span><span class="n">ordered_scores</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">predicted_prob_dist</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor([0.8176176  0.08738231 0.09500005], shape=(3,), dtype=float32)
</code></pre></div></div>

<p>Simple! We’ll also <strong>convert our relevance grades into probability distributions using the softmax function.</strong> We’ll assign each item of clothing an arbitrary relevance grade to illustrate this step:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">raw_relevance_grades</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">constant</span><span class="p">([</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">],</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
<span class="n">true_prob_dist</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="nf">softmax</span><span class="p">(</span><span class="n">raw_relevance_grades</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">true_prob_dist</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor([0.8437947  0.11419519 0.04201007], shape=(3,), dtype=float32)
</code></pre></div></div>

<p>This is what these probability distributions look like:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/true_vs_predicted_prob_dists.png" alt="" width="800px" class="align-center" /></p>

<p>We can see that the score for ‘dress’ ranks it at position one. However, the probabilities for ‘shirt’ and ‘pants’ rank them in the incorrect order.</p>

<p>We now have a probability distribution across our scores and our relevance labels. How can we compare them?</p>

<p>Enter our loss function!</p>

<h2 id="our-loss-function---kl-divergence">Our loss function - KL divergence</h2>

<p>Here’s where we will diverge from the paper. The ListNet paper uses cross entropy as its loss. On page seven, they say this:</p>

<blockquote>
  <p>Future work includes exploring the performance of other objective function besides cross entropy and the performance of other ranking model instead of linear Neural Network model.</p>
</blockquote>

<p>We’ll be using <strong>Kullback-Leibler divergence</strong> (KL divergence) to explicitly measure the difference between our predicted and target distributions! Let’s learn about it now.</p>

<p>On page seventy-two of <strong><em>‘Deep Learning’ by Goodfellow et al</em></strong>, the authors describe KL divergence:</p>

<blockquote>
  <p>If we have two separate probability distributions \(P(X)\) and \(Q(X)\) over the same random variable \(X\), we can measure how diﬀerent these two distributions are using the Kullback-Leibler (KL) divergence.</p>
</blockquote>

<p>Later on the same page, they make this statement:</p>

<blockquote>
  <p>The KL divergence is \(0\) if and only if \(P\) and \(Q\) are the same distribution in the case of discrete variables.</p>
</blockquote>

<p>Given our true and predicted probability distributions, we can define KL divergence in the following way:</p>

\[D_{KL} = \text{true distribution} \cdot \log\left( \frac{\text{true distribution}}{\text{predicted distribution}} \right)\]

<p>Let’s apply it to our little clothing example:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">sum</span><span class="p">(</span><span class="n">true_prob_dist</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="n">true_prob_dist</span> <span class="o">/</span> <span class="n">predicted_prob_dist</span><span class="p">))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;tf.Tensor: shape=(), dtype=float32, numpy=0.022873338&gt;
</code></pre></div></div>

<p>This is a small loss value. We see that this makes sense because our true and predicted probability distributions look similar to each other!</p>

<p>We can confirm the second quote from <em>Goodfellow et al</em> by making the following observation:</p>

<blockquote>
  <p>The logarithm of one is zero. So it follows that KL divergence is zero when both distributions are identical.</p>
</blockquote>

<p>Let’s test this out:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">sum</span><span class="p">(</span><span class="n">true_prob_dist</span> <span class="o">*</span> <span class="n">np</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="n">true_prob_dist</span> <span class="o">/</span> <span class="n">true_prob_dist</span><span class="p">))</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;tf.Tensor: shape=(), dtype=float32, numpy=0.0&gt;
</code></pre></div></div>

<p>Hooray! As expected, we get a zero loss when the distributions are identical.</p>

<h2 id="whats-our-neural-network-architecture">What’s our neural network architecture?</h2>

<p>We now know how to transform our document scores into probability distributions. We also know how to compare the probability distribution over our scores to the one over our relevance grades using KL divergence.</p>

<p>We haven’t yet covered how we get our scores in the first place. This is the job of our neural network!</p>

<p>The authors depict a neural network, \(\omega\) and the ranking function based on this neural network as \(f_{\omega}\). The neural network takes in a feature vector \(x_{j}^{(i)}\) and outputs a real number. The feature vector represents a <strong>(query, document) pair</strong>. You’ll find out how we create these feature vectors later.</p>

<p>We can restate our top one probability equation from above like this:</p>

\[P_{\text{neural net score}}(j) = \frac{exp(\text{neural net score for object }j )}{\sum_{k=1}^n exp(\text{neural net score for object }k)}\]

<p>where \(s_j\) is the score of the \(j\)-th object.</p>

<p>We’ve done all the hard work upfront, so this part was easy! Let’s walk through our <strong>neural network’s forward pass.</strong></p>

<h3 id="our-inputs">Our inputs</h3>

<p>From our first post, we know that we can represent words as embeddings. Let’s use a document retrieval example to illustrate our forward pass. This time, instead of Wikipedia articles, <strong>we’ll rank Microsoft Bing and Google search engine results!</strong></p>

<p>Say that we have two queries:</p>

<blockquote>
  <p>dog</p>
</blockquote>

<p>and</p>

<blockquote>
  <p>what is a dog?</p>
</blockquote>

<p>We’ll associate the first query with the top five search results returned by Bing when we perform a search while using that query. We’ll associate the second query with the top five search results returned by Google when we perform a search while using the second query.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_1</span> <span class="o">=</span> <span class="sh">"</span><span class="s">dog</span><span class="sh">"</span>

<span class="n">bing_search_results</span> <span class="o">=</span> <span class="p">[</span>
    <span class="sh">"</span><span class="s">Dog - Wikipedia</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">Adopting a dog or puppy | RSPCA Australia</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">dog | History, Domestication, Physical Traits, &amp; Breeds</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">New South Wales | Dogs &amp; Puppies | Gumtree Australia Free</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">dog - Wiktionary</span><span class="sh">"</span>
<span class="p">]</span>
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_2</span> <span class="o">=</span> <span class="sh">"</span><span class="s">what is a dog</span><span class="sh">"</span>

<span class="n">google_search_results</span> <span class="o">=</span> <span class="p">[</span>
    <span class="sh">"</span><span class="s">Dog - Wikipedia</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">Dog - Simple English Wikipedia, the free encyclopedia</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">Dog | National Geographic</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">dog | History, Domestication, Physical Traits, &amp; Breeds</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">What is a Dog | Facts About Dogs | DK Find Out</span><span class="sh">"</span>
<span class="p">]</span>
</code></pre></div></div>

<p>Let’s assign each document an arbitrary relevance grade:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">relevance_grades</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">constant</span><span class="p">([</span>
    <span class="p">[</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">2.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">],</span>
    <span class="p">[</span><span class="mf">3.0</span><span class="p">,</span> <span class="mf">3.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">]</span>
<span class="p">])</span>
</code></pre></div></div>

<p>At this point, we make an observation:</p>

<blockquote>
  <p>The number of words in our queries and documents can vary. It follows that the number of word embeddings that make up the queries and documents can vary.</p>
</blockquote>

<p><em>(Note: the number of documents per query can also vary! We’ll deal with how to account for that in the next post, smarty pants!)</em></p>

<p>How can we remove this variation so that our neural network is given a single feature vector, regardless of how many words are contained in our documents and queries? Let’s answer this question now.</p>

<p>We’ll be using a single embedding matrix for the words in our queries and for our words in our documents. So let’s tokenise our queries and documents using the same Keras Tokenizer:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">combined_texts</span> <span class="o">=</span> <span class="p">[</span><span class="n">query_1</span><span class="p">,</span> <span class="o">*</span><span class="n">bing_search_results</span><span class="p">,</span> <span class="n">query_2</span><span class="p">,</span> <span class="o">*</span><span class="n">google_search_results</span><span class="p">]</span>

<span class="n">tokeniser</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">preprocessing</span><span class="p">.</span><span class="n">text</span><span class="p">.</span><span class="nc">Tokenizer</span><span class="p">()</span>
<span class="n">tokeniser</span><span class="p">.</span><span class="nf">fit_on_texts</span><span class="p">(</span><span class="n">combined_texts</span><span class="p">)</span>

<span class="c1"># we add one here to account for the padding word
</span><span class="n">vocab_size</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="n">tokeniser</span><span class="p">.</span><span class="n">index_word</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="nf">print</span><span class="p">(</span><span class="n">vocab_size</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>35
</code></pre></div></div>

<p>Here’s our full vocabulary. Notice that there’s no “index 0” as it’s reserved for padding values!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">word</span> <span class="ow">in</span> <span class="n">tokeniser</span><span class="p">.</span><span class="n">index_word</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">index </span><span class="si">{</span><span class="n">idx</span><span class="si">}</span><span class="s"> - </span><span class="si">{</span><span class="n">word</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>index 1 - dog
index 2 - wikipedia
index 3 - a
index 4 - australia
index 5 - history
        ...
        ...
        ...
index 30 - facts
index 31 - about
index 32 - dk
index 33 - find
index 34 - out
</code></pre></div></div>

<p>Let’s create a bunch of toy embedding vectors. We’ll stick with two-dimensions because we can naturally plot them in our two dimensional plane:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>EMBEDDING_DIMS = 2

embeddings = np.random.randn(vocab_size, EMBEDDING_DIMS).astype(np.float32)

print(embeddings)
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[-1.0729686   0.86540765]
 [-2.3015387   1.7448118 ]
 [-0.7612069   0.3190391 ]
 [-0.24937038  1.4621079 ]
 [-2.0601406  -0.3224172 ]
            ...
            ...
            ...             
 [-0.29809284  0.48851815]
 [-0.07557172  1.1316293 ]
 [ 1.5198169   2.1855755 ]
 [-1.3964963  -1.4441139 ]
 [-0.5044659   0.16003707]]
</code></pre></div></div>

<p>Our first query consists of a single word. It can be naturally represented by a single embedding vector:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_1_embedding_index</span> <span class="o">=</span> <span class="n">tokeniser</span><span class="p">.</span><span class="nf">texts_to_sequences</span><span class="p">([</span><span class="n">query_1</span><span class="p">])</span>
<span class="n">query_1_embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">embeddings</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">query_1_embedding_index</span><span class="p">])</span>

<span class="nf">print</span><span class="p">(</span><span class="n">query_1_embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[[-2.3015387  1.7448118]]]
</code></pre></div></div>

<p>However, our second query consists of four words, so it requires four embeddings to represent it!</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_2_embedding_indices</span> <span class="o">=</span> <span class="n">tokeniser</span><span class="p">.</span><span class="nf">texts_to_sequences</span><span class="p">([</span><span class="n">query_2</span><span class="p">])</span>
<span class="n">query_2_embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">embeddings</span><span class="p">[</span><span class="n">x</span><span class="p">]</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">query_2_embedding_indices</span><span class="p">])</span>

<span class="nf">print</span><span class="p">(</span><span class="n">query_2_embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[[-0.93576944 -0.26788807]
  [ 0.53035545 -0.69166076]
  [-0.24937038  1.4621079 ]
  [-2.3015387   1.7448118 ]]]
</code></pre></div></div>

<p>How can we remove the potential variation in the number of embeddings from query to query and from document to document?</p>

<blockquote>
  <p>We can aggregate our embedding vectors!</p>
</blockquote>

<p>Specifically, we’ll be taking the <strong>component-wise average</strong> of our word embeddings.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_2_embeddings_avg</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">reduce_mean</span><span class="p">(</span><span class="n">query_2_embeddings</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>

<span class="nf">print</span><span class="p">(</span><span class="n">query_2_embeddings_avg</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[[-0.7390808  0.5618427]]]
</code></pre></div></div>

<p>What does this average vector looked like if we plot it in our two dimensional space?</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/embedding_averages.png" alt="" width="400px" class="align-center" /></p>

<p>Interesting! This gives us a nice <strong>fixed-sized representation</strong> of our query.</p>

<p>Let’s create a new array out of the fixed-sized representations of our queries.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">row_stack</span><span class="p">([</span><span class="n">query_1_embeddings</span><span class="p">,</span> <span class="n">query_2_embeddings_avg</span><span class="p">])</span>
</code></pre></div></div>

<p>Nice! We now have an array of dimensions <strong>(number of queries, 1, embedding dimensions)</strong>, where the “1” represents the number of embedding vectors we have per query after we averaged them. Let’s inspect the shape of our array of queries:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="n">query_embeddings</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(2, 1, 2)
</code></pre></div></div>

<p>Great success! <strong>We take the same approach for our documents.</strong> We take each word in our document and look up its embedding vector.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">docs_sequences</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">docs_list</span> <span class="ow">in</span> <span class="p">[</span><span class="n">bing_search_results</span><span class="p">,</span> <span class="n">google_search_results</span><span class="p">]:</span>
    <span class="n">docs_sequences</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">tokeniser</span><span class="p">.</span><span class="nf">texts_to_sequences</span><span class="p">(</span><span class="n">docs_list</span><span class="p">))</span>

<span class="n">docs_embeddings</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">docs_set</span> <span class="ow">in</span> <span class="n">docs_sequences</span><span class="p">:</span>
    <span class="n">this_docs_set</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">docs_set</span><span class="p">:</span>
        <span class="n">this_doc_embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">([</span><span class="n">embeddings</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span> <span class="k">for</span> <span class="n">idx</span> <span class="ow">in</span> <span class="n">doc</span><span class="p">])</span>
        <span class="n">this_docs_set</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">this_doc_embeddings</span><span class="p">)</span>
    <span class="n">docs_embeddings</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">this_docs_set</span><span class="p">)</span>
</code></pre></div></div>

<p>For our Bing results, we get this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">embeddings</span> <span class="ow">in</span> <span class="n">docs_embeddings</span><span class="p">[</span><span class="mi">0</span><span class="p">]:</span>
    <span class="nf">print</span><span class="p">()</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[-2.3015387  1.7448118]
 [-0.7612069  0.3190391]]

[[-0.39675352 -0.6871727 ]
 [-0.24937038  1.4621079 ]
 [-2.3015387   1.7448118 ]
 [-0.84520566 -0.6712461 ]
 [-0.0126646  -1.1173104 ]
 [ 0.2344157   1.6598022 ]
 [-2.0601406  -0.3224172 ]]

[[-2.3015387   1.7448118 ]
 [-0.38405436  1.1337694 ]
 [-1.0998913  -0.1724282 ]
 [-0.8778584   0.04221375]
 [ 0.58281523 -1.1006192 ]
 [ 1.1447237   0.9015907 ]]

[[ 0.74204415 -0.19183555]
 [-0.887629   -0.7471583 ]
 [ 1.6924546   0.05080776]
 [ 0.50249434  0.90085596]
 [-0.6369957   0.19091548]
 [ 2.1002553   0.12015896]
 [-2.0601406  -0.3224172 ]
 [-0.68372786 -0.12289023]]

[[-2.3015387   1.7448118 ]
 [ 0.6172031   0.30017033]]
</code></pre></div></div>

<p>For our Google results, we get this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">embeddings</span> <span class="ow">in</span> <span class="n">docs_embeddings</span><span class="p">[</span><span class="mi">1</span><span class="p">]:</span>
    <span class="nf">print</span><span class="p">()</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[-2.3015387  1.7448118]
 [-0.7612069  0.3190391]]

[[-2.3015387   1.7448118 ]
 [-0.35224986 -1.1425182 ]
 [-0.34934273 -0.20889424]
 [-0.7612069   0.3190391 ]
 [ 0.5866232   0.8389834 ]
 [-0.68372786 -0.12289023]
 [ 0.9311021   0.2855873 ]]

[[-2.3015387  1.7448118]
 [ 0.8851412 -0.7543979]
 [ 1.2528682  0.5129298]]

[[-2.3015387   1.7448118 ]
 [-0.38405436  1.1337694 ]
 [-1.0998913  -0.1724282 ]
 [-0.8778584   0.04221375]
 [ 0.58281523 -1.1006192 ]
 [ 1.1447237   0.9015907 ]]

[[-0.93576944 -0.26788807]
 [ 0.53035545 -0.69166076]
 [-0.24937038  1.4621079 ]
 [-2.3015387   1.7448118 ]
 [-0.29809284  0.48851815]
 [-0.07557172  1.1316293 ]
 [ 0.50249434  0.90085596]
 [ 1.5198169   2.1855755 ]
 [-1.3964963  -1.4441139 ]
 [-0.5044659   0.16003707]]
</code></pre></div></div>

<p>We’ll collapse each document into a fixed-sized vector by averaging them along each of their components. The result is an array with dimensions <strong>(number of queries, number of documents per query, embedding dimensions)</strong>.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">docs_averaged_embeddings</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">docs_set</span> <span class="ow">in</span> <span class="n">docs_embeddings</span><span class="p">:</span>
    <span class="n">this_docs_set</span> <span class="o">=</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">docs_set</span><span class="p">:</span>
        <span class="n">this_docs_set</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">reduce_mean</span><span class="p">(</span><span class="n">doc</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">keepdims</span><span class="o">=</span><span class="bp">True</span><span class="p">))</span>
    <span class="n">concatenated_docs_set</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">concat</span><span class="p">(</span><span class="n">this_docs_set</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>
    <span class="n">docs_averaged_embeddings</span><span class="p">.</span><span class="nf">append</span><span class="p">(</span><span class="n">concatenated_docs_set</span><span class="p">)</span>
    
<span class="n">docs_averaged_embeddings</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">docs_averaged_embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[[-1.5313728   1.0319254 ]
  [-0.80446535  0.29551077]
  [-0.4893006   0.42488968]
  [ 0.09609441 -0.01519538]
  [-0.8421678   1.0224911 ]]

 [[-1.5313728   1.0319254 ]
  [-0.41862014  0.24487413]
  [-0.0545098   0.50111455]
  [-0.4893006   0.42488968]
  [-0.32086387  0.56698734]]]
</code></pre></div></div>

<p>We inspect our array’s shape and see that this is so:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nf">print</span><span class="p">(</span><span class="n">docs_averaged_embeddings</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(2, 5, 2)
</code></pre></div></div>

<h3 id="showing-documents-in-the-context-of-other-documents-and-a-query">Showing documents in the context of other documents and a query</h3>

<p>A single query is potentially associated with multiple documents. Here’s an illustration of our second query with its documents:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-pre-batch-expansion.png" alt="" width="1000px" class="align-center" /></p>

<p>How can we represent a group of documents in the context of a single query? To do this, we can <strong>copy the fixed-size representation of our query “n documents times”.</strong> We <strong>expand</strong> our training example into a rectangular shape. Here’s what a single expanded example looks like:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/letor-part-2-post-batch-expansion.png" alt="" width="1000px" class="align-center" /></p>

<p>We calculate our loss within the context of each expanded example. We’ll call a batch of such expanded examples as an <strong>expanded batch</strong>.</p>

<p>How can we repeat our queries as many times as there are documents associated with them using TensorFlow? Thankfully, the <a href="https://github.com/tensorflow/ranking/blob/99ab8ee062ff632617e09ff0904a840f335e9468/tensorflow_ranking/python/model.py#L346">TensorFlow ranking repo</a> shows us how we can do this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_DOCS_PER_QUERY</span> <span class="o">=</span> <span class="mi">5</span>

<span class="n">expanded_queries</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="n">query_embeddings</span><span class="p">,</span> <span class="p">[</span><span class="mi">0</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="n">NUM_DOCS_PER_QUERY</span><span class="p">)],</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>

<span class="nf">print</span><span class="p">(</span><span class="n">expanded_queries</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>array([[[-2.3015387,  1.7448118],
        [-2.3015387,  1.7448118],
        [-2.3015387,  1.7448118],
        [-2.3015387,  1.7448118],
        [-2.3015387,  1.7448118]],

       [[-0.7390808,  0.5618427],
        [-0.7390808,  0.5618427],
        [-0.7390808,  0.5618427],
        [-0.7390808,  0.5618427],
        [-0.7390808,  0.5618427]]], dtype=float32)
</code></pre></div></div>

<p>And to show our groups of documents in the contexts of their associated queries, we simply concatenate them to get our expanded batch:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">expanded_batch</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">concatenate</span><span class="p">([</span><span class="n">expanded_queries</span><span class="p">,</span> <span class="n">docs_averaged_embeddings</span><span class="p">],</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">expanded_batch</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[[-2.3015387   1.7448118  -1.5313728   1.0319254 ]
  [-2.3015387   1.7448118  -0.80446535  0.29551077]
  [-2.3015387   1.7448118  -0.4893006   0.42488968]
  [-2.3015387   1.7448118   0.09609441 -0.01519538]
  [-2.3015387   1.7448118  -0.8421678   1.0224911 ]]

 [[-0.7390808   0.5618427  -1.5313728   1.0319254 ]
  [-0.7390808   0.5618427  -0.41862014  0.24487413]
  [-0.7390808   0.5618427  -0.0545098   0.50111455]
  [-0.7390808   0.5618427  -0.4893006   0.42488968]
  [-0.7390808   0.5618427  -0.32086387  0.56698734]]]
</code></pre></div></div>

<p>Not too bad, right?</p>

<h3 id="the-hidden-layers">The hidden layers</h3>

<p>We’ll pass our expanded batch into some fully-connected layers. For our prototype, we’ll use a single layer.</p>

<p><em>Remember what we said about the reproducibility of TensorFlow and Keras results, above!</em></p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">dense_1</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">relu</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dense_1_out</span> <span class="o">=</span> <span class="nf">dense_1</span><span class="p">(</span><span class="n">expanded_batch</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">dense_1_out</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(
[[[0.96246356 0.         2.3214347 ]
  [0.5498358  0.         2.0962873 ]
  [0.4715745  0.         2.1984253 ]
  [0.17358822 0.         2.0852127 ]
  [0.72574073 0.         2.414626  ]]

 [[0.8194035  0.         0.91152126]
  [0.26407483 0.         0.7183531 ]
  [0.197609   0.         0.88388896]
  [0.3285144  0.         0.7885119 ]
  [0.30305254 0.         0.87557834]]], shape=(2, 5, 3), dtype=float32)
</code></pre></div></div>

<h3 id="the-output-layer---our-scores">The output layer - our scores!</h3>

<p>This is a dense layer with a single unit. We use a linear unit (i.e. we won’t apply non-linearity to this unit) like in the ListNet paper:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">linear</span><span class="sh">'</span><span class="p">)</span>
<span class="n">scores_out</span> <span class="o">=</span> <span class="nf">scores</span><span class="p">(</span><span class="n">dense_1_out</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">scores_out</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(
[[[-0.51760715]
  [-0.18927467]
  [-0.10698503]
  [ 0.13695028]
  [-0.29851556]]

 [[-0.58782816]
  [-0.13076714]
  [-0.04999146]
  [-0.1772059 ]
  [-0.14299354]]], shape=(2, 5, 1), dtype=float32)
</code></pre></div></div>

<h3 id="calculate-kl-divergence-in-the-context-of-our-expanded-batch">Calculate KL divergence in the context of our expanded batch</h3>

<p>So we now have a bunch of scores. We need to convert them into probability distributions. We observed above that we can do this via the softmax function. So let’s apply it here:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">scores_for_softmax</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">squeeze</span><span class="p">(</span><span class="n">scores_out</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
<span class="n">scores_prob_dist</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="nf">softmax</span><span class="p">(</span><span class="n">scores_for_softmax</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">scores_prob_dist</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(
[[0.14152995 0.19653566 0.21339257 0.27234477 0.17619705]
 [0.1358749  0.21460423 0.23265839 0.20486614 0.21199636]], shape=(2, 5), dtype=float32)
</code></pre></div></div>

<p>We also observed above that we can do the same for our relevance grades. Let’s apply our softmax function to them here:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">relevance_grades_prob_dist</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">nn</span><span class="p">.</span><span class="nf">softmax</span><span class="p">(</span><span class="n">relevance_grades</span><span class="p">,</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">relevance_grades_prob_dist</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(
[[0.44663328 0.1643072  0.1643072  0.1643072  0.06044524]
 [0.4309495  0.4309495  0.05832267 0.05832267 0.02145571]], shape=(2, 5), dtype=float32)
</code></pre></div></div>

<p>To calculate our batch KL divergence, it’s as simple as doing this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="nc">KLDivergence</span><span class="p">()</span>
<span class="n">batch_loss</span> <span class="o">=</span> <span class="nf">loss</span><span class="p">(</span><span class="n">relevance_grades_prob_dist</span><span class="p">,</span> <span class="n">scores_prob_dist</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">batch_loss</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(0.4439875, shape=(), dtype=float32)
</code></pre></div></div>

<p>But we aren’t satisfied with this simplicity. We must know what this function is calculating behind the scenes!</p>

<p>We already know how to calculate our loss for a single training example:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">per_example_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">reduce_sum</span><span class="p">(</span>
    <span class="n">relevance_grades_prob_dist</span> <span class="o">*</span> <span class="n">tf</span><span class="p">.</span><span class="n">math</span><span class="p">.</span><span class="nf">log</span><span class="p">(</span><span class="n">relevance_grades_prob_dist</span> <span class="o">/</span> <span class="n">scores_prob_dist</span><span class="p">),</span>
    <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span>
<span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">per_example_loss</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor([0.29320744 0.5947675 ], shape=(2,), dtype=float32)
</code></pre></div></div>

<p>To get our batch loss, we’ll simply take the mean of our batch of individual training example losses:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">batch_loss</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">reduce_mean</span><span class="p">(</span><span class="n">per_example_loss</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">batch_loss</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tf.Tensor(0.4439875, shape=(), dtype=float32)
</code></pre></div></div>

<p>We see the two numbers are the same and have satisfied our yearning for knowledge.</p>

<h2 id="a-toy-listnet-implemenetation">A toy ListNet implemenetation</h2>

<p>In the following implementation, we’ll assume a few things. Firstly, I want to leave topics like <strong>padding and zero-masking</strong> for the next post, so we’ll input our pre-averaged query and document embeddings into our network. Secondly, we’ll be passing our precalculated probability distributions over our relevance grades as only once I’ve covered padding and zero-masking can I show you how to do this dynamically in a training pipeline. Hold your horses for the next post!</p>

<p>We’ll set some constants upfront that depict the dimensions of our data:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NUM_DOCS_PER_QUERY</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">EMBEDDING_DIMS</span> <span class="o">=</span> <span class="mi">2</span>
</code></pre></div></div>

<p>We’ll wrap our batch expansion in a custom Keras layer:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">ExpandBatchLayer</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="n">Layer</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
        <span class="nf">super</span><span class="p">(</span><span class="n">ExpandBatchLayer</span><span class="p">,</span> <span class="n">self</span><span class="p">).</span><span class="nf">__init__</span><span class="p">(</span><span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
        
    <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="nb">input</span><span class="p">):</span>
        <span class="n">queries</span><span class="p">,</span> <span class="n">docs</span> <span class="o">=</span> <span class="nb">input</span>
        <span class="n">batch</span><span class="p">,</span> <span class="n">num_docs</span><span class="p">,</span> <span class="n">embedding_dims</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">unstack</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">shape</span><span class="p">(</span><span class="n">docs</span><span class="p">))</span>
        <span class="n">expanded_queries</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">gather</span><span class="p">(</span><span class="n">queries</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="nf">zeros</span><span class="p">([</span><span class="n">num_docs</span><span class="p">],</span> <span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">),</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="nf">concat</span><span class="p">([</span><span class="n">expanded_queries</span><span class="p">,</span> <span class="n">docs</span><span class="p">],</span> <span class="n">axis</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>

<p>Once we’ve taken care of the above, the rest of the model is intuitive:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">query_input</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">EMBEDDING_DIMS</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">query</span><span class="sh">'</span><span class="p">)</span>
<span class="n">docs_input</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="n">NUM_DOCS_PER_QUERY</span><span class="p">,</span> <span class="n">EMBEDDING_DIMS</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">float32</span><span class="p">,</span> 
                <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">docs</span><span class="sh">'</span><span class="p">)</span>

<span class="n">expand_batch</span> <span class="o">=</span> <span class="nc">ExpandBatchLayer</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">expand_batch</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dense_1</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">linear</span><span class="sh">'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">dense_1</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dense_out</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">linear</span><span class="sh">'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">scores</span><span class="sh">'</span><span class="p">)</span>
<span class="n">scores_prob_dist</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="n">units</span><span class="o">=</span><span class="n">NUM_DOCS_PER_QUERY</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">softmax</span><span class="sh">'</span><span class="p">,</span> 
                      <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">scores_prob_dist</span><span class="sh">'</span><span class="p">)</span>

<span class="n">expanded_batch</span> <span class="o">=</span> <span class="nf">expand_batch</span><span class="p">([</span><span class="n">query_input</span><span class="p">,</span> <span class="n">docs_input</span><span class="p">])</span>
<span class="n">dense_1_out</span> <span class="o">=</span> <span class="nf">dense_1</span><span class="p">(</span><span class="n">expanded_batch</span><span class="p">)</span>
<span class="n">scores</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Flatten</span><span class="p">()(</span><span class="nf">dense_out</span><span class="p">(</span><span class="n">dense_1_out</span><span class="p">))</span>
<span class="n">model_out</span> <span class="o">=</span> <span class="nf">scores_prob_dist</span><span class="p">(</span><span class="n">scores</span><span class="p">)</span>

<span class="n">model</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">models</span><span class="p">.</span><span class="nc">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">query_input</span><span class="p">,</span> <span class="n">docs_input</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">model_out</span><span class="p">])</span>

<span class="n">model</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="n">optimizer</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="nc">SGD</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="mf">0.03</span><span class="p">,</span> <span class="n">momentum</span><span class="o">=</span><span class="mf">0.9</span><span class="p">),</span> 
              <span class="n">loss</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="nc">KLDivergence</span><span class="p">())</span>
</code></pre></div></div>

<p>Here be our topology:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
query (InputLayer)              [(None, 1, 2)]       0                                            
__________________________________________________________________________________________________
docs (InputLayer)               [(None, 5, 2)]       0                                            
__________________________________________________________________________________________________
expand_batch (ExpandBatchLayer) (None, 5, 4)         0           query[0][0]                      
                                                                 docs[0][0]                       
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 5, 3)         15          expand_batch[0][0]               
__________________________________________________________________________________________________
scores (Dense)                  (None, 5, 1)         4           dense_1[0][0]                    
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 5)            0           scores[0][0]                     
__________________________________________________________________________________________________
scores_prob_dist (Dense)        (None, 5)            30          flatten_1[0][0]                  
==================================================================================================
Total params: 49
Trainable params: 49
Non-trainable params: 0
</code></pre></div></div>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/model.png" alt="" width="800px" class="align-center" /></p>

<p>Here’s a comparison of what our target and predicted probability distributions look like before we train our network:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/true_vs_predicted_initial.png" alt="" width="800px" class="align-center" /></p>

<p>We train for 50 epochs:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="nf">fit</span><span class="p">(</span>
    <span class="p">[</span><span class="n">query_embeddings</span><span class="p">,</span> <span class="n">docs_averaged_embeddings</span><span class="p">],</span> 
    <span class="n">relevance_grades_prob_dist</span><span class="p">,</span> 
    <span class="n">epochs</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span> 
    <span class="n">verbose</span><span class="o">=</span><span class="bp">False</span>
<span class="p">)</span>
</code></pre></div></div>

<p>We see that our loss has converged:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/training_loss.png" alt="" width="500px" class="align-center" /></p>

<p>We inspect our target and predicted probability distributions once we have trained our network:</p>

<p><img src="/assets/post_images/2020-05-26-learning-to-rank-part-2/true_vs_predicted_after.png" alt="" width="800px" class="align-center" /></p>

<p>And we jump in joy for our neural network has learnt to rank!</p>

<h2 id="conclusion">Conclusion</h2>

<p>Wow! What an adventure!</p>

<p>We worked through the ListNet paper and we implemented it. Along the way, we covered some of its maths!</p>

<p>Next time, we’ll apply ListNet to a Kaggle competition dataset. We’ll add some stuff to our basic ListNet implementation to cover off some scenarios that come up in real life before we train it on our dataset.</p>

<p>Until next time,</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Machine-learning&quot;, &quot;TensorFlow&quot;, &quot;Ranking&quot;, &quot;Deep-learning&quot;]" /><summary type="html"><![CDATA[The second post in an epic to learn to rank lists of things!]]></summary></entry><entry><title type="html">Learning to rank is good for your ML career - Part 1: background and word embeddings</title><link href="https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-1/" rel="alternate" type="text/html" title="Learning to rank is good for your ML career - Part 1: background and word embeddings" /><published>2020-05-26T07:00:00+10:00</published><updated>2020-05-26T07:00:00+10:00</updated><id>https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-1</id><content type="html" xml:base="https://embracingtherandom.com/machine-learning/tensorflow/ranking/deep-learning/learning-to-rank-part-1/"><![CDATA[<blockquote>
  <p>The first post in an epic to learn to rank lists of things!</p>
</blockquote>

<p>A lot of machine learning problems we deal with day to day are classification and regression problems. As a result, we probably have developed some strong intuition on how to approach these types of problems.</p>

<p>But what would we do if we were asked to solve a problem like this one?</p>

<blockquote>
  <p>Say that each training example in our data set belongs to a customer. Say that we have some feature vector for this customer which serves as an input to our model. For our labels we have an ordered list of products which are ordered by relevance to that customer. How can we go about training a model that learns to rank this list of products in the order described by our labels?</p>
</blockquote>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/learning-to-rank-intro.png" alt="" width="800px" class="align-center" /></p>

<p>I’ll tell you the tautological answer to this question:</p>

<blockquote>
  <p>We must <strong>‘learn to rank’</strong>!</p>
</blockquote>

<h1 id="on-this-epic">On this epic</h1>

<p>In the <strong>first post</strong> we’ll be:</p>

<ul>
  <li>describing a motivating example as an introduction to the field of ‘learning to rank’, and</li>
  <li>exploring word embeddings as we’ll be using them as our features!</li>
</ul>

<p>In the <strong>second post</strong> we’ll be:</p>

<ul>
  <li>learning about the <strong>ListNet</strong> model architecture, and</li>
  <li>building a prototype of ListNet on some synthetic data.</li>
</ul>

<p>In the <strong>third and final post</strong>, we’ll be applying our implementation of ListNet <strong>on a Kaggle data set!</strong> In that post, we’ll be:</p>

<ul>
  <li>preparing the above data set so that we can use it with our model,</li>
  <li>training our model, and</li>
  <li>briefly describing Normalised Discounted Cumulative Gain which will serve as our evaluation metric, and</li>
  <li>taking a look at our results!</li>
</ul>

<p>By the end of this series, I hope that you’ll have some idea of how to approach a similar problem in the future.</p>

<h1 id="what-do-you-mean-by-learning-to-rank">What do you mean by ‘learning to rank’?</h1>

<p>Much of the following is based on this great paper: <a href="http://times.cs.uiuc.edu/course/598f14/l2r.pdf"><em>Li, Hang. (2011). A Short Introduction to Learning to Rank.</em></a></p>

<p>The very first line of this paper summarises the field of ‘learning to rank’:</p>

<blockquote>
  <p>Learning to rank refers to machine learning techniques for training the model in a ranking task.</p>
</blockquote>

<p>Great! That was easy!</p>

<p>The paper then goes on to describe learning to rank in the context of <strong>‘document retrieval’</strong>. Let’s use a scenario most of us are familiar with to understand what this is:  <strong>searching for an article on Wikipedia.</strong></p>

<ul>
  <li>We have a website, Wikipedia, with a search function.</li>
  <li>Users submit search requests (<strong>‘queries’</strong>) to the search function.</li>
  <li>Users are then presented with ranked lists of articles (<strong>‘documents’</strong>).</li>
</ul>

<p>In learning to rank, the list ranking is performed by a ranking model \(f(q, d)\), where:</p>

<ul>
  <li>\(f\) is some ranking function that is learnt through supervised learning,</li>
  <li>\(q\) is our query, and</li>
  <li>\(d\) is our document.</li>
</ul>

<p>Applying this to our Wikipedia example, our user might be looking for an article on ‘dogs’ (the animals). The user types in the word <strong>‘dogs’</strong> into the search bar and is presented with a list of articles that’s ‘sorted by relevance’. The top 3 results are these:</p>

<blockquote>
  <ol>
    <li><em>Dog (redirect from Dogs)</em></li>
    <li><em>Dogs Eating Dogs</em> (an EP by the band Blink-182)</li>
    <li><em>Reservoir Dogs</em> (the Quentin Tarantino film)</li>
  </ol>
</blockquote>

<p>This is a well-ranked list for our user!</p>

<p>We mentioned that this is a <strong>supervised learning task</strong>. What does our training data look like for such a task?</p>

<h1 id="what-do-our-labels-look-like">What do our labels look like?</h1>

<p>Let’s continue on with our Wikipedia example.</p>

<p>Let’s say that someone has created a dataset by asking real people to submit queries to the Wikipedia search engine and asking them to assign a number to indicate the relevance of an article in the search results set. Let’s say that the curator asks each user to assign each article one of these numbers:</p>

<blockquote>
  <ul>
    <li><code class="language-plaintext highlighter-rouge">2</code> for <code class="language-plaintext highlighter-rouge">relevant</code></li>
    <li><code class="language-plaintext highlighter-rouge">1</code> for <code class="language-plaintext highlighter-rouge">somewhat relevant</code></li>
    <li><code class="language-plaintext highlighter-rouge">0</code> for <code class="language-plaintext highlighter-rouge">irrelevant</code></li>
  </ul>
</blockquote>

<p>These are arbitrary numbers where the larger the number, the more relevant the article is. We call these <strong>relevance grades</strong> and are one such way of representing relevance in a learning to rank task.</p>

<p>We should take note of a few things about our example:</p>

<ul>
  <li>Each query is associated with one or more documents.</li>
  <li>There are as many relevance grades as there are documents associated with a given query.</li>
  <li>We might have multiple articles for a query with the same relevance grade. For example, a user might deem two Wikipedia articles to be ‘somewhat relevant’ to their query. In our example, we are indifferent to the ranking of articles with similar relevance grades. What we will be focusing our efforts on instead is to rank articles with higher relevance grades above those with lower relevance grades.</li>
</ul>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/letor-wikipeda-search.png" alt="" width="800px" class="align-center" /></p>

<h1 id="what-features-will-we-be-using">What features will we be using?</h1>

<p>We’ll be using neural nets so we could be using any arbitrary feature that we think might help in our ranking task.</p>

<p>However, for our example, we’ll be focusing on using the words in our queries and documents!</p>

<blockquote>
  <p>How on earth can we use words as inputs into our neural net? Words aren’t numbers that can be optimised! You’ve lost your mind!</p>
</blockquote>

<p>I concede the last statement. But give me a chance to explain. Let’s briefly explore the wonderful world of word embeddings!</p>

<h1 id="enter-word-embeddings">Enter word embeddings!</h1>

<p>Say that we start with a two-dimensional space:</p>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/2d-space.jpg" alt="" width="400px" class="align-center" /></p>

<p>We’re all familiar with this! Each point in this space can be described by two numbers - an \(x\) coordinate, and a \(y\) coordinate. In other words, each point in the space can be described by pairs of the form \((x, y)\).</p>

<p>Let’s take a word - <strong>‘beagle’</strong>. We’ll arbitrarily place it in our space at the point \((2, -1)\):</p>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/2d-space-beagle.jpg" alt="" width="400px" class="align-center" /></p>

<p>Easy! Now, instead of saying that each point in this space can be described by a ‘pair’, let’s say that it can be described by a <strong>‘vector’</strong>. <strong>Don’t be scared!</strong> Just think of these as lists of numbers! We can depict our vectors like this:</p>

\[\begin{bmatrix} x \\ y\end{bmatrix}\]

<p>The first <strong>component</strong> of our vector represents its coordinate in the first dimension (in this case, the \(x\)-axis), and the second component represents its coordinate in the second dimension (the \(y\)-axis). So taking our beagle example, we can describe our word in this two-dimensional space using this vector:</p>

\[\begin{bmatrix} 2 \\ -1 \end{bmatrix}\]

<p>Let’s repeat the process with another word. Let’s plot the name, <strong>‘snoopy’</strong> at the point represented by this vector:</p>

\[\begin{bmatrix}-3 \\ 1 \end{bmatrix}\]

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/2d-space-beagle-snoopy.jpg" alt="" width="400px" class="align-center" /></p>

<p>Take a look at that! We have two <strong>words</strong> which we’ve represented using a bunch of <strong>numbers</strong>! We call these vectors <strong>word embeddings</strong>!</p>

<h2 id="a-necessary-warning">A necessary warning</h2>

<p>If you’re more pragmatically inclined, then you can stop reading here. Just keep in mind that we’ll be using such word embeddings as our features in the upcoming posts.</p>

<p>However, if you’re more inclined to obsessively understand how things work like I am, then please read on, my friend!</p>

<h2 id="why-would-we-want-to-represent-our-words-as-embedding-vectors">Why would we want to represent our words as embedding vectors?</h2>

<p>To understand the benefits of using word embeddings to represent our words, it’s useful to know a bit about how some of the successful <strong>language models</strong> were built in the past. Let’s go on a journey!</p>

<blockquote>
  <p>Most of the following summary is based on the <strong>‘12.4 Natural Language Processing’</strong> from the 
bible of deep learning, <strong><em>‘Deep Learning’ by Goodfellow et al</em></strong>.</p>
</blockquote>

<h3 id="whats-a-natural-language">What’s a natural language?</h3>

<p>Let’s start with the basics! <strong>What’s a natural language?</strong> For this, I consult the <a href="https://en.wikipedia.org/wiki/Natural_language">Wikipedia page for ‘Natural language’</a>:</p>

<blockquote>
  <p>… a <strong>natural language</strong> or <strong>ordinary language</strong> is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation.</p>
</blockquote>

<p>Interesting!</p>

<h3 id="what-aretokens">What are tokens?</h3>

<p>We need to understand what tokens are to understand what language models are. So, what are ‘tokens’ in the context of natural languages? Say that we have a bunch of sentences. We want to build some model that uses individual words as its smallest units. Then our tokens are individual words. Say that instead, we want to build some model that uses individual characters as its smallest units. Then our tokens are individual characters. Either way, we start with strings and chop them up into useful little pieces. <strong>These little pieces are our tokens!</strong></p>

<h3 id="what-are-languagemodels">What are language models?</h3>

<p>We’re finally ready to define <strong>language models</strong>. From page 456 of Goodfellow et al:</p>

<blockquote>
  <p>A <strong>language model</strong> deﬁnes a probability distribution over sequences of tokens in a natural language.</p>
</blockquote>

<p>Why would we want to define such probability distributions? Good question! Given our language model, we could ask a question like this:</p>

<blockquote>
  <p>Which sequence is more likely in our language: <em>“Snoopy is a beagle”</em> or <em>“Beagle Snoopy is”</em>?</p>
</blockquote>

<p>If we’ve built our language model using grammatically correct texts, then we would find that the first sequence is more likely to occur than the second one! We could also ask a question like this:</p>

<blockquote>
  <p>Given the sequence <em>“Snoopy is a”</em>, which word out of my vocabulary of words maximises the probability of the entire sequence?</p>
</blockquote>

<p>These probabilities are very useful! For example, they can be used to solve real-life problems like predicting the next word you are about to type in a sentence.</p>

<h3 id="whats-an-n-gram">What’s an n-gram?</h3>

<p>Many traditional language models are based on specific types of sequences of tokens in a natural language. These are called \(n\)-grams and are simply sequences of <strong>\(n\)-tokens</strong>! These language models define the conditional probability of the \(n\)-th token given th \(n-1\) tokens that came before it.</p>

<p>You might be wondering:</p>

<blockquote>
  <p>Why the ‘gram’ in \(n\)-gram?</p>
</blockquote>

<p>Apparently it is a Greek suffix which means “something written”!</p>

<h2 id="how-were-these-words-represented-traditionally">How were these words represented traditionally?</h2>

<p>Traditionally, \(n\)-grams were represented in the one-hot vector space.</p>

<p>Let’s say that we create word-level tokens from our sentence, ‘Snoopy is a beagle’. Let’s create our word tokens from this sentence. At this point, we’ll also import the packages used in this article:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="n">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>

<span class="n">sentence</span> <span class="o">=</span> <span class="sh">"</span><span class="s">Snoopy is a beagle</span><span class="sh">"</span>

<span class="n">tokens</span> <span class="o">=</span> <span class="n">sentence</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="sh">"</span><span class="s"> </span><span class="sh">"</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>['Snoopy', 'is', 'a', 'beagle']
</code></pre></div></div>

<p>We’ll map each word to an index and assign a one to the component at the same index in our one-hot vector. The rest of our components will be zeros.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">index_word</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="n">x</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">tokens</span><span class="p">)}</span>

<span class="nf">print</span><span class="p">(</span><span class="n">index_word</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{0: 'Snoopy', 1: 'is', 2: 'a', 3: 'beagle'}
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">num_classes</span> <span class="o">=</span> <span class="nf">len</span><span class="p">(</span><span class="n">index_word</span><span class="p">)</span>

<span class="n">index_one_hot</span> <span class="o">=</span> <span class="p">{</span><span class="n">i</span><span class="p">:</span> <span class="n">tf</span><span class="p">.</span><span class="nf">one_hot</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">depth</span><span class="o">=</span><span class="n">num_classes</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">x</span> <span class="ow">in</span> <span class="nf">enumerate</span><span class="p">(</span><span class="n">index_word</span><span class="p">.</span><span class="nf">keys</span><span class="p">())}</span>

<span class="k">for</span> <span class="n">k</span><span class="p">,</span> <span class="n">v</span> <span class="ow">in</span> <span class="n">index_one_hot</span><span class="p">.</span><span class="nf">items</span><span class="p">():</span>
    <span class="n">word</span> <span class="o">=</span> <span class="n">index_word</span><span class="p">[</span><span class="n">k</span><span class="p">]</span>
    <span class="n">one_hot_vector</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="nf">numpy</span><span class="p">()</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="si">{</span><span class="n">word</span><span class="si">:</span><span class="o">&lt;</span><span class="mi">6</span><span class="si">}</span><span class="s">: </span><span class="si">{</span><span class="n">one_hot_vector</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Snoopy: [1. 0. 0. 0.]
is    : [0. 1. 0. 0.]
a     : [0. 0. 1. 0.]
beagle: [0. 0. 0. 1.]
</code></pre></div></div>

<p>We can observe a few things about these vectors.</p>

<p><strong>Firstly, the one-hot vector space is discrete.</strong></p>

<p><strong>Secondly, we can see that the dimensions of our one-hot vectors are as large as our vocabulary is.</strong> This is a problem as our vocabulary could consist of millions of words! We can also see that these vectors are sparse (they contain mostly zeros). Embedding vectors on the other hand commonly have <strong>dimensions that are far smaller than the sizes of our vocabularies.</strong> Each of the components of our embedding vectors are <strong>floating-point numbers.</strong> They are not sparse but are <strong>dense vectors</strong>. Given the same number of dimensions, our embedding vectors can represent many more distinct configurations than their one-hot counterparts.</p>

<p>Let’s say that we have our four same words. This time, we’ll represent each word with two-dimensional vectors of floating-point numbers. We’ll randomly create them like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">embeddings</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="nf">uniform</span><span class="p">((</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="n">minval</span><span class="o">=-</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">maxval</span><span class="o">=</span><span class="mf">0.05</span><span class="p">).</span><span class="nf">numpy</span><span class="p">()</span>

<span class="nf">print</span><span class="p">(</span><span class="n">embeddings</span><span class="p">)</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[[-0.00841825 -0.02467561]
 [-0.03953496  0.01846253]
 [-0.03010724  0.03095749]
 [-0.01248298  0.00497364]]
</code></pre></div></div>

<p>Behold the density of these vectors! We’ll come back to these vectors shortly.</p>

<p>Thirdly, we can’t use them to answer questions like <em>“Is the word ‘Snoopy’ more similar to the word ‘beagle’ than it is to the word ‘is’?”</em>. Let’s calculate the Euclidean distance between these vectors. Let’s start with the distance between ‘Snoopy’ and ‘beagle’:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">snoopy_vec</span> <span class="o">=</span> <span class="n">index_one_hot</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">beagle_vec</span> <span class="o">=</span> <span class="n">index_one_hot</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span>

<span class="n">snoopy_vs_beagle</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">square</span><span class="p">(</span><span class="n">snoopy_vec</span> <span class="o">-</span> <span class="n">beagle_vec</span><span class="p">)))</span>

<span class="nf">print</span><span class="p">(</span><span class="n">snoopy_vs_beagle</span><span class="p">.</span><span class="nf">numpy</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1.4142135
</code></pre></div></div>

<p>Next, the distance between ‘Snoopy’ and ‘is’:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">is_vec</span> <span class="o">=</span> <span class="n">index_one_hot</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>

<span class="n">snoopy_vs_is</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">square</span><span class="p">(</span><span class="n">snoopy_vec</span> <span class="o">-</span> <span class="n">is_vec</span><span class="p">)))</span>

<span class="nf">print</span><span class="p">(</span><span class="n">snoopy_vs_is</span><span class="p">.</span><span class="nf">numpy</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1.4142135
</code></pre></div></div>

<p>In both cases, we can see that the distance is \(\sqrt 2\)! <strong>These words are equally dissimilar!</strong></p>

<p>Let’s return to our randomly created word vectors. We can see that the distances between these word vectors don’t all equal \(\sqrt 2\)! This is a good start. Assuming that each word vector corresponds to the same words as in the one-hot vector example, we can observe the differences in our Euclidean distances:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">snoopy_vs_beagle</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">square</span><span class="p">(</span><span class="n">embeddings</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">embeddings</span><span class="p">[</span><span class="mi">3</span><span class="p">])))</span>

<span class="nf">print</span><span class="p">(</span><span class="n">snoopy_vs_beagle</span><span class="p">.</span><span class="nf">numpy</span><span class="p">())</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0.029926574
</code></pre></div></div>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">snoopy_vs_is</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">reduce_sum</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">square</span><span class="p">(</span><span class="n">embeddings</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="n">embeddings</span><span class="p">[</span><span class="mi">1</span><span class="p">])))</span>

<span class="nf">print</span><span class="p">(</span><span class="n">snoopy_vs_is</span><span class="p">.</span><span class="nf">numpy</span><span class="p">())</span>
</code></pre></div></div>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0.05318974
</code></pre></div></div>

<p>Wouldn’t it be nice if we could learn representations for each word where the distance between vectors can be used as a gauge for their similarities?</p>

<p>Let’s now talk about why these dense vectors can help us achieve this. I can’t explain this as well as Yoav Goldberg did in <a href="http://u.cs.biu.ac.il/~yogo/nnlp.pdf"><em>A Primer on Neural Network Models
for Natural Language Processing</em></a>, so I will quote from it!</p>

<p>The author starts with this, beginning on page 6:</p>

<blockquote>
  <p>The main benefit of the dense representations is in generalization power: if we believe some features may provide similar clues, it is worthwhile to provide a representation that is able to capture these similarities.</p>
</blockquote>

<p>The author then describes a scenario:</p>

<blockquote>
  <p>For example, assume we have observed the word ‘dog’
many times during training, but only observed the word ‘cat’ a handful of times, or not at all.</p>
</blockquote>

<p>He then explains what the outcome of this scenario would be if we were to represent the words in the one-hot vector space:</p>

<blockquote>
  <p>If each of the words is associated with its own dimension, occurrences of ‘dog’ will not tell us anything about the occurrences of ‘cat’.</p>
</blockquote>

<p>He then explains what the outcome could be if we were to use word embeddings to represent the same words:</p>

<blockquote>
  <p>However, in the dense vectors representation
the learned vector for ‘dog’ may be similar to the learned vector from ‘cat’, allowing the model to share statistical strength between the two events.</p>
</blockquote>

<p>By allowing a concept of a ‘dog’ to be distributed across potentially multiple vectors and multiple dimensions, our dense word embeddings allow us to <strong>“recognize
that two words are similar without losing the ability to encode each word as distinct from the other”</strong> (Goodfellow et al, pages 458-459).</p>

<p>To summarise this section, we can say that word embeddings are generally more efficient and meaningful representations of our words compared to one-hot vectors. We’ll be using these word embeddings as features in the rest of our tutorial.</p>

<h1 id="lets-build-a-toy-model">Let’s build a toy model</h1>

<p>Let’s create some ‘sentences’. Each sentence contains two words that share similar meanings.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sentences</span> <span class="o">=</span> <span class="p">[</span>
    <span class="sh">"</span><span class="s">snoopy dog</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">milo dog</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">dumbo elephant</span><span class="sh">"</span><span class="p">,</span>
    <span class="sh">"</span><span class="s">portugal country</span><span class="sh">"</span><span class="p">,</span> 
    <span class="sh">"</span><span class="s">brazil country</span><span class="sh">"</span><span class="p">,</span>
<span class="p">]</span>
</code></pre></div></div>

<p>We will represent each of these as word vectors. Our goal is to train a model that places word vectors with similar meanings closer together in some two-dimensional space.</p>

<p>Instead of manually preparing our tokens and assigning indices to them, we’ll use the Keras <code class="language-plaintext highlighter-rouge">Tokenizer</code>. Firstly, we’ll create our vocabulary:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">tokeniser</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">preprocessing</span><span class="p">.</span><span class="n">text</span><span class="p">.</span><span class="nc">Tokenizer</span><span class="p">()</span>
<span class="n">tokeniser</span><span class="p">.</span><span class="nf">fit_on_texts</span><span class="p">(</span><span class="n">sentences</span><span class="p">)</span>

<span class="nf">print</span><span class="p">(</span><span class="n">tokeniser</span><span class="p">.</span><span class="n">word_index</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{'dog': 1, 'country': 2, 'snoopy': 3, 'milo': 4, 'dumbo': 5, 'elephant': 6, 'portugal': 7, 'brazil': 8}
</code></pre></div></div>

<p>Then we’ll convert our sentences into sequences of indices which map to words in our vocabulary:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sequences</span> <span class="o">=</span> <span class="n">tokeniser</span><span class="p">.</span><span class="nf">texts_to_sequences</span><span class="p">(</span><span class="n">sentences</span><span class="p">)</span>
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[3, 1]
[4, 1]
[5, 6]
[7, 2]
[8, 2]
</code></pre></div></div>

<p>We take note of the size of our vocabulary which we will use when creating our <code class="language-plaintext highlighter-rouge">Embedding</code> layer. Index zero is a special padding value in the Keras <code class="language-plaintext highlighter-rouge">Embedding</code> layer so we add one to our largest word index to account for it:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">VOCAB_SIZE</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="n">tokeniser</span><span class="p">.</span><span class="n">index_word</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span>
<span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">VOCAB_SIZE: </span><span class="si">{</span><span class="n">VOCAB_SIZE</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>VOCAB_SIZE: 9
</code></pre></div></div>

<p>We want our neural network to learn to pull words in each of our sentences closer together, while also learning to push each word away from a randomly chosen <code class="language-plaintext highlighter-rouge">negative example</code>. This negative sampling is accomplished by the <code class="language-plaintext highlighter-rouge">negative_samples</code> argument in <code class="language-plaintext highlighter-rouge">tf.keras.preprocessing.sequence.skipgrams</code>. We use this function to create a newly sampled training set at the beginning of each epoch:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">make_skipgrams</span><span class="p">():</span>
    <span class="n">train_x</span><span class="p">,</span> <span class="n">all_labels</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[]</span>
    <span class="k">for</span> <span class="n">sequence</span> <span class="ow">in</span> <span class="n">sequences</span><span class="p">:</span>
        <span class="n">pairs</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">preprocessing</span><span class="p">.</span><span class="n">sequence</span><span class="p">.</span><span class="nf">skipgrams</span><span class="p">(</span>
            <span class="n">sequence</span><span class="p">,</span> <span class="n">VOCAB_SIZE</span><span class="p">,</span> <span class="n">negative_samples</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">window_size</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">shuffle</span><span class="o">=</span><span class="bp">True</span>
        <span class="p">)</span>
        <span class="n">train_x</span><span class="p">.</span><span class="nf">extend</span><span class="p">(</span><span class="n">pairs</span><span class="p">)</span>
        <span class="n">all_labels</span><span class="p">.</span><span class="nf">extend</span><span class="p">(</span><span class="n">labels</span><span class="p">)</span>

    <span class="n">train_x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">train_x</span><span class="p">)</span>
    <span class="n">all_labels</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">all_labels</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">float32</span><span class="p">)</span>
    
    <span class="n">content_words</span> <span class="o">=</span> <span class="n">train_x</span><span class="p">[:,</span> <span class="mi">0</span><span class="p">]</span>
    <span class="n">context_words</span> <span class="o">=</span> <span class="n">train_x</span><span class="p">[:,</span> <span class="mi">1</span><span class="p">]</span>
    
    <span class="k">return</span> <span class="n">content_words</span><span class="p">,</span> <span class="n">context_words</span><span class="p">,</span> <span class="n">all_labels</span>
</code></pre></div></div>

<p>We then build our model. The focus of this post isn’t to explain this toy model so I’ll be brief:</p>

<ul>
  <li>We input into our network a pair of integers corresponding to the position of our word embedding in our embedding matrix.</li>
  <li>A binary label is passed into the network as well. This label is zero if the two words should be treated as negative examples and it is one if the two words should be associated with each other.</li>
  <li>We look up the corresponding word vectors in our matrix of embedding vectors.</li>
  <li>We calculate the cosine similarity between the two vectors and pass it into our sigmoid unit. This allows us to treat this as a binary classification problem.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># inputs
</span><span class="n">content_input</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">content_word</span><span class="sh">'</span><span class="p">)</span>
<span class="n">context_input</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Input</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">tf</span><span class="p">.</span><span class="n">int32</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">context_word</span><span class="sh">'</span><span class="p">)</span>

<span class="c1"># layers
</span><span class="n">embeddings</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Embedding</span><span class="p">(</span><span class="n">input_dim</span><span class="o">=</span><span class="n">VOCAB_SIZE</span><span class="p">,</span> <span class="n">output_dim</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">embeddings</span><span class="sh">'</span><span class="p">)</span>
<span class="n">dot_prod</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dot</span><span class="p">(</span><span class="n">axes</span><span class="o">=</span><span class="mi">2</span><span class="p">,</span> <span class="n">normalize</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">dot_product</span><span class="sh">'</span><span class="p">)</span>
<span class="c1"># graph
</span><span class="n">content_embedding</span> <span class="o">=</span> <span class="nf">embeddings</span><span class="p">(</span><span class="n">content_input</span><span class="p">)</span>
<span class="n">context_embedding</span> <span class="o">=</span> <span class="nf">embeddings</span><span class="p">(</span><span class="n">context_input</span><span class="p">)</span>

<span class="n">cosine_sim</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Flatten</span><span class="p">(</span><span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">flatten</span><span class="sh">'</span><span class="p">)(</span><span class="nf">dot_prod</span><span class="p">([</span><span class="n">content_embedding</span><span class="p">,</span> <span class="n">context_embedding</span><span class="p">]))</span>
<span class="n">dense_out</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">layers</span><span class="p">.</span><span class="nc">Dense</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="sh">'</span><span class="s">sigmoid</span><span class="sh">'</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="sh">'</span><span class="s">sigmoid_out</span><span class="sh">'</span><span class="p">)(</span><span class="n">cosine_sim</span><span class="p">)</span>

<span class="c1"># model
</span><span class="n">model</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">models</span><span class="p">.</span><span class="nc">Model</span><span class="p">(</span><span class="n">inputs</span><span class="o">=</span><span class="p">[</span><span class="n">content_input</span><span class="p">,</span> <span class="n">context_input</span><span class="p">],</span> <span class="n">outputs</span><span class="o">=</span><span class="p">[</span><span class="n">dense_out</span><span class="p">])</span>

<span class="n">DECAY_RATE</span> <span class="o">=</span> <span class="mf">5e-6</span>
<span class="n">LR</span> <span class="o">=</span> <span class="mf">0.1</span>

<span class="n">optimiser</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="nc">SGD</span><span class="p">(</span><span class="n">learning_rate</span><span class="o">=</span><span class="n">LR</span><span class="p">,</span> <span class="n">decay</span><span class="o">=</span><span class="n">DECAY_RATE</span><span class="p">)</span>
<span class="n">model</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="sh">'</span><span class="s">binary_crossentropy</span><span class="sh">'</span><span class="p">,</span> <span class="n">optimizer</span><span class="o">=</span><span class="n">optimiser</span><span class="p">,</span> <span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="sh">'</span><span class="s">accuracy</span><span class="sh">'</span><span class="p">])</span>
</code></pre></div></div>

<p>This is what the above model looks like:</p>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/model_schema.png" alt="" /></p>

<p>We train the model like this while saving plots of our embedding vectors upon completion of each epoch:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">loss_hist</span> <span class="o">=</span> <span class="p">[]</span>

<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nf">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
    
    <span class="k">if</span> <span class="n">i</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">:</span>
        
        <span class="n">content_words</span><span class="p">,</span> <span class="n">context_words</span><span class="p">,</span> <span class="n">labels</span> <span class="o">=</span> <span class="nf">make_skipgrams</span><span class="p">()</span>
        
        <span class="n">hist</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="nf">fit</span><span class="p">([</span><span class="n">content_words</span><span class="p">,</span> <span class="n">context_words</span><span class="p">],</span> <span class="n">labels</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
        <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">loss: </span><span class="si">{</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="sh">'</span><span class="s">loss</span><span class="sh">'</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="si">:</span><span class="p">.</span><span class="mi">4</span><span class="n">f</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
        <span class="n">loss_hist</span><span class="p">.</span><span class="nf">extend</span><span class="p">(</span><span class="n">hist</span><span class="p">.</span><span class="n">history</span><span class="p">[</span><span class="sh">'</span><span class="s">loss</span><span class="sh">'</span><span class="p">])</span>
    
    <span class="n">embedding_vectors</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">array</span><span class="p">(</span><span class="n">embeddings</span><span class="p">.</span><span class="n">weights</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">numpy</span><span class="p">())</span>

    <span class="n">fig</span><span class="p">,</span> <span class="n">ax</span> <span class="o">=</span> <span class="n">plt</span><span class="p">.</span><span class="nf">subplots</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">10</span><span class="p">,</span><span class="mi">10</span><span class="p">))</span>

    <span class="n">ax</span><span class="p">.</span><span class="nf">scatter</span><span class="p">(</span><span class="n">embedding_vectors</span><span class="p">[</span><span class="mi">1</span><span class="p">:,</span> <span class="mi">0</span><span class="p">],</span> <span class="n">embedding_vectors</span><span class="p">[</span><span class="mi">1</span><span class="p">:,</span> <span class="mi">1</span><span class="p">],</span>  <span class="n">c</span><span class="o">=</span><span class="sh">'</span><span class="s">white</span><span class="sh">'</span><span class="p">)</span>

    <span class="k">for</span> <span class="n">idx</span><span class="p">,</span> <span class="n">word</span> <span class="ow">in</span> <span class="nf">sorted</span><span class="p">(</span><span class="n">tokeniser</span><span class="p">.</span><span class="n">index_word</span><span class="p">.</span><span class="nf">items</span><span class="p">()):</span>
        <span class="n">x_coord</span> <span class="o">=</span> <span class="n">embedding_vectors</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
        <span class="n">y_coord</span> <span class="o">=</span> <span class="n">embedding_vectors</span><span class="p">[</span><span class="n">idx</span><span class="p">,</span> <span class="mi">1</span><span class="p">]</span>

        <span class="n">ax</span><span class="p">.</span><span class="nf">annotate</span><span class="p">(</span>
            <span class="n">word</span><span class="p">,</span> 
            <span class="p">(</span><span class="n">x_coord</span><span class="p">,</span> <span class="n">y_coord</span><span class="p">),</span> 
            <span class="n">horizontalalignment</span><span class="o">=</span><span class="sh">'</span><span class="s">center</span><span class="sh">'</span><span class="p">,</span>
            <span class="n">verticalalignment</span><span class="o">=</span><span class="sh">'</span><span class="s">center</span><span class="sh">'</span><span class="p">,</span>
            <span class="n">size</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
            <span class="n">alpha</span><span class="o">=</span><span class="mf">0.7</span>
        <span class="p">)</span>
        
        <span class="n">ax</span><span class="p">.</span><span class="nf">set_title</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">iteration-</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>

    <span class="n">plt</span><span class="p">.</span><span class="nf">savefig</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="s">iteration-</span><span class="si">{</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="si">:</span><span class="mi">03</span><span class="n">d</span><span class="si">}</span><span class="s">.jpg</span><span class="sh">"</span><span class="p">)</span>

</code></pre></div></div>

<p>The result is a bunch of embeddings that move through space!</p>

<p><img src="/assets/post_images/2020-05-25-learning-to-rank-part-1/embeddings.gif" alt="" width="600px" class="align-center" /></p>

<p>We start out with our words scattered randomly throughout our two-dimensional space. Over the course of twenty epochs, our model has learnt to place the words in our pairs closer together!</p>

<h1 id="conclusion">Conclusion</h1>

<p>We’ve begun our ‘learning to rank’ adventure. We briefly explored a motivating example. We then spent some time exploring word embeddings so that we can use the words in our queries and documents as features in our upcoming model.</p>

<p>Next time, we will be exploring ListNet and implementing it.</p>

<p>Let’s do this!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Machine-learning&quot;, &quot;TensorFlow&quot;, &quot;Ranking&quot;, &quot;Deep-learning&quot;]" /><summary type="html"><![CDATA[The first post in an epic to learn to rank lists of things!]]></summary></entry><entry><title type="html">Learn to code for data: a pragmatist’s guide</title><link href="https://embracingtherandom.com/coding/data-analysis/learning/learn-to-code-for-data/" rel="alternate" type="text/html" title="Learn to code for data: a pragmatist’s guide" /><published>2020-04-30T07:00:00+10:00</published><updated>2020-04-30T07:00:00+10:00</updated><id>https://embracingtherandom.com/coding/data-analysis/learning/learn-to-code-for-data</id><content type="html" xml:base="https://embracingtherandom.com/coding/data-analysis/learning/learn-to-code-for-data/"><![CDATA[<blockquote>
  <p>A recipe to go from spreadsheet to code</p>
</blockquote>

<p><img src="/assets/post_images/2020-04-30-learn-to-code-for-data/header.jpg" alt="" /></p>

<p>Spreadsheet applications like Microsoft Excel and Google Sheets are great. They’re hard to beat when you want to perform simple calculations or complete some financial modelling.</p>

<p>However, there comes a point when you should ditch the spreadsheet and go with another solution. I’ve seen some hideous spreadsheet reports in my time! When you start using your spreadsheet as a report, a database, and as a data transformation tool, you need to stop! You have gone too far. These spreadsheets are nightmares to maintain and debug.</p>

<p>So what’s the alternative? The alternative is to <strong>learn to code for data analysis!</strong> You’ve no reason to be scared of code. If you already know how to use a spreadsheet program, you have plenty of analogies you can draw from to make this an easy process. I’ve mentored several analysts through these stages of their development before. I’ve seen the approach outlined in this article work time and time again!</p>

<p>Let’s get started!</p>

<h1 id="prerequisite-learn-how-to-use-a-spreadsheet-to-analyse-data">Prerequisite: Learn how to use a spreadsheet to analyse data</h1>

<p>If you don’t know how to use a spreadsheet application, this is where you should start. This is where you’ll learn the data-related concepts that’ll make your transition into ‘coding for data’ easier.</p>

<h2 id="starting-with-zero-experience">Starting with zero experience</h2>

<p>If you’re starting with zero experience, then I’d recommend taking a free spreadsheet course on YouTube. There are many of these so just pick one and complete it.</p>

<h2 id="starting-with-some-experience">Starting with some experience</h2>

<p>If you’re starting with some experience, then you just need to practice until you feel comfortable with spreadsheets. There are many open-source data sets available online. Come up with 5 questions you could ask of the data and answer them using those data sets. Some questions you could ask are these:</p>

<blockquote>
  <p>“What is the average value in column X?”<br />
“How many cells are blank/contain missing values in column X?”<br />
“Sort column X in descending order.”<br />
“Create a line chart of column X.”<br />
“What is the sum of column X for each of the values in column Y?”</p>
</blockquote>

<h2 id="the-concepts-you-will-be-learning">The concepts you will be learning</h2>

<p>Regardless of your experience level, your adventures in spreadsheet land will teach you some valuable concepts that apply to ‘coding for data’:</p>

<ul>
  <li><strong>Formulas can be applied to individual cells or to a bunch of cells.</strong> This is similar to the concept of ‘vectorisation’ where we apply operations to entire arrays.</li>
  <li><strong>Data can be summarised through aggregation.</strong> Pivot tables will teach us what it means to count or sum columns by groups defined in other columns.</li>
  <li><strong>Visualisations are powerful.</strong> Pivot charts will allow you to understand which types of charts work for different types of data.</li>
  <li><strong>Two data sets can be ‘merged’.</strong> We can learn this through the power of formulas like ‘VLOOKUP’.
Once you feel ‘fluent’ in using your spreadsheet application of choice, it’s time to graduate to the land of code!</li>
</ul>

<h1 id="going-from-spreadsheet-to-code">Going from spreadsheet to code</h1>

<h2 id="languages-to-focus-on">Languages to focus on</h2>

<p>Microsoft Office users might be thinking:</p>

<blockquote>
  <p>I use Excel heavily and I’ve heard about ‘VBA’. I heard its code can be used to automate my Excel reports. Should I learn it?</p>
</blockquote>

<p>No! Avoid VBA. I say this as someone who has written a lot of it in the past. You’ll be better off learning a ‘transferable skill’ — a skill that you can take to your next employer regardless of whether they happen to use Microsoft Office.</p>

<p>Assuming that you have no programming experience, I think that you should focus on <strong>SQL and R</strong>.</p>

<p>Learning SQL is a no-brainer. It’s the language of relational databases which are found everywhere! You must learn it.</p>

<p>But why R? Why not Python? <strong>Because it’s made for data analysis!</strong> As it’s been designed with this specific purpose in mind, it’ll allow you to spend more time analysing your data instead of having to grapple with the abstract aspects of programming languages. For example, R comes with built-in data sets like ‘mtcars’ that you can start analysing straight away. CSV files can be imported in one line by using ‘read.csv()’. You can create some (ugly) histograms using ‘hist()’. You can create scatterplots and more by using ‘plot()’. The point is, these are all built-in features of R which allow you to start analysing your data seconds after you have finished installing it!</p>

<p><strong>I want to be clear:</strong> I’m not saying that you shouldn’t learn Python. I absolutely love Python and it is my language of choice! You should absolutely learn Python. Just don’t learn it now. My ulterior motive is that by getting you to learn R first, you will be able to experience some coding-related successes right away. The hope is that these small successes will keep you, the coding student, motivated to continue to pursue the rewarding craft that is ‘coding for data’.</p>

<h3 id="theres-so-much-i-could-learn-about-these-languages-which-aspects-should-i-focus-on">There’s so much I could learn about these languages! Which aspects should I focus on?</h3>

<p>We want to be pragmatic here and focus our energy on learning things that we’re likely to use in our jobs as analysts. To come up with our list of things to focus on, let’s follow this simple recipe:</p>

<blockquote>
  <ol>
    <li>Make a list of all the spreadsheet-based reports you update on a regular basis. Add to that list all the ad hoc pieces of analysis that you’ve performed using spreadsheets over the last six months.</li>
    <li>Take a five-minute break because this can be tiring!</li>
    <li>Open one of the spreadsheets on your list.</li>
    <li>Set a timer for ten-minutes.</li>
    <li>In a second list, take note of some of the formulas you’ve used in the report. Take note of the visualisations you’ve created. Also, take note of any numbers calculated using pivot tables. <strong>Be as broad as possible!</strong> There’s no need to be specific about how you performed a ‘VLOOKUP’ using exact matches in column B, while returning the values in column F. Just take note of the fact that you performed a ‘VLOOKUP’. Next to each formula, <strong>keep a tally of how many times you’ve encountered this formula across your spreadsheets.</strong> Keep going until you run out of time or move onto the next spreadsheet if you finish before the ten minutes are up.</li>
    <li>Take a five-minute break and move onto the next spreadsheet on your list.</li>
  </ol>
</blockquote>

<p>Keep working through the list until you’ve had enough! <strong>Sort your list</strong> in descending order by the number of times each formula/visualisation/pivot table value appeared across your reports and analysis. This is the order of priority in which you should conduct your learning!</p>

<h2 id="how-should-i-go-about-learning-these-things">How should I go about learning these things?</h2>

<p>We will divide up our learning into <strong>two, twenty-five-minute sessions</strong> (two Pomodoros for those Pomodoro Technique practitioners). <strong>Complete these two sessions before work!</strong> We are all weak after our days of hard work. The later we leave our sessions in the day, the less likely it is that we will complete them at all.</p>

<h3 id="for-the-first-twenty-five-minutes-we-focus-on-learning-r">For the first twenty-five minutes, we focus on learning R:</h3>

<ul>
  <li>Take the first thing on the list and learn how to do that thing in R. Prioritise learning how to do it using the friendly <strong>dplyr package</strong>. If you can’t find out how to do it using dplyr, then broaden your search to look for how to do that thing using R in general.</li>
  <li>If you have any data sets from work you could use, then use them. If not, use the built-in R data sets or any open-source data sets that you can find online.</li>
  <li>Once your twenty-five minutes are over, ask yourself whether you can apply this skill fluently. If you can, cross this first skill of your list. You can move onto the next item on the list tomorrow. If you struggled, we will continue practicing the skill tomorrow.</li>
</ul>

<h3 id="for-the-second-twenty-five-minute-session-we-focus-on-learning-sql">For the second twenty-five minute session, we focus on learning SQL:</h3>

<ul>
  <li>Work through the free <a href="https://sqlzoo.net/">SQLZOO</a> course.</li>
  <li>Once you’ve completed the course, import some data into <a href="https://sqliteonline.com/">this online SQL environment</a> and work on aggregating and joining your tables. <strong>Be sure not to upload any work-related data!</strong></li>
  <li>Once you can fluently aggregate and join tables, learn to apply some window functions.</li>
  <li>Once you can do all of this fluently, you can stop learning SQL. Replace the twenty-five-minute SQL session with a twenty-five-minute R session.</li>
</ul>

<p>As you become fluent in R and SQL, <strong>look for ways in which you can start using your new skills at work.</strong> Feel how much more power you have over your data now that you know how to code!</p>

<h2 id="ive-completed-my-list-what-should-i-do-now">I’ve completed my list! What should I do now?</h2>
<p>Don’t get stuck doing the basic things. Keep pushing yourself. There is more to life than pulling lists of data and running reports!</p>

<p>Start learning some fancier things. For example, work through <a href="https://r4ds.had.co.nz/introduction.html">this free book</a> by Garrett Grolemund and R legend Hadley Wickham. Find out what <a href="https://www.kaggle.com/">Kaggle</a> is. Subscribe to <a href="https://www.r-bloggers.com/">R-bloggers</a> and learn from your fellow R users.</p>

<p><strong>Work on R daily for a solid six months.</strong> Once you feel like you are fluent in R, start thinking about learning Python. Now that you have two programming languages under your belt, your move to Python will be much easier!</p>

<h1 id="conclusion">Conclusion</h1>

<p>Watching the colleagues that I have mentored grow from spreadsheet data analysts to coding data analysts have been some of the most rewarding moments in my career so far. Sadly, I can’t be there to personally guide you through your transformation! I hope that this guide gives you enough information for you to take your first step towards <strong>levelling up</strong> your skills to becoming a more powerful data analyst.</p>

<p>You can do it!</p>

<p>Justin</p>]]></content><author><name> Hello, world!&lt;br&gt; My name is Justin.</name></author><category term="[&quot;Coding&quot;, &quot;Data-analysis&quot;, &quot;Learning&quot;]" /><summary type="html"><![CDATA[A recipe to go from spreadsheet to code]]></summary></entry></feed>