<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://ylam21.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ylam21.github.io/" rel="alternate" type="text/html" /><updated>2026-05-02T10:24:59+00:00</updated><id>https://ylam21.github.io/feed.xml</id><title type="html">Ondřej Malý</title><subtitle>My Github Pages Website</subtitle><entry><title type="html">Measuring data throughput of get_next_line</title><link href="https://ylam21.github.io/2026/04/27/Measuring-data-throughput-of-get_next_line.html" rel="alternate" type="text/html" title="Measuring data throughput of get_next_line" /><published>2026-04-27T00:00:00+00:00</published><updated>2026-04-27T00:00:00+00:00</updated><id>https://ylam21.github.io/2026/04/27/Measuring%20data%20throughput%20of%20get_next_line</id><content type="html" xml:base="https://ylam21.github.io/2026/04/27/Measuring-data-throughput-of-get_next_line.html"><![CDATA[<h1 id="measuring-data-throughput-of-get_next_line">Measuring data throughput of get_next_line</h1>
<p><strong>NOTE</strong>: This post is primarily for 42 students, since they have probably written their implementation of get_next_line already.<br />
The important fact is that for 42 projects you are <strong>not allowed</strong> to find out the size of your file with any external function (such as ‘stat’ on Linux). To read and write to a file you are allowed to use <code class="language-plaintext highlighter-rouge">read</code> and <code class="language-plaintext highlighter-rouge">write</code> functions provided by the <code class="language-plaintext highlighter-rouge">unistd.h</code>.<br />
The <code class="language-plaintext highlighter-rouge">get_next_line</code> function should return the next line from a file each time it is called.<br /></p>
<h2 id="reading-from-a-file-with-unknown-size">Reading from a file with unknown size</h2>

<p>Since we do not know the size of the file it is a logical approach to read the file by chunks until the end of the file.<br /> In this article, I would like to figure out the ideal size of those chunks, in our case, BUFFER_SIZE macro defines the size of one chunk. So whenever we call our <code class="language-plaintext highlighter-rouge">read</code> it will look like this:<br /></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">read_bytes</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buffer</span><span class="p">,</span> <span class="n">BUFFER_SIZE</span><span class="p">);</span>
</code></pre></div></div>
<p>In theory, with file size of TOTAL_SIZE and buffer size of BUFFER_SIZE, the <code class="language-plaintext highlighter-rouge">read</code> should
be called N + 1 times, where <br /> N = TOTAL_SIZE / BUFFER_SIZE - the extra call is for the case where we will return an empty string after reaching end of the file.<br />
To support this statement, let’s test it my main loop that looks like this:<br /></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">t_arena_temp</span> <span class="n">temp</span> <span class="o">=</span> <span class="n">temp_begin</span><span class="p">(</span><span class="n">arena</span><span class="p">);</span>
<span class="n">t_string</span> <span class="n">line</span> <span class="o">=</span> <span class="n">get_next_line_from_fd</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">line</span><span class="p">.</span><span class="n">size</span><span class="p">)</span>
<span class="p">{</span>
    <span class="c1">// write(STDOUT_FILENO, line.str, line.size);</span>
    <span class="n">temp_end</span><span class="p">(</span><span class="n">temp</span><span class="p">);</span>
    <span class="n">line</span> <span class="o">=</span> <span class="n">get_next_line_from_fd</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">temp_end</span><span class="p">(</span><span class="n">temp</span><span class="p">);</span>
</code></pre></div></div>
<p>ASIDE:
Short description of my <code class="language-plaintext highlighter-rouge">get_next_line</code> implementation: when it comes to memory managment, I am using arena allocator, when it comes to string processing I am operating with <code class="language-plaintext highlighter-rouge">t_string</code> structs only - they group the pointer to the string and the size/length of that string. The source code for this function is at the very end of this article.<br /><br />
Let’s count how many times <code class="language-plaintext highlighter-rouge">read</code> function will be called:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 main.c
<span class="nv">$ </span>./a.out test_files/test_100.txt
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>filename: test_files/test_100.txt
size: 52428800 bytes

<span class="nt">---</span> PROFILING RESULTS <span class="nt">---</span>
Total <span class="nb">time</span>: 63 milliseconds <span class="o">(</span>CPU freq: 3193934535<span class="o">)</span>
read_in_gnl         : 39005952     cycles <span class="o">[</span>19.18%] | No children blocks  | Hits 12801
get_next_line_loop       : 203198528    cycles <span class="o">[</span>99.92%] | Exclusive: <span class="o">[</span>80.74%] | Hits 1
<span class="nt">-------------------------</span>
</code></pre></div></div>
<p>We get 12801 hits for our <code class="language-plaintext highlighter-rouge">read</code> - that specifies how many times this function has been called.
4096 bytes is of BUFFER_SIZE and 52428800 bytes is the size of our file<br />
52428800 / 4096 = 12800 … 12800 + 1 = <strong>12801</strong>.<br /><br />
The profiling results also capture the time spent in the <code class="language-plaintext highlighter-rouge">read_in_gnl</code> block - <strong>19.18%</strong>.
We can see that we get a decent performance penalty for our call to <code class="language-plaintext highlighter-rouge">read</code>.<br />
A person would predict that reading the same file, but with smaller BUFFER_SIZE,  will get us more hits for the <code class="language-plaintext highlighter-rouge">read</code> function, therefore getting bigger penalty and bigger relative time spent in the <code class="language-plaintext highlighter-rouge">read_in_gnl</code> block. Lets decrease our BUFFER_SIZE to <strong>1024</strong> and then to <strong>64</strong>, let’s see what happens:<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>1024 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 main.c
<span class="nv">$ </span>./a.out test_files/test_100.txt
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>filename: test_files/test_100.txt
size: 52428800 bytes

<span class="nt">---</span> PROFILING RESULTS <span class="nt">---</span>
Total <span class="nb">time</span>: 85 milliseconds <span class="o">(</span>CPU freq: 3193953279<span class="o">)</span>
read_in_gnl         : 110509184    cycles <span class="o">[</span>40.64%] | No children blocks  | Hits 51201
get_next_line       : 271725472    cycles <span class="o">[</span>99.92%] | Exclusive: <span class="o">[</span>59.28%] | Hits 1
<span class="nt">-------------------------</span>
</code></pre></div></div>
<p>Ouch, <strong>40.64%</strong>! Let’s decrease the BUFFER_SIZE even more:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>64 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 main.c
<span class="nv">$ </span>./a.out test_files/test_100.txt
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>filename: test_files/test_100.txt
size: 52428800 bytes

<span class="nt">---</span> PROFILING RESULTS <span class="nt">---</span>
Total <span class="nb">time</span>: 518 milliseconds <span class="o">(</span>CPU freq: 3193957727<span class="o">)</span>
read_in_gnl         : 1450925024   cycles <span class="o">[</span>87.58%] | No children blocks  | Hits 819201
get_next_line       : 1656475264   cycles <span class="o">[</span>99.99%] | Exclusive: <span class="o">[</span>12.41%] | Hits 1
<span class="nt">-------------------------</span>
</code></pre></div></div>
<p>Now <strong>87.58%</strong>!<br />
Alright, we can clearly see a pattern. <strong>We want to have as few <code class="language-plaintext highlighter-rouge">read</code> calls as possible to read the entire file quicker</strong>. That would mean to increase the BUFFER_SIZE. But by how much?<br />
The BUFFER_SIZE we tested with is 4096. Lets increase the number:<br />
To 16384:<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>16384 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 main.c
<span class="nv">$ </span>./a.out test_files/test_100.txt
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>filename: test_files/test_100.txt
size: 52428800 bytes

<span class="nt">---</span> PROFILING RESULTS <span class="nt">---</span>
Total <span class="nb">time</span>: 55 milliseconds <span class="o">(</span>CPU freq: 3193975418<span class="o">)</span>
read_in_gnl         : 22001472     cycles <span class="o">[</span>12.51%] | No children blocks  | Hits 3201
get_next_line       : 175700864    cycles <span class="o">[</span>99.87%] | Exclusive: <span class="o">[</span>87.37%] | Hits 1
<span class="nt">-------------------------</span>
</code></pre></div></div>
<p>To 65536:<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>65536 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 main.c
<span class="nv">$ </span>./a.out test_files/test_100.txt
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>filename: test_files/test_100.txt
size: 52428800 bytes

<span class="nt">---</span> PROFILING RESULTS <span class="nt">---</span>
Total <span class="nb">time</span>: 54 milliseconds <span class="o">(</span>CPU freq: 3193963706<span class="o">)</span>
read_in_gnl         : 15507872     cycles <span class="o">[</span> 8.99%] | No children blocks  | Hits 801
get_next_line       : 172278016    cycles <span class="o">[</span>99.88%] | Exclusive: <span class="o">[</span>90.89%] | Hits 1
<span class="nt">-------------------------</span>
</code></pre></div></div>
<p><strong>8.99%</strong>, much better!</p>
<h2 id="the-repetition-tester">The Repetition Tester</h2>
<p>To produce more reliable test results I will test my code with a repetition tester.
Repetition tester is going to run our <code class="language-plaintext highlighter-rouge">get_next_line_loop</code> multiple times like this and print out results when the loop hits bigger minimum of its execution time:<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 reptester/reptest_main.c
<span class="nv">$ </span><span class="nb">sudo</span> ./a.out
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">---New</span> Test Wave: get_next_line_loop test_files/test_100.txt <span class="nv">BUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">---</span>
MIN: 195161120 <span class="o">(</span>61ms<span class="o">)</span>   50.000Mib at 818.285Mib/s
MIN: 191886912 <span class="o">(</span>60ms<span class="o">)</span>   50.000Mib at 832.247Mib/s
MIN: 190069728 <span class="o">(</span>60ms<span class="o">)</span>   50.000Mib at 840.204Mib/s
MIN: 184958496 <span class="o">(</span>58ms<span class="o">)</span>   50.000Mib at 863.423Mib/s
MIN: 184048832 <span class="o">(</span>58ms<span class="o">)</span>   50.000Mib at 867.690Mib/s
MIN: 181250880 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 881.085Mib/s
MIN: 179726880 <span class="o">(</span>56ms<span class="o">)</span>   50.000Mib at 888.556Mib/s
	MIN: 179726880 <span class="o">(</span>56ms<span class="o">)</span>   50.000Mib at 888.556Mib/s
	MAX: 200777056 <span class="o">(</span>63ms<span class="o">)</span>   50.000Mib at 795.397Mib/s
	AVG: 183045094 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 872.448Mib/s
<span class="nt">---Testing</span> Done.---
</code></pre></div></div>
<p><strong>NOTE</strong>: the size of the file is rounded down to 50 megabytes. The real file size is the same as before.<br />
You can read the Repetition Tester source code <a href="https://github.com/ylam21/haversine/tree/main/src/reptester">here</a>.</p>

<h2 id="the-file-format---characters-per-line">The file format - characters per line</h2>
<p>What if the file format changes?
Until now, we have been testing with a ~50MiB file that is called <code class="language-plaintext highlighter-rouge">test_100.txt</code>,  the ‘100’ in its name specifies the number of character per line. Let’s introduce few more test files with the similar size, ~50MiB.<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">ls</span> <span class="nt">-la</span> test_files
total 204812
drwxrwxr-x 2 omaly omaly     4096 Apr 28 20:30 <span class="nb">.</span>
drwxrwxr-x 7 omaly omaly     4096 Apr 30 17:59 ..
<span class="nt">-rwxrwxr-x</span> 1 omaly omaly 52420000 Apr 28 20:38 test_10000.txt
<span class="nt">-rwxrwxr-x</span> 1 omaly omaly 52428000 Apr 28 20:38 test_1000.txt
<span class="nt">-rwxrwxr-x</span> 1 omaly omaly 52428800 Apr 28 20:38 test_100.txt
<span class="nt">-rwxrwxr-x</span> 1 omaly omaly 52428800 Apr 28 20:38 test_10.txt
</code></pre></div></div>
<p>Now we have also files with 10, 1000 and 10000 characters per line. We should expect a big performance drop for our loop, since the <code class="language-plaintext highlighter-rouge">get_next_line</code> function is going to have to process more lines in total per file. Like this:<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gcc <span class="nt">-Wall</span> <span class="nt">-Wextra</span> <span class="nt">-DBUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">-DENABLE_PROFILER</span><span class="o">=</span>1 reptester/reptest_main.c
<span class="nv">$ </span><span class="nb">sudo</span> ./a.out
</code></pre></div></div>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt">---New</span> Test Wave: get_next_line_loop test_files/test_10.txt <span class="nv">BUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">---</span>
MIN: 334340544 <span class="o">(</span>105ms<span class="o">)</span>   50.000Mib at 477.647Mib/s
MIN: 329488768 <span class="o">(</span>103ms<span class="o">)</span>   50.000Mib at 484.681Mib/s
MIN: 325036448 <span class="o">(</span>102ms<span class="o">)</span>   50.000Mib at 491.320Mib/s
MIN: 323887456 <span class="o">(</span>101ms<span class="o">)</span>   50.000Mib at 493.063Mib/s
MIN: 323538528 <span class="o">(</span>101ms<span class="o">)</span>   50.000Mib at 493.595Mib/s
	MIN: 323538528 <span class="o">(</span>101ms<span class="o">)</span>   50.000Mib at 493.595Mib/s
	MAX: 338943616 <span class="o">(</span>106ms<span class="o">)</span>   50.000Mib at 471.161Mib/s
	AVG: 328629913 <span class="o">(</span>103ms<span class="o">)</span>   50.000Mib at 485.947Mib/s

<span class="nt">---New</span> Test Wave: get_next_line_loop test_files/test_100.txt <span class="nv">BUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">---</span>
MIN: 193031424 <span class="o">(</span>60ms<span class="o">)</span>   50.000Mib at 827.310Mib/s
MIN: 188605664 <span class="o">(</span>59ms<span class="o">)</span>   50.000Mib at 846.724Mib/s
MIN: 187684640 <span class="o">(</span>59ms<span class="o">)</span>   50.000Mib at 850.879Mib/s
MIN: 182856704 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 873.344Mib/s
MIN: 180464896 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 884.919Mib/s
MIN: 179265504 <span class="o">(</span>56ms<span class="o">)</span>   50.000Mib at 890.840Mib/s
	MIN: 179265504 <span class="o">(</span>56ms<span class="o">)</span>   50.000Mib at 890.840Mib/s
	MAX: 194754304 <span class="o">(</span>61ms<span class="o">)</span>   50.000Mib at 819.991Mib/s
	AVG: 182217046 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 876.410Mib/s

<span class="nt">---New</span> Test Wave: get_next_line_loop test_files/test_1000.txt <span class="nv">BUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">---</span>
MIN: 180458304 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 884.952Mib/s
MIN: 175238176 <span class="o">(</span>55ms<span class="o">)</span>   50.000Mib at 911.313Mib/s
MIN: 174996896 <span class="o">(</span>55ms<span class="o">)</span>   50.000Mib at 912.570Mib/s
MIN: 174915616 <span class="o">(</span>55ms<span class="o">)</span>   50.000Mib at 912.994Mib/s
MIN: 170202016 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 938.278Mib/s
MIN: 169782688 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 940.596Mib/s
MIN: 169024128 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 944.817Mib/s
MIN: 168979520 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 945.066Mib/s
MIN: 167222688 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 954.995Mib/s
MIN: 167215104 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 955.039Mib/s
MIN: 167208320 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 955.077Mib/s
MIN: 167036960 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 956.057Mib/s
	MIN: 167036960 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 956.057Mib/s
	MAX: 180458304 <span class="o">(</span>57ms<span class="o">)</span>   50.000Mib at 884.952Mib/s
	AVG: 168942730 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 945.272Mib/s

<span class="nt">---New</span> Test Wave: get_next_line_loop test_files/test_10000.txt <span class="nv">BUFFER_SIZE</span><span class="o">=</span>4096 <span class="nt">---</span>
MIN: 173471520 <span class="o">(</span>54ms<span class="o">)</span>   50.000Mib at 920.594Mib/s	| 26214400 bytes / page fault <span class="o">(</span>total: 2<span class="o">)</span>
MIN: 168896896 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 945.529Mib/s
MIN: 168600224 <span class="o">(</span>53ms<span class="o">)</span>   50.000Mib at 947.192Mib/s
MIN: 167497984 <span class="o">(</span>52ms<span class="o">)</span>   50.000Mib at 953.426Mib/s
MIN: 162627808 <span class="o">(</span>51ms<span class="o">)</span>   50.000Mib at 981.978Mib/s
MIN: 160497920 <span class="o">(</span>50ms<span class="o">)</span>   50.000Mib at 995.009Mib/s
	MIN: 160497920 <span class="o">(</span>50ms<span class="o">)</span>   50.000Mib at 995.009Mib/s
	MAX: 173471520 <span class="o">(</span>54ms<span class="o">)</span>   50.000Mib at 920.594Mib/s
	AVG: 161519449 <span class="o">(</span>51ms<span class="o">)</span>   50.000Mib at 988.716Mib/s

<span class="nt">---Testing</span> Done.---
</code></pre></div></div>
<p>I wanted to introduce more file formats after introducing the repetition tester. These tests on more file formats is done so we can see how my <code class="language-plaintext highlighter-rouge">get_next_line</code> function performs overall.<br /></p>
<h2 id="the-ideal-buffer_size">The ideal BUFFER_SIZE</h2>
<p>I want to get back to our topic of increasing the size of BUFFER_SIZE. Let’s stick with the test file <code class="language-plaintext highlighter-rouge">test_100.txt</code>, since 100 character per line is somewhat more real-life scenario when processing a text file.<br />
I measured the <strong>data throughput</strong> of <code class="language-plaintext highlighter-rouge">get_next_line</code> for the <code class="language-plaintext highlighter-rouge">test_100.txt</code> file with different BUFFER_SIZE values: <strong><em>[10, 64,128, 256, 512, 1024, 4096, 16384, 65536, 262144, 1048576].</em></strong><br />
I will add some info about my CPU for better context:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>lscpu | <span class="nb">grep </span>cache
L1d cache:                               192 KiB <span class="o">(</span>6 instances<span class="o">)</span> // 32KiB per instance
L1i cache:                               192 KiB <span class="o">(</span>6 instances<span class="o">)</span> // 32KiB per instance
L2 cache:                                6 MiB <span class="o">(</span>6 instances<span class="o">)</span>   // 1MiB per instance
L3 cache:                                16 MiB <span class="o">(</span>1 instance<span class="o">)</span>   // 16Mib per instance
</code></pre></div></div>
<p>And here are the results from the repetition tester put into a chart:<br />
<img src="/assets/images/chart_50mib_file.png" alt="50mib file" /></p>

<p>I assume that since I am setting my BUFFER_SIZE up to ~4MiB all the data can live in my L3 cache and I do not get bigger performance drops.
Let’s try to increase the BUFFER_SIZE even more, but we will have to create a 1GiB file.<br /></p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">ls</span> <span class="nt">-lha</span> test_files/test_100.txt
<span class="nt">-rwxrwxr-x</span> 1 omaly omaly 1.0G Apr 30 20:01 test_files/test_100.txt
</code></pre></div></div>

<p>This is how it scales up to a BUFFER_SIZE with the value of 2^27.<br />
<img src="/assets/images/chart_1gib_file.png" alt="!gib file" /></p>

<h2 id="closing-thoughts">Closing Thoughts</h2>

<p>I was expecting bigger drop of the data throughput when we got over <strong>16Mib</strong>, since this is my size of L3 cache. However, it did not occur. At this moment, this is enough testing for me.<br />
After some reading done, the responsible piece of puzzle for this is <strong>probably</strong> (because I do not know) the hardware prefetcher. A hardware prefetcher is a data prefetching technique that is implemented as a hardware component in a processor. I prefer to skip this topic since I don’t understand how hardware prefetchers work.<br /></p>

<p>At the very beginning of my article I was trying to figure out what is the ideal BUFFER_SIZE for my <code class="language-plaintext highlighter-rouge">get_next_line</code> function. So, what is the ideal BUFFER_SIZE?<br />
Based on the data, the sweet spot sits between 4096 and 65536 bytes. At 4096 bytes, we completely escape the heavy performance penalty of the read system call. And by stopping at 65536 bytes, we ensure our data stays safely inside the L1 and L2 CPU caches.</p>

<h2 id="extra">Extra</h2>

<p>This is how my <code class="language-plaintext highlighter-rouge">get_next_line_from_fd</code> function is implemented:<br />
I left out the Arena allocator implementation.<br /></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">s_string</span> <span class="n">t_string</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">s_string</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">str</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">size</span><span class="p">;</span>
<span class="p">};</span>

<span class="k">typedef</span> <span class="k">struct</span> <span class="n">s_gnl_ctx</span> <span class="n">t_gnl_ctx</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">s_gnl_ctx</span>
<span class="p">{</span>
    <span class="n">t_arena</span> <span class="o">*</span><span class="n">arena</span><span class="p">;</span>
    <span class="kt">int</span> <span class="n">fd</span><span class="p">;</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">buf</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">buf_pos</span><span class="p">;</span>
    <span class="kt">size_t</span> <span class="n">buf_len</span><span class="p">;</span>
<span class="p">};</span>

<span class="n">t_string</span> <span class="nf">get_next_line_from_fd</span><span class="p">(</span><span class="n">t_gnl_ctx</span> <span class="o">*</span><span class="n">ctx</span><span class="p">);</span>
<span class="kt">ssize_t</span> <span class="nf">find_nl_idx_in_string</span><span class="p">(</span><span class="n">t_string</span> <span class="n">s</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="nf">string_concat_to_arena</span><span class="p">(</span><span class="n">t_arena</span> <span class="o">*</span><span class="n">arena</span><span class="p">,</span> <span class="n">t_string</span> <span class="n">s</span><span class="p">);</span>

<span class="n">t_string</span> <span class="nf">get_next_line_from_fd</span><span class="p">(</span><span class="n">t_gnl_ctx</span> <span class="o">*</span><span class="n">ctx</span><span class="p">)</span>
<span class="p">{</span>
    <span class="n">t_string</span> <span class="n">line</span> <span class="o">=</span> <span class="p">{.</span><span class="n">str</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="o">-&gt;</span><span class="n">base</span> <span class="o">+</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="o">-&gt;</span><span class="n">offset</span><span class="p">)};</span>
    <span class="kt">ssize_t</span> <span class="n">nl_idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">nl_idx</span> <span class="o">=</span> <span class="n">find_nl_idx_in_string</span><span class="p">((</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">+</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_pos</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span><span class="p">});</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">line</span><span class="p">.</span><span class="n">size</span> <span class="o">+=</span> <span class="n">string_concat_to_arena</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="p">,(</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">+</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_pos</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">});</span>
            <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span> <span class="o">-=</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
            <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_pos</span> <span class="o">+=</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
            <span class="k">return</span> <span class="n">line</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">line</span><span class="p">.</span><span class="n">size</span> <span class="o">+=</span> <span class="n">string_concat_to_arena</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="p">,(</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span> <span class="o">+</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_pos</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span><span class="p">});</span>
    <span class="p">}</span>
    <span class="cm">/*
    * We should not set 'ctx-&gt;buf_pos' or 'ctx-&gt;buf_len' value to zero.
    * When we skipped the '(ctx-&gt;buf_len &gt; 0)' 'if statement' we know that the buf_len value is zero.
    * Both of the values are going to be overwritten soon.
    */</span>
    <span class="kt">ssize_t</span> <span class="n">read_bytes</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">read_bytes</span> <span class="o">=</span> <span class="n">read</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">fd</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">,</span> <span class="n">BUFFER_SIZE</span><span class="p">);</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">read_bytes</span> <span class="o">&gt;</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="n">nl_idx</span> <span class="o">=</span> <span class="n">find_nl_idx_in_string</span><span class="p">((</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">read_bytes</span><span class="p">});</span>
            <span class="k">if</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">!=</span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
            <span class="p">{</span>
                <span class="n">line</span><span class="p">.</span><span class="n">size</span> <span class="o">+=</span> <span class="n">string_concat_to_arena</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="p">,(</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)(</span><span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)});</span>
                <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span> <span class="o">=</span> <span class="n">read_bytes</span> <span class="o">-</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
                <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_pos</span> <span class="o">=</span> <span class="p">(</span><span class="n">nl_idx</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
                <span class="k">return</span> <span class="n">line</span><span class="p">;</span>
            <span class="p">}</span>
            <span class="n">line</span><span class="p">.</span><span class="n">size</span> <span class="o">+=</span> <span class="n">string_concat_to_arena</span><span class="p">(</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">arena</span><span class="p">,(</span><span class="n">t_string</span><span class="p">){.</span><span class="n">str</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf</span><span class="p">,</span> <span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">read_bytes</span><span class="p">});</span>
        <span class="p">}</span>
        <span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">read_bytes</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="c1">// We reached the end of file</span>
            <span class="c1">// Here we need to set 'ctx-&gt;buf_len' to zero</span>
            <span class="c1">// so for the next call to this function we return an string with size of zero.</span>
            <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">buf_len</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
            <span class="k">return</span> <span class="n">line</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="k">else</span>
        <span class="p">{</span>
            <span class="c1">// Error: read returned '-1'</span>
            <span class="k">return</span> <span class="p">(</span><span class="n">t_string</span><span class="p">){</span><span class="mi">0</span><span class="p">};</span>
        <span class="p">}</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="kt">ssize_t</span> <span class="nf">find_nl_idx_in_string</span><span class="p">(</span><span class="n">t_string</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
    <span class="k">while</span> <span class="p">(</span><span class="n">i</span> <span class="o">&lt;</span> <span class="n">s</span><span class="p">.</span><span class="n">size</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="k">if</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'\n'</span><span class="p">)</span>
        <span class="p">{</span>
            <span class="k">return</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="n">i</span><span class="p">;</span>
        <span class="p">}</span>
        <span class="n">i</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="p">(</span><span class="kt">ssize_t</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>

<span class="kt">size_t</span> <span class="nf">string_concat_to_arena</span><span class="p">(</span><span class="n">t_arena</span> <span class="o">*</span><span class="n">arena</span><span class="p">,</span> <span class="n">t_string</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
    <span class="kt">char</span> <span class="o">*</span><span class="n">str</span> <span class="o">=</span> <span class="n">ft_arena_push_packed</span><span class="p">(</span><span class="n">arena</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
    <span class="k">if</span> <span class="p">(</span><span class="n">str</span><span class="p">)</span>
    <span class="p">{</span>
        <span class="n">memcpy</span><span class="p">(</span><span class="n">str</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">,</span> <span class="n">s</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
        <span class="k">return</span> <span class="n">s</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
    <span class="p">}</span>
    <span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>]]></content><author><name></name></author><summary type="html"><![CDATA[Measuring data throughput of get_next_line NOTE: This post is primarily for 42 students, since they have probably written their implementation of get_next_line already. The important fact is that for 42 projects you are not allowed to find out the size of your file with any external function (such as ‘stat’ on Linux). To read and write to a file you are allowed to use read and write functions provided by the unistd.h. The get_next_line function should return the next line from a file each time it is called. Reading from a file with unknown size]]></summary></entry></feed>