<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Untitled Publication]]></title><description><![CDATA[Untitled Publication]]></description><link>https://ales.blaznik.si</link><generator>RSS for Node</generator><lastBuildDate>Sat, 16 May 2026 11:20:45 GMT</lastBuildDate><atom:link href="https://ales.blaznik.si/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[When Fixtures Aren't Enough: Testing with Large Datasets in the Database]]></title><description><![CDATA[We all know the value of test data. It helps us identify bugs, ensure functionality, and build confidence in our systems. But what happens when your trusty test data libraries start to slow down under the weight of millions of records? This is where ...]]></description><link>https://ales.blaznik.si/when-fixtures-arent-enough-testing-with-large-datasets-in-the-database</link><guid isPermaLink="true">https://ales.blaznik.si/when-fixtures-arent-enough-testing-with-large-datasets-in-the-database</guid><category><![CDATA[Testing]]></category><category><![CDATA[Fixtures]]></category><category><![CDATA[Databases]]></category><dc:creator><![CDATA[Aleš Blaznik]]></dc:creator><pubDate>Sun, 10 Mar 2024 16:30:13 GMT</pubDate><content:encoded><![CDATA[<p>We all know the value of test data. It helps us identify bugs, ensure functionality, and build confidence in our systems. But what happens when your trusty test data libraries start to slow down under the weight of millions of records? This is where exploring data locality and in-database test data generation comes in.</p>
<h3 id="heading-fixture-fatigue-the-slowdown-of-traditional-approaches"><strong>Fixture Fatigue: The Slowdown of Traditional Approaches</strong></h3>
<p>Many developers rely on libraries like fixtures to create test data. These libraries work well for smaller datasets, but as your data volume grows, so does the time it takes to generate and manage that data. This can significantly slow down your test suite, hindering development velocity.</p>
<h3 id="heading-the-power-of-in-database-test-data-generation"><strong>The Power of In-Database Test Data Generation</strong></h3>
<p>Here's where leveraging the power of your database itself becomes advantageous. By generating test data directly within the database, we can take advantage of data locality – the physical proximity of related data – to speed things up. This approach bypasses the overhead of external libraries and allows the database to optimize data access for testing purposes.</p>
<h3 id="heading-a-real-world-example-testing-with-millions-of-audit-logs"><strong>A Real-World Example: Testing with Millions of Audit Logs</strong></h3>
<p>Let's say you're building a system that generates audit logs. You want to understand how the system behaves when it needs to process millions of these entries. Manually (or using fixtures library) creating such a massive dataset is impractical. Here's a quick and effective approach:</p>
<ol>
<li><p><strong>Run the Test Suite:</strong> Start by running your test suite with your existing test data.</p>
</li>
<li><p><strong>Duplicate the Data:</strong> Within the database, duplicate the existing audit log data (generated by the initial test run) to create a larger dataset representing millions of records.</p>
</li>
<li><p><strong>Run the Test Suite Again:</strong> Rerun your test suite with the newly duplicated data.</p>
</li>
<li><p><strong>Evaluate Performance Impact:</strong> Analyze the test execution times. Did the test suite slow down significantly with the larger dataset?</p>
</li>
</ol>
<p>This simple process helps identify potential bottlenecks. It can reveal issues like missing indexes or inefficient data loading practices.</p>
<pre><code class="lang-sql"><span class="hljs-keyword">WITH</span> <span class="hljs-keyword">RECURSIVE</span> num_of_iterations <span class="hljs-keyword">AS</span> (
    <span class="hljs-keyword">SELECT</span> <span class="hljs-number">1</span> <span class="hljs-keyword">AS</span> n
    <span class="hljs-keyword">UNION</span> <span class="hljs-keyword">ALL</span>
    <span class="hljs-keyword">SELECT</span> n + <span class="hljs-number">1</span> <span class="hljs-keyword">FROM</span> num_of_iterations <span class="hljs-keyword">WHERE</span> n &lt; <span class="hljs-number">1000000</span>
)
<span class="hljs-keyword">INSERT</span> <span class="hljs-keyword">INTO</span> audit_logs
<span class="hljs-keyword">SELECT</span> gen_random_uuid() <span class="hljs-keyword">as</span> <span class="hljs-keyword">id</span>, user_id, <span class="hljs-keyword">type</span>, <span class="hljs-keyword">data</span>, metadata, created_at <span class="hljs-keyword">FROM</span> num_of_iterations
<span class="hljs-keyword">LEFT</span> <span class="hljs-keyword">JOIN</span> (<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> audit_logs) <span class="hljs-keyword">AS</span> n <span class="hljs-keyword">ON</span> <span class="hljs-literal">true</span>;
</code></pre>
<p><em>SQL code snippet utilizes a</em> <code>WITH RECURSIVE</code> <em>to generate a sequence of numbers from 1 to 1,000,000. It then uses these numbers to loop through each row in the existing</em> <code>audit_logs</code> <em>table. For each loop, a new record is inserted with data from the original table, effectively creating a million-record dataset that mimics the structure and content of the real data.</em></p>
<h3 id="heading-benefits-of-in-database-test-data-generation"><strong>Benefits of In-Database Test Data Generation</strong></h3>
<ul>
<li><p><strong>Speed:</strong> By leveraging the database's processing power and data locality, you can significantly reduce test execution time.</p>
</li>
<li><p><strong>Efficiency:</strong> No need to manage external libraries or large test data files.</p>
</li>
<li><p><strong>Quick Insights:</strong> Provides a fast way to identify potential performance issues before they become critical.</p>
</li>
</ul>
<p><strong>Remember:</strong> While duplicating data isn't a perfect representation of real-world scenarios, it's a valuable tool for quick performance checks and identifying areas for improvement.</p>
<p>By exploring in-database test data generation, you can overcome the limitations of traditional approaches and ensure your system performs optimally when dealing with large datasets.</p>
]]></content:encoded></item></channel></rss>