Jekyll2022-12-22T16:11:50+00:00https://fbiville.github.io/feed.xmlđž Florent + The MachineInsatiable LearnerFlorent BivilleNode.js Streams For Fun And Profit2020-04-16T00:00:00+00:002020-04-16T00:00:00+00:00https://fbiville.github.io/2020/04/16/Node_Streams_For_Fun_And_Profit<p>I joined the <a href="https://projectriff.io">riff</a> team at Pivotal a year and a half ago.
I have been working for more than a year on <a href="https://projectriff.io">riff</a> invokers.</p>
<p>This probably deserves a blog post on its own, but invokers, in short, have the responsibility of invoking user-defined functions
and exposing a way to send inputs and receive outputs.
The <a href="https://github.com/projectriff/invoker-specification/">riff invocation protocol</a> formally defines the scope of such invokers.</p>
<p>Part of my job has been to update the existing invokers (especially the <a href="https://github.com/projectriff/node-function-invoker">Node.js one</a>) so that they comply with this spec.
As the invocation protocol is a <a href="https://github.com/projectriff/invoker-specification/blob/a41d885fb411dc00e7ea3f7724ede4c435121a62/riff-rpc.proto#L13">streaming-first protocol</a>,
I had to really brush up my knowledge about Node.js streams (narratorâs voice: well, learn from zero).</p>
<p>I learnt a lot by trial and error, probably more than I care to admit.
This blog post serves as an introduction to Node.js streams.
Hopefully, it also outlines some good practices, and some annoying pitfalls to avoid.</p>
<h2 id="thanks-dear-proofreaders">Thanks, Dear (Proof)Readers</h2>
<p>I would like to thank:</p>
<ul>
<li><a href="https://twitter.com/old_sound">Alvaro Videla</a></li>
<li><a href="https://twitter.com/nicokosi">Nicolas Kosinski</a></li>
<li><a href="https://twitter.com/poledesfetes">Vladimir de Turckheim</a></li>
</ul>
<p>for the various suggestions to make this better. Thanks â€ïž</p>
<h2 id="harder-better-mapper-zipper">Harder, Better, Mapper, Zipper</h2>
<p>Letâs create a tiny Node.js library that works with streams and provide
familiar functional operators such as <code class="language-plaintext highlighter-rouge">map</code> and <code class="language-plaintext highlighter-rouge">zip</code>.</p>
<p>First, what is a stream?</p>
<p>Loosely defined, a stream conveys (possibly indefinitely) chunks of data, to which specific operations can be applied.</p>
<p>How does that translate to Node.js exactly?</p>
<h2 id="streams-in-nodejs">Streams in Node.js</h2>
<p>Node.js streams come in two flavors: <a href="https://nodejs.org/api/stream.html#stream_readable_streams"><code class="language-plaintext highlighter-rouge">Readable</code></a> and <a href="https://nodejs.org/api/stream.html#stream_writable_streams"><code class="language-plaintext highlighter-rouge">Writable</code></a>.</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">Readable</code> streams can be read from</li>
<li><code class="language-plaintext highlighter-rouge">Writable</code> streams can be written to</li>
</ul>
<p><a href="https://nodejs.org/api/stream.html#stream_readable_pipe_destination_options"><code class="language-plaintext highlighter-rouge">Readable#pipe</code></a> allows to create a pipeline, where the inputs come from the <code class="language-plaintext highlighter-rouge">Readable</code> stream and are written
to the destination <code class="language-plaintext highlighter-rouge">Writable</code> stream.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span>
<span class="nx">myReadableStream</span><span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myWritableStream</span><span class="p">);</span>
</code></pre></div></div>
<p>What happens here is that the source <code class="language-plaintext highlighter-rouge">Readable</code> stream goes from a paused state to a <a href="https://nodejs.org/api/stream.html#stream_three_states">flowing state</a> after <code class="language-plaintext highlighter-rouge">pipe</code> is called.</p>
<blockquote>
<p>You can manually manage such state transitions with functions like <a href="https://nodejs.org/api/stream.html#stream_readable_pause"><code class="language-plaintext highlighter-rouge">Readable#pause</code></a>
or <a href="https://nodejs.org/api/stream.html#stream_readable_resume"><code class="language-plaintext highlighter-rouge">Readable#resume</code></a> but we are only going to rely on automatic flowing mode from now on.</p>
</blockquote>
<p>A Node.js stream can also encapsulate a <code class="language-plaintext highlighter-rouge">Readable</code> side <strong>and</strong> a <code class="language-plaintext highlighter-rouge">Writable</code> side, such streams are called <a href="https://nodejs.org/api/stream.html#stream_class_stream_duplex"><code class="language-plaintext highlighter-rouge">Duplex</code></a> streams.
If outputs of the duplex stream depend on inputs, then a <a href="https://nodejs.org/api/stream.html#stream_class_stream_transform"><code class="language-plaintext highlighter-rouge">Transform</code></a> stream is the way to go (it is a specialization of the <code class="language-plaintext highlighter-rouge">Duplex</code> type).</p>
<blockquote>
<p>Outputs are <em>read</em>, hence they come from the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">Duplex</code> stream.</p>
<p>Inputs are <em>written</em>, hence they go to the <code class="language-plaintext highlighter-rouge">Writable</code> side of the <code class="language-plaintext highlighter-rouge">Duplex</code> stream.</p>
<p><code class="language-plaintext highlighter-rouge">Transform</code> streams automatically expose chunks from the <code class="language-plaintext highlighter-rouge">Writable</code> side to a user-defined transformation function.
The function results are automatically forwarded to the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">Transform</code> stream.</p>
<p>Note: unfortunately, <code class="language-plaintext highlighter-rouge">Duplex</code> streams do not differentiate <code class="language-plaintext highlighter-rouge">Readable</code> errors from <code class="language-plaintext highlighter-rouge">Writable</code> ones.</p>
</blockquote>
<p><img src="/assets/img/node_streams.svg" alt="Node.js stream family" title="Node.js stream family diagram" /></p>
<p>These compound streams are interesting for any kind of pipeline beyond basic ones.
They encode intermediate transformations before chunks reach the final destination <code class="language-plaintext highlighter-rouge">Writable</code> stream.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Transform</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream1</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream2</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream3</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span>
<span class="nx">myReadableStream</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream1</span><span class="p">)</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream2</span><span class="p">)</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myTransformStream3</span><span class="p">)</span>
<span class="p">.</span><span class="nx">pipe</span><span class="p">(</span><span class="nx">myWritableStream</span><span class="p">);</span>
</code></pre></div></div>
<p>The above âfluentâ example works because <code class="language-plaintext highlighter-rouge">Readable#pipe</code> returns the reference to the destination stream.
<code class="language-plaintext highlighter-rouge">Transform</code> (or more generally, <code class="language-plaintext highlighter-rouge">Duplex</code>) streams have two sides, so they can be piped to (<code class="language-plaintext highlighter-rouge">Writable</code> side) and then from (<code class="language-plaintext highlighter-rouge">Readable</code> side) via a new <code class="language-plaintext highlighter-rouge">pipe</code> call.</p>
<p>However, this is not necessarily the best way to define a <strong>linear</strong> pipeline though.
One important limitation is that <code class="language-plaintext highlighter-rouge">pipe</code> does not offer any particular assistance when it comes to error handling.</p>
<blockquote>
<p>Emphasis on linear here. Streams can be piped from and to several times, so you can end up with graph-shaped pipelines.</p>
</blockquote>
<p>A more robust alternative in case of linear pipelines is to use the built-in <code class="language-plaintext highlighter-rouge">pipeline</code> function.
It must be called with:</p>
<ul>
<li>1 <code class="language-plaintext highlighter-rouge">Readable</code> stream (a.k.a. the source)</li>
<li>0..n <code class="language-plaintext highlighter-rouge">Duplex</code> stream (a.k.a. intermediates)</li>
<li>1 <code class="language-plaintext highlighter-rouge">Writable</code> stream (a.k.a. the destination)</li>
</ul>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span><span class="p">,</span> <span class="nx">Transform</span><span class="p">,</span> <span class="nx">Writable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">myReadableStream</span> <span class="cm">/* = instantiate Readable stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream1</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream2</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myTransformStream3</span> <span class="cm">/* = instantiate Transform stream */</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">myWritableStream</span> <span class="cm">/* = instantiate Writable stream */</span><span class="p">;</span>
<span class="nx">pipeline</span><span class="p">(</span>
<span class="nx">myReadableStream</span><span class="p">,</span>
<span class="nx">myTransformStream1</span><span class="p">,</span>
<span class="nx">myTransformStream2</span><span class="p">,</span>
<span class="nx">myTransformStream3</span><span class="p">,</span>
<span class="nx">myWritableStream</span><span class="p">,</span>
<span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span> <span class="cm">/* ... */</span> <span class="p">}</span>
<span class="p">);</span>
</code></pre></div></div>
<p>You can also provide a callback that will be invoked when the pipeline completes, abnormally (i.e. when an error occurs) or not.</p>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">pipeline</code> invokes the completion callback even if any of the streamsâ setting <code class="language-plaintext highlighter-rouge">autoDestroy</code> is set to <code class="language-plaintext highlighter-rouge">false</code>.</p>
</blockquote>
<blockquote>
<p><code class="language-plaintext highlighter-rouge">pipeline</code> actually supports more than streams but thatâs out of scope for this article.
Feel free to check <a href="https://nodejs.org/api/stream.html#stream_stream_pipeline_source_transforms_destination_callback">the documentation</a> to learn about other usages.</p>
</blockquote>
<p>Now that the general pipeline model is understood, letâs dive into the details of how <code class="language-plaintext highlighter-rouge">map</code> works, learning how custom streams are implemented in the process.</p>
<h2 id="you-cant-map-this">You Canât <code class="language-plaintext highlighter-rouge">map</code> This</h2>
<p>Credit where credit is due, I am going to reuse the awesome diagrams of <a href="https://projectreactor.io/">project Reactor</a>.</p>
<p><img src="/assets/img/mapForFlux.svg" alt="`map` diagram" title="`map` diagram" /></p>
<p>The top of the diagram depicts chunks as they initially come to the stream, as well as
the stream completion signal (marked by the bold vertical line at the end of the sequence).</p>
<p>The <code class="language-plaintext highlighter-rouge">map</code> operation here is in the middle, applying a transformation from circles to squares.</p>
<p>The bottom part of the diagram shows the resulting chunks and how the completion signal is propagated as-is.</p>
<p>In other terms, <code class="language-plaintext highlighter-rouge">map</code> applies a transformation function to each element of the stream, in the order they arrive.</p>
<p>Letâs start with a <a href="https://jasmine.github.io/">Jasmine</a> test:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">map operator =></span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">applies transformations to chunks</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">source</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span>
<span class="kd">const</span> <span class="nx">transformation</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MapTransform</span><span class="p">((</span><span class="nx">number</span><span class="p">)</span> <span class="o">=></span> <span class="nx">number</span> <span class="o">**</span> <span class="mi">2</span><span class="p">);</span> <span class="c1">// (2)</span>
<span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (3)</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span>
<span class="c1">// ??? (4)</span>
<span class="nx">pipeline</span><span class="p">(</span>
<span class="nx">source</span><span class="p">,</span>
<span class="nx">transformation</span><span class="p">,</span>
<span class="nx">destination</span><span class="p">,</span>
<span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (5)</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">]);</span>
<span class="nx">done</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">);</span>
<span class="p">});</span>
<span class="p">})</span>
</code></pre></div></div>
<p>A few things of note:</p>
<ol>
<li>You can create a <code class="language-plaintext highlighter-rouge">Readable</code> from an iterable source such as an array, or a <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*">generator function</a>. Here, the stream will emit each array element in succession.
The <a href="https://nodejs.org/api/stream.html#stream_object_mode"><code class="language-plaintext highlighter-rouge">objectMode</code></a> option configures the stream to receive any kind of chunk.
The default chunk data type is textual or binary (i.e. strings, <code class="language-plaintext highlighter-rouge">Buffer</code> or <code class="language-plaintext highlighter-rouge">Uint8Array</code>).
Quite surprisingly, the default mode when specifically using <code class="language-plaintext highlighter-rouge">Readable#from</code> is the object mode, contrary to stream constructors. However redundant, the object mode is set here just for consistencyâs sake.</li>
<li><code class="language-plaintext highlighter-rouge">MapTransform</code> does not exist yet, we will have to figure out its implementation next but we can assume its constructor accepts a transformation function (here: the square function).
We could pass the <code class="language-plaintext highlighter-rouge">objectMode</code> setting, but letâs assume it always operates this way.</li>
<li><a href="https://nodejs.org/api/stream.html#stream_class_stream_passthrough"><code class="language-plaintext highlighter-rouge">PassThrough</code></a> is a special implementation of <code class="language-plaintext highlighter-rouge">Transform</code> stream which directly forwards inputs as outputs (it applies the identity function in other words).</li>
<li>we need to somehow accumulate the observed outputs to <code class="language-plaintext highlighter-rouge">result</code>, more on that soon</li>
<li>we leverage the completion callback of <code class="language-plaintext highlighter-rouge">pipeline</code> to verify a few things:
<ol>
<li>the pipeline completes successfully</li>
<li>the observed results are consistent with the transformation we intend to apply on the initial chunks</li>
<li><code class="language-plaintext highlighter-rouge">done</code> is a Jasmine utility to notify the test runner of the (asynchronous) test completion</li>
</ol>
</li>
</ol>
<p>For people familiar with the given-when-then test structure, this test may look a bit strange.
Indeed, the order is changed here to given-then-when. This has to do with the asynchronous nature of streams.
We have to set up the expectations (the âthenâ block) before data starts flowing in, i.e. before <code class="language-plaintext highlighter-rouge">pipeline</code> is called.</p>
<p>How can we be sure the test completes? After all, streams can be infinite.
In that case, <code class="language-plaintext highlighter-rouge">Readable#from</code> reads a finite array and will send a completion signal once the array is fully consumed.
This completion signal will be forwarded to all the other (downstream) streams, we can therefore be confident the <code class="language-plaintext highlighter-rouge">pipeline</code> completion callback is going to be called.
In the worst case, the test will hang for a while until the Jasmine timeout is reached, causing a test failure.</p>
<p>We now need to figure out how to complete the test.</p>
<p>Node.js streams extend <a href="https://nodejs.org/api/events.html#events_events"><code class="language-plaintext highlighter-rouge">EventEmitter</code></a>.
They emit specific events that can be listened to via functions such as <code class="language-plaintext highlighter-rouge">EventEmitter#on(eventType, callback)</code>.
Event listeners are <strong>synchronously</strong> executed in the order they are added (you can tweak the order via alternative functions such as <code class="language-plaintext highlighter-rouge">EventEmitter#prependListener(eventType, callback)</code>).</p>
<p>Our test needs to observe chunks written to the destination stream.
Technically, the destination could just be a <code class="language-plaintext highlighter-rouge">Writable</code> stream as this is the only requirement of <code class="language-plaintext highlighter-rouge">pipe</code> and <code class="language-plaintext highlighter-rouge">pipeline</code>.
However, we need to read the chunks that have been written to, so using a <code class="language-plaintext highlighter-rouge">Transform</code> stream such as <code class="language-plaintext highlighter-rouge">PassThrough</code> definitely helps as it exposes a <code class="language-plaintext highlighter-rouge">Readable</code> side.</p>
<p>In particular, <code class="language-plaintext highlighter-rouge">Readable</code> streams emit a <a href="https://nodejs.org/api/stream.html#stream_event_data"><code class="language-plaintext highlighter-rouge">data</code> event</a> with the associated chunk of data. That is exactly what we need to
accumulate the results!</p>
<p>Our test now becomes:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">map operator =></span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">applies transformations to chunks</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">source</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="kd">const</span> <span class="nx">transformation</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">MapTransform</span><span class="p">((</span><span class="nx">number</span><span class="p">)</span> <span class="o">=></span> <span class="nx">number</span> <span class="o">**</span> <span class="mi">2</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span>
<span class="nx">destination</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">pipeline</span><span class="p">(</span>
<span class="nx">source</span><span class="p">,</span>
<span class="nx">transformation</span><span class="p">,</span>
<span class="nx">destination</span><span class="p">,</span>
<span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">9</span><span class="p">]);</span>
<span class="nx">done</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">);</span>
<span class="p">});</span>
<span class="p">})</span>
</code></pre></div></div>
<p>The test seems ready. If I execute it, I get:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>Failures:
1<span class="o">)</span> map operator <span class="o">=></span> applies transformations to chunks
Message:
ReferenceError: MapTransform is not defined
</code></pre></div></div>
<p>Just to make sure the pipeline is properly set up, letâs temporarily replace <code class="language-plaintext highlighter-rouge">MapTransform</code> with <code class="language-plaintext highlighter-rouge">PassThrough</code> in object mode.
In that case, the test should fail because <code class="language-plaintext highlighter-rouge">result</code> will be equal to <code class="language-plaintext highlighter-rouge">[1, 2, 3]</code> and not <code class="language-plaintext highlighter-rouge">[1, 4, 9]</code>.
Letâs see:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>1<span class="o">)</span> map operator <span class="o">=></span> applies transformations to chunks
Message:
Expected <span class="nv">$[</span>1] <span class="o">=</span> 2 to equal 4.
Expected <span class="nv">$[</span>2] <span class="o">=</span> 3 to equal 9.
</code></pre></div></div>
<p>The test fails as expected, letâs focus on the implementation now.</p>
<p><code class="language-plaintext highlighter-rouge">map</code> is an intermediate transformation, directly correlating outputs to inputs.
Hence, <code class="language-plaintext highlighter-rouge">Transform</code> is the ideal choice.</p>
<p>Letâs subclass <code class="language-plaintext highlighter-rouge">Transform</code>, then:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Transform</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">MapTransform</span> <span class="kd">extends</span> <span class="nx">Transform</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">mapFunction</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span> <span class="o">=</span> <span class="nx">mapFunction</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// ???</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">Transform</code> streams need to implement the <a href="https://nodejs.org/api/stream.html#stream_transform_transform_chunk_encoding_callback"><code class="language-plaintext highlighter-rouge">_transform</code> method</a>.
The first parameter is the chunk of data coming to the <code class="language-plaintext highlighter-rouge">Writable</code> side, the second is the encoding (which is irrelevant in object mode) and the third one is a callback
that must be called <strong>exactly once</strong> to notify either an error or null (first argument) or pass on the result to the <code class="language-plaintext highlighter-rouge">Readable</code> side (second argument).</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Transform</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">MapTransform</span> <span class="kd">extends</span> <span class="nx">Transform</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">mapFunction</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span> <span class="o">=</span> <span class="nx">mapFunction</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">_transform</span><span class="p">(</span><span class="nx">chunk</span><span class="p">,</span> <span class="nx">encoding</span><span class="p">,</span> <span class="nx">callback</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">callback</span><span class="p">(</span><span class="kc">null</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">mapFunction</span><span class="p">(</span><span class="nx">chunk</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Letâs see if the test passes now:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test</span>
<span class="o">></span> jasmine
Randomized with seed 30817
Started
<span class="nb">.</span>
1 spec, 0 failures
Finished <span class="k">in </span>0.014 seconds
</code></pre></div></div>
<p>đŸ It does!</p>
<p>We could improve a few things, such as accepting asynchronous functions and handling throwing functions.
This is left as an exercise to the readers đ (hint: <code class="language-plaintext highlighter-rouge">Promise.resolve</code> bridges synchronous and asynchronous functions)</p>
<h2 id="zip-it">Zip it!</h2>
<p><code class="language-plaintext highlighter-rouge">zip</code> is slightly more complex than <code class="language-plaintext highlighter-rouge">map</code> as it operates on (at least) two streams.
Letâs see it in action (thanks again to <a href="https://projectreactor.io/">project Reactor</a> for the diagrams):</p>
<p><img src="/assets/img/zip.svg" alt="`zip` diagram" title="`zip` diagram" /></p>
<p><code class="language-plaintext highlighter-rouge">zip</code> pairs up chunks by order of arrival.
Once the pair is formed, a transformation function is applied to it.
<code class="language-plaintext highlighter-rouge">zip</code> completes when the last stream completes.</p>
<p>For simplicityâs sake, our <code class="language-plaintext highlighter-rouge">zip</code> implementation will only pair elements together but not apply any transformation.</p>
<p>Time to express our intent with a test:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">PassThrough</span><span class="p">,</span> <span class="nx">pipeline</span><span class="p">,</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="nx">describe</span><span class="p">(</span><span class="dl">"</span><span class="s2">zip operator =></span><span class="dl">"</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">it</span><span class="p">(</span><span class="dl">"</span><span class="s2">pairs chunks from upstream streams</span><span class="dl">"</span><span class="p">,</span> <span class="p">(</span><span class="nx">done</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">upstream1</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span>
<span class="kd">const</span> <span class="nx">upstream2</span> <span class="o">=</span> <span class="nx">Readable</span><span class="p">.</span><span class="k">from</span><span class="p">([</span><span class="dl">"</span><span class="s2">Un</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Deux</span><span class="dl">"</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Trois</span><span class="dl">"</span><span class="p">],</span> <span class="p">{</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (1)</span>
<span class="kd">const</span> <span class="nx">zipSource</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">ZipReadable</span><span class="p">(</span><span class="nx">upstream1</span><span class="p">,</span> <span class="nx">upstream2</span><span class="p">);</span> <span class="c1">// (2)</span>
<span class="kd">const</span> <span class="nx">destination</span> <span class="o">=</span> <span class="k">new</span> <span class="nx">PassThrough</span><span class="p">({</span> <span class="na">objectMode</span><span class="p">:</span> <span class="kc">true</span> <span class="p">});</span> <span class="c1">// (3)</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span> <span class="c1">// (4)</span>
<span class="nx">destination</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (4)</span>
<span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span>
<span class="p">});</span>
<span class="nx">pipeline</span><span class="p">(</span>
<span class="nx">zipSource</span><span class="p">,</span>
<span class="nx">destination</span><span class="p">,</span>
<span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (5)</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">err</span><span class="p">).</span><span class="nx">toBeFalsy</span><span class="p">(</span><span class="dl">'</span><span class="s1">pipeline should successfully complete</span><span class="dl">'</span><span class="p">);</span>
<span class="nx">expect</span><span class="p">(</span><span class="nx">result</span><span class="p">).</span><span class="nx">toEqual</span><span class="p">([</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Un</span><span class="dl">"</span><span class="p">],</span>
<span class="p">[</span><span class="mi">2</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Deux</span><span class="dl">"</span><span class="p">],</span>
<span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="dl">"</span><span class="s2">Trois</span><span class="dl">"</span><span class="p">]</span>
<span class="p">]);</span>
<span class="nx">done</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">);</span>
<span class="p">})</span>
<span class="p">})</span>
</code></pre></div></div>
<p>This is very similar to the previous <code class="language-plaintext highlighter-rouge">map</code> test:</p>
<ol>
<li>we need two streams to read from, hence the creation of two <code class="language-plaintext highlighter-rouge">Readable</code> streams from different arrays.
Note we could (and should for a production implementation) spice up the test a bit by introducing latency, thus making sure we properly wait for chunks to be paired in order.
This could be done with <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Statements/function*">generator functions</a>
and <a href="https://developer.mozilla.org/en-US/docs/Web/API/WindowOrWorkerGlobalScope/setTimeout"><code class="language-plaintext highlighter-rouge">setTimeout</code></a>.</li>
<li>the next step will be to figure out how to implement <code class="language-plaintext highlighter-rouge">ZipReadable</code>. We can safely assume it accepts two <code class="language-plaintext highlighter-rouge">Readable</code> streams to read chunks from.</li>
<li>same as before, we rely on <code class="language-plaintext highlighter-rouge">PassThrough</code> to receive the resulting chunks. We will use its <code class="language-plaintext highlighter-rouge">Readable</code> side to observe and accumulate the results.</li>
<li>we accumulate the observed resulting chunks in <code class="language-plaintext highlighter-rouge">result</code>, based on the <a href="https://nodejs.org/api/stream.html#stream_event_data"><code class="language-plaintext highlighter-rouge">data</code> event</a> emitted by the <code class="language-plaintext highlighter-rouge">Readable</code> side of the <code class="language-plaintext highlighter-rouge">PassThrough</code> stream</li>
<li>finally, we rely on the completion callback to make sure, as before, that the pipeline successfully completes, the resulting chunks are as we expect and notify Jasmine of the test completion</li>
</ol>
<p>Letâs run the test:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>Failures:
1<span class="o">)</span> zip operator <span class="o">=></span> pairs chunks from upstream streams
Message:
ReferenceError: ZipReadable is not defined
</code></pre></div></div>
<p>Letâs create an implementation that works with two streams for now.
First, what kind of stream our <code class="language-plaintext highlighter-rouge">ZipReadable</code> should be? Letâs go with <code class="language-plaintext highlighter-rouge">Readable</code>, as <code class="language-plaintext highlighter-rouge">ZipReadable</code> acts as a source
built upon two upstream streams.</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// ??? (2)</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// ??? (1)</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="c1">// ??? (1)</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<ol>
<li>we need to get data from both the upstream streams. We chose here not to call <code class="language-plaintext highlighter-rouge">_startReading</code> in the constructor.
The goal is to start reading only when a first consumer wants to read data.</li>
<li>we somehow need to emit data whenever <code class="language-plaintext highlighter-rouge">ZipReadable</code> is read from</li>
</ol>
<p>Letâs first worry about buffering the incoming data:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// ???</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Nothing too fancy here, chunks are pushed to the corresponding array.
Custom <code class="language-plaintext highlighter-rouge">Readable</code> need to implement <a href="https://nodejs.org/api/stream.html#stream_readable_read_size_1"><code class="language-plaintext highlighter-rouge">Readable#_read</code></a>.
Results are pushed to consumers via <a href="https://nodejs.org/api/stream.html#stream_readable_push_chunk_encoding"><code class="language-plaintext highlighter-rouge">Readable#push</code></a>.</p>
<p>Letâs have a crack at it:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span>
<span class="p">}</span>
<span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span> <span class="c1">// (1)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span> <span class="c1">// (2)</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="c1">// (3)</span>
<span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span> <span class="c1">// (3)</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span> <span class="c1">// (4)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span> <span class="c1">// (5)</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<ol>
<li>upon the first call to <code class="language-plaintext highlighter-rouge">Readable#_read</code> (when <code class="language-plaintext highlighter-rouge">pipeline</code> is called in the test), we start reading data from the upstream sources.
As we do not want to subscribe to the <code class="language-plaintext highlighter-rouge">'data'</code> event multiple times, we guard this initialization with the <code class="language-plaintext highlighter-rouge">this.initialized</code> flag.</li>
<li><code class="language-plaintext highlighter-rouge">size</code> is advisory, so we could just ignore it but it does not cost much to include in the bound computation. More on that towards the end of this article.</li>
<li><code class="language-plaintext highlighter-rouge">splice</code> is used here to remove and return the <code class="language-plaintext highlighter-rouge">bound</code> first elements of each array as well as shift the remaining ones. That way, we do not keep consumed chunks around.</li>
<li>the core logic of <code class="language-plaintext highlighter-rouge">zip</code> is here, we create a pair (an array) of chunks accumulated from two streams</li>
<li>finally, we publish that pair</li>
</ol>
<p>Letâs see if our test is happy:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Failures:
1<span class="o">)</span> zip operator <span class="o">=></span> pairs chunks from upstream streams
Message:
Error: Timeout - Async <span class="k">function </span>did not <span class="nb">complete </span>within 5000ms <span class="o">(</span><span class="nb">set </span>by jasmine.DEFAULT_TIMEOUT_INTERVAL<span class="o">)</span>
</code></pre></div></div>
<p>Oh no! The test fails.
Looking at the above implementation, this actually makes sense.
When <code class="language-plaintext highlighter-rouge">_read</code> is called the first time, there is no guarantee at all that data has been buffered yet from the upstream sources.</p>
<p>Looking a bit more closely to <a href="https://nodejs.org/api/stream.html#stream_readable_read_size_1"><code class="language-plaintext highlighter-rouge">Readable#_read</code> documentation</a>, we can read:</p>
<blockquote>
<p>Once the readable._read() method has been called, it will not be called again until more data is pushed through the readable.push() method.</p>
</blockquote>
<p>Ahah! Thatâs exactly the issue we hit! <code class="language-plaintext highlighter-rouge">_read</code> is called a first time when the pipeline is set up, but no data has come yet so nothing to push.
Then, we are stuck forever as no further <code class="language-plaintext highlighter-rouge">Readable#push</code> calls can occur because <code class="language-plaintext highlighter-rouge">_read</code> will not be called anymore.</p>
<p>Lucky for us, nothing prevents <code class="language-plaintext highlighter-rouge">Readable#push</code>, or even <code class="language-plaintext highlighter-rouge">Readable#_read</code> from being called from elsewhere in the <code class="language-plaintext highlighter-rouge">Readable</code> implementation.</p>
<p>Letâs try again (and add a few temporary logs while weâre at it):</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span>
<span class="p">}</span>
<span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="dl">'</span><span class="s1">Initializing pipeline</span><span class="dl">'</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, nothing to do for now...`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Data flowing: </span><span class="p">${</span><span class="nx">bound</span><span class="p">}</span><span class="s2"> element(s) from each source to zip!`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Chunk 1 received: </span><span class="p">${</span><span class="nx">chunk1</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, calling with </span><span class="p">${</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">}</span><span class="s2"> element(s) from first upstream`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Chunk 2 received: </span><span class="p">${</span><span class="nx">chunk2</span><span class="p">}</span><span class="s2">`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span>
<span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s2">`Waiting for data, calling with </span><span class="p">${</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">}</span><span class="s2"> element(s) from second upstream`</span><span class="p">);</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Letâs re-run the test:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>Initializing pipeline
Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now...
Chunk 1 received: 1
Waiting <span class="k">for </span>data, calling with 1 element<span class="o">(</span>s<span class="o">)</span> from first upstream
Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now...
Chunk 2 received: Un
Waiting <span class="k">for </span>data, calling with 1 element<span class="o">(</span>s<span class="o">)</span> from second upstream
Data flowing: 1 element<span class="o">(</span>s<span class="o">)</span> from each <span class="nb">source </span>to zip!
Chunk 1 received: 2
Chunk 2 received: Deux
Chunk 1 received: 3
Chunk 2 received: Trois
Data flowing: 2 element<span class="o">(</span>s<span class="o">)</span> from each <span class="nb">source </span>to zip!
Waiting <span class="k">for </span>data, nothing to <span class="k">do for </span>now...
Failures:
1<span class="o">)</span> zip operator <span class="o">=></span> pairs chunks from upstream streams
Message:
Error: Timeout - Async <span class="k">function </span>did not <span class="nb">complete </span>within 5000ms <span class="o">(</span><span class="nb">set </span>by jasmine.DEFAULT_TIMEOUT_INTERVAL<span class="o">)</span>
</code></pre></div></div>
<p>Hmm, the test still fails, but the implementation seems to behave correctly.
What actually happens is that our <code class="language-plaintext highlighter-rouge">ZipReadable</code> implementation never completes.
Looking again at the <a href="https://nodejs.org/api/stream.html#stream_readable_push_chunk_encoding"><code class="language-plaintext highlighter-rouge">Readable#push</code></a> documentation,
we can see pushing that <code class="language-plaintext highlighter-rouge">null</code> notifies downstream consumers that the stream is done emitting data.</p>
<p>Now, when should we do that?
If we look at the Reactor diagram of <code class="language-plaintext highlighter-rouge">zip</code> again:</p>
<p><img src="/assets/img/zip.svg" alt="`zip` diagram" title="`zip` diagram" /></p>
<p>⊠we can see that the completion should be sent when the last stream completes.
<code class="language-plaintext highlighter-rouge">Readable</code> streams notify consumers with the <a href="https://nodejs.org/api/stream.html#stream_event_end"><code class="language-plaintext highlighter-rouge">end</code> event</a> when they are done.
Now that we have got everything figured out, letâs get rid of the logs and fix our implementation:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(</span><span class="nx">stream1</span><span class="p">,</span> <span class="nx">stream2</span><span class="p">)</span> <span class="p">{</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// (1)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span> <span class="o">=</span> <span class="nx">stream1</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span> <span class="o">=</span> <span class="nx">stream2</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span> <span class="o">=</span> <span class="p">[];</span>
<span class="p">}</span>
<span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="kd">const</span> <span class="nx">readyChunks1</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">readyChunks2</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">pair</span> <span class="o">=</span> <span class="p">[</span><span class="nx">readyChunks1</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="nx">readyChunks2</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (2)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (3)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (2)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="mi">2</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (3)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream1</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk1</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk1</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks1</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">stream2</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk2</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk2</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks2</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<ol>
<li>we introduce a counter to keep track of upstream stream completion.</li>
<li>we observe each upstream stream completion and increment the counter when than occurs.</li>
<li>we notify the <code class="language-plaintext highlighter-rouge">zip</code> stream completion when all upstream streams are done.</li>
</ol>
<p>Letâs run the tests:</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>2 specs, 0 failures
</code></pre></div></div>
<p>Yay, it passes đ„ł</p>
<p>However, the implementation could definitely be refactored as there is a lot of duplicated behaviors.
It could even be generalized to <em>n</em> upstream sources (the corresponding test is very similar to the one with 2 sources)!</p>
<p>And here we go:</p>
<div class="language-javascript highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// DO NOT USE IN PRODUCTION - SEE BELOW FOR DETAILS</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">Readable</span> <span class="p">}</span> <span class="o">=</span> <span class="nx">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">stream</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">class</span> <span class="nx">ZipReadable</span> <span class="kd">extends</span> <span class="nx">Readable</span> <span class="p">{</span>
<span class="kd">constructor</span><span class="p">(...</span><span class="nx">upstreams</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (1)</span>
<span class="k">super</span><span class="p">({</span> <span class="na">objectMode</span> <span class="p">:</span> <span class="kc">true</span> <span class="p">});</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">streams</span> <span class="o">=</span> <span class="nx">upstreams</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks</span> <span class="o">=</span> <span class="nx">upstreams</span><span class="p">.</span><span class="nx">map</span><span class="p">(()</span> <span class="o">=></span> <span class="p">[]);</span> <span class="c1">// (2)</span>
<span class="p">}</span>
<span class="nx">_read</span><span class="p">(</span><span class="nx">size</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">initialized</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_startReading</span><span class="p">();</span>
<span class="k">this</span><span class="p">.</span><span class="nx">initialized</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="kd">const</span> <span class="nx">bound</span> <span class="o">=</span> <span class="nb">Math</span><span class="p">.</span><span class="nx">min</span><span class="p">(</span><span class="nx">size</span><span class="p">,</span> <span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">chunks</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">array</span> <span class="o">=></span> <span class="nx">array</span><span class="p">.</span><span class="nx">length</span><span class="p">));</span> <span class="c1">// (3)</span>
<span class="k">if</span> <span class="p">(</span><span class="nx">bound</span> <span class="o">===</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">true</span><span class="p">;</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span> <span class="o">=</span> <span class="kc">false</span><span class="p">;</span>
<span class="k">this</span><span class="p">.</span><span class="nx">chunks</span>
<span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">a</span> <span class="o">=></span> <span class="nx">a</span><span class="p">.</span><span class="nx">splice</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nx">bound</span><span class="p">))</span>
<span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">prev</span><span class="p">,</span> <span class="nx">curr</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span> <span class="c1">// (4)</span>
<span class="kd">const</span> <span class="nx">result</span> <span class="o">=</span> <span class="p">[];</span>
<span class="k">for</span> <span class="p">(</span><span class="kd">let</span> <span class="nx">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="nx">i</span> <span class="o"><</span> <span class="nx">bound</span><span class="p">;</span> <span class="nx">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">previous</span> <span class="o">=</span> <span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]))</span> <span class="p">?</span> <span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="p">:</span> <span class="p">[</span><span class="nx">prev</span><span class="p">[</span><span class="nx">i</span><span class="p">]];</span>
<span class="nx">result</span><span class="p">.</span><span class="nx">push</span><span class="p">([...</span><span class="nx">previous</span><span class="p">,</span> <span class="nx">curr</span><span class="p">[</span><span class="nx">i</span><span class="p">]]);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nx">result</span>
<span class="p">})</span>
<span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">pair</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">pair</span><span class="p">);</span>
<span class="p">})</span>
<span class="p">}</span>
<span class="nx">_startReading</span><span class="p">()</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">streams</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">stream</span><span class="p">,</span> <span class="nx">index</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="nx">stream</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">end</span><span class="dl">'</span><span class="p">,</span> <span class="p">()</span> <span class="o">=></span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">endedUpstreamCount</span> <span class="o">===</span> <span class="k">this</span><span class="p">.</span><span class="nx">streams</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// (5)</span>
<span class="k">this</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="kc">null</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="nx">stream</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="dl">'</span><span class="s1">data</span><span class="dl">'</span><span class="p">,</span> <span class="p">(</span><span class="nx">chunk</span><span class="p">)</span> <span class="o">=></span> <span class="p">{</span>
<span class="kd">const</span> <span class="nx">streamChunks</span> <span class="o">=</span> <span class="k">this</span><span class="p">.</span><span class="nx">chunks</span><span class="p">[</span><span class="nx">index</span><span class="p">];</span>
<span class="nx">streamChunks</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">chunk</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">waitingForData</span><span class="p">)</span> <span class="p">{</span>
<span class="k">this</span><span class="p">.</span><span class="nx">_read</span><span class="p">(</span><span class="nx">streamChunks</span><span class="p">.</span><span class="nx">length</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">});</span>
<span class="p">});</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<ol>
<li>we use now the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Functions/rest_parameters">ârest parameterâ syntax</a> to accept any number of streams.
We could arguably improve the signature further by having two mandatory streams and an optional rest ones for extra streams.</li>
<li>we just have to create an initial empty array of chunks for every stream</li>
<li>we compute the current length of each chunk array and use the <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Spread_syntax">âspread syntaxâ</a> to fit these lengths into separate arguments of <code class="language-plaintext highlighter-rouge">Math.min</code>.</li>
<li>finally, after <code class="language-plaintext highlighter-rouge">Array#splice</code> extract the <code class="language-plaintext highlighter-rouge">bound</code> first parameter of each chunk array, these arrays are reduced into pairs and then published via <code class="language-plaintext highlighter-rouge">Readable#push</code></li>
<li>the counter now need to reflect the dynamic number of upstream sources instead of the hardcoded 2 of the previous version</li>
</ol>
<p>Does the existing test still pass?</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nv">$ </span>npm <span class="nb">test
</span>2 specs, 0 failures
</code></pre></div></div>
<p>Yes!</p>
<h2 id="one-more-thing">One More Thing</h2>
<p>There is one (albeit very important) aspect of streams I deliberately did not mention here: <a href="https://nodejs.org/es/docs/guides/backpressuring-in-streams/">backpressure</a>.
Backpressure happens when downstream streams cannot keep up with upstream streams. Basically, the latter conveys data too fast for the first.</p>
<p>The good news is that <code class="language-plaintext highlighter-rouge">Readable#pipe</code> handles backpressure âfor freeâ (and I assume <code class="language-plaintext highlighter-rouge">pipeline</code> as well).</p>
<p>That being said, do our custom implementations of <code class="language-plaintext highlighter-rouge">zip</code> and <code class="language-plaintext highlighter-rouge">map</code> <a href="https://nodejs.org/en/docs/guides/backpressuring-in-streams/#rules-to-abide-by-when-implementing-custom-streams">handle backpressure correctly</a>?</p>
<p>Spoiler alert: Iâm afraid not.</p>
<p>However, there will be a dedicated blog post about this, with updates to the initial implementations đ</p>
<h2 id="going-further">Going further</h2>
<p>If you notice improvements (other than backpressure-related ones), please send a <a href="https://github.com/fbiville/fbiville.github.io">Pull Request</a> and/or reach out to me on <a href="https://twitter.com/fbiville">Twitter</a>.
Here are a few references that helped me in my stream learning journey that are worth sharing:</p>
<ul>
<li><a href="https://nodejs.org/api/stream.html">https://nodejs.org/api/stream.html</a>: the official documentation of Node.js streams, including implementation guides</li>
<li><a href="https://github.com/nodejs/help/">https://github.com/nodejs/help/</a>: stuck with something? Open an issue in this repository and Node.js maintainers will help you!</li>
<li><a href="https://www.w3.org/TR/streams-api/">https://www.w3.org/TR/streams-api/</a> W3C/WhatWG stream spec (it slightly differs from Node.js stream API, but many concepts overlap)</li>
<li><a href="https://v8.dev/blog">https://v8.dev/blog</a>: not directly related to streams, but this blog authored by v8 maintainers is a goldmine of information w.r.t. how v8 works and new Javascript features</li>
</ul>Florent BivilleI joined the riff team at Pivotal a year and a half ago. I have been working for more than a year on riff invokers. This probably deserves a blog post on its own, but invokers, in short, have the responsibility of invoking user-defined functions and exposing a way to send inputs and receive outputs. The riff invocation protocol formally defines the scope of such invokers.Hello Jekyll!2019-05-19T00:00:00+00:002019-05-19T00:00:00+00:00https://fbiville.github.io/2019/05/19/Hello_Jekyll_<p>After a few issues with <a href="https://github.com/HubPress/hubpress.io">Hubpress.io</a> (is it even maintained now?), I decided to migrate my blog again and move to Jekyll.</p>
<p>The process was a mix of automatic (<a href="https://pandoc.org/">Pandoc</a>), semi-manual (helped with some good old Bash commands) and purely manual transformations. I even fixed old quirks from the previous Dotclear->Hubpress migration in the process.</p>
<p>The theme is used is well⊠minimal but I do not really need a fancy blog. I got rid of the analytics. I also added a mystery page.</p>
<p>Anyway, my blog is now live and usable again.</p>
<p>Stay tuned for an announcement I have been wanting to make for a while!</p>
<p>In the meantime, long live Jekyll!</p>
<p><img src="/assets/img/jekyll.png" alt="Jekyll" /></p>Florent BivilleAfter a few issues with Hubpress.io (is it even maintained now?), I decided to migrate my blog again and move to Jekyll.hack.commit.push2019-05-19T00:00:00+00:002019-05-19T00:00:00+00:00https://fbiville.github.io/2019/05/19/hack.commit.push<p><a href="https://hack-commit-pu.sh">hack.commit.push</a> est un nouvel événement gratuit autour des projets libres / open-source qui débarque bientÎt à Paris !</p>
<p>Avant dâentrer dans les dĂ©tails, je voulais revenir sur les motivations qui mâont poussĂ© Ă le co-crĂ©er.</p>
<h1 id="tldr"><abbr title="Too Long; Didn't Read">TL;DR</abbr></h1>
<p>Pas envie de tout lire ?
Vous pouvez aller <a href="#save-the-date">droit Ă lâessentiel</a> avec les infos Ă retenir.</p>
<h1 id="la-source--hackergarten-paris">La source : Hackergarten Paris</h1>
<p>Le meetup <a href="https://www.meetup.com/Paris-Hackergarten/">Hackergarten Paris</a> rĂ©unit contributeur·trice·s de projets libres/open-source et personnes dĂ©sireuses de sây mettre sans nĂ©cessairement savoir par oĂč commencer.</p>
<p>Comme expliquĂ© dans <a href="https://fbiville.github.io/2016/09/20/Pourquoi-venir-au-Hackergarten.html">une publication prĂ©cĂ©dente</a>, lâavantage est multiple.</p>
<p>Les nouveaux·elles venu·e·s sont accompagné·e·s en direct par une personne familiĂšre avec le code Ă changer. Elles peuvent donc contribuer efficacement, prendre confiance et aussi dĂ©mystifier le travail accompli : <strong>vous aussi</strong> ĂȘtes capable de contribuer !</p>
<p>CĂŽtĂ© project leads, une rĂ©cente enquĂȘte (en anglais) de lâexcellente initiative <a href="https://opencollective.com/">Open Collective</a> rĂ©sume bien mieux que moi lâun des besoins que les meetups Hackergarten ont pour ambition de satisfaire.</p>
<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="en" dir="ltr">One of the core reasons why the <a href="https://twitter.com/hackcommitpush?ref_src=twsrc%5Etfw">@hackcommitpush</a> conference and the <a href="https://twitter.com/Hackergarten?ref_src=twsrc%5Etfw">@hackergarten</a> meetups exist is perfectly summed up in this <a href="https://twitter.com/opencollect?ref_src=twsrc%5Etfw">@opencollect</a> survey: <a href="https://t.co/qmOEIkKAdr">https://t.co/qmOEIkKAdr</a>. Worth reading and sharing!<br />Looking forward to welcoming contributors on June 15: <a href="https://t.co/skQvuterrd">https://t.co/skQvuterrd</a>! <a href="https://t.co/yYDiHtRHBO">pic.twitter.com/yYDiHtRHBO</a></p>— hack.commit.push (@hackcommitpush) <a href="https://twitter.com/hackcommitpush/status/1129324028735438848?ref_src=twsrc%5Etfw">May 17, 2019</a></blockquote>
<script async="" src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
<p>En effet, la plupart des projets libres/open-source sont maintenus par des personnes distribuĂ©es sur toute la planĂšte et la communication sâeffectue habituellement par Ă©crans interposĂ©s.</p>
<p>Le meetup Hackergarten Paris (<a href="http://hackergarten.net/">comme ceux dâautres villes</a>) permet donc de co-localiser les personnes motivĂ©es par un sujet commun et de les faire avancer dans un cadre dĂ©tendu et bienveillant. En bref, retisser un lien social qui se perd entre contributeur·rice·s.</p>
<h1 id="hackcommitpush-dans-tout-ça-">hack.commit.push dans tout ça ?</h1>
<p>Jâai dâexcellents souvenirs de mes premiĂšres participations au Hackergarten autour de 2011-2012. Il avait lieu rĂ©guliĂšrement Ă <a href="https://xebia.com/">Xebia</a> et Ă©tait organisĂ© par <a href="https://twitter.com/mathildelemee">Mathilde</a>, <a href="https://twitter.com/BriceDutheil">Brice</a> et <a href="https://twitter.com/elefevre">Ăric</a>.</p>
<p>Néanmoins, faute de temps, le meetup ne fut plus organisé que pendant les grandes conférences (Devoxx etc).
Avec la permission des trois organisateurs citĂ©s ci-dessus, jâai alors repris le meetup (fin 2015, de mĂ©moire) et relancĂ© sa version mensuelle (qui continue aujourdâhui : tous les derniers mardis du mois Ă <a href="https://www.meetup.com/Paris-Hackergarten/">Paris</a>).</p>
<p>Jâai mĂȘme essayĂ© deux ou trois fois de tenir le Hackergarten pendant <a href="https://devoxx.fr">Devoxx France</a>, aprĂšs sa migration au Palais des CongrĂšs. Pour des raisons diverses, cela nâa simplement pas fonctionnĂ© : quasiment personne nâa rejoint la session.</p>
<p>Au delĂ des amĂ©liorations dâorganisation potentielles de Devoxx pour le Hackergarten (les orgas abattent dĂ©jĂ un travail considĂ©rable), jâai fini par me demander sâil Ă©tait vraiment pertinent de proposer un Hackergarten Ă des personnes venues avant tout pour assister Ă des confĂ©rences et pour rĂ©seauter.</p>
<p>Câest de ce constat quâest nĂ© lâidĂ©e du <a href="https://hack-commit-pu.sh">hack.commit.push</a> est nĂ© : un Ă©vĂ©nement 100% dĂ©diĂ© aux contributions de projets libres / open-source, Ă la maniĂšre des Hackergartens existants !</p>
<h1 id="save-the-date">Save the date</h1>
<p>OrganisĂ©e par <a href="https://twitter.com/aalmiray">Andres</a>, <a href="https://twitter.com/hboutemy">HervĂ©</a>, <a href="https://twitter.com/mesirii">Michael</a> et votre serviteur, soutenue par des contributeur·trice·s tel·le·s que <a href="https://twitter.com/JessicaGantier">Jessica</a>, <a href="https://twitter.com/dyild">Dilek</a> et <a href="https://twitter.com/kehrlann">Daniel</a>, la premiĂšre Ă©dition est <strong>GRATUITE</strong>, aura lieu le <strong>15 Juin</strong> Ă <strong>Paris</strong> dans les trĂšs beaux locaux de <a href="http://www.techandcodefactory.fr/">Tech & Code Factory</a> et sâinscrit dans la droite lignĂ©e des Hackergartens :</p>
<ul>
<li>tou·te·s les participant·e·s sont bienvenu·e·s, quel que soit leur niveau en développement logiciel et leur expérience avec des projets libres / open-source</li>
<li>que ce soit de lâamĂ©lioration de documentation, de design, de correction de bugs ou de lâajout de fonctionnalitĂ©, chaque contribution compte !</li>
</ul>
<p>Pour les dĂ©butant·e·s, nous avons pour volontĂ© dâorganiser des ateliers dâintroduction la matinĂ©e (par exemple : introduction Ă Git / Github) afin de les aider Ă contribuer pendant lâaprĂšs-midi.</p>
<p>Nous avons dâores et dĂ©jĂ de beaux projets Ă vous proposer :</p>
<ul>
<li><a href="https://maven.apache.org/">Apache Maven</a></li>
<li><a href="https://neo4j.com/">Neo4j</a></li>
<li><a href="https://gradle.org/">Gradle</a></li>
<li><a href="https://projectriff.io/">riff</a></li>
<li><a href="https://kubernetes.io/docs/home/">Kubernetes FR docs</a> <- un grand merci Ă <a href="https://twitter.com/remyleone">RĂ©my Leone</a> de <a href="https://www.scaleway.com/en/betas/">Scaleway</a> au passage</li>
<li>et bien dâautres !</li>
</ul>
<p>NâhĂ©sitez plus, <a href="https://hack-commit-pu.sh/">inscrivez-vous</a> et faites passer le mot !</p>
<p>Vous souhaitez vous impliquer davantage ? Lisez ce qui suit â</p>
<h1 id="je-veux-mimpliquer-">Je veux mâimpliquer !</h1>
<h2 id="je-veux-proposer-un-projet">Je veux proposer un projet</h2>
<p>Votre mission, si vous lâacceptez, est dâaccompagner de façon bienveillante des personnes au niveau variĂ© sur leurs premiĂšres contributions Ă votre projet libre/open-source.</p>
<p>Votre challenge sera dâĂ©quilibrer le temps dâexplication nĂ©cessaire pour commencer Ă contribuer (vous voulez maximiser la participation des contributeur·trice·s) et le temps effectif de contribution (vous pouvez dĂ©finir des prĂ©-requis pour ĂȘtre plus efficace, mais câest au risque dâexclure dâemblĂ©e trop de participant·e·s).</p>
<p>Toujours tenté·e·s ? Alors, nâhĂ©sitez pas Ă nous envoyer, de prĂ©fĂ©rence en anglais, une description de votre projet et les contributions possibles en une journĂ©e (avec dâĂ©ventuels prĂ©-requis pour les participant·e·s) : <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code>.</p>
<h2 id="ma-société-veut-sponsoriser">Ma société veut sponsoriser</h2>
<p>Nous avons en effet divers frais Ă couvrir, tels que le buffet de la journĂ©e, le cocktail de clĂŽture et pourquoi pas encore dâautres services si le budget le permet.</p>
<p>Pour information, nous sommes structurés en <a href="https://paris-springers.github.io/">association</a>.</p>
<p>NâhĂ©sitez pas Ă nous contacter, de prĂ©fĂ©rence en anglais, Ă <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code> pour que nous vous envoyions notre prospectus.</p>
<h2 id="je-veux-animer-un-atelier-dintroduction">Je veux animer un atelier dâintroduction</h2>
<p>Nous avons Ă coeur que les profils moins expĂ©rimentĂ©s puissent Ă©galement participer. Le but des ateliers dâintroduction est dâadresser, en deux heures, les fondamenteux de technologies utiles aux diffĂ©rents projets reprĂ©sentĂ©s pendant lâĂ©vĂ©nement.</p>
<p>Le candidat le plus Ă©vident est Git / Github.</p>
<p>NâhĂ©sitez pas Ă nous contacter, de prĂ©fĂ©rence en anglais, Ă <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code> si cette opportunitĂ© vous intĂ©resse.</p>
<h2 id="je-veux-ĂȘtre-bĂ©nĂ©vole">Je veux ĂȘtre bĂ©nĂ©vole</h2>
<p>Si vous voulez rejoindre lâaventure, nâhĂ©sitez pas Ă nous contacter, de prĂ©fĂ©rence en anglais, Ă <code class="language-plaintext highlighter-rouge">organization AT hack-commit-pu.sh</code>.</p>
<p>Si vous ne voulez aider âqueâ pendant le jour J, voici un aperçu de ce quâil est possible de faire :</p>
<ul>
<li>accueil des sponsors / project leads</li>
<li>inscription des participant·e·s</li>
<li>annonce des pauses</li>
<li>aide au ménage en fin de journée</li>
</ul>
<p>Ce qui nâest pas incompatible avec une participation Ă lâĂ©vĂ©nement en lui-mĂȘme (vous aurez juste un temps de participation un peu plus rĂ©duit) !</p>Florent Bivillehack.commit.push est un nouvel Ă©vĂ©nement gratuit autour des projets libres / open-source qui dĂ©barque bientĂŽt Ă Paris !Pourquoi Venir Au Hackergarten2016-09-20T00:00:00+00:002016-09-20T00:00:00+00:00https://fbiville.github.io/2016/09/20/Pourquoi-venir-au-Hackergarten<p>Quâon se le dise, les logiciels Open Source sont partout. Il y a fort Ă
parier que vous les utilisiez directement voire en développiez dans
votre activité professionnelle. Il demeure indéniable que vous en
bĂ©nĂ©ficiez dans votre vie quotidienne, mĂȘme indirectement.</p>
<h1 id="hackers-we-need-you">Hackers: we need you!</h1>
<p>Il vous est peut-ĂȘtre mĂȘme arrivĂ© de renseigner un bug, voire de
soumettre un correctif Ă un logiciel open-source que vous utilisez dans
le cadre professionnel. Mais en dehors de ces rares occasions, vous
nâavez jamais trouvĂ© le temps de contribuer de façon plus pĂ©renne.</p>
<p>Pourtant, en voilĂ un objectif qui peut rendre fier|fiĂšre ! Devenir
lâun des committers principaux dâun projet visible (ou en passe de le
devenir) peut faire une belle différence sur le CV et dans votre
carriĂšre.</p>
<p>Cela ne se fait Ă©videmment pas en un jour, mais chaque premiĂšre
contribution est importante. Il peut ĂȘtre assez difficile de se plonger
dans une base de code inconnue sans aide extérieure, ni objectif précis.</p>
<p>Paris Hackergarten est lĂ pour vous !</p>
<p>Il vise Ă regrouper, dans une mĂȘme piĂšce, le temps dâune soirĂ©e (1 fois
par mois), committers confirmés (a.k.a. mentors) et contributeurs
motivés (a.k.a. hackers) !</p>
<p>Chacun y retrouve son compte :</p>
<ul>
<li>
<p>le mentor voit son projet avancer grĂące aux contributions</p>
</li>
<li>
<p>le hacker se familiarise avec la base de code, avec lâaide du mentor
et envoie ses premiĂšres contributions en quelques heures, et non pas
en quelques jours</p>
</li>
</ul>
<p>Lors de la derniÚre soirée, un binÎme a réussi à soumettre une <a href="https://github.com/apache/maven-shared/pull/13">pull
request</a> au projet
Apache Maven ! Ils ont pourtant commencé la soirée sans connaissances
préalables de la base de code. Merci à Hervé pour le mentoring au
passage !</p>
<p>Tous les hackers sont bienvenus ! Ne vous auto-censurez pas en pensant
que vous nâavez pas le niveau, ça nâest pas vrai ! ;-)</p>
<h1 id="appel-aux-mentors">Appel aux mentors</h1>
<p>Vous souhaitez présenter votre projet et attirer de nouvelles
contributions ?</p>
<p>Pour se faire, deux rĂšgles sont en vigueur :</p>
<ol>
<li>
<p>préparer une présentation deux minutes afin de familiariser et
"vendre" votre projet aux participants</p>
</li>
<li>
<p>avoir un ensemble de tùches bien définies, idéalement réalisables en
une soirée</p>
</li>
</ol>
<p>Concernant la technologie employée : aucune contrainte !</p>
<p>Je tiens Ă insister sur ce point. On pourrait croire actuellement que le
meetup est rĂ©servĂ© aux dĂ©veloppeurs Java, ça nâest pas le cas !</p>
<p>Il se peut mĂȘme quâune session du Paris Hackergarten soit prochainement
dédiée au développement iOS, stay tuned! ;-)</p>
<h1 id="Ă -vos-calendriers-">Ă vos calendriers !</h1>
<p>Nous nous efforçons dâorganiser le <a href="http://www.meetup.com/Paris-Hackergarten/">Paris
Hackergarten</a> tous les
derniers mardis du mois, dans les locaux de Xebia.</p>
<p>Le
<a href="http://www.meetup.com/Paris-Hackergarten/events/231855753/">prochain</a>
aura donc lieu le 27 Septembre, jâespĂšre donc vous y voir !</p>Florent BivilleQuâon se le dise, les logiciels Open Source sont partout. Il y a fort Ă parier que vous les utilisiez directement voire en dĂ©veloppiez dans votre activitĂ© professionnelle. Il demeure indĂ©niable que vous en bĂ©nĂ©ficiez dans votre vie quotidienne, mĂȘme indirectement.Rant: The Teletubbies âDocumentationâ Pitfall2016-09-19T00:00:00+00:002016-09-19T00:00:00+00:00https://fbiville.github.io/2016/09/19/Rant-The-Teletubbies-Documentation-Pitfall<h1 id="disclaimer">Disclaimer</h1>
<p>I am not Uncle Bobâs nephew, but if you already have read Clean Code,
chances are you will not learn much from this post.</p>
<h1 id="typical-example">Typical example</h1>
<p>Let me talk about a coding practice that I find profoundly disturbing.
Get this code for instance:</p>
<pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) {
// call nice service to fetch foo
Foo foo = niceService.fetchFoo(parameter);
return new SomeResult(foo);
}
</code></pre>
<p>Basically, we have got some trivial calls to a service and use it for
instanciating the result we are interested in.</p>
<p>Do we need the comment, though? Obviously, we donât!</p>
<p>We are just adding noise!</p>
<p>Thatâs why I call it a Teletubbies documentation.</p>
<h1 id="teletu-what">Teletu-what?</h1>
<p>Teletubbies, as you probably already know, is a TV show for very young
children, created by the BBC.</p>
<p>If you know the show, you know also that whenever a Teletubbies
character does something, the following happens:</p>
<ol>
<li>
<p>the character announces what it intends to do</p>
</li>
<li>
<p>the voice-over paraphrases what the character just said</p>
</li>
<li>
<p>the character does it</p>
</li>
<li>
<p>optionally back to step 1</p>
</li>
</ol>
<p>This makes sense for very young children, part of education is based on
repetition.</p>
<h1 id="back-to-our-example">Back to our example</h1>
<p>So whenever I encounter a snippet of code like above, I immediately hear
this annoying voice-over that just repeats something we already know.</p>
<p>It is annoying because, well, we are not very young children.</p>
<p>Whatâs the big deal, you might object?</p>
<p>Well, comments like these can <strong>easily</strong> get out of sync. In the
worst-case scenario, they become misleading.</p>
<p>It leads to situations where you have to confront the current code and
the outdated comment and you cannot really be sure which one describes
what the behavior <strong>should</strong> be.</p>
<p>Comments donât run, they are just an informal bunch of text and cannot
be changed automatically (at least, not in a 100% reliable way). Their
risk of becoming obsolete is therefore higher.</p>
<p>To rephrase it, comments like this are part of the problem, not the
solution.</p>
<p>Inline comments are just a liability.</p>
<p>The worst part is that they often appear as a whole bunch:</p>
<pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) {
// call nice service to fetch foo
Foo foo = niceService.fetchFoo(parameter);
// [...] 200 lines with comments+code like that
// hilarity ensues... not
return new SomeResult(foo, ...);
}
</code></pre>
<p>Indeed, the bad side effect of this kind of brain-dead comments is that
it <strong>prevents</strong> the original authors to ask themselves: is the code
readable enough this way? Am I thinking this through? How can I make the
code more self-explanatory?</p>
<p>If you get used to this kind of comments, you will most likely focus
your reading on them and live in the illusion that the method is
readable and well-documented.</p>
<p>I have got some bad news for you: 200 lines of code for a method are NOT
readable at all, no matter how much obsolete poetry you stick in there.</p>
<p>As a general rule of thumb, is it worth writing something down if that
only took you 10 seconds to come up with?</p>
<h1 id="a-not-so-noisy-example">A not-so-noisy example</h1>
<p>Letâs move on to a more interesting example.</p>
<p>Itâs not that the first example does not happen frequently, but there
are some situations like the following that involves a bit more than
pure noise.</p>
<pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter) {
/*
* call nice service to fetch foo because
* some contextual reasons
*
* fetchFoo may throw in theory but will not
* because the parameter is always valid in
* this particular usecase [...], so no try-catch,
* YOLO
*/
Foo foo = niceService.fetchFoo(parameter);
return new SomeResult(foo);
}
</code></pre>
<p>"Ah! This comment is useful! It explains the implementation
rationale!â, you may say.</p>
<p>While there is some value in these pieces of information, they just do
not belong there.</p>
<p>Let me elaborate.</p>
<h1 id="small-detour-back-to-basics">Small detour: back to basics</h1>
<p>As you already know, in many programming languages, method signatures
look like:</p>
<pre><code class="language-{.java}">public SomeResult computeResult(SomeParameter parameter)
</code></pre>
<p>Ideally, the signature should be explicit enough (especially with
well-defined types, parametricity FTW) to know what the method does. How
the method does it should be relevant only if you have to change
something there.</p>
<p>Everything that follows between curly braces is about <strong>implementation</strong>
details.</p>
<h1 id="back-to-the-example-again">Back to the example again</h1>
<p>However, I would argue that the two information encoded as a inline
comment above are NOT implementation details, yet they live in the
implementation section.</p>
<p>What are these comment sections about?</p>
<ol>
<li>
<p>the first part describe the intent behind the implementation (or at
least part of it)</p>
</li>
<li>
<p>the second and last part describe (part of) the observable behavior
of the method</p>
</li>
</ol>
<h1 id="intent-documentation">Intent documentation</h1>
<p>Intents are very contextual and temporal.</p>
<p>Decisions, no matter how small, are taken every day and guide the way we
implement things.</p>
<p>These decisions are influenced by temporal factors mostly: the
assumptions made at the time may not hold at all anymore in 6 months, 1
yearâŠâ</p>
<p>Temporal documentation.</p>
<p><strong>TEMPORAL</strong> documentation.</p>
<p>It rings a bell, somehow.</p>
<p>S-C-M! Source Control Management tools like Git, Mercurial and friends.</p>
<p>They play an important part in documentation. Not only do they
intrinsically describe what has changed and when, they should describe
<strong>why</strong> the changes were made.</p>
<p>Thatâs what <strong>commit messages</strong> are for!</p>
<p>And if you start thinking this way, there will be an additional benefit:
you will keep your commits as small and focused as possible. If the
commit is too big, there is no way you can explain all the important
changes you made ;-)</p>
<p>And if you start to care enough about your changelog, you will get nice
readable releases notes for free!</p>
<h1 id="observable-behavior-documentation">Observable behavior documentation</h1>
<p>If what you describe is part of the observable behavior of the scope you
are modifying, then it is clearly about the contract you implicitly sign
between the code you are implementing and its callers.</p>
<p>The documentation is about the API. API is just a clever name for a set
of accessible signatures. It is not an implementation detail at all, it
should be near the method signature itself:</p>
<pre><code class="language-{.java}">/**
* *describes the nominal observable behaviour here [...]*
*
* fetchFoo may throw in theory but will not
* because the parameter is always valid in this
* particular usecase [...], so no try-catch, YOLO
*/
public SomeResult computeResult(SomeParameter parameter) {
Foo foo = niceService.fetchFoo(parameter);
return new SomeResult(foo);
}
</code></pre>
<h1 id="going-further">Going further</h1>
<p>You could even rewrite the method like this:</p>
<pre><code class="language-{.java}">/**
* *describes the nominal observable behaviour here [...]*
*/
public SomeResult computeResult(SomeParameter parameter) {
try {
Foo foo = niceService.fetchFoo(parameter);
return new SomeResult(foo);
}
catch (MyNiceServiceException e) {
throw new AssertionError("Should not happen", e);
}
}
</code></pre>
<p>Now the assumptions are even more explicit. That opens even an
interesting discussion about the virtues of <a href="https://www.youtube.com/watch?v=57P86oZXjXs">failing
fast</a> :-)</p>
<p>One could argue we could do even better. Ideally, method signatures
should be sufficient to tell what the method is doing:
<a href="http://data.tmorris.net/talks/yow-west-2016/1d388b6263e7cbeedfbea224997648daa1d7862d/parametricity.pdf">parametricity</a>
FTW! Hoogle.com is probably one of the best illustrations for this.</p>
<p>That requires discipline (especially with languages such as Java, C# et
al), but is not impossible to achieve: try to minimize and contain side
effects, forego nullsâŠâ and then types could convery a lot more useful
information!</p>
<p>Yet another interesting discussion!</p>
<h1 id="the-end">The end</h1>
<p>As you can see, caring about documentation is a gateway drug to better
software, clearer releases and happier collaborators.</p>
<p>I personally write comments less than 1% of the time I write code. This
happens where there is a tiny local expression that may seem obscure and
there is not simple way around it.</p>
<p>For the 99+%, there are almost always better places to write the
information you want to convey:</p>
<ul>
<li>
<p>the code itself, it should answer <strong>WHAT</strong> it does, without
ambiguity, else just refactor it (extract meaningful methods,
rename, split expressionsâŠâ the IDE is your friend). This is the
material that decays the least, rely on this as much as you can!</p>
</li>
<li>
<p>the *-doc (e.g. Javadoc, Csharpdoc): the information is about the
observable behavior of the section you are altering</p>
</li>
<li>
<p>the intent: that should justify the commit you are about to push</p>
</li>
</ul>
<p>Inline comments are (99+%) dead! Long live inline comments!</p>Florent BivilleDisclaimerCompilers Hate Him! Discover This One Weird Trick with Neo4j Stored Procedures2016-07-12T00:00:00+00:002016-07-12T00:00:00+00:00https://fbiville.github.io/2016/07/12/Compilers-hate-him-Discover-this-one-weird-trick-with-Neo4j-stored-procedures<p>As you probably already know, Neo4j 3.0 finally comes with <a href="https://neo4j.com/docs/java-reference/current/#_calling_procedure">stored
procedures</a>
(letâs call them sprocs from now on).</p>
<p>The cool thing about this is you can directly interact with sprocs in
Cypher, as <a href="https://twitter.com/mesirii">Michael Hunger</a> explains in
this <a href="https://neo4j.com/blog/intro-user-defined-procedures-apoc/">blog
post</a>.</p>
<h1 id="writing-stored-procedures">Writing stored procedures</h1>
<p>During the preparation of my Neo4j introduction talk in the latest
<a href="https://www.facebook.com/GoCriteo/photos/pcb.1045385882181102/1045385698847787/?type=3">Criteo
summit</a>
(weâre <a href="http://www.criteo.com/careers/#careers-browser">hiring</a>!), I
started playing around with sprocs.</p>
<p>The process is quite simple:</p>
<ol>
<li>
<p>You write some code, annotate it</p>
</li>
<li>
<p>test it with the test harness</p>
</li>
<li>
<p>package the JAR and deploy it to your Neo4j instance (<code class="language-plaintext highlighter-rouge">plugins/</code>)!</p>
</li>
</ol>
<p>Actually, step 3 may repeat itself quite a few times, Neo4j sprocs must
comply to a few rules before your Neo4j server accepts to deploy it.</p>
<h1 id="sproc-rules">Sproc rules</h1>
<p>The rules are detailed in <code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Procedure</code>
<a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Procedure.java#L31">javadoc</a>,
but we can summarize them as follows:</p>
<ul>
<li>
<p>a sproc is a method annotated with <code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Procedure</code></p>
</li>
<li>
<p>it must return a
<a href="https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html"><code class="language-plaintext highlighter-rouge">java.util.stream.Stream<T></code></a>
where T is a user-defined record type</p>
</li>
<li>
<p>the record type must define public fields</p>
</li>
<li>
<p>these can only be of restricted types</p>
</li>
<li>
<p>if the sproc accepts parameters, they all must be annotated with
<a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Name.java"><code class="language-plaintext highlighter-rouge">@org.neo4j.procedure.Name</code></a></p>
</li>
<li>
<p>parameters can only be of specific types</p>
</li>
<li>
<p>the procedure name must be unique (name = package name+method name)</p>
</li>
<li>
<p>injectable types (<code class="language-plaintext highlighter-rouge">GraphDatabaseService</code> et al) must target public
non-static, non-final,
<a href="https://github.com/neo4j/neo4j/blob/3.0/community/kernel/src/main/java/org/neo4j/procedure/Context.java"><code class="language-plaintext highlighter-rouge">@Context</code>-annotated</a>
fields</p>
</li>
</ul>
<p>Fortunately, folks at <a href="https://neo4j.com/company/">Neo Technology</a> have
done a wonderful job at error reporting. Neo4j fails fast if any of the
rules is violated and gives a detailed error message.</p>
<p>Here is an example with Neo4j 3.0.3 and the following <strong>failing</strong>
attempt to deploy the following sproc:</p>
<pre><code class="language-{.java}">@Procedure
public Stream<MyRecord> doSomething(Map<String, Integer> value) {
// [...]
}
</code></pre>
<p>The following error will be prompted (see <code class="language-plaintext highlighter-rouge">logs/neo4j.log</code>):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Caused by: org.neo4j.kernel.api.exceptions.ProcedureException: Argument at position 0 in method `doSomething` is missing an `@Name` annotation.
Please add the annotation, recompile the class and try again.
</code></pre></div></div>
<p>Nice error message! Just add the missing <code class="language-plaintext highlighter-rouge">@Name</code> on the only parameter,
re-compile, package and deploy the JAR again, restart Neo4j and youâre
done!</p>
<h1 id="can-we-do-better">Can we do better?</h1>
<p>The previous example is quite trivial, but this back-and-forth could be
potentially repeated many times, especially when one is not much
familiar with sprocs.</p>
<p>Fortunately for us, most of the errors can be caught at compile time.</p>
<h1 id="eurekaannotation-processing-ftw">@Eureka("annotation processing FTW!â)</h1>
<p>Annotations have been around in Java since end of 2004 (v1.5) and have
come together with <code class="language-plaintext highlighter-rouge">apt</code> (now built in <code class="language-plaintext highlighter-rouge">javac</code>), the annotation
processing tool.</p>
<p>What the latter does in brief (in long, read the
<a href="https://www.jcp.org/en/jsr/detail?id=269">spec</a>) is to allow
user-defined code to introspect a Java program at compile-time (original
paper <a href="http://www.bracha.org/mirrors.pdf">here</a>) and possibly:</p>
<ul>
<li>
<p>issue compilation notices/warnings/errors</p>
</li>
<li>
<p>generate static, source and/or bytecode files</p>
</li>
</ul>
<p>(By the way, this means exceptions can be raised at compile-time too!)</p>
<p>Based on this, I decided to write a little annotation processor on my
way back from Criteo summit (did I mention we are
<a href="http://www.criteo.com/careers/#careers-browser">hiring</a>?).</p>
<p><a href="https://github.com/fbiville/neo4j-sproc-compiler">neo4j-sproc-compiler</a>
is born. And itâs
<a href="https://github.com/neo4j-contrib/neo4j-apoc-procedures/blob/18fe85a3712aa84696cc4dedaf0db659a63e3e7b/pom.xml#L72">used</a>!</p>
<p>If Michael is happy, I am happy:</p>
<p><img src="https://raw.githubusercontent.com/fbiville/fbiville.github.io/master/images/michael-sproc-compiler-feedback.png" alt="michael sproc compiler
feedback" /></p>
<p>(I swear itâs not photoshopped, see #apoc channel, 1st of July 2016 in
Neo4j-Users <a href="https://neo4j-users.slack.com">Slack</a>).</p>
<h1 id="neo4j-sproc-compiler-in-action">neo4j-sproc-compiler in action</h1>
<p>While the following screencast features Maven, the annotation processor
is actually agnostic of any build tool. You can use any build tool you
want or directly <code class="language-plaintext highlighter-rouge">javac</code> if that floats your boat!</p>
<p><a href="https://asciinema.org/a/79379"><img src="https://asciinema.org/a/79379.svg" alt="asciicast" /></a></p>
<h1 id="conclusion">Conclusion</h1>
<p>Be cautious, most but <strong>not</strong> all checks can be performed at compile
time. Youâll still need to write some tests and monitor your deploys!</p>
<p>Hopefully, this little utility that I wrote will shorten your
development feedback loop and get your stored procedures harder, better,
stronger and faster.</p>Florent BivilleAs you probably already know, Neo4j 3.0 finally comes with stored procedures (letâs call them sprocs from now on).New Blog!2015-05-03T00:00:00+00:002015-05-03T00:00:00+00:00https://fbiville.github.io/2015/05/03/New-blog<p>Getting rid of Dotclear was long overdue. Impractical at best, I wasted
way too much time polishing the contents so that it would not render too
bad.</p>
<h1 id="whats-next">Whatâs next?</h1>
<p>I need to automate the migration to HubPress, so it will take some more
time before all my blog posts show up here. For now,
<a href="http://florent.biville.net">http://florent.biville.net</a> is still serving my old blog.</p>
<p>Itâs just a matter of time before everything is fully set up ;)</p>Florent BivilleGetting rid of Dotclear was long overdue. Impractical at best, I wasted way too much time polishing the contents so that it would not render too bad.Transfert Estival2014-10-05T00:00:00+00:002014-10-05T00:00:00+00:00https://fbiville.github.io/2014/10/05/Transfert-estival<h1 id="mais-pourquoi-">Mais pourquoi ?!</h1>
<p>Pour avoir un dictionnaire chaque année, bien sûr ! (Désolé, mes talents
GIMPiens sont encore limités).</p>
<p><img src="/assets/img/rtfv_m.png" alt="Read The F****** Vidal" /></p>
<p>Plus sĂ©rieusement, le fait de partir de Lateral Thoughts, sociĂ©tĂ© Ă
laquelle jâĂ©tais associĂ© et oĂč je disposais dâune grande autonomie, peut
poser question. Lateral Thoughts, pour toute personne souhaitant devenir
freelance, est un endroit idĂ©al. On peut mĂȘme y ĂȘtre salariĂ© en ayant
les mĂȘmes avantages (rĂ©munĂ©rations nets moindres, Ă©videmment). Oui, mais
voilà , alors que le freelancing fait rage depuis plusieurs années dans
notre "industrie", ma voie actuelle sâen Ă©carte.</p>
<h1 id="le-déclencheur">Le déclencheur</h1>
<p>Il y a quelques mois, jâai Ă©tĂ© contactĂ© par un recruteur Google. De
lâagrĂ©able surprise sâensuivit un stress Ă©norme et des prĂ©parations
dâentretien jusquâĂ la derniĂšre marche courant Juin : la journĂ©e
dâentretiens Ă Paris. Finalement non retenu Ă lâultime jury de sĂ©lection
de cette ultime Ă©tape, je nâen retiens que du positif. Petite
parenthĂšse, quand je vois certains critiquer les entretiens oĂč il est
demandé de coder, je rigole doucement. Tentez le marathon Google et on
en reparle :)
Revenons Ă nos moutons.Â
Comme je le disais, cette expĂ©rience intense mâa Ă©normĂ©ment appris : la
lecture des publications de Google, entrâapercevoir lâentreprise pendant
quelques heures, parler avec quelques ingénieurs⊠ont renforcé ma
conviction sur un point : je veux ĂȘtre dĂ©veloppeur, et rien dâautre.
Câest un peu lâessence de notre mĂ©tier, tel que je le conçois, qui mâest
revenu en pleine figure : la technique au service du besoin. Et quand je
dis technique, je ne parle pas du dernier framework Ă la mode ou du
dernier <a href="https://developer.apple.com/swift/">langage</a> soi-disant
rĂ©volutionnaire. Je pense plutĂŽt Ă de lâalgorithmie, du design (pas
celui de lâArchitecte Omniscient, hein). Les ingĂ©s de Google nâont pas
créé <a href="http://cracking8hacking.com/cracking-hacking/Ebooks/Misc/pdf/The%20Google%20filesystem.pdf">Google
FileSystem</a>
pour le fun ou pour en parler en conférence, mais bien parce que le
besoin était criant. Revenir aux fondamentaux a donc redynamisé mon
intĂ©rĂȘt pour le dĂ©veloppement et mâa fait prendre conscience de la
distance entre mon quotidien, le microcosme dans lequel jâĂ©volue et le
quotidien prĂ©sentĂ© dans une entreprise dâune telle ampleur.</p>
<h1 id="et-pourquoi-pas-freelance-">Et pourquoi pas freelance ?</h1>
<p>Le fait dâĂ©voluer en quasi-freelance mâa appris beaucoup de choses. Ăa
pourrait en fait se rĂ©sumer en une phrase : on nâobtient que ce que lâon
va chercher.Â
Une bonne mission ? Trouve-la toi-mĂȘme (ou fais en sorte que celle oĂč tu
es le devienne).
Pas content de telle ou telle situation ? Agis ou accepte.
Tout nâest pas rose non plus.
Au sein dâun regroupement de freelances ou simili-freelances comme Ă
Lateral Thoughts, chacun, et câest bien normal, trace son bonhomme de
chemin et fait Ă©merger les projets quâil a envie de dĂ©velopper. LĂ oĂč
cela se complique, câest quand il sâagit de mutualiser les efforts. Pas
de magie : si tu as besoin de plus de cerveaux pour co-réaliser ton
idĂ©e, il faut convaincre.Â
Câest un procĂ©dĂ© juste, mais usant voire parfois dĂ©motivant.
Pour qui me donne-je du mal ? Pour ma personne ? Pour Lateral Thoughts
?
Une des rĂ©ponses est : âen tâexposant au public, tu bĂ©nĂ©ficies de plus
de visibilitĂ© et câest aussi tout bĂ©nefâ pour LTâ. Jâai dâailleurs suivi
ce prĂ©cepte pendant 2 ans, autour de Neo4j, notamment : de Paris Ă
Istanbul en passant par GenĂšve.Â
Enrichissant, mais fatigant aussi.
Finalement, ces entretiens pour Google mâont redonnĂ© un objectif qui
dĂ©passe mon nombril. Jâai touchĂ© de prĂšs Ă lâun des gĂ©ants du Web, une
boĂźte qui (me) fait rĂȘver et Ă laquelle jâai envie de contribuer.Â
(Jâassume mon cĂŽtĂ© bisounours).
Bref, Google mâa juste aiguillĂ© sur le bon chemin. Et ce chemin ne passe
pas par le freelancing.</p>
<h1 id="larrivĂ©e-Ă -vidal">LâarrivĂ©e Ă Vidal</h1>
<p>JâĂ©tais dĂ©jĂ intervenu Ă Vidal et jây connaissais ses challenges
techniques. Lâenvironnement de travail de notre Ă©quipe auto-organisĂ©e
est propice Ă lâamĂ©lioration continue et je compte bien lâutiliser Ă bon
escient. Ce qui mâa motivĂ© pour les rejoindre en tant quâinterne : câest
la perspective de pouvoir se focaliser sur ce que lâon fait de mieux et
devenir irrĂ©prochables (par ordre dâimportance) :</p>
<ul>
<li>
<p>sâapproprier nos softs, de leur crĂ©ation au suivi de prod en passant
par les tests</p>
</li>
<li>
<p>devenir de plus en plus véloces sur la maintenance de ces produits</p>
</li>
<li>
<p>oser tenter des choix Ă contre-courant</p>
</li>
</ul>
<p>Ce ne sont pas les idĂ©es qui manquent, ni la motivation gĂ©nĂ©rale. Jâai
vraiment Ă coeur que notre Ă©quipe "Software" sâamĂ©liore
collectivement.
Nicolas Martignole parlait de lâĂ©quipe <a href="http://www.touilleur-express.fr/2010/03/19/rencontre-avec-des-developpeurs-chez-vidal-software/">"Software" de Vidal en
2010</a>,
vivement 2015 !</p>Florent BivilleMais pourquoi ?!Créer une application java avec Neo4j embarqué2014-06-17T00:00:00+00:002014-06-17T00:00:00+00:00https://fbiville.github.io/2014/06/17/Creer-une-application-Java-avec-Neo4j-embarque<h1 id="un-long-discours-">Un long discours ?</h1>
<p>AprÚs vous avoir assommé avec <a href="/?post/2014/06/09/Neo4j-sous-le-capot">mon article
précédent</a> sur le stockage
interne de Neo4j et sa scalabilitĂ©, je vais aujourdâhui me contenter
dâassez peu. En effet, plutĂŽt que de consacrer un effort important Ă
expliquer des bonnes pratiques autour de la mise en oeuvre de Neo4j dans
des projets Java, pourquoi ne pas crĂ©er lâ<a href="https://github.com/fbiville/maven-embedded-neo4j-archetype">archetype
Maven</a> qui
fait le boulot ?</p>
<h1 id="archetype-maven-">ArchetypeâŠâ Maven ?</h1>
<p>Alors oui, je sais, certains dâentre vous ne peuvent pas voir Maven en
couleurs. </p>
<p>Je sais quâil existe quelques archetypes bien particuliers autour de
Neo4j pour dâautres outils de build tels que
<a href="https://github.com/sarmbruster/unmanaged-extension-archetype">celui</a> de
<a href="https://twitter.com/darthvader42">Stefan Armbruster</a> pour
<a href="http://www.gradle.org/">Gradle</a>. NĂ©anmoins, je nâai pas croisĂ©
dâarchetypes Ă©quivalents Ă celui que je vais vous prĂ©senter.</p>
<p>Si vous pensez en avoir trouvĂ© un, nâhĂ©sitez pas Ă <a href="https://www.twitter.com/fbiville">me
contacter</a> que je le liste ici.</p>
<h2 id="physiologie">Physiologie</h2>
<p>Penchons-nous maintenant
sur lâ<a href="https://github.com/fbiville/maven-embedded-neo4j-archetype">archetype</a> crĂ©Ă©
pour lâoccasion.</p>
<p>Il génÚre des projets embarquant :</p>
<ul>
<li>
<p>neo4j</p>
</li>
<li>
<p>neo4j-kernel (classifier test-jar) pour les tests dâintĂ©gration</p>
</li>
<li>
<p>junit</p>
</li>
<li>
<p>assertj-core</p>
</li>
</ul>
<p><a href="http://joel-costigliola.github.io/assertj/assertj-neo4j.html">assertj-neo4j</a>
nâest pas encore assez mature, je vais tĂącher de le faire Ă©voluer avant
de le proposer via lâarchetype.</p>
<h2 id="contenu">Contenu</h2>
<p>Si vous suivez <a href="https://github.com/fbiville/maven-embedded-neo4j-archetype/blob/master/README.md">les
instructions</a>,
vous vous retrouverez avec un projet tout simple : * qui insĂšre des
données avec
<a href="http://docs.neo4j.org/chunked/stable/cypher-query-lang.html">Cypher</a> :</p>
<ul>
<li>qui lit des données via le <a href="http://docs.neo4j.org/chunked/stable/tutorial-traversal-java-api.html">framework de traversée
Java</a></li>
<li>qui utilise EmbeddedDatabaseRule pour les tests
<a href="http://junit.org/">JUnit</a> (cette <a href="https://github.com/junit-team/junit/wiki/Rules">rĂšgle
JUnit</a>Â encapsule
lâutilisation de Neo4j pour les tests dâintĂ©gration via son
<a href="http://docs.neo4j.org/chunked/stable/tutorials-java-unit-testing.html">implémentation
spécifique</a>)</li>
</ul>
<h1 id="conclusion">Conclusion</h1>
<p>Un autre archetype Maven devrait suivre pour lâinterfaçage REST de
Neo4j. Lâarchetype dĂ©crit ici sera bientĂŽt releasĂ© sur Maven Central.
En attendant, vous pouvez dĂ©jĂ lâutiliser et dĂ©marrer avec Neo4j sur des
bases saines !</p>Florent BivilleUn long discours ?Neo4j Sous Le Capot2014-06-09T00:00:00+00:002014-06-09T00:00:00+00:00https://fbiville.github.io/2014/06/09/Neo4j-sous-le-capot<h1 id="3615-ma-vie">3615-ma-vie</h1>
<dl>
<dt>Tout ce qui va suivre nâest quâun tissu de mauvaises excuses, me</dt>
<dt>direz-vous, mais jâai tout de mĂȘme quelques circonstances attĂ©nuantes</dt>
<dt>quant Ă lâinactivitĂ© de mon blog (et mon absence de la scĂšne parisienne</dt>
<dd>je nây ai pas fait de talks depuis 6 mois).</dd>
</dl>
<p>Sur un plan personnel dâabord, je suis heureux de vous annoncer quâune
jolie alliance orne dĂ©sormais lâannulaire de ma main gauche :-)</p>
<p>Sur un plan professionnel, bien quâabsent âpubliquementâ, beaucoup de
choses se sont passées : ma premiÚre <a href="http://www.lateral-thoughts.com/formation-neo4j">formation sur
Neo4j</a> a eu lieu, jâai
eu lâoccasion dâintervenir chez plus de clients et certains projets
autour de Neo4j sâesquissent encore (stay tuned!).</p>
<p>Dâailleurs, si vous voulez que je vienne parler de Neo4j dans votre User
Group, nâhĂ©sitez pas Ă me contacter (sur
<a href="https://twitter.com/fbiville">Twitter</a> par exemple).</p>
<h1 id="back-to-business--parlons-de-neo">Back to business : parlons de Neo</h1>
<h2 id="base-de-données-orientée-graphe-">Base de données orientée graphe ?</h2>
<p><a href="http://www.neo4j.org/">Neo4j</a>, vous lâaurez compris, est une base de
donnĂ©es orientĂ©e graphe. Mais quâest-ce quââorientĂ©e grapheâ signifie
exactement ?</p>
<p>Si lâon cite
<a href="http://fr.wikipedia.org/wiki/Base_de_donn%C3%A9es_orient%C3%A9e_graphe">Wikipedia</a>,
une base de données orientée graphe (<em>graph database</em>) est donc une base
de données mettant en oeuvre des noeuds, relations et propriétés pour
représenter et stocker de la donnée.</p>
<p>Cette définition peut vous paraßtre anodine, mais notez bien la présence
de deux verbes (et non pas dâun seul) :Â </p>
<ul>
<li>
<p>représenter</p>
</li>
<li>
<p>stocker</p>
</li>
</ul>
<p>En termes plus techniques, une base de données orientée graphe offre
donc une API (âreprĂ©senterâ) exposant un vocabulaire propre au graphe.
Ses enregistrements sur disque (âstockerâ) doivent eux aussi ĂȘtre
formatĂ©s selon les structures dâun graphe.</p>
<p>Ce deuxiÚme point est fondamental. </p>
<p>Prenons lâexemple dâun concurrent de Neo4j :
<a href="http://thinkaurelius.github.io/titan/">Titan</a>. </p>
<p>DĂšs la page dâaccueil, on peut lire :Â </p>
<blockquote>
<p>Titan is a scalable graph database [âŠâ] </p>
<p>Support for various storage backends:</p>
<ul>
<li>
<p>Apache Cassandra</p>
</li>
<li>
<p>Apache HBase</p>
</li>
<li>
<p>Oracle BerkeleyDB</p>
</li>
<li>
<p>Akiban Persistit</p>
</li>
</ul>
<p>Cela contredit la définition que je vous ai donnée plus haut. </p>
</blockquote>
<p>Si Titan était une base de données graphe, cela impliquerait que
Cassandra, HBase, BerkeleyDB et Persistit le soient. Or, jusquâĂ preuve
du contraire, cela nâest pas le cas :)</p>
<p>Titan propose une <strong>surcouche</strong> dâAPI orientĂ©e graphe, dĂ©lĂ©guant la
persistance Ă des stores distribuĂ©es. Cela nâen fait pas pour autant une
base de données orientée graphe, tout comme <a href="https://giraph.apache.org/">Apache
Giraph</a> nâest âquâuneâ API de calcul
orientée graphe.</p>
<p>âQuelle importance ?â, me direz-vous ?</p>
<p>HĂ© bien, une base de donnĂ©es graphe, bien quâelle offre des nombreux
avantages, est intrinsĂšquement difficile Ă distribuer comme nous allons
le voir au travers de cet article. Câest en regardant les couches les
plus basses dâune base typiquement orientĂ©e graphe comme Neo4j que vous
allez comprendre ce quâĂȘtre une base de donnĂ©es graphe implique en
termes de partis pris.</p>
<h2 id="des-liens-et-des-chaĂźnes">Des liens et des chaĂźnes</h2>
<p>Neo4j, selon le modĂšle du <a href="https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model">Property
Graph</a>,
structure les données par des noeuds liés par des relations. </p>
<ul>
<li>
<p>Chacune de ces entités peut se voir attribuer un ensemble de
propriétés (une clef [String], une valeur [entier, String,
tableau de primitifs]).</p>
</li>
<li>
<p>Chaque relation porte obligatoirement une notion de type (exemple :
une relation âFOLLOWSâ ou âIS_FRIEND_WITHâ).</p>
</li>
<li>
<p>Chaque noeud porte, depuis la version 2.0, une notion optionnelle
(mais fortement recommandĂ©e) appelĂ©e âlabelâ (un noeud a de 0 Ă n
labels).</p>
</li>
</ul>
<p>Ăvidemment, toutes ces informations sont persistĂ©es sur disque.</p>
<p>Un simple <code class="language-plaintext highlighter-rouge">ls /path/to/neo/data/graph.db</code> vous permettra de
constater, outre les fichiers dâindexes Lucene (legacy: rĂ©pertoire
<code class="language-plaintext highlighter-rouge">index</code>, nouveau: répertoire <code class="language-plaintext highlighter-rouge">schema</code>) et les journaux de
transactions, les différents fichiers .db :</p>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.labeltokenstore.db</code></p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.nodestore.db</code></p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db</code></p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.relationshipstore.db</code></p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.schemastore.db</code></p>
</li>
</ul>
<p>Ils reprĂ©sentent tous un âstoreâ dĂ©diĂ© Ă un type de donnĂ©es particulier.
Passons-les en revue individuellement, en commençant par les
nouveautés. </p>
<p>Notez que les informations Ă venir sont sujettes Ă caution : les
<a href="http://neo4j.com/blog/the-neo4j-2-1-0-milestone-1-release-import-and-dense-nodes/">récents
travaux</a>
autour des noeuds denses ont sans doute influencé le format des fichiers
décrits.</p>
<h3 id="labeltokenstore"><code class="language-plaintext highlighter-rouge">LabelTokenStore</code></h3>
<p>On sâen douterait presque, ce(s) fichier(s) contien(nen)t les
enregistrements de labels. Il(s) nâexistai(en)t donc pas avant la sortie
de la 2.0.</p>
<p>Ces enregistrements comprennent :</p>
<ul>
<li>
<p>un ID interne (typĂ© int en Java, donc jusquâĂ 2ÂłÂč - 1 [sauf Java 8
oĂč on peut avoir des int de 0 Ă 232 - 1 mais je diverge]). chacun
de ces IDs est référencé dans le fichier
neostore.labeltokenstore.db.id. </p>
</li>
<li>
<p>et un nom (câest justement la valeur que vous assignez au label :
âPersonneâ pour le label Personne) lui-mĂȘme uniquement identifiĂ©
(neostore.labeltokenstore.db.names.id) et stocké dans
(neostore.labeltokenstore.db.names)</p>
</li>
</ul>
<p>Ainsi le fichier neostore.labeltokenstore.db ne comporte en fait que des
rĂ©fĂ©rences vers les IDs internes et noms, stockĂ©s âĂ cĂŽtĂ©â. Notez que
cette division en fichier <code class="language-plaintext highlighter-rouge">neostore.db.*</code> se retrouve pour tous les
autres stores. </p>
<h3 id="schemastore"><code class="language-plaintext highlighter-rouge">SchemaStore</code></h3>
<p>Avec lâĂ©mergence des labels est apparu la notion de schema. Ne vous
emballez pas : Neo4j nâest pas devenue une base de donnĂ©es normalisĂ©e.
On parle plutĂŽt dâune base de donnĂ©es <em>schema-optional</em>. </p>
<p>Les labels permettent de grouper des noeuds sémantiquement similaires
(cela est donc complÚtement dépendant du domaine métier) mais rien
nâempĂȘche lesdits noeuds dâĂȘtre complĂštement hĂ©tĂ©rogĂšnes. Par exemple,
deux noeuds peuvent partager le label Personne tout en comportant des
propriĂ©tĂ©s diffĂ©rentes, disons, la couleur des cheveux pour lâun, la
pointure pour lâautre.</p>
<p>Maintenant que nous avons des labels Ă disposition, nous pouvons mĂȘme
dĂ©finir des contraintes sur ceux-ci : des contraintes dâunicitĂ© par
exemple. Ces contraintes sont en fait appelĂ©es <em>rules</em> et lâensemble de
celles-ci forment le fameux schema dont je vous parlais. Ce support est
assez récent et la structuration sous-jacente est encore toute simple.
En effet, une rule comprend :</p>
<ul>
<li>
<p>un ID interne (<code class="language-plaintext highlighter-rouge">neostore.schemastore.db.id</code>)</p>
</li>
<li>
<p>sa description Ă proprement parler (<code class="language-plaintext highlighter-rouge">neostore.schemastore.db</code>)</p>
</li>
</ul>
<p>Jusquâici, jâai couvert les additions rĂ©centes de Neo4j. </p>
<p>Bien entendu, Neo nâa pas attendu sa version 2.0 pour ĂȘtre une base de
données orientée graphe à part entiÚre. Regardons ses composants
centraux.</p>
<h3 id="propertystore">PropertyStore</h3>
<p>à quoi servirait une base de données orientée graphe sans propriétés sur
nos noeuds et relations ? Pas grand chose :-)</p>
<p>Ces propriétés (rappel : propriété = clef/valeur) néanmoins ne sont pas
enregistrĂ©es exactement au mĂȘme endroit selon certains critĂšres :</p>
<ul>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.index</code> stocke la partie âclefâ des
propriétés</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.arrays</code>, comme son nom lâindique, est
dédié aux propriétés dont la valeur est un tableau de primitives ou
String</p>
</li>
<li>
<p><code class="language-plaintext highlighter-rouge">neostore.propertystore.db.strings</code> quant Ă lui se charge de
répertorier les propriétés dont la valeur est une chaßne de
caractĂšres</p>
</li>
<li>
<p>les autres propriétés (booléen, entier) sont stockés directement
dans <code class="language-plaintext highlighter-rouge">neostore.propertystore.db</code></p>
</li>
</ul>
<p>Chaque jeu de propriétés est propre à la relation/le noeud le contenant,
les propriétés sont représentées comme des listes simplement chaßnées.</p>
<h3 id="nodestore-et-relationshipstore">NodeStore et RelationshipStore</h3>
<p>Le voilĂ , le nerf de la guerre !</p>
<p>Commençons par les noeuds. Chaque noeud est composĂ© dâun :</p>
<ul>
<li>
<p>ID âinterneâ (<code class="language-plaintext highlighter-rouge">neostore.nodestore.db.id</code>)</p>
</li>
<li>
<p>des références à ses labels (<code class="language-plaintext highlighter-rouge">neostore.nodestore.db.labels{,.id}</code>)</p>
</li>
<li>
<p>une rĂ©fĂ©rence vers sa premiĂšre propriĂ©tĂ© (lâID interne de la
propriété) et le premier noeud parmi tous ceux qui lui sont liés (le
tout dans <code class="language-plaintext highlighter-rouge">neostore.nodestore.db</code>)</p>
</li>
</ul>
<p>Conceptuellement, cela pourrait se représenter ainsi (slide
outrageusement et à de nombreuses reprises emprunté à Neo Technology) : </p>
<p><img src="/assets/img/graph_on_disk.png" alt="graph on disk" /></p>
<p>Tout repose sur la structuration des enregistrements de relations. Cela
est plutĂŽt intuitif : les relations sont lâĂ©pine dorsale du graphe.</p>
<p>Cet élément central se décompose de la façon suivante :</p>
<ul>
<li>
<p>un ID âinterneâ (comme dâhabâ : <code class="language-plaintext highlighter-rouge">neostore.relationshipstore.db.id</code>)</p>
</li>
<li>
<p>son type (<code class="language-plaintext highlighter-rouge">neostore.relationshiptypestore.db.names</code>)</p>
</li>
</ul>
<p>Pour lâinstant, ça nâexplique pas ce qui en fait une base orientĂ©e
graphe. </p>
<p>Pour cela, regardons plutĂŽt le code Java (eh oui, câest ça qui est cool
avec les <a href="https://github.com/neo4j/neo4j">projets open source</a> dans les
langages quâon connaĂźt bien) :Â </p>
<pre><code class="language-{.java}">public class RelationshipRecord extends PrimitiveRecord
{
  private long firstNode;
  private long secondNode;
  private int type;
  private long firstPrevRel = 1;
  private long firstNextRel = Record.NO_NEXT_RELATIONSHIP.intValue();
  private long secondPrevRel = 1;
  private long secondNextRel = Record.NO_NEXT_RELATIONSHIP.intValue();
  // [...]
</code></pre>
<p>Passons sur le formatage digne des codeurs C les plus chevronnés (qui
pour une Pull Request pour remettre les accolades en fin de ligne ? :P).</p>
<p>Ce qui est vraiment intĂ©ressant ici, câest cette notion de <code class="language-plaintext highlighter-rouge">first</code> et
<code class="language-plaintext highlighter-rouge">second</code>. En rĂ©alitĂ©, il sâagit des rĂ©fĂ©rences internes (tout est
référence à ce niveau) aux enregistrements correspondant aux noeuds de
dĂ©part et dâarrivĂ©e. Seulement, la notion de direction nâayant de sens
quâau moment du requĂȘtage et non Ă la crĂ©ation de la relation, on ne
peut pas savoir, Ă ce niveau, qui du <code class="language-plaintext highlighter-rouge">first</code> ou du <code class="language-plaintext highlighter-rouge">second</code> est le noeud
de dĂ©part dâoĂč cette nomenclature.</p>
<p>Ce que vous devez comprendre de ce petit bout de code, câest quâune
relation porte en réalité, outre les informations précédemment
mentionnées :</p>
<ul>
<li>
<p>une rĂ©fĂ©rence vers ses noeuds de dĂ©part et dâarrivĂ©e</p>
</li>
<li>
<p>une référence vers la précédente relation des noeuds de départ /
dâarrivĂ©e</p>
</li>
<li>
<p>une référence vers la relation suivante des noeuds de départ /
dâarrivĂ©e</p>
</li>
</ul>
<p>Une illustration vaut mieux quâun long discours :</p>
<p><img src="/assets/img/graph_on_disk_bis.png" alt="graph on disk bis" /></p>
<p>Il sâagit exactement de ce que jâai tentĂ© dâexpliquer : les flĂšches
rouges symbolisent les liens portés par les enregistrements de
relations. Chacune de ces relations pointe vers les relations
prĂ©cĂ©dentes/suivantes de ses noeuds de dĂ©part et dâarrivĂ©e.</p>
<p>Autrement dit, chaque noeud rĂ©fĂ©rence (flĂšche verte) un Ă©lĂ©ment dâune
liste doublement chaßnée de relations.</p>
<p>Et câest lĂ la nature mĂȘme du graphe !</p>
<p>Câest par cette structure que Neo4j peut se targuer dâĂȘtre une base de
données graphe.</p>
<ul>
<li>
<p>Comment requĂȘter de la donnĂ©e dans un graphe ? Par une traversĂ©e.</p>
</li>
<li>
<p>Comment traverser dans Neo4j ? En trouvant les points de départ les
plus pertinents possible et en naviguant dans listes de
relations/noeuds.</p>
</li>
</ul>
<p>Vous commencez à comprendre pourquoi ce genre de base de données
sâadapte trĂšs bien aux donnĂ©es fortement connectĂ©es ?</p>
<h3 id="quid-des-noeuds-denses-">Quid des noeuds denses ?</h3>
<p>Ahah, je vois que jâai affaire Ă des lecteurs initiĂ©s ;)</p>
<p>Resituons le contexte au travers de deux situations légÚrement
différentes.</p>
<h4 id="situation-n1">Situation n°1</h4>
<p>Un noeud dense est un noeud qui est fortement connecté. De nombreux
exemples se retrouvent dâailleurs dans la vie courante. Par exemple,
Justin Bieber a 52 millions de followers sur Twitter (tiens, je ne
savais pas que la surdité était devenu un phénomÚne de masse).</p>
<p>Rappelez-vous, le noeud Justin Bieber pointe vers sa premiĂšre relation.
Si par manque de chance, vous avez besoin dâaccĂ©der Ă son 52 millioniĂšme
noeud-fan, vous allez devoir traverser, dans le pire des cas,
lâintĂ©gralitĂ© de la liste doublement chaĂźnĂ©e des relations avant de le
retrouver : bref, du O(n)âŠâ vraiment pas terrible.</p>
<p>Ceci dit, ce cas reste relativement rare. Modifions légÚrement
lâexemple.</p>
<h4 id="situation-n2">Situation n°2</h4>
<p>Justin Bieber a certes 52 millions de followers mais il a bien moins de
personnes dans sa famille.</p>
<p>Si par hasard, parmi cette gigantesque quantité de relations, seules les
relations familiales vous intéressent, vous faites face exactement au
mĂȘme problĂšme que dĂ©crit ci-dessus⊠si vous utilisez une version de
Neo4j antérieure à la version 2.1 de Neo4j. </p>
<p>Depuis cette version, les relations sont aussi discriminées par type,
permettant ainsi de ne pas tomber dans cet Ă©cueuil. Un noeud est
dâailleurs considĂ©rĂ© dense Ă partir de 50 relations par dĂ©faut (cf.
âhttp://docs.neo4j.org/chunked/stable/kernel-configuration.html[dense
node threshold]â).</p>
<h4 id="help-je-suis-dans-la-situation-n1">Help! Je suis dans la situation n°1!</h4>
<p>Si par malheur, et aprĂšs exploration de toutes les alternatives
(Ă©chantillonnage statistique etc), vous en concluez que vous ne pouvez
faire autrement : rassurez-vous !</p>
<p>Tout dâabord, les Ă©quipes de Neo continuent de plancher et dâapporter
des améliorations à ce sujet. Nous devrions donc voir quelques
améliorations avec la v2.2.</p>
<p>De plus, une approche simple <a href="https://github.com/maxdemarzi/dense">est déjà codée pour
vous</a> par lâexcellent
<a href="https://twitter.com/maxdemarzi">Max</a> <a href="http://maxdemarzi.com/">de</a>
<a href="https://www.kickstarter.com/projects/1355751798/high-performance-neo4j-video-course">Marzi</a>.</p>
<p>LâidĂ©e de son extension est simple : elle va simplement ventiler les
noeuds par niveau lors de chaque nouvelle insertion et les lire de façon
transparente.</p>
<p>Voici donc un exemple de structure automatiquement créée par son
extension :</p>
<p><img src="/assets/img/dense_nodes.png" alt="dense nodes" /></p>
<p>Tout comme Justin Bieber, Lady Gaga et Madonna ont Ă©galement de nombreux
fans (chaque fan âLIKESâ lâartiste). Un noeud factice va donc se
substituer aux noeuds que lâon aurait directement liĂ© aux artistes et
introduire des couches, par le biais de noeuds intermédiaires regroupant
eux aussi un nombre limitĂ© de fans, reliĂ© alors par une âDENSE_LIKESâ.
Les relations sont maintenant rĂ©parties et lâon pourra paginer nos
requĂȘtes de lecture de cette façon : </p>
<pre><code class="language-{.cypher}">MATCH (fan:Fan)-[:DENSE_LIKES*0..5]->()-[:LIKES]->(loved:Artist {name:
âMadonnaâ})
RETURN fan
</code></pre>
<p>Cette requĂȘte signifie (en lisant le pattern de bas en haut, de droite Ă
gauche) :</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>retourne tous les noeuds au label âArtistâ et au nom âMadonnaâ +
qui sont âLIKĂSâ par un noeud quelconque (appelons-le META) +
et 0 à 5 relations DENSE_LIKE séparent META des noeuds
</code></pre></div></div>
<p>Ătant donnĂ© que la requĂȘte recherche les nombreux fans dâun artiste,
sans aucune ventilation du graphe, nous serions en plein dans la
situation n°1 décrite préalablement. Néanmoins, cette approche simple
couplĂ©e Ă lâusage astucieux des <a href="http://docs.neo4j.org/chunked/milestone/query-match.html#match-variable-length-relationships">variable-length
paths</a>
permet de ne rĂ©cupĂ©rer quâune fraction des fans sans pour autant
traverser toutes les relations dont lâartiste dĂ©pend.</p>
<h2 id="neo4j-et-scalabilité">Neo4j et scalabilité</h2>
<p>Maintenant que le format physique des fichiers est un peu plus clair,
regardons un peu les couches supérieures.</p>
<h3 id="architecture">Architecture</h3>
<p>Les accÚs disques sont bien évidemment limités autant que possible. Deux
niveaux de cache interviennent.</p>
<h4 id="le-file-buffer-cache">Le <em>file buffer cache</em></h4>
<p>Vous vous en doutez, le file buffer cache sert de tampon aux
Ă©critures/lectures des enregistrements physiques (cf. les fichiers
décrits précédemment). Les entrées les moins récemment accédées sont
évincées du buffer
(<a href="http://en.wikipedia.org/wiki/Least_Recently_Used#LRU">LRU</a>). Si
possible, ce buffer est directement mappé au fichier store sous-jacent
(âmemory-mappingâ). Ce comportement dĂ©pend du systĂšme de fichiers et de
lâOS. Quoi quâil en soit, cette couche a pour seul but de rĂ©duire au
maximum les accĂšs disque mais nâintroduit aucune forme dâabstraction sur
les données manipulées.</p>
<h4 id="lobject-cache">Lâ<em>object cache</em></h4>
<p>Lui aussi cache LRU, câest Ă partir de ce moment-lĂ que les donnĂ©es
manipulĂ©es commencent Ă prendre la forme du graphe que vous requĂȘtez par
traversĂ©e ou par Cypher. Notez que lâallocation mĂ©moire Ă ce niveau est
prise sur la heap de la JVM hĂŽte et non plus directement de lâOS hĂŽte
sous-jacent. Câest pourquoi il est souvent prĂ©fĂ©rable de dĂ©ployer Neo4j
de façon isolée, afin que votre application ne vienne pas perturber
(comme par exemple : ) les cycles GC de votre instance Neo et
vise-versa.</p>
<h4 id="et-le-reste">et le reste</h4>
<p>Ă partir de lĂ , les APIs unitaires Java prennent le relais, suivies des
APIs de traversées, Cypher et les APIs REST !</p>
<p><img src="/assets/img/neo4j_archi.png" alt="neo4j archi" /></p>
<h3 id="gestion-de-la-concurrence">Gestion de la concurrence</h3>
<p>Bien que faisant partie de cette (non-)famille quâest NoSQL, Neo4j fait
un peu figure dâexception, en se conformant Ă ACID. En effet, vous
retrouverez avec Neo4j les transactions en 2 phases que vous connaissez
bien. NâĂ©tant pas un spĂ©cialiste des systĂšmes distribuĂ©s, je vous invite
Ă lire la multitude dâarticles existants sur les limites dâACID, les
limites du locking et les alternatives existantes (âlock-free
concurrencyâ, BASE vs ACID) : Google est votre ami. Jâen profite donc
pour passer Ă la partie qui mâintĂ©resse le plus : le <em>sharding</em> :)</p>
<h3 id="sharding-dun-graphe-dynamique"><em>Sharding</em> dâun graphe dynamique</h3>
<p>Expliquons briĂšvement le terme <em>sharding</em>. Le <em>sharding</em> consiste
simplement Ă rĂ©partir ses donnĂ©es entre diffĂ©rentes instances dâun
systÚme de persistence distribué. Par exemple : je peux décider de
stocker toutes les adresses postales américaines sur mes serveurs aux
Ătats-Unis et mes adresses australiennes Ă Sydney. Une instance donnĂ©e
ne contient donc pas lâintĂ©gralitĂ© des donnĂ©es, mais le domaine mĂ©tier
auquel appartient mon application appartient comporte des notions qui se
répartissent naturellement. Eh oui ! Le <em>sharding</em> est une solution
technique, certes, mais hautement dépendante du métier (comme toute
solution technique devrait lâĂȘtre, mais je digresse).</p>
<h4 id="graphe-statique">Graphe statique</h4>
<p>Un graphe statique est plutĂŽt facile Ă <em>sharder</em> (dans la mesure oĂč le
domaine mĂ©tier modĂ©lisĂ© le permet), ses fragmentations sont faciles Ă
dĂ©tecter (on parle de â<em>graph clustering</em>â ou de â<em>community
detection</em>â) : elles ne sont pas amenĂ©es Ă Ă©voluer du tout. <a href="http://en.wikipedia.org/wiki/Strongly_connected_component">Certains
algorithmes</a>
sont mĂȘme relativement faciles Ă implĂ©menter.</p>
<h4 id="graphe-dynamique">Graphe dynamique</h4>
<p>Pour les graphes dynamiques, en revanche, câest une autre paire de
manche. De nombreuses opĂ©rations dâinsertion et suppression
interviennent en permanence et elles impactent nécessairement la
topologie du graphe. Le but du jeu est donc de déterminer un découpage
du graphe en shards de telle sorte, quâĂ tout instant, le nombre de
relations inter-shards soit minimisĂ©. Cela est dâautant plus critique
que les shards sont distants (imaginez la latence réseau induite par une
traversée qui commence par un shard hébergé à Los Angeles pour finir
dans un shard Ă PĂ©kin).</p>
<p><img src="/assets/img/neo4j_shards.png" alt="neo4j shards" /></p>
<p>Câest un <a href="http://alexaverbuch.blogspot.fr/2010/04/me-my-names-alex-im-currently.html">sujet de
recherche</a>
à part entiÚre et Neo Technology travaille depuis plusieurs années sur
un systĂšme shardable. Comprenez bien le terrible dilemne : par son
orientation graphe dÚs les couches physiques, Neo4j est à la fois idéal
pour stocker et requĂȘter des donnĂ©es sous forme de graphe mais Ă©galement
trĂšs difficile Ă sharder !</p>
<h4 id="une-lueur-despoir-">Une lueur dâespoir ?</h4>
<dl>
<dt>Il est pour lâinstant nĂ©cessaire de miser sur du [*scaling</dt>
<dt>vertical*](http://fr.wikipedia.org/wiki/Scalability) : dimensionnez</dt>
<dt>suffisamment vos machines et tout se passera trĂšs bien. Laissez-moi vous</dt>
<dt>rassurer davantage : * jusquâĂ prĂ©sent, une infime minoritĂ© de clients</dt>
<dt>a été confrontée à une volumétrie telle ([capacité nomimale de</dt>
<dt>Neo4j](http://docs.neo4j.org/chunked/stable/capabilities-capacity.html)</dt>
<dd>34 millards de noeuds et de relations) quâune rĂ©partition des donnĂ©es
était nécessaire * il se trouve que certains domaines métiers
permettent naturellement de ségréguer ses données * il existe un début
de solution de répartition !</dd>
</dl>
<h4 id="le-cache-sharding-">Le <em>cache sharding</em> !</h4>
<p>Le titre peut faire peur, mais rassurez-vous, lâidĂ©e est toute simple.
Tout dâabord, cette idĂ©e sâapplique Ă Neo4j en mode <a href="http://docs.neo4j.org/chunked/stable/ha-how.html">High
Availability</a>. En
dâautres termes, cela ne sâapplique quâĂ une instance Neo4j au sein
dâun <em>cluster</em>.</p>
<p>Non seulement vous bĂ©nĂ©ficiez dâune rĂ©plication master/replica, mais
vous pouvez Ă©galement bĂ©nĂ©ficier de <em>sharding</em>. Oui, oui, jâai bien dit
<em>sharding</em>. Malheureusement, pour les raisons évoquées plus haut, il ne
sâagit pas de <em>sharding</em> sur les donnĂ©es Ă proprement parler. Comme le
titre lâĂ©voque, il sâagit de sharding sur le cache.</p>
<p>Comment est-ce possible ? Câest tout simple !</p>
<p>Les caches de Neo4j sont des caches LRU, ils ne conservent que les
entrĂ©es les plus rĂ©centes en leur sein. Sâil existait un moyen de
rĂ©partir les requĂȘtes de façon persistante entre chaque instance de mon
cluster, le tour serait jouĂ©. En effet, la requĂȘte X serait toujours
exĂ©cutĂ©e sur lâinstance A, la requĂȘte Y sur lâinstance B⊠Le rĂ©sultat
X serait de facto dans les caches A, celui dâY dans les caches B. Mes
données seraient donc effectivement réparties par cache. Le problÚme se
rĂ©duit donc Ă : comment rĂ©partir de façon consistante les requĂȘtes Ă
exécuter entre les instances de mon cluster Neo4j ? Je vous le donne en
mille. La solution existe depuis des lustres : un simple load balancer
comme <a href="http://haproxy.1wt.eu/">HAProxy</a> saura faire lâaffaire. On parle
de consistent routing (plus généralement de <a href="http://en.wikipedia.org/wiki/Consistent_hashing"><em>consistent
hashing</em></a>). Il suffit
de configurer sa façon de router selon un des arguments présents dans le
corps ou un quelconque entĂȘte des appels HTTP envoyĂ©s Ă Neo
(rappelez-vous : toute communication distante est définie par une API
REST) et le load balancer se chargera dâexĂ©cuter vos ordres lĂ oĂč vous
lâavez configurĂ© ! Astucieux, non ? Un simple load balancer, un cluster
Neo4j (lâĂ©dition High Availability vous fournit tous les outils quâil
vous fait) et vous ĂȘtes prĂȘts Ă affronter une forte volumĂ©trie de
données !</p>
<h1 id="conclusion">Conclusion</h1>
<p>Une des leçons de NOSQL est que toute solution se restreint à un certain
champ dâapplication et sâapplique sous certaines conditions. JâespĂšre
que cet article vous aura permis de comprendre les faiblesses mais
surtout les forces des bases de données graphe et, qui sait, vous
donnera envie dâapprofondir le sujet.</p>
<p>Je ne prĂ©tends pas Ă lâexhaustivitĂ©, donc si vous souhaitez que je
dĂ©taille dâautres parties (exemple : Cypher), je peux Ă©ventuellement y
consacrer dâautres articles.</p>
<p><shameless_plug>Si cet article vous a plu, je peux aussi venir en
parler dans un User Group de votre ville et je donne des
formations customisables
sur Neo4j et en français ! </shameless_plug></p>Florent Biville3615-ma-vie