[doc] Documentation work for the 0.4 release (not finished).

2017-05-28 19:21:12 +02:00
parent 9370f6504e
commit 0146fb0d55
15 changed files with 282 additions and 94 deletions
--- a/docs/tutorial/tut01.rst
+++ b/docs/tutorial/tut01.rst
@ -1,58 +1,91 @@
-Basic concepts
-==============
+Let's get started!
+==================

-To begin with Bonobo, you need to install it in a working python 3.5+ environment:
+To begin with Bonobo, you need to install it in a working python 3.5+ environment, and you'll also need cookiecutter
+to bootstrap your project.

 .. code-block:: shell-session

-    $ pip install bonobo
+    $ pip install bonobo cookiecutter

 See :doc:`/install` for more options.

-Let's write a first data transformation
-:::::::::::::::::::::::::::::::::::::::

-We'll start with the simplest transformation possible.
+Create an empty project
+:::::::::::::::::::::::

-In **Bonobo**, a transformation is a plain old python callable, not more, not less. Let's write one that takes a string
-and uppercases it.
+Your ETL code will live in ETL projects, which are basically a bunch of files, including python code, that bonobo
+can run.
+
+.. code-block:: shell-session
+
+    bonobo init tutorial
+
+This will create a `tutorial` directory (`content description here <https://www.bonobo-project.org/with/cookiecutter>`_).
+
+To run this project, use:
+
+.. code-block:: shell-session
+
+    bonobo run tutorial
+
+
+Write a first transformation
+::::::::::::::::::::::::::::
+
+Open `tutorial/__main__.py`, and delete all the code here.
+
+A transformation can be whatever python can call, having inputs and outputs. Simplest transformations are functions.
+
+Let's write one:

 .. code-block:: python

-    def uppercase(x: str):
+    def transform(x):
        return x.upper()

-Pretty straightforward.
+Easy.

-You could even use :func:`str.upper` directly instead of writing a wrapper, as a type's method (unbound) will take an
-instance of this type as its first parameter (what you'd call `self` in your method).
+.. note::

-The type annotations written here are not used, but can make your code much more readable, and may very well be used as
-validators in the future.
+    This is about the same as :func:`str.upper`, and in the real world, you'd use it directly.

-Let's write two more transformations: a generator to produce the data to be transformed, and something that outputs it,
-because, yeah, feedback is cool.
+Let's write two more transformations for the "extract" and "load" steps. In this example, we'll generate the data from
+scratch, and we'll use stdout to simulate data-persistence.

 .. code-block:: python

-    def generate_data():
+    def extract():
        yield 'foo'
        yield 'bar'
        yield 'baz'

-    def output(x: str):
+    def load(x):
        print(x)

-Once again, you could have skipped the pain of writing this and simply use an iterable to generate the data and the
-builtin :func:`print` for the output, but we'll stick to writing our own transformations for now.
+Bonobo makes no difference between generators (yielding functions) and regular functions. It will, in all cases, iterate
+on things returned, and a normal function will just be seen as a generator that yields only once.

-Let's chain the three transformations together and run the transformation graph:
+.. note::
+
+    Once again, :func:`print` would be used directly in a real-world transformation.
+
+
+Create a transformation graph
+:::::::::::::::::::::::::::::
+
+Bonobo main roles are two things:
+
+* Execute the transformations in independant threads
+* Pass the outputs of one thread to other(s) thread(s).
+
+To do this, it needs to know what data-flow you want to achieve, and you'll use a :class:`bonobo.Graph` to describe it.

 .. code-block:: python

    import bonobo

-    graph = bonobo.Graph(generate_data, uppercase, output)
+    graph = bonobo.Graph(extract, transform, load)

    if __name__ == '__main__':
        bonobo.run(graph)
@ -64,14 +97,60 @@ Let's chain the three transformations together and run the transformation graph:
        stylesheet = "../_static/graphs.css";

        BEGIN [shape="point"];
-        BEGIN -> "generate_data" -> "uppercase" -> "output";
+        BEGIN -> "extract" -> "transform" -> "load";
    }

-We use the :func:`bonobo.run` helper that hides the underlying object composition necessary to actually run the
-transformations in parallel, because it's simpler.
+.. note::

-Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
-if you don't need to tune the graph or the execution strategy (see below).
+    The `if __name__ == '__main__':` section is not required, unless you want to run it directly using the python
+    interpreter.
+
+
+Execute the job
+:::::::::::::::
+
+Save `tutorial/__main__.py` and execute your transformation:
+
+.. code-block:: shell-session
+
+    bonobo run tutorial
+
+This example is available in :mod:`bonobo.examples.tutorials.tut01e01`, and you can also run it as a module:
+
+.. code-block:: shell-session
+
+    bonobo run -m bonobo.examples.tutorials.tut01e01
+
+
+Rewrite it using builtins
+:::::::::::::::::::::::::
+
+There is a much simpler way to describe an equivalent graph:
+
+.. code-block:: python
+
+    import bonobo
+
+    graph = bonobo.Graph(
+        ['foo', 'bar', 'baz',],
+        str.upper,
+        print,
+    )
+
+    if __name__ == '__main__':
+        bonobo.run(graph)
+
+We use a shortcut notation for the generator, with a list. Bonobo will wrap an iterable as a generator by itself if it
+is added in a graph.
+
+This example is available in :mod:`bonobo.examples.tutorials.tut01e02`, and you can also run it as a module:
+
+.. code-block:: shell-session
+
+    bonobo run -m bonobo.examples.tutorials.tut01e02
+
+You can now jump to the next part (:doc:`tut02`), or read a small summary of concepts and definitions introduced here
+below.

 Takeaways
 :::::::::
@ -79,7 +158,7 @@ Takeaways
 ① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.

 It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
-branches and cycles.
+forks and joins.

 This is what the graph we defined looks like:

@ -97,10 +176,10 @@ either `return` or `yield` data to send it to the next step. Regular functions (
 each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
 number of output lines for a given input varies.

-③ The `Graph` instance, or `transformation graph` is then executed using an `ExecutionStrategy`. You did not use it
-directly in this tutorial, but :func:`bonobo.run` created an instance of :class:`bonobo.ThreadPoolExecutorStrategy`
-under the hood (which is the default strategy). Actual behavior of an execution will depend on the strategy chosen, but
-the default should be fine in most of the basic cases.
+③ The `Graph` instance, or `transformation graph` is executed using an `ExecutionStrategy`. You won't use it directly,
+but :func:`bonobo.run` created an instance of :class:`bonobo.ThreadPoolExecutorStrategy` under the hood (the default
+strategy). Actual behavior of an execution will depend on the strategy chosen, but the default should be fine for most
+cases.

 ④ Before actually executing the `transformations`, the `ExecutorStrategy` instance will wrap each component in an
 `execution context`, whose responsibility is to hold the state of the transformation. It enables to keep the
@ -111,21 +190,22 @@ Concepts and definitions

 * Transformation: a callable that takes input (as call parameters) and returns output(s), either as its return value or
  by yielding values (a.k.a returning a generator).
-* Transformation graph (or Graph): a set of transformations tied together in a :class:`bonobo.Graph` instance, which is a simple
-  directed acyclic graph (also refered as a DAG, sometimes).
-* Node: a transformation within the context of a transformation graph. The node defines what to do with a
-  transformation's output, and especially what other nodes to feed with the output.
+
+* Transformation graph (or Graph): a set of transformations tied together in a :class:`bonobo.Graph` instance, which is
+  a directed acyclic graph (or DAG).
+
+* Node: a graph element, most probably a transformation in a graph.
+
 * Execution strategy (or strategy): a way to run a transformation graph. It's responsibility is mainly to parallelize
  (or not) the transformations, on one or more process and/or computer, and to setup the right queuing mechanism for
  transformations' inputs and outputs.
+
 * Execution context (or context): a wrapper around a node that holds the state for it. If the node needs state, there
-  are tools available in bonobo to feed it to the transformation using additional call parameters, and so every
-  transformation will be atomic.
+  are tools available in bonobo to feed it to the transformation using additional call parameters, keeping
+  transformations stateless.

 Next
 ::::

-You now know all the basic concepts necessary to build (batch-like) data processors.
-
-Time to jump to the second part: :doc:`tut02`
+Time to jump to the second part: :doc:`tut02`.