Minor fixes and update documentation. Preparing the upcoming 0.2 release.

2017-01-20 20:45:16 +01:00
parent e57ec4a4b3
commit 9dab39a474
67 changed files with 845 additions and 714 deletions
--- a/docs/tutorial/basics.rst
+++ b/docs/tutorial/basics.rst
@ -1,161 +0,0 @@
-Basic concepts
-==============
-
-To begin with Bonobo, you need to install it in a working python 3.5+ environment:
-
-.. code-block:: shell-session
-
-    $ pip install bonobo
-
-See :doc:`/install` for more options.
-
-Let's write a first data transformation
-:::::::::::::::::::::::::::::::::::::::
-
-We'll start with the most simple components we can.
-
-In **Bonobo**, a component is a plain old python callable, not more, not less. Let's write one that takes a string and
-uppercase it.
-
-.. code-block:: python
-
-    def uppercase(x: str):
-        return x.upper()
-
-Pretty straightforward.
-
-You could even use :func:`str.upper` directly instead of writing a wrapper, as a type's method (unbound) will take an
-instance of this type as its first parameter (what you'd call `self` in your method).
-
-The type annotations written here are not used, but can make your code much more readable, and may very well be used as
-validators in the future.
-
-Let's write two more components: a generator to produce the data to be transformed, and something that outputs it,
-because, yeah, feedback is cool.
-
-.. code-block:: python
-
-    def generate_data():
-        yield 'foo'
-        yield 'bar'
-        yield 'baz'
-
-    def output(x: str):
-        print(x)
-
-Once again, you could have skipped the pain of writing this and simply use an iterable to generate the data and the
-builtin :func:`print` for the output, but we'll stick to writing our own components for now.
-
-Let's chain the three components together and run the transformation:
-
-.. code-block:: python
-
-    from bonobo import run
-
-    run(generate_data, uppercase, output)
-
-.. graphviz::
-
-    digraph {
-        rankdir = LR;
-        stylesheet = "../_static/graphs.css";
-
-        BEGIN [shape="point"];
-        BEGIN -> "generate_data" -> "uppercase" -> "output";
-    }
-
-We use the :func:`bonobo.run` helper that hides the underlying object composition necessary to actually run the
-components in parralel, because it's simpler.
-
-Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
-if you don't need to tune the graph or the execution strategy (see below).
-
-Diving in
-:::::::::
-
-Let's rewrite it using the builtin functions :func:`str.upper` and :func:`print` instead of our own wrappers, and expand
-the :func:`bonobo.run()` helper so you see what's inside...
-
-.. code-block:: python
-
-    from bonobo import Graph, ThreadPoolExecutorStrategy
-
-    # Represent our data processor as a simple directed graph of callables.
-    graph = Graph()
-    graph.add_chain(
-        ('foo', 'bar', 'baz'),
-        str.upper,
-        print,
-    )
-
-    # Use a thread pool.
-    executor = ThreadPoolExecutorStrategy()
-
-    # Run the thing.
-    executor.execute(graph)
-
-We also switched our generator for a tuple, **Bonobo** will wrap it as a generator itself if it's not callable but
-iterable.
-
-The shorthand version with builtins would look like this:
-
-.. code-block:: python
-
-    from bonobo import run
-
-    run(
-        ('foo', 'bar', 'baz'),
-        str.upper,
-        print,
-    )
-
-Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, prefer the shorter version.
-
-Takeaways
-:::::::::
-
-① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
-
-It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
-branches and cycles.
-
-This is what the graph we defined looks like:
-
-.. graphviz::
-
-    digraph {
-        rankdir = LR;
-        "iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
-    }
-
-
-② `Components` are simple python callables. Whatever can be called can be used as a `component`. Callables can
-either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
-each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
-number of output lines for a given input varies.
-
-③ The `graph` is then executed using an `ExecutionStrategy`. In this tutorial, we'll only use
-:class:`bonobo.ThreadPoolExecutorStrategy`, which use an underlying `concurrent.futures.ThreadPoolExecutor` to
-schedule calls in a pool of threads, but basically this strategy is what determines the actual behaviour of execution.
-
-④ Before actually executing the `components`, the `ExecutorStrategy` instance will wrap each component in a `context`,
-whose responsibility is to hold the state, to keep the `components` stateless. We'll expand on this later.
-
-Concepts and definitions
-::::::::::::::::::::::::
-
-* Component
-* Graph
-* Executor
-
-.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
-
-
-Next
-::::
-
-You now know all the basic concepts necessary to build (batch-like) data processors.
-
-If you're confident with this part, let's get to a more real world example, using files and nice console output:
-:doc:`basics2`
-
--- a/docs/tutorial/basics2.rst
+++ b/docs/tutorial/basics2.rst
@ -1,46 +0,0 @@
-Working with files
-==================
-
-Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
-if you don't expect any gain from parralelization of tasks.
-
-Let's take the following graph as an example:
-
-.. graphviz::
-
-    digraph {
-        rankdir = LR;
-        "A" -> "B" -> "C";
-    }
-
-The execution strategy does a bit of under the scene work, wrapping every component in a thread (assuming you're using
-the :class:`bonobo.ThreadPoolExecutorStrategy`), which allows to start running `B` as soon as `A` yielded the first line
-of data, and `C` as soon as `B` yielded the first line of data, even if `A` or `B` still have data to yield.
-
-The great thing is that you generally don't have to think about it. Just be aware that your components will be run in
-parralel, and don't worry too much about blocking components, as they won't block their siblings.
-
-That being said, let's try to write a more real-world like transformation.
-
-Reading a file
-::::::::::::::
-
-There are a few component builders available in **Bonobo** that let you read files. You should at least know about the following:
-
-* :class:`bonobo.FileReader` (aliased as :func:`bonobo.from_file`)
-* :class:`bonobo.JsonFileReader` (aliased as :func:`bonobo.from_json`)
-* :class:`bonobo.CsvFileReader` (aliased as :func:`bonobo.from_csv`)
-
-Reading a file is as simple as using one of those, and for the example, we'll use a text file that was generated using
-Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by Mairie de Paris under the Open Database
-License (ODbL). You can `explore the original dataset <https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
-You'll need the example dataset, available in **Bonobo**'s repository.
-
-.. code-block:: python
-
-    from bonobo import FileReader, run
-
-    run(
-        FileReader('examples/datasets/cheap_coffeeshops_in_paris.txt'),
-        print,
-    )
--- a/docs/tutorial/index.rst
+++ b/docs/tutorial/index.rst
@ -3,12 +3,38 @@ First steps

 We tried hard to make **Bonobo** simple. We use simple python, and we believe it should be simple to learn.

+Tutorial
+::::::::
+
 We strongly advice that even if you're an advanced python developper, you go through the whole tutorial for two
 reasons: that should be sufficient to do anything possible with **Bonobo** and that's a good moment to learn the few
 concepts you'll see everywhere in the software.

+If you're not familiar with python, you should first read :doc:`./python`.
+
 .. toctree::
   :maxdepth: 2

-   basics
-   basics2
+   tut01
+   tut02
+
+Where to go next?
+:::::::::::::::::
+
+When you're done with the tutorial, you may be interested in the following next steps:
+
+Read the :doc:`../reference/examples`
+
+Read about best development practices
+-------------------------------------
+
+* :doc:`../guide/index`
+* :doc:`../guide/purity`
+
+Read about integrating external tools with bonobo
+-------------------------------------------------
+
+* :doc:`../guide/ext/docker`: run transformation graphs in isolated containers.
+* :doc:`../guide/ext/jupyter`: run transformations within jupyter notebooks.
+* :doc:`../guide/ext/selenium`: run
+* :doc:`../guide/ext/sqlalchemy`: everything you need to interract with SQL databases.
--- a/docs/tutorial/python.rst
+++ b/docs/tutorial/python.rst
@ -0,0 +1,16 @@
+Just enough Python for Bonobo
+=============================
+
+This guide is intended to help programmers or enthusiasts to grasp the python basics necessary to use Bonobo. It should
+definately not be considered as a general python introduction, neither a deep dive into details.
+
+.. toctree::
+    :maxdepth: 2
+
+    python01
+    python02
+    python03
+    python04
+    python05
+
+
--- a/docs/tutorial/tut01.rst
+++ b/docs/tutorial/tut01.rst
@ -0,0 +1,132 @@
+Basic concepts
+==============
+
+To begin with Bonobo, you need to install it in a working python 3.5+ environment:
+
+.. code-block:: shell-session
+
+    $ pip install bonobo
+
+See :doc:`/install` for more options.
+
+Let's write a first data transformation
+:::::::::::::::::::::::::::::::::::::::
+
+We'll start with the simplest transformation possible.
+
+In **Bonobo**, a transformation is a plain old python callable, not more, not less. Let's write one that takes a string
+and uppercase it.
+
+.. code-block:: python
+
+    def uppercase(x: str):
+        return x.upper()
+
+Pretty straightforward.
+
+You could even use :func:`str.upper` directly instead of writing a wrapper, as a type's method (unbound) will take an
+instance of this type as its first parameter (what you'd call `self` in your method).
+
+The type annotations written here are not used, but can make your code much more readable, and may very well be used as
+validators in the future.
+
+Let's write two more transformations: a generator to produce the data to be transformed, and something that outputs it,
+because, yeah, feedback is cool.
+
+.. code-block:: python
+
+    def generate_data():
+        yield 'foo'
+        yield 'bar'
+        yield 'baz'
+
+    def output(x: str):
+        print(x)
+
+Once again, you could have skipped the pain of writing this and simply use an iterable to generate the data and the
+builtin :func:`print` for the output, but we'll stick to writing our own transformations for now.
+
+Let's chain the three transformations together and run the transformation graph:
+
+.. code-block:: python
+
+    import bonobo
+
+    graph = bonobo.Graph(generate_data, uppercase, output)
+
+    if __name__ == '__main__':
+        bonobo.run(graph)
+
+.. graphviz::
+
+    digraph {
+        rankdir = LR;
+        stylesheet = "../_static/graphs.css";
+
+        BEGIN [shape="point"];
+        BEGIN -> "generate_data" -> "uppercase" -> "output";
+    }
+
+We use the :func:`bonobo.run` helper that hides the underlying object composition necessary to actually run the
+transformations in parralel, because it's simpler.
+
+Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
+if you don't need to tune the graph or the execution strategy (see below).
+
+Takeaways
+:::::::::
+
+① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
+
+It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
+branches and cycles.
+
+This is what the graph we defined looks like:
+
+.. graphviz::
+
+    digraph {
+        rankdir = LR;
+        BEGIN [shape="point"];
+        BEGIN -> "iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
+    }
+
+
+② `Transformations` are simple python callables. Whatever can be called can be used as a `transformation`. Callables can
+either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
+each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
+number of output lines for a given input varies.
+
+③ The `Graph` instance, or `transformation graph` is then executed using an `ExecutionStrategy`. You did not use it
+directly in this tutorial, but :func:`bonobo.run` created an instance of :class:`bonobo.ThreadPoolExecutorStrategy`
+under the hood (which is the default strategy). Actual behavior of an execution will depend on the strategy chosen, but
+the default should be fine in most of the basic cases.
+
+④ Before actually executing the `transformations`, the `ExecutorStrategy` instance will wrap each component in an
+`execution context`, whose responsibility is to hold the state of the transformation. It enables to keep the
+`transformations` stateless, while allowing to add an external state if required. We'll expand on this later.
+
+Concepts and definitions
+::::::::::::::::::::::::
+
+* Transformation: a callable that takes input (as call parameters) and returns output(s), either as its return value or
+  by yielding values (a.k.a returning a generator).
+* Transformation graph (or Graph): a set of transformations tied together in a :class:`bonobo.Graph` instance, which is a simple
+  directed acyclic graph (also refered as a DAG, sometimes).
+* Node: a transformation within the context of a transformation graph. The node defines what to do whith a
+  transformation's output, and especially what other node to feed with the output.
+* Execution strategy (or strategy): a way to run a transformation graph. It's responsibility is mainly to parralelize
+  (or not) the transformations, on one or more process and/or computer, and to setup the right queuing mechanism for
+  transformations' inputs and outputs.
+* Execution context (or context): a wrapper around a node that holds the state for it. If the node need the state, there
+  are tools available in bonobo to feed it to the transformation using additional call parameters, and so every
+  transformation will be atomic.
+
+Next
+::::
+
+You now know all the basic concepts necessary to build (batch-like) data processors.
+
+If you're confident with this part, let's get to a more real world example, using files and nice console output:
+:doc:`basics2`
+
--- a/docs/tutorial/tut02.rst
+++ b/docs/tutorial/tut02.rst
@ -0,0 +1,63 @@
+Working with files
+==================
+
+Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
+if you don't expect any gain from parralelization/distribution of tasks.
+
+Let's take the following graph as an example:
+
+.. graphviz::
+
+    digraph {
+        rankdir = LR;
+        BEGIN [shape="point"];
+        BEGIN -> "A" -> "B" -> "C";
+    }
+
+The execution strategy does a bit of under the scene work, wrapping every component in a thread (assuming you're using
+the :class:`bonobo.ThreadPoolExecutorStrategy`), which allows to start running `B` as soon as `A` yielded the first line
+of data, and `C` as soon as `B` yielded the first line of data, even if `A` or `B` still have data to yield.
+
+The great thing is that you generally don't have to think about it. Just be aware that your components will be run in
+parralel (with the default strategy), and don't worry too much about blocking components, as they won't block their
+siblings when run in bonobo.
+
+That being said, let's try to write a more real-world like transformation.
+
+Reading a file
+::::::::::::::
+
+There are a few component builders available in **Bonobo** that let you read files. You should at least know about the
+following:
+
+* :class:`bonobo.io.FileReader`
+* :class:`bonobo.io.JsonReader`
+* :class:`bonobo.io.CsvReader`
+
+Reading a file is as simple as using one of those, and for the example, we'll use a text file that was generated using
+Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by Mairie de Paris under the Open Database
+License (ODbL). You can `explore the original dataset <https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
+You'll need the example dataset, available in **Bonobo**'s repository.
+
+.. literalinclude:: ../../examples/tut02_01_read.py
+    :language: python
+
+Until then, we ran the file directly using our python interpreter, but there is other options, one of them being
+`bonobo run`. This command allows to run a graph defined by a python file, and is replacing the :func:`bonobo.run`
+helper. It's the exact reason why we call :func:`bonobo.run` in the `if __name__ == '__main__'` block, to only
+instanciate it if it is run directly.
+
+Using bonobo command line has a few advantages. It will look for one and only one :class:`bonobo.Graph` instance defined
+in the file given as argument, configure an execution strategy, eventually plugins, and execute it. It has the benefit
+of allowing to tune the "artifacts" surrounding the transformation graph on command line (verbosity, plugins ...), and
+it will also ease the transition to run transformation graphs in containers, as the syntax will be the same. Of course,
+it is not required, and the containerization capabilities are provided by an optional and separate python package.
+
+.. code-block:: shell-session
+
+    $ bonobo run examples/tut02_01_read.py
+
+
+
+
+