starting to write docs, taking decisions on public api

2016-12-27 13:31:38 +01:00
parent 512e2ab46d
commit 25ad284935
29 changed files with 604 additions and 96 deletions
--- a/docs/_templates/index.html
+++ b/docs/_templates/index.html
@ -1,22 +1,20 @@
 {% extends "layout.html" %}
-{% set title = _('Overview') %}
+{% set title = _('Bonobo — Data processing for humans') %}
 {% block body %}

-    <div style="border: 2px solid red; font-weight: bold;">
-        Migration in progress, things may be broken for now. Please give us some time to finish painting the walls.
+    <div style="border: 2px solid red; font-weight: bold; margin: 1em; padding: 1em">
+        Rewrite in progress, things may be broken for now. Please give us some time to finish painting the walls.
    </div>

-    <h1>{{ _('Welcome to Bonobo\'s Documentation') }}</h1>
-
-    <div style="text-align: center;">
-        <img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo"
+    <h1 style="text-align: center">
+        <img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo" alt="Bonobo"
             style=" width: 128px; height: 128px;"/>
-    </div>
+    </h1>

    <p>
        {% trans %}
-            Bonobo is a line-by-line data-processing toolkit for python 3.5+ emphasizing simplicity and atomicity of
-            data transformations using a simple directed graph of python callables.
+            <strong>Bonobo</strong> is a line-by-line data-processing toolkit for python 3.5+ emphasizing simple and
+            atomic data transformations defined using a directed graph of plain old python callables.
        {% endtrans %}
    </p>

@ -71,9 +69,8 @@
    <table class="contentstable">
        <tr>
            <td>
-                <p class="biglink"><a class="biglink" href="{{ pathto("tutorial") }}">{% trans %}First steps with
-                    Bonobo{% endtrans %}</a><br/>
-                    <span class="linkdescr">{% trans %}overview of basic features{% endtrans %}</span></p>
+                <p class="biglink"><a class="biglink" href="{{ pathto("tutorial/basics") }}">{% trans %}First steps{% endtrans %}</a><br/>
+                    <span class="linkdescr">{% trans %}quick overview of basic features{% endtrans %}</span></p>
            </td>
            <td>
                {%- if hasdoc('search') %}
--- a/docs/conf.py
+++ b/docs/conf.py
@ -12,8 +12,14 @@ import bonobo
 # -- General configuration ------------------------------------------------

 extensions = [
-    'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage',
-    'sphinx.ext.ifconfig', 'sphinx.ext.viewcode'
+    'sphinx.ext.autodoc',
+    'sphinx.ext.doctest',
+    'sphinx.ext.intersphinx',
+    'sphinx.ext.todo',
+    'sphinx.ext.coverage',
+    'sphinx.ext.ifconfig',
+    'sphinx.ext.viewcode',
+    'sphinx.ext.graphviz',
 ]

 # Add any paths that contain templates here, relative to this directory.
@ -95,6 +101,8 @@ html_additional_pages = {'index': 'index.html'}
 html_static_path = ['_static']
 html_show_sphinx = False

+graphviz_output_format = 'svg'
+
 # -- Options for HTMLHelp output ------------------------------------------

 # Output file base name for HTML help builder.
--- a/docs/install.rst
+++ b/docs/install.rst
@ -0,0 +1,34 @@
+Installation
+============
+
+
+.. todo::
+
+    better install docs, especially on how to use different fork, etc.
+
+Install with pip
+::::::::::::::::
+
+.. code-block:: shell-session
+
+    $ pip install bonobo
+
+Install from source
+:::::::::::::::::::
+
+.. code-block:: shell-session
+
+    $ pip install git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
+
+Editable install
+::::::::::::::::
+
+If you plan on making patches to Bonobo, you should install it as an "editable" package.
+
+
+.. code-block:: shell-session
+
+    $ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
+
+Note: `-e` is the shorthand version of `--editable`.
+
--- a/docs/tutorial/basics.rst
+++ b/docs/tutorial/basics.rst
@ -0,0 +1,146 @@
+First steps - Basic concepts
+============================
+
+To begin with Bonobo, you should first install it:
+
+.. code-block:: shell-session
+
+    $ pip install bonobo
+
+See :doc:`install` if you're looking for more options.
+
+Let's write a first data transformation
+:::::::::::::::::::::::::::::::::::::::
+
+We'll write a simple component that just uppercase everything. In **Bonobo**, a component is a plain old python
+callable, not more, not less.
+
+.. code-block:: python
+
+    def uppercase(x: str):
+        return x.upper()
+
+Ok, this is kind of simple, and you can even use `str.upper` directly instead of writing a wrapper. The type annotations
+are not used, but can make your code much more readable (and may be used as validators in the future).
+
+To run this, we need two more things: a generator that feeds data, and something that outputs it.
+
+.. code-block:: python
+
+    def generate_data():
+        yield 'foo'
+        yield 'bar'
+        yield 'baz'
+
+    def output(x: str):
+        print(x)
+
+That should do the job. Now, let's chain the three callables together and run them.
+
+.. code-block:: python
+
+    from bonobo import run
+
+    run(generate_data, uppercase, output)
+
+This is the simplest data transormation possible, and we run it using the `run` helper that hides the underlying object
+composition necessary to actually run the callables in parralel. The more flexible, but a bit more verbose to do the
+same thing would be:
+
+.. code-block:: python
+
+    from bonobo import Graph, ThreadPoolExecutorStrategy
+
+    graph = Graph()
+    graph.add_chain(generate_data, uppercase, output)
+
+    executor = ThreadPoolExecutorStrategy()
+    executor.execute(graph)
+
+Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
+if you don't need to tune the graph or the execution strategy.
+
+Definitions
+:::::::::::
+
+* Graph
+* Component
+* Executor
+
+.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
+
+Summary
+:::::::
+
+Let's rewrite this using builtin functions and methods, then explain the few concepts available here:
+
+.. code-block:: python
+
+    from bonobo import Graph, ThreadPoolExecutorStrategy
+
+    # Represent our data processor as a simple directed graph of callables.
+    graph = Graph(
+        (x for x in 'foo', 'bar', 'baz'),
+        str.upper,
+        print,
+    )
+
+    # Use a thread pool.
+    executor = ThreadPoolExecutorStrategy()
+
+    # Run the thing.
+    executor.execute(graph)
+
+Or the shorthand version, that you should prefer if you don't need fine tuning:
+
+.. code-block:: python
+
+    from bonobo import run
+
+    run(
+        iter(['foo', 'bar', 'baz']),
+        str.upper,
+        print,
+    )
+
+Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, favour the shorter.
+
+Takeaways
+:::::::::
+
+① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
+
+It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
+branches and cycles.
+
+This is what the graph we defined looks like:
+
+.. graphviz::
+
+    digraph {
+        rankdir = LR;
+        "iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
+    }
+
+
+② Transformations are simple python callables. Whatever can be called can be used as a transformation. Callables can
+either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
+each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
+number of output lines for a given input varies.
+
+③ The graph is then executed using an `ExecutionStrategy`. For now, let's focus only on
+:class:`bonobo.ThreadPoolExecutorStrategy`, which use an underlying `concurrent.futures.ThreadPoolExecutor` to
+schedule calls in a pool of threads, but basically this strategy is what determines the actual behaviour of execution.
+
+④ Before actually executing the callables, the `ExecutorStrategy` instance will wrap each component in a `context`,
+whose responsibility is to hold the state, to keep the components stateless. We'll expand on this later.
+
+
+Next
+::::
+
+You now know all the basic concepts necessary to build (batch-like) data processors.
+
+If you're confident with this part, let's get to a more real world example, using files and nice console output.
+
+.. todo:: link to next page
--- a/docs/tutorial/basics2.rst
+++ b/docs/tutorial/basics2.rst
@ -0,0 +1,46 @@
+First steps - Working with files
+================================
+
+Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
+if you don't expect any gain from parralelization of tasks.
+
+Let's take the following graph as an example:
+
+.. graphviz::
+
+    digraph {
+        rankdir = LR;
+        "A" -> "B" -> "C";
+    }
+
+The execution strategy does a bit of under the scene work, wrapping every component in a thread (assuming you're using
+the :class:`bonobo.ThreadPoolExecutorStrategy`), which allows to start running `B` as soon as `A` yielded the first line
+of data, and `C` as soon as `B` yielded the first line of data, even if `A` or `B` still have data to yield.
+
+The great thing is that you generally don't have to think about it. Just be aware that your components will be run in
+parralel, and don't worry too much about blocking components, as they won't block their siblings.
+
+That being said, let's try to write a more real-world like transformation.
+
+Reading a file
+::::::::::::::
+
+There are a few component builders available in **Bonobo** that let you read files. You should at least know about the following:
+
+* :class:`bonobo.FileReader` (aliased as :func:`bonobo.from_file`)
+* :class:`bonobo.JsonFileReader` (aliased as :func:`bonobo.from_json`)
+* :class:`bonobo.CsvFileReader` (aliased as :func:`bonobo.from_csv`)
+
+Reading a file is as simple as using one of those, and for the example, we'll use a text file that was generated using
+Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by Mairie de Paris under the Open Database
+License (ODbL). You can `explore the original dataset <https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
+You'll need the example dataset, available in **Bonobo**'s repository.
+
+.. code-block:: python
+
+    from bonobo import FileReader, run
+
+    run(
+        FileReader('examples/datasets/cheap_coffeeshops_in_paris.txt'),
+        print,
+    )