starting to write docs, taking decisions on public api

This commit is contained in:
Romain Dorgueil
2016-12-27 13:31:38 +01:00
parent 512e2ab46d
commit 25ad284935
29 changed files with 604 additions and 96 deletions

View File

@ -1,22 +1,20 @@
{% extends "layout.html" %}
{% set title = _('Overview') %}
{% set title = _('Bonobo — Data processing for humans') %}
{% block body %}
<div style="border: 2px solid red; font-weight: bold;">
Migration in progress, things may be broken for now. Please give us some time to finish painting the walls.
<div style="border: 2px solid red; font-weight: bold; margin: 1em; padding: 1em">
Rewrite in progress, things may be broken for now. Please give us some time to finish painting the walls.
</div>
<h1>{{ _('Welcome to Bonobo\'s Documentation') }}</h1>
<div style="text-align: center;">
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo"
<h1 style="text-align: center">
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo" alt="Bonobo"
style=" width: 128px; height: 128px;"/>
</div>
</h1>
<p>
{% trans %}
Bonobo is a line-by-line data-processing toolkit for python 3.5+ emphasizing simplicity and atomicity of
data transformations using a simple directed graph of python callables.
<strong>Bonobo</strong> is a line-by-line data-processing toolkit for python 3.5+ emphasizing simple and
atomic data transformations defined using a directed graph of plain old python callables.
{% endtrans %}
</p>
@ -71,9 +69,8 @@
<table class="contentstable">
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial") }}">{% trans %}First steps with
Bonobo{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}overview of basic features{% endtrans %}</span></p>
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial/basics") }}">{% trans %}First steps{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}quick overview of basic features{% endtrans %}</span></p>
</td>
<td>
{%- if hasdoc('search') %}

View File

@ -12,8 +12,14 @@ import bonobo
# -- General configuration ------------------------------------------------
extensions = [
'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage',
'sphinx.ext.ifconfig', 'sphinx.ext.viewcode'
'sphinx.ext.autodoc',
'sphinx.ext.doctest',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
'sphinx.ext.graphviz',
]
# Add any paths that contain templates here, relative to this directory.
@ -95,6 +101,8 @@ html_additional_pages = {'index': 'index.html'}
html_static_path = ['_static']
html_show_sphinx = False
graphviz_output_format = 'svg'
# -- Options for HTMLHelp output ------------------------------------------
# Output file base name for HTML help builder.

34
docs/install.rst Normal file
View File

@ -0,0 +1,34 @@
Installation
============
.. todo::
better install docs, especially on how to use different fork, etc.
Install with pip
::::::::::::::::
.. code-block:: shell-session
$ pip install bonobo
Install from source
:::::::::::::::::::
.. code-block:: shell-session
$ pip install git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
Editable install
::::::::::::::::
If you plan on making patches to Bonobo, you should install it as an "editable" package.
.. code-block:: shell-session
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
Note: `-e` is the shorthand version of `--editable`.

146
docs/tutorial/basics.rst Normal file
View File

@ -0,0 +1,146 @@
First steps - Basic concepts
============================
To begin with Bonobo, you should first install it:
.. code-block:: shell-session
$ pip install bonobo
See :doc:`install` if you're looking for more options.
Let's write a first data transformation
:::::::::::::::::::::::::::::::::::::::
We'll write a simple component that just uppercase everything. In **Bonobo**, a component is a plain old python
callable, not more, not less.
.. code-block:: python
def uppercase(x: str):
return x.upper()
Ok, this is kind of simple, and you can even use `str.upper` directly instead of writing a wrapper. The type annotations
are not used, but can make your code much more readable (and may be used as validators in the future).
To run this, we need two more things: a generator that feeds data, and something that outputs it.
.. code-block:: python
def generate_data():
yield 'foo'
yield 'bar'
yield 'baz'
def output(x: str):
print(x)
That should do the job. Now, let's chain the three callables together and run them.
.. code-block:: python
from bonobo import run
run(generate_data, uppercase, output)
This is the simplest data transormation possible, and we run it using the `run` helper that hides the underlying object
composition necessary to actually run the callables in parralel. The more flexible, but a bit more verbose to do the
same thing would be:
.. code-block:: python
from bonobo import Graph, ThreadPoolExecutorStrategy
graph = Graph()
graph.add_chain(generate_data, uppercase, output)
executor = ThreadPoolExecutorStrategy()
executor.execute(graph)
Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
if you don't need to tune the graph or the execution strategy.
Definitions
:::::::::::
* Graph
* Component
* Executor
.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
Summary
:::::::
Let's rewrite this using builtin functions and methods, then explain the few concepts available here:
.. code-block:: python
from bonobo import Graph, ThreadPoolExecutorStrategy
# Represent our data processor as a simple directed graph of callables.
graph = Graph(
(x for x in 'foo', 'bar', 'baz'),
str.upper,
print,
)
# Use a thread pool.
executor = ThreadPoolExecutorStrategy()
# Run the thing.
executor.execute(graph)
Or the shorthand version, that you should prefer if you don't need fine tuning:
.. code-block:: python
from bonobo import run
run(
iter(['foo', 'bar', 'baz']),
str.upper,
print,
)
Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, favour the shorter.
Takeaways
:::::::::
① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
branches and cycles.
This is what the graph we defined looks like:
.. graphviz::
digraph {
rankdir = LR;
"iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
}
② Transformations are simple python callables. Whatever can be called can be used as a transformation. Callables can
either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
number of output lines for a given input varies.
③ The graph is then executed using an `ExecutionStrategy`. For now, let's focus only on
:class:`bonobo.ThreadPoolExecutorStrategy`, which use an underlying `concurrent.futures.ThreadPoolExecutor` to
schedule calls in a pool of threads, but basically this strategy is what determines the actual behaviour of execution.
④ Before actually executing the callables, the `ExecutorStrategy` instance will wrap each component in a `context`,
whose responsibility is to hold the state, to keep the components stateless. We'll expand on this later.
Next
::::
You now know all the basic concepts necessary to build (batch-like) data processors.
If you're confident with this part, let's get to a more real world example, using files and nice console output.
.. todo:: link to next page

46
docs/tutorial/basics2.rst Normal file
View File

@ -0,0 +1,46 @@
First steps - Working with files
================================
Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
if you don't expect any gain from parralelization of tasks.
Let's take the following graph as an example:
.. graphviz::
digraph {
rankdir = LR;
"A" -> "B" -> "C";
}
The execution strategy does a bit of under the scene work, wrapping every component in a thread (assuming you're using
the :class:`bonobo.ThreadPoolExecutorStrategy`), which allows to start running `B` as soon as `A` yielded the first line
of data, and `C` as soon as `B` yielded the first line of data, even if `A` or `B` still have data to yield.
The great thing is that you generally don't have to think about it. Just be aware that your components will be run in
parralel, and don't worry too much about blocking components, as they won't block their siblings.
That being said, let's try to write a more real-world like transformation.
Reading a file
::::::::::::::
There are a few component builders available in **Bonobo** that let you read files. You should at least know about the following:
* :class:`bonobo.FileReader` (aliased as :func:`bonobo.from_file`)
* :class:`bonobo.JsonFileReader` (aliased as :func:`bonobo.from_json`)
* :class:`bonobo.CsvFileReader` (aliased as :func:`bonobo.from_csv`)
Reading a file is as simple as using one of those, and for the example, we'll use a text file that was generated using
Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by Mairie de Paris under the Open Database
License (ODbL). You can `explore the original dataset <https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
You'll need the example dataset, available in **Bonobo**'s repository.
.. code-block:: python
from bonobo import FileReader, run
run(
FileReader('examples/datasets/cheap_coffeeshops_in_paris.txt'),
print,
)