starting to write docs, taking decisions on public api
This commit is contained in:
23
docs/_templates/index.html
vendored
23
docs/_templates/index.html
vendored
@ -1,22 +1,20 @@
|
||||
{% extends "layout.html" %}
|
||||
{% set title = _('Overview') %}
|
||||
{% set title = _('Bonobo — Data processing for humans') %}
|
||||
{% block body %}
|
||||
|
||||
<div style="border: 2px solid red; font-weight: bold;">
|
||||
Migration in progress, things may be broken for now. Please give us some time to finish painting the walls.
|
||||
<div style="border: 2px solid red; font-weight: bold; margin: 1em; padding: 1em">
|
||||
Rewrite in progress, things may be broken for now. Please give us some time to finish painting the walls.
|
||||
</div>
|
||||
|
||||
<h1>{{ _('Welcome to Bonobo\'s Documentation') }}</h1>
|
||||
|
||||
<div style="text-align: center;">
|
||||
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo"
|
||||
<h1 style="text-align: center">
|
||||
<img class="logo" src="{{ pathto('_static/bonobo.png', 1) }}" title="Bonobo" alt="Bonobo"
|
||||
style=" width: 128px; height: 128px;"/>
|
||||
</div>
|
||||
</h1>
|
||||
|
||||
<p>
|
||||
{% trans %}
|
||||
Bonobo is a line-by-line data-processing toolkit for python 3.5+ emphasizing simplicity and atomicity of
|
||||
data transformations using a simple directed graph of python callables.
|
||||
<strong>Bonobo</strong> is a line-by-line data-processing toolkit for python 3.5+ emphasizing simple and
|
||||
atomic data transformations defined using a directed graph of plain old python callables.
|
||||
{% endtrans %}
|
||||
</p>
|
||||
|
||||
@ -71,9 +69,8 @@
|
||||
<table class="contentstable">
|
||||
<tr>
|
||||
<td>
|
||||
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial") }}">{% trans %}First steps with
|
||||
Bonobo{% endtrans %}</a><br/>
|
||||
<span class="linkdescr">{% trans %}overview of basic features{% endtrans %}</span></p>
|
||||
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial/basics") }}">{% trans %}First steps{% endtrans %}</a><br/>
|
||||
<span class="linkdescr">{% trans %}quick overview of basic features{% endtrans %}</span></p>
|
||||
</td>
|
||||
<td>
|
||||
{%- if hasdoc('search') %}
|
||||
|
||||
12
docs/conf.py
12
docs/conf.py
@ -12,8 +12,14 @@ import bonobo
|
||||
# -- General configuration ------------------------------------------------
|
||||
|
||||
extensions = [
|
||||
'sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.intersphinx', 'sphinx.ext.todo', 'sphinx.ext.coverage',
|
||||
'sphinx.ext.ifconfig', 'sphinx.ext.viewcode'
|
||||
'sphinx.ext.autodoc',
|
||||
'sphinx.ext.doctest',
|
||||
'sphinx.ext.intersphinx',
|
||||
'sphinx.ext.todo',
|
||||
'sphinx.ext.coverage',
|
||||
'sphinx.ext.ifconfig',
|
||||
'sphinx.ext.viewcode',
|
||||
'sphinx.ext.graphviz',
|
||||
]
|
||||
|
||||
# Add any paths that contain templates here, relative to this directory.
|
||||
@ -95,6 +101,8 @@ html_additional_pages = {'index': 'index.html'}
|
||||
html_static_path = ['_static']
|
||||
html_show_sphinx = False
|
||||
|
||||
graphviz_output_format = 'svg'
|
||||
|
||||
# -- Options for HTMLHelp output ------------------------------------------
|
||||
|
||||
# Output file base name for HTML help builder.
|
||||
|
||||
34
docs/install.rst
Normal file
34
docs/install.rst
Normal file
@ -0,0 +1,34 @@
|
||||
Installation
|
||||
============
|
||||
|
||||
|
||||
.. todo::
|
||||
|
||||
better install docs, especially on how to use different fork, etc.
|
||||
|
||||
Install with pip
|
||||
::::::::::::::::
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ pip install bonobo
|
||||
|
||||
Install from source
|
||||
:::::::::::::::::::
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ pip install git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
|
||||
|
||||
Editable install
|
||||
::::::::::::::::
|
||||
|
||||
If you plan on making patches to Bonobo, you should install it as an "editable" package.
|
||||
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
|
||||
|
||||
Note: `-e` is the shorthand version of `--editable`.
|
||||
|
||||
146
docs/tutorial/basics.rst
Normal file
146
docs/tutorial/basics.rst
Normal file
@ -0,0 +1,146 @@
|
||||
First steps - Basic concepts
|
||||
============================
|
||||
|
||||
To begin with Bonobo, you should first install it:
|
||||
|
||||
.. code-block:: shell-session
|
||||
|
||||
$ pip install bonobo
|
||||
|
||||
See :doc:`install` if you're looking for more options.
|
||||
|
||||
Let's write a first data transformation
|
||||
:::::::::::::::::::::::::::::::::::::::
|
||||
|
||||
We'll write a simple component that just uppercase everything. In **Bonobo**, a component is a plain old python
|
||||
callable, not more, not less.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def uppercase(x: str):
|
||||
return x.upper()
|
||||
|
||||
Ok, this is kind of simple, and you can even use `str.upper` directly instead of writing a wrapper. The type annotations
|
||||
are not used, but can make your code much more readable (and may be used as validators in the future).
|
||||
|
||||
To run this, we need two more things: a generator that feeds data, and something that outputs it.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
def generate_data():
|
||||
yield 'foo'
|
||||
yield 'bar'
|
||||
yield 'baz'
|
||||
|
||||
def output(x: str):
|
||||
print(x)
|
||||
|
||||
That should do the job. Now, let's chain the three callables together and run them.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo import run
|
||||
|
||||
run(generate_data, uppercase, output)
|
||||
|
||||
This is the simplest data transormation possible, and we run it using the `run` helper that hides the underlying object
|
||||
composition necessary to actually run the callables in parralel. The more flexible, but a bit more verbose to do the
|
||||
same thing would be:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo import Graph, ThreadPoolExecutorStrategy
|
||||
|
||||
graph = Graph()
|
||||
graph.add_chain(generate_data, uppercase, output)
|
||||
|
||||
executor = ThreadPoolExecutorStrategy()
|
||||
executor.execute(graph)
|
||||
|
||||
Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
|
||||
if you don't need to tune the graph or the execution strategy.
|
||||
|
||||
Definitions
|
||||
:::::::::::
|
||||
|
||||
* Graph
|
||||
* Component
|
||||
* Executor
|
||||
|
||||
.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
|
||||
|
||||
Summary
|
||||
:::::::
|
||||
|
||||
Let's rewrite this using builtin functions and methods, then explain the few concepts available here:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo import Graph, ThreadPoolExecutorStrategy
|
||||
|
||||
# Represent our data processor as a simple directed graph of callables.
|
||||
graph = Graph(
|
||||
(x for x in 'foo', 'bar', 'baz'),
|
||||
str.upper,
|
||||
print,
|
||||
)
|
||||
|
||||
# Use a thread pool.
|
||||
executor = ThreadPoolExecutorStrategy()
|
||||
|
||||
# Run the thing.
|
||||
executor.execute(graph)
|
||||
|
||||
Or the shorthand version, that you should prefer if you don't need fine tuning:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo import run
|
||||
|
||||
run(
|
||||
iter(['foo', 'bar', 'baz']),
|
||||
str.upper,
|
||||
print,
|
||||
)
|
||||
|
||||
Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, favour the shorter.
|
||||
|
||||
Takeaways
|
||||
:::::::::
|
||||
|
||||
① The :class:`bonobo.Graph` class is used to represent a data-processing pipeline.
|
||||
|
||||
It can represent simple list-like linear graphs, like here, but it can also represent much more complex graphs, with
|
||||
branches and cycles.
|
||||
|
||||
This is what the graph we defined looks like:
|
||||
|
||||
.. graphviz::
|
||||
|
||||
digraph {
|
||||
rankdir = LR;
|
||||
"iter(['foo', 'bar', 'baz'])" -> "str.upper" -> "print";
|
||||
}
|
||||
|
||||
|
||||
② Transformations are simple python callables. Whatever can be called can be used as a transformation. Callables can
|
||||
either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
|
||||
each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
|
||||
number of output lines for a given input varies.
|
||||
|
||||
③ The graph is then executed using an `ExecutionStrategy`. For now, let's focus only on
|
||||
:class:`bonobo.ThreadPoolExecutorStrategy`, which use an underlying `concurrent.futures.ThreadPoolExecutor` to
|
||||
schedule calls in a pool of threads, but basically this strategy is what determines the actual behaviour of execution.
|
||||
|
||||
④ Before actually executing the callables, the `ExecutorStrategy` instance will wrap each component in a `context`,
|
||||
whose responsibility is to hold the state, to keep the components stateless. We'll expand on this later.
|
||||
|
||||
|
||||
Next
|
||||
::::
|
||||
|
||||
You now know all the basic concepts necessary to build (batch-like) data processors.
|
||||
|
||||
If you're confident with this part, let's get to a more real world example, using files and nice console output.
|
||||
|
||||
.. todo:: link to next page
|
||||
46
docs/tutorial/basics2.rst
Normal file
46
docs/tutorial/basics2.rst
Normal file
@ -0,0 +1,46 @@
|
||||
First steps - Working with files
|
||||
================================
|
||||
|
||||
Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
|
||||
if you don't expect any gain from parralelization of tasks.
|
||||
|
||||
Let's take the following graph as an example:
|
||||
|
||||
.. graphviz::
|
||||
|
||||
digraph {
|
||||
rankdir = LR;
|
||||
"A" -> "B" -> "C";
|
||||
}
|
||||
|
||||
The execution strategy does a bit of under the scene work, wrapping every component in a thread (assuming you're using
|
||||
the :class:`bonobo.ThreadPoolExecutorStrategy`), which allows to start running `B` as soon as `A` yielded the first line
|
||||
of data, and `C` as soon as `B` yielded the first line of data, even if `A` or `B` still have data to yield.
|
||||
|
||||
The great thing is that you generally don't have to think about it. Just be aware that your components will be run in
|
||||
parralel, and don't worry too much about blocking components, as they won't block their siblings.
|
||||
|
||||
That being said, let's try to write a more real-world like transformation.
|
||||
|
||||
Reading a file
|
||||
::::::::::::::
|
||||
|
||||
There are a few component builders available in **Bonobo** that let you read files. You should at least know about the following:
|
||||
|
||||
* :class:`bonobo.FileReader` (aliased as :func:`bonobo.from_file`)
|
||||
* :class:`bonobo.JsonFileReader` (aliased as :func:`bonobo.from_json`)
|
||||
* :class:`bonobo.CsvFileReader` (aliased as :func:`bonobo.from_csv`)
|
||||
|
||||
Reading a file is as simple as using one of those, and for the example, we'll use a text file that was generated using
|
||||
Bonobo from the "liste-des-cafes-a-un-euro" dataset made available by Mairie de Paris under the Open Database
|
||||
License (ODbL). You can `explore the original dataset <https://opendata.paris.fr/explore/dataset/liste-des-cafes-a-un-euro/information/>`_.
|
||||
You'll need the example dataset, available in **Bonobo**'s repository.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
from bonobo import FileReader, run
|
||||
|
||||
run(
|
||||
FileReader('examples/datasets/cheap_coffeeshops_in_paris.txt'),
|
||||
print,
|
||||
)
|
||||
Reference in New Issue
Block a user