more docs, still wip

This commit is contained in:
Romain Dorgueil
2016-12-27 22:05:21 +01:00
parent 4e2e9fa140
commit ad36f9368a
20 changed files with 560 additions and 106 deletions

View File

@ -19,22 +19,64 @@
</p>
<p>
{% trans %}
It was originally created as a programmatic ETL (extract transform load) python 2.7+ library called rdc.etl,
to process tenth of millions of retail stock informations, and served this purpose for years.
{% trans history_url=pathto('history') %}
It was originally created as a programmatic ETL (extract transform load) for python 2.7+ (see
<a href="{{ history_url }}">history</a>) , but is now much more than that. Of course you can still write ETL jobs within minutes, but
you can also write web crawlers, twitter bots, web crawlers, streaming API endpoints...
{% endtrans %}
</p>
<p>
{% trans %}
Bonobo is a clean full-rewrite of rdc.etl, for python 3.5+, and is now used for many ETL as well as non-ETL
use cases. For examples, it's pretty easy to write selenium based web crawlers, or twitter bots. As long as
a use case can be represented as a graph of callables interracting, Bonobo can be used.
As long as your use case can be represented as a graph of callables interracting, Bonobo can be used.
{% endtrans %}
</p>
<h2>Features</h2>
<h2 style="margin-bottom: 0">{% trans %}Documentation{% endtrans %}</h2>
<table class="contentstable">
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial/basics") }}">{% trans %}First steps{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}quick overview of basic features{% endtrans %}</span></p>
</td>
<td>
{%- if hasdoc('search') %}
<p class="biglink"><a class="biglink" href="{{ pathto("search") }}">{% trans %}
Search{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}search the documentation{% endtrans %}</span></p>{%- endif %}
</td>
</tr>
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("guide/index") }}">{% trans %}
Guides{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}for a complete overview{% endtrans %}</span>
</p>
</td>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("reference/index") }}">{% trans %}References{% endtrans %}</a>
<br/>
<span class="linkdescr">{% trans %}all functions, classes, terms{% endtrans %}</span>
</p>
</td>
</tr>
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("changes") }}">{% trans %}
Cookbook{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}examples and recipes{% endtrans %}</span></p>
</td>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("changes") }}">{% trans %}
Contribute{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}contributor guide{% endtrans %}</span></p>
</td>
</tr>
</table>
<h2>Features</h2>
<ul>
<li>
@ -63,51 +105,6 @@
</li>
</ul>
<h2 style="margin-bottom: 0">{% trans %}Documentation{% endtrans %}</h2>
<table class="contentstable">
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("tutorial/basics") }}">{% trans %}First steps{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}quick overview of basic features{% endtrans %}</span></p>
</td>
<td>
{%- if hasdoc('search') %}
<p class="biglink"><a class="biglink" href="{{ pathto("search") }}">{% trans %}
Search{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}search the documentation{% endtrans %}</span></p>{%- endif %}
</td>
</tr>
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("contents") }}">{% trans %}
Guides{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}for a complete overview{% endtrans %}</span>
</p>
</td>
<td>
{%- if hasdoc('genindex') %}
<p class="biglink"><a class="biglink" href="{{ pathto("genindex") }}">{% trans %}References{% endtrans %}</a>
<br/>
<span class="linkdescr">{% trans %}all functions, classes, terms{% endtrans %}</span>
</p>{%- endif %}
</td>
</tr>
<tr>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("changes") }}">{% trans %}
Cookbook{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}examples and recipes{% endtrans %}</span></p>
</td>
<td>
<p class="biglink"><a class="biglink" href="{{ pathto("changes") }}">{% trans %}
Contribute{% endtrans %}</a><br/>
<span class="linkdescr">{% trans %}contributor guide{% endtrans %}</span></p>
</td>
</tr>
</table>
<p>{% trans %}
You can also download PDF/EPUB versions of the Bonobo documentation:
<a href="http://readthedocs.org/projects/bonobo/downloads/pdf/stable/">PDF version</a>,

21
docs/_templates/sidebarintro.html vendored Normal file
View File

@ -0,0 +1,21 @@
<h3>About Bonobo</h3>
<p>
Bonobo is a data-processing toolkit for python 3.5+, with emphasis on simplicity, atomicity and testability. Oh,
and performances, too!
</p>
<h3>Other Formats</h3>
<p>
You can download the documentation in other formats as well:
</p>
<ul>
<li><a href="http://jinja.pocoo.org/docs/jinja-docs.pdf">as PDF</a>
<li><a href="http://jinja.pocoo.org/docs/jinja-docs.zip">as zipped HTML</a>
</ul>
<h3>Useful Links</h3>
<ul>
<li><a href="https://bonobo-project.org/">Bonobo project's Website</a></li>
<li><a href="http://pypi.python.org/pypi/bonobo">Bonobo @ PyPI</a></li>
<li><a href="http://github.com/python-bonobo/bonobo">Bonobo @ github</a></li>
</ul>

View File

@ -4,7 +4,7 @@
import sys
import os
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('_themes'))
import bonobo
@ -82,13 +82,23 @@ html_theme_options = {
}
html_sidebars = {
'**': [
'index': [
'sidebarlogo.html',
'navigation.html',
'localtoc.html',
'relations.html',
'sidebarintro.html',
'sourcelink.html',
'searchbox.html',
'sidebarinfos.html',
],
'**': [
'sidebarlogo.html',
'navigation.html',
'localtoc.html',
'relations.html',
'sourcelink.html',
'searchbox.html',
'sidebarinfos.html',
]
}

4
docs/guide/index.rst Normal file
View File

@ -0,0 +1,4 @@
Guides
======
.. todo:: write the fucking doc!

22
docs/history.rst Normal file
View File

@ -0,0 +1,22 @@
History
=======
**Bonobo** is a full rewrite of **rdc.etl**.
**rdc.etl** is a full python 2.7+ ETL library for which development started in 2012, and was opensourced in 2013 (see
`first commit <https://github.com/rdcli/rdc.etl/commit/fdbc11c0ee7f6b97322693bd0051d63677b06a93>`_).
Although the first commit in **Bonobo** happened late 2016, it's based on a lot of code, learnings and experience that
happened because of **rdc.etl**.
It would have been counterproductive to migrate the same codebase:
* a lot of mistakes were impossible to fix in a backward compatible way (for example, transormations were stateful,
making them more complicated to write and impossible to reuse, a lot of effort was used to make the components have
multi-inputs and multi-outputs, although in 99% of the case it's useless, etc.).
* we also wanted to develop something that took advantage of modern python versions, hence the choice of 3.5+.
**rdc.etl** still runs data transformation jobs, in both python 2.7 and 3, and we reuse whatever is possible to
build Bonobo.
You can read

View File

@ -3,9 +3,11 @@ Bonobo
.. toctree::
:maxdepth: 2
:caption: Contents:
install
tutorial/index
guide/index
reference/index
genindex
modindex
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@ -0,0 +1,22 @@
bonobo.compat package
=====================
Submodules
----------
bonobo.compat.pandas module
---------------------------
.. automodule:: bonobo.compat.pandas
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.compat
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,85 @@
bonobo.core package
===================
Subpackages
-----------
.. toctree::
bonobo.core.strategies
Submodules
----------
bonobo.core.bags module
-----------------------
.. automodule:: bonobo.core.bags
:members:
:undoc-members:
:show-inheritance:
bonobo.core.contexts module
---------------------------
.. automodule:: bonobo.core.contexts
:members:
:undoc-members:
:show-inheritance:
bonobo.core.errors module
-------------------------
.. automodule:: bonobo.core.errors
:members:
:undoc-members:
:show-inheritance:
bonobo.core.graphs module
-------------------------
.. automodule:: bonobo.core.graphs
:members:
:undoc-members:
:show-inheritance:
bonobo.core.inputs module
-------------------------
.. automodule:: bonobo.core.inputs
:members:
:undoc-members:
:show-inheritance:
bonobo.core.plugins module
--------------------------
.. automodule:: bonobo.core.plugins
:members:
:undoc-members:
:show-inheritance:
bonobo.core.services module
---------------------------
.. automodule:: bonobo.core.services
:members:
:undoc-members:
:show-inheritance:
bonobo.core.stats module
------------------------
.. automodule:: bonobo.core.stats
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.core
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,38 @@
bonobo.core.strategies package
==============================
Submodules
----------
bonobo.core.strategies.base module
----------------------------------
.. automodule:: bonobo.core.strategies.base
:members:
:undoc-members:
:show-inheritance:
bonobo.core.strategies.executor module
--------------------------------------
.. automodule:: bonobo.core.strategies.executor
:members:
:undoc-members:
:show-inheritance:
bonobo.core.strategies.naive module
-----------------------------------
.. automodule:: bonobo.core.strategies.naive
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.core.strategies
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,22 @@
bonobo.ext.console package
==========================
Submodules
----------
bonobo.ext.console.plugin module
--------------------------------
.. automodule:: bonobo.ext.console.plugin
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext.console
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,30 @@
bonobo.ext.jupyter package
==========================
Submodules
----------
bonobo.ext.jupyter.plugin module
--------------------------------
.. automodule:: bonobo.ext.jupyter.plugin
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.jupyter.widget module
--------------------------------
.. automodule:: bonobo.ext.jupyter.widget
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext.jupyter
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,46 @@
bonobo.ext package
==================
Subpackages
-----------
.. toctree::
bonobo.ext.console
bonobo.ext.jupyter
Submodules
----------
bonobo.ext.couchdb_ module
--------------------------
.. automodule:: bonobo.ext.couchdb_
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.opendatasoft module
------------------------------
.. automodule:: bonobo.ext.opendatasoft
:members:
:undoc-members:
:show-inheritance:
bonobo.ext.selenium module
--------------------------
.. automodule:: bonobo.ext.selenium
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.ext
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,30 @@
bonobo.io package
=================
Submodules
----------
bonobo.io.file module
---------------------
.. automodule:: bonobo.io.file
:members:
:undoc-members:
:show-inheritance:
bonobo.io.json module
---------------------
.. automodule:: bonobo.io.json
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.io
:members:
:undoc-members:
:show-inheritance:

21
docs/reference/bonobo.rst Normal file
View File

@ -0,0 +1,21 @@
bonobo package
==============
Subpackages
-----------
.. toctree::
bonobo.compat
bonobo.core
bonobo.ext
bonobo.io
bonobo.util
Module contents
---------------
.. automodule:: bonobo
:members:
:undoc-members:
:show-inheritance:

View File

@ -0,0 +1,62 @@
bonobo.util package
===================
Submodules
----------
bonobo.util.compat module
-------------------------
.. automodule:: bonobo.util.compat
:members:
:undoc-members:
:show-inheritance:
bonobo.util.helpers module
--------------------------
.. automodule:: bonobo.util.helpers
:members:
:undoc-members:
:show-inheritance:
bonobo.util.iterators module
----------------------------
.. automodule:: bonobo.util.iterators
:members:
:undoc-members:
:show-inheritance:
bonobo.util.lifecycle module
----------------------------
.. automodule:: bonobo.util.lifecycle
:members:
:undoc-members:
:show-inheritance:
bonobo.util.time module
-----------------------
.. automodule:: bonobo.util.time
:members:
:undoc-members:
:show-inheritance:
bonobo.util.tokens module
-------------------------
.. automodule:: bonobo.util.tokens
:members:
:undoc-members:
:show-inheritance:
Module contents
---------------
.. automodule:: bonobo.util
:members:
:undoc-members:
:show-inheritance:

13
docs/reference/index.rst Normal file
View File

@ -0,0 +1,13 @@
References
==========
.. todo:: write the fucking doc!
.. toctree::
:maxdepth: 4
bonobo
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

View File

@ -0,0 +1,3 @@
bonobo
======

View File

@ -1,29 +1,37 @@
First steps - Basic concepts
============================
Basic concepts
==============
To begin with Bonobo, you should first install it:
To begin with Bonobo, you need to install it in a working python 3.5+ environment:
.. code-block:: shell-session
$ pip install bonobo
See :doc:`install` if you're looking for more options.
See :doc:`/install` for more options.
Let's write a first data transformation
:::::::::::::::::::::::::::::::::::::::
We'll write a simple component that just uppercase everything. In **Bonobo**, a component is a plain old python
callable, not more, not less.
We'll start with the most simple components we can.
In **Bonobo**, a component is a plain old python callable, not more, not less. Let's write one that takes a string and
uppercase it.
.. code-block:: python
def uppercase(x: str):
return x.upper()
Ok, this is kind of simple, and you can even use `str.upper` directly instead of writing a wrapper. The type annotations
are not used, but can make your code much more readable (and may be used as validators in the future).
Pretty straightforward.
To run this, we need two more things: a generator that feeds data, and something that outputs it.
You could even use :func:`str.upper` directly instead of writing a wrapper, as a type's method (unbound) will take an
instance of this type as its first parameter (what you'd call `self` in your method).
The type annotations written here are not used, but can make your code much more readable, and may very well be used as
validators in the future.
Let's write two more components: a generator to produce the data to be transformed, and something that outputs it,
because, yeah, feedback is cool.
.. code-block:: python
@ -35,7 +43,10 @@ To run this, we need two more things: a generator that feeds data, and something
def output(x: str):
print(x)
That should do the job. Now, let's chain the three callables together and run them.
Once again, you could have skipped the pain of writing this and simply use an iterable to generate the data and the
builtin :func:`print` for the output, but we'll stick to writing our own components for now.
Let's chain the three components together and run the transformation:
.. code-block:: python
@ -43,44 +54,33 @@ That should do the job. Now, let's chain the three callables together and run th
run(generate_data, uppercase, output)
This is the simplest data transormation possible, and we run it using the `run` helper that hides the underlying object
composition necessary to actually run the callables in parralel. The more flexible, but a bit more verbose to do the
same thing would be:
.. graphviz::
.. code-block:: python
digraph {
rankdir = LR;
"generate_data" -> "uppercase" -> "output";
}
from bonobo import Graph, ThreadPoolExecutorStrategy
graph = Graph()
graph.add_chain(generate_data, uppercase, output)
executor = ThreadPoolExecutorStrategy()
executor.execute(graph)
We use the :func:`bonobo.run` helper that hides the underlying object composition necessary to actually run the
components in parralel, because it's simpler.
Depending on what you're doing, you may use the shorthand helper method, or the verbose one. Always favor the shorter,
if you don't need to tune the graph or the execution strategy.
if you don't need to tune the graph or the execution strategy (see below).
Definitions
:::::::::::
Diving in
:::::::::
* Graph
* Component
* Executor
.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
Summary
:::::::
Let's rewrite this using builtin functions and methods, then explain the few concepts available here:
Let's rewrite it using the builtin functions :func:`str.upper` and :func:`print` instead of our own wrappers, and expand
the :func:`bonobo.run()` helper so you see what's inside...
.. code-block:: python
from bonobo import Graph, ThreadPoolExecutorStrategy
# Represent our data processor as a simple directed graph of callables.
graph = Graph(
(x for x in 'foo', 'bar', 'baz'),
graph = Graph()
graph.add_chain(
('foo', 'bar', 'baz'),
str.upper,
print,
)
@ -91,19 +91,22 @@ Let's rewrite this using builtin functions and methods, then explain the few con
# Run the thing.
executor.execute(graph)
Or the shorthand version, that you should prefer if you don't need fine tuning:
We also switched our generator for a tuple, **Bonobo** will wrap it as a generator itself if it's not callable but
iterable.
The shorthand version with builtins would look like this:
.. code-block:: python
from bonobo import run
run(
iter(['foo', 'bar', 'baz']),
('foo', 'bar', 'baz'),
str.upper,
print,
)
Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, favour the shorter.
Both methods are strictly equivalent (see :func:`bonobo.run`). When in doubt, prefer the shorter version.
Takeaways
:::::::::
@ -123,17 +126,26 @@ This is what the graph we defined looks like:
}
Transformations are simple python callables. Whatever can be called can be used as a transformation. Callables can
`Components` are simple python callables. Whatever can be called can be used as a `component`. Callables can
either `return` or `yield` data to send it to the next step. Regular functions (using `return`) should be prefered if
each call is guaranteed to return exactly one result, while generators (using `yield`) should be prefered if the
number of output lines for a given input varies.
③ The graph is then executed using an `ExecutionStrategy`. For now, let's focus only on
③ The `graph` is then executed using an `ExecutionStrategy`. In this tutorial, we'll only use
:class:`bonobo.ThreadPoolExecutorStrategy`, which use an underlying `concurrent.futures.ThreadPoolExecutor` to
schedule calls in a pool of threads, but basically this strategy is what determines the actual behaviour of execution.
④ Before actually executing the callables, the `ExecutorStrategy` instance will wrap each component in a `context`,
whose responsibility is to hold the state, to keep the components stateless. We'll expand on this later.
④ Before actually executing the `components`, the `ExecutorStrategy` instance will wrap each component in a `context`,
whose responsibility is to hold the state, to keep the `components` stateless. We'll expand on this later.
Concepts and definitions
::::::::::::::::::::::::
* Component
* Graph
* Executor
.. todo:: Definitions, and substitute vague terms in the page by the exact term defined here
Next
@ -141,6 +153,6 @@ Next
You now know all the basic concepts necessary to build (batch-like) data processors.
If you're confident with this part, let's get to a more real world example, using files and nice console output.
If you're confident with this part, let's get to a more real world example, using files and nice console output:
:doc:`basics2`
.. todo:: link to next page

View File

@ -1,5 +1,5 @@
First steps - Working with files
================================
Working with files
==================
Bonobo would not be of any use if the aim was to uppercase small lists of strings. In fact, Bonobo should not be used
if you don't expect any gain from parralelization of tasks.

14
docs/tutorial/index.rst Normal file
View File

@ -0,0 +1,14 @@
First steps
===========
We tried hard to make **Bonobo** simple. We use simple python, and we believe it should be simple to learn.
We strongly advice that even if you're an advanced python developper, you go through the whole tutorial for two
reasons: that should be sufficient to do anything possible with **Bonobo** and that's a good moment to learn the few
concepts you'll see everywhere in the software.
.. toctree::
:maxdepth: 2
basics
basics2