Merge branch 'master' of github.com:python-bonobo/bonobo into develop

This commit is contained in:
Romain Dorgueil
2017-05-22 18:47:08 +02:00
6 changed files with 99 additions and 10 deletions

View File

@ -7,11 +7,12 @@ Data-processing for humans.
.. image:: https://img.shields.io/pypi/v/bonobo.svg
:target: https://pypi.python.org/pypi/bonobo
:alt: PyPI
.. image:: https://img.shields.io/pypi/pyversions/bonobo.svg
:target: https://pypi.python.org/pypi/bonobo
:alt: Versions
.. image:: https://readthedocs.org/projects/bonobo/badge/?version=0.3
.. image:: https://readthedocs.org/projects/bonobo/badge/?version=latest
:target: http://docs.bonobo-project.org/
:alt: Documentation

View File

@ -1 +1 @@
__version__ = '0.3.0a1'
__version__ = '0.3.0'

View File

@ -1,6 +1,51 @@
Changelog
=========
v.0.3.0 - 22 may 2017
:::::::::::::::::::::
Features
--------
* ContextProcessors can now be implemented by getting the "yield" value (v = yield x), shortening the teardown-only context processors by one line.
* File related writers (file, csv, json ...) now returns NOT_MODIFIED, making it easier to chain something after.
* More consistent console output, nodes are now sorted in a topological order before display.
* Graph.add_chain(...) now takes _input and _output parameters the same way, accepting indexes, instances or names (subject to change).
* Graph.add_chain(...) now allows to "name" a chain, using _name keyword argument, to easily reference its output later (subject to change).
* New settings module (bonobo.settings) read environment for some global configuration stuff (DEBUG and PROFILE, for now).
* New Method subclass of Option allows to use Configurable objects as decorator (see bonobo.nodes.filter.Filter for a simple example).
* New Filter transformation in standard library.
Internal features
-----------------
* Better ContextProcessor implementation, avoiding to use a decorator on the parent class. Now works with Configurable instances like Option, Service and Method.
* ContextCurrifier replaces the logic that was in NodeExecutionContext, that setup and teardown the context stack. Maybe the name is not ideal.
* All builtin transformations are of course updated to use the improved API, and should be 100% backward compatible.
* The "core" package has been dismantled, and its rare remaining members are now in "structs" and "util" packages.
* Standard transformation library has been moved under the bonobo.nodes package. It does not change anything if you used bonobo.* (which you should).
* ValueHolder is now more restrictive, not allowing to use .value anymore.
Miscellaneous
-------------
* Code cleanup, dead code removal, more tests, etc.
* More documentation.
v.0.2.4 - 2 may 2017
::::::::::::::::::::
* Cosmetic release for PyPI package page formating. Same content as v.0.2.3.
v.0.2.3 - 1 may 2017
:::::::::::::::::::::
* Positional options now supported, backward compatible. All FileHandler subclasses supports their path argument as positional.
* Better transformation lifecycle management (still work needed here).
* Windows continuous integration now works.
* Refactoring the "API" a lot to have a much cleaner first glance at it.
* More documentation, tutorials, and tuning project artifacts.
v.0.2.2 - 28 apr 2017
:::::::::::::::::::::
@ -36,4 +81,4 @@ Initial release
* Input/output MUX DEMUX removed, maybe no need for that in the real world. May come back, but not in 1.0
* Change dependency policy. We need to include only the very basic requirements (and very required). Everything related
to transforms that we may not use (bs, sqla, ...) should be optional dependencies.
* Execution strategies, threaded by default.
* Execution strategies, threaded by default.

View File

@ -3,6 +3,7 @@ F.A.Q.
List of questions that went up about the project, in no particuliar order.
Too long; didn't read.
----------------------
@ -19,8 +20,22 @@ It's lean manufacturing for data.
.. note::
This is NOT a «big data» tool. We process around 5 millions database lines in around 1 hour with rdc.etl, bonobo
ancestor (algorithms are the same, we still need to run a bit of benchmarks).
This is NOT a «big data» tool. Neither a «data analysis» tool. We process around 5 millions database lines in around
1 hour with rdc.etl, bonobo ancestor (algorithms are the same, we still need to run a bit of benchmarks).
What versions of python does bonobo support? Why not more?
----------------------------------------------------------
Bonobo is battle-tested against the latest python 3.5 and python 3.6. It may work well using other patch releases of those
versions, but we cannot guarantee it.
The main reasons about why 3.5+:
* Creating a tool that works well under both python 2 and 3 is a lot more work.
* Python 3 is nearly 10 years old. Consider moving on.
* Python 3.5 contains syntaxic sugar that makes working with data a lot more convenient.
Can a graph contain another graph?
----------------------------------
@ -30,8 +45,14 @@ No, not for now. There are no tools today in bonobo to insert a graph as a subgr
It would be great to allow it, but there is a few design questions behind this, like what node you use as input and
output of the subgraph, etc.
On another hand, if you don't consider a graph as the container but by the nodes and edges it contains, its pretty
easy to add a set of nodes and edge to a subgraph, and thus simulate it. But there will be more threads, more copies
of the same nodes, so it's not really an acceptable answer for big graphs. If it was possible to use a Graph as a
node, then the problem would be correctly solved.
It is something to be seriously considered post 1.0 (probably way post 1.0).
How would one access contextual data from a transformation? Are there parameter injections like pytest's fixtures?
------------------------------------------------------------------------------------------------------------------
@ -43,20 +64,26 @@ to find a better way to apply it.
To understand how it works today, look at https://github.com/python-bonobo/bonobo/blob/0.3/bonobo/io/csv.py#L63 and class hierarchy.
What is a plugin? Do I need to write one?
-----------------------------------------
Plugins are special classes added to an execution context, used to enhance or change the actual behavior of an execution
in a generic way. You don't need to write plugins to code transformation graphs.
Is there a difference between a transformation node and a regular python function or generator?
-----------------------------------------------------------------------------------------------
No.
Short answer: no.
Transformation callables are just regular callables, and there is nothing that differentiate it from regular python callables.
You can even use some callables both in an imperative programming context and in a transformation graph, no problem.
Longer answer: yes, sometimes, but you should not care. The function-based transformations are plain old python callable. The
class-based transformations can be plain-old-python-objects, but can also subclass Configurable which brings a lot of
fancy features, like options, service injections, class factories as decorators...
Why did you include the word «marketing» in a commit message? Why is there a marketing-automation tag on the project? Isn't marketing evil?
-------------------------------------------------------------------------------------------------------------------------------------------
@ -83,6 +110,7 @@ See https://github.com/python-bonobo/bonobo/issues/1
Bonobo is not a replacement for pandas, nor dask, nor luigi, nor airflow... It may be a replacement for Pentaho, Talend
or other data integration suites but targets people more comfortable with code as an interface.
All those references to monkeys hurt my head. Bonobos are not monkeys.
----------------------------------------------------------------------
@ -96,6 +124,7 @@ known primate typing feature.»
See https://github.com/python-bonobo/bonobo/issues/24
Who is behind this?
-------------------
@ -104,6 +133,7 @@ Me (as an individual), and a few great people that helped me along the way. Not
The code, documentation, and surrounding material is created using spare time and may lack a bit velocity. Feel free
to jump in so we can go faster!
Documentation seriously lacks X, there is a problem in Y...
-----------------------------------------------------------

View File

@ -1,7 +1,16 @@
Installation
============
Bonobo is `available on PyPI <https://pypi.python.org/pypi/bonobo>`_, and it's the easiest solution to get started.
Create an ETL project
:::::::::::::::::::::
If you only want to use Bonobo to code ETLs, your easiest option to get started is to use our
`cookiecutter template <https://github.com/python-bonobo/cookiecutter-bonobo>`_.
Install from PyPI
:::::::::::::::::
You can also install it directly from the `Python Package Index <https://pypi.python.org/pypi/bonobo>`_.
.. code-block:: shell-session

View File

@ -1,8 +1,6 @@
First steps
===========
Bonobo uses simple python and should be quick and easy to learn.
What is Bonobo?
:::::::::::::::
@ -13,10 +11,16 @@ Bonobo *is not* a statistical or data-science tool. If you're looking for a data
Bonobo is a lean manufacturing assembly line for data that let you focus on the actual work instead of the plumbery.
Bonobo uses simple python and should be quick and easy to learn.
Tutorial
::::::::
Warning: the documentation is still in progress. Although all content here should be accurate, you may feel a lack of
completeness, for which we plaid guilty and apologize. If there is something blocking, please come on our
`slack channel <https://bonobo-slack.herokuapp.com/>`_ and complain, we'll figure something out. If there is something
that did not block you but can be a no-go for others, please consider contributing to the docs.
.. toctree::
:maxdepth: 2
@ -43,6 +47,6 @@ Read about integrating external tools with bonobo
* :doc:`../guide/ext/docker`: run transformation graphs in isolated containers.
* :doc:`../guide/ext/jupyter`: run transformations within jupyter notebooks.
* :doc:`../guide/ext/selenium`: run
* :doc:`../guide/ext/selenium`: crawl the web using a real browser and work with the gathered data.
* :doc:`../guide/ext/sqlalchemy`: everything you need to interract with SQL databases.