Merge branch 'filesystem' into 0.2

This commit is contained in:
Romain Dorgueil
2017-04-28 06:33:37 +02:00
30 changed files with 423 additions and 266 deletions

View File

@ -10,6 +10,7 @@ There are a few things that you should know while writing transformations graphs
:maxdepth: 2
purity
services
Third party integrations
::::::::::::::::::::::::

View File

@ -1,21 +1,35 @@
Services and dependencies (draft implementation)
================================================
:Status: Draft implementation
:Stability: Alpha
:Last-Modified: 27 apr 2017
Most probably, you'll want to use external systems within your transformations. Those systems may include databases,
apis (using http, for example), filesystems, etc.
For a start, including those services hardcoded in your transformations can do the job, but you'll pretty soon feel
limited, for two main reasons:
You can start by hardcoding those services. That does the job, at first.
* Hardcoded and tightly linked dependencies make your transformation atoms hard to test.
If you're going a little further than that, you'll feel limited, for a few reasons:
* Hardcoded and tightly linked dependencies make your transformations hard to test, and hard to reuse.
* Processing data on your laptop is great, but being able to do it on different systems (or stages), in different
environments, is more realistic.
environments, is more realistic? You probably want to contigure a different database on a staging environment,
preprod environment or production system. Maybe you have silimar systems for different clients and want to select
the system at runtime. Etc.
Service injection
:::::::::::::::::
To solve this problem, we introduce a light dependency injection system that basically allows you to define named
dependencies in your transformations, and provide an implementation at runtime.
To solve this problem, we introduce a light dependency injection system. It allows to define named dependencies in
your transformations, and provide an implementation at runtime.
Class-based transformations
---------------------------
To define a service dependency in a class-based transformation, use :class:`bonobo.config.Service`, a special
descriptor (and subclass of :class:`bonobo.config.Option`) that will hold the service names and act as a marker
for runtime resolution of service instances.
Let's define such a transformation:
@ -24,7 +38,7 @@ Let's define such a transformation:
from bonobo.config import Configurable, Service
class JoinDatabaseCategories(Configurable):
database = Service(default='primary_sql_database')
database = Service('primary_sql_database')
def __call__(self, database, row):
return {
@ -35,28 +49,46 @@ Let's define such a transformation:
This piece of code tells bonobo that your transformation expect a sercive called "primary_sql_database", that will be
injected to your calls under the parameter name "database".
Function-based transformations
------------------------------
No implementation yet, but expect something similar to CBT API, maybe using a `@Service(...)` decorator.
Execution
---------
Let's see how to execute it:
.. code-block:: python
import bonobo
bonobo.run(
[...extract...],
graph = bonobo.graph(
*before,
JoinDatabaseCategories(),
[...load...],
services={
'primary_sql_database': my_database_service,
}
*after,
)
if __name__ == '__main__':
bonobo.run(
graph,
services={
'primary_sql_database': my_database_service,
}
)
A dictionary, or dictionary-like, "services" named argument can be passed to the :func:`bonobo.run` helper. The
"dictionary-like" part is the real keyword here. Bonobo is not a DIC library, and won't become one. So the implementation
provided is pretty basic, and feature-less. But you can use much more evolved libraries instead of the provided
stub, and as long as it works the same (a.k.a implements a dictionary-like interface), the system will use it.
Future
::::::
Future and proposals
::::::::::::::::::::
This is the first proposed implementation and it will evolve, but looks a lot like how we used bonobo ancestor in
production.
You can expect to see the following features pretty soon:
May or may not happen, depending on discussions.
* Singleton or prototype based injection (to use spring terminology, see
https://www.tutorialspoint.com/spring/spring_bean_scopes.htm), allowing smart factory usage and efficient sharing of
@ -64,11 +96,43 @@ You can expect to see the following features pretty soon:
* Lazily resolved parameters, eventually overriden by command line or environment, so you can for example override the
database DSN or target filesystem on command line (or with shell environment).
* Pool based locks that ensure that only one (or n) transformations are using a given service at the same time.
* Simple config implementation, using a python file for config (ex: bonobo run ... --services=services_prod.py).
* Default configuration for services, using an optional callable (`def get_services(args): ...`). Maybe tie default
configuration to graph, but not really a fan because this is unrelated to graph logic.
* Default implementation for a service in a transformation or in the descriptor. Maybe not a good idea, because it
tends to push forward multiple instances of the same thing, but we maybe...
A few ideas on how it can be implemented, from the user perspective.
.. code-block:: python
# using call
http = Service('http.client')(requests)
# using more explicit call
http = Service('http.client').set_default_impl(requests)
# using a decorator
@Service('http.client')
def http(self, services):
import requests
return requests
# as a default in a subclass of Service
class HttpService(Service):
def get_default_impl(self, services):
import requests
return requests
# ... then use it as another service
http = HttpService('http.client')
This is under heavy development, let us know what you think (slack may be a good place for this).
This is under development, let us know what you think (slack may be a good place for this).
The basics already work, and you can try it.
Read more
:::::::::
todo: example code.
* See https://github.com/hartym/bonobo-sqlalchemy/blob/work-in-progress/bonobo_sqlalchemy/writers.py#L19 for example usage (work in progress).

View File

@ -1,8 +1,7 @@
Installation
============
Install with pip
::::::::::::::::
Bonobo is `available on PyPI <https://pypi.python.org/pypi/bonobo>`_, and it's the easiest solution to get started.
.. code-block:: shell-session
@ -11,29 +10,61 @@ Install with pip
Install from source
:::::::::::::::::::
If you want to install an unreleased version, you can use git urls with pip. This is useful when using bonobo as a
dependency of your code and you want to try a forked version of bonobo with your software. You can use the git+http
string in your `requirements.txt` file. However, the best option for development on bonobo directly is not this one,
but editable installs (see below).
.. code-block:: shell-session
$ pip install git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
$ pip install git+https://github.com/python-bonobo/bonobo.git@0.2#egg=bonobo
Editable install
::::::::::::::::
If you plan on making patches to Bonobo, you should install it as an "editable" package.
If you plan on making patches to Bonobo, you should install it as an "editable" package, which is a really great pip feature.
Pip will clone your repository in a source directory and create a symlink for it in the site-package directory of your
python interpreter.
.. code-block:: shell-session
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@master#egg=bonobo
$ pip install --editable git+https://github.com/python-bonobo/bonobo.git@0.2#egg=bonobo
Note: `-e` is the shorthand version of `--editable`.
.. note:: You can also use the `-e` flag instead of the long version.
If you can't find the "source" directory, try trunning this:
.. code-block:: shell-session
$ python -c "import bonobo; print(bonobo.__path__)"
Another option is to have a "local" editable install, which means you create the clone by yourself and make an editable install
from the local clone.
.. code-block:: shell-session
  $ git clone git@github.com:python-bonobo/bonobo.git
$ cd bonobo
$ pip install --editable .
You can develop on this clone, but you probably want to add your own repository if you want to push code back and make pull requests.
I usually name the git remote for the main bonobo repository "upstream", and my own repository "origin".
.. code-block:: shell-session
$ git remote rename origin upstream
$ git remote add origin git@github.com:hartym/bonobo.git
Of course, replace my github username by the one you used to fork bonobo. You should be good to go!
Windows support
:::::::::::::::
We had some people report that there are problems on the windows platform, mostly due to terminal features. We're trying
to look into that but we don't have good windows experience, no windows box and not enough energy to provide serious
support there. If you have experience in this domain and you're willing to help, you're more than welcome!
There are problems on the windows platform, mostly due to the fact bonobo was not developed by experienced windows users.
We're trying to look into that but energy available to provide serious support on windows is very limited.
If you have experience in this domain and you're willing to help, you're more than welcome!
.. todo::