Merge remote-tracking branch 'upstream/master'
This commit is contained in:
@ -74,6 +74,6 @@ class ETLCommand(BaseCommand):
|
|||||||
self.stderr = OutputWrapper(ConsoleOutputPlugin._stderr, ending=CLEAR_EOL + '\n')
|
self.stderr = OutputWrapper(ConsoleOutputPlugin._stderr, ending=CLEAR_EOL + '\n')
|
||||||
self.stderr.style_func = lambda x: Fore.LIGHTRED_EX + Back.RED + '!' + Style.RESET_ALL + ' ' + x
|
self.stderr.style_func = lambda x: Fore.LIGHTRED_EX + Back.RED + '!' + Style.RESET_ALL + ' ' + x
|
||||||
|
|
||||||
self.run(*args, **kwargs)
|
self.run(*args, **options)
|
||||||
|
|
||||||
self.stdout, self.stderr = _stdout_backup, _stderr_backup
|
self.stdout, self.stderr = _stdout_backup, _stderr_backup
|
||||||
|
|||||||
@ -112,7 +112,7 @@ Extract
|
|||||||
yield 'hello'
|
yield 'hello'
|
||||||
yield 'world'
|
yield 'world'
|
||||||
|
|
||||||
This is a first transformation, written as a python generator, that will send some strings, one after the other, to its
|
This is a first transformation, written as a `python generator <https://docs.python.org/3/glossary.html#term-generator>`_, that will send some strings, one after the other, to its
|
||||||
output.
|
output.
|
||||||
|
|
||||||
Transformations that take no input and yields a variable number of outputs are usually called **extractors**. You'll
|
Transformations that take no input and yields a variable number of outputs are usually called **extractors**. You'll
|
||||||
|
|||||||
@ -44,7 +44,7 @@ Now, we need to write a `writer` transformation, and apply this context processo
|
|||||||
f.write(repr(row) + "\n")
|
f.write(repr(row) + "\n")
|
||||||
|
|
||||||
The `f` parameter will contain the value yielded by the context processors, in order of appearance. You can chain
|
The `f` parameter will contain the value yielded by the context processors, in order of appearance. You can chain
|
||||||
multiple context processors. To find about how to implement this, check the |bonobo| guides in the documentation.
|
multiple context processors. To find out about how to implement this, check the |bonobo| guides in the documentation.
|
||||||
|
|
||||||
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
|
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
|
||||||
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
|
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
|
||||||
@ -144,7 +144,7 @@ Reading from files is done using the same logic as writing, except that you'll p
|
|||||||
def get_graph(**options):
|
def get_graph(**options):
|
||||||
graph = bonobo.Graph()
|
graph = bonobo.Graph()
|
||||||
graph.add_chain(
|
graph.add_chain(
|
||||||
bonobo.CsvReader('output.csv'),
|
bonobo.CsvReader('input.csv'),
|
||||||
...
|
...
|
||||||
)
|
)
|
||||||
return graph
|
return graph
|
||||||
|
|||||||
@ -2,9 +2,8 @@ Part 4: Services
|
|||||||
================
|
================
|
||||||
|
|
||||||
All external dependencies (like filesystems, network clients, database connections, etc.) should be provided to
|
All external dependencies (like filesystems, network clients, database connections, etc.) should be provided to
|
||||||
transformations as a service. It allows great flexibility, including the ability to test your transformations isolated
|
transformations as a service. This will allow for great flexibility, including the ability to test your transformations isolated
|
||||||
from the external world, and being friendly to the infrastructure people (and if you're one of them, it's also nice to
|
from the external world and easily switch to production (being user-friendly for people in system administration).
|
||||||
treat yourself well).
|
|
||||||
|
|
||||||
In the last section, we used the `fs` service to access filesystems, we'll go even further by switching our `requests`
|
In the last section, we used the `fs` service to access filesystems, we'll go even further by switching our `requests`
|
||||||
call to use the `http` service, so we can switch the `requests` session at runtime. We'll use it to add an http cache,
|
call to use the `http` service, so we can switch the `requests` session at runtime. We'll use it to add an http cache,
|
||||||
@ -24,7 +23,7 @@ Overriding services
|
|||||||
:::::::::::::::::::
|
:::::::::::::::::::
|
||||||
|
|
||||||
You can override the default services, or define your own services, by providing a dictionary to the `services=`
|
You can override the default services, or define your own services, by providing a dictionary to the `services=`
|
||||||
argument of :obj:`bonobo.run`:
|
argument of :obj:`bonobo.run`. First, let's rewrite get_services:
|
||||||
|
|
||||||
.. code-block:: python
|
.. code-block:: python
|
||||||
|
|
||||||
@ -50,8 +49,8 @@ Let's replace the :obj:`requests.get` call we used in the first steps to use the
|
|||||||
def extract_fablabs(http):
|
def extract_fablabs(http):
|
||||||
yield from http.get(FABLABS_API_URL).json().get('records')
|
yield from http.get(FABLABS_API_URL).json().get('records')
|
||||||
|
|
||||||
Tadaa, done! You're not anymore tied to a specific implementation, but to whatever :obj:`requests` compatible object the
|
Tadaa, done! You're no more tied to a specific implementation, but to whatever :obj:`requests` -compatible object the
|
||||||
user want to provide.
|
user wants to provide.
|
||||||
|
|
||||||
Adding cache
|
Adding cache
|
||||||
::::::::::::
|
::::::::::::
|
||||||
|
|||||||
@ -1,9 +1,7 @@
|
|||||||
Part 5: Projects and Packaging
|
Part 5: Projects and Packaging
|
||||||
==============================
|
==============================
|
||||||
|
|
||||||
Until then, we worked with one file managing a job.
|
Throughout this tutorial, we have been working with one file managing a job but real life often involves more complicated setups, with relations and imports between different files.
|
||||||
|
|
||||||
Real life often involves more complicated setups, with relations and imports between different files.
|
|
||||||
|
|
||||||
Data processing is something a wide variety of tools may want to include, and thus |bonobo| does not enforce any
|
Data processing is something a wide variety of tools may want to include, and thus |bonobo| does not enforce any
|
||||||
kind of project structure, as the target structure will be dictated by the hosting project. For example, a `pipelines`
|
kind of project structure, as the target structure will be dictated by the hosting project. For example, a `pipelines`
|
||||||
@ -17,7 +15,7 @@ Imports mechanism
|
|||||||
|bonobo| does not enforce anything on how the python import mechanism work. Especially, it won't add anything to your
|
|bonobo| does not enforce anything on how the python import mechanism work. Especially, it won't add anything to your
|
||||||
`sys.path`, unlike some popular projects, because we're not sure that's something you want.
|
`sys.path`, unlike some popular projects, because we're not sure that's something you want.
|
||||||
|
|
||||||
If you want to use imports, you should move your script in a python package, and it's up to you to have it setup
|
If you want to use imports, you should move your script into a python package, and it's up to you to have it setup
|
||||||
correctly.
|
correctly.
|
||||||
|
|
||||||
|
|
||||||
@ -36,8 +34,8 @@ your jobs in it. For example, it can be `mypkg.pipelines`.
|
|||||||
Creating a brand new package
|
Creating a brand new package
|
||||||
::::::::::::::::::::::::::::
|
::::::::::::::::::::::::::::
|
||||||
|
|
||||||
Because you're maybe starting a project with the data-engineering part, then you may not have a python package yet. As
|
Because you may be starting a project involving some data-engineering, you may not have a python package yet. As
|
||||||
it can be a bit tedious to setup right, there is an helper, using `Medikit <http://medikit.rdc.li/en/latest/>`_, that
|
it can be a bit tedious to setup right, there is a helper, using `Medikit <http://medikit.rdc.li/en/latest/>`_, that
|
||||||
you can use to create a brand new project:
|
you can use to create a brand new project:
|
||||||
|
|
||||||
.. code-block:: shell-session
|
.. code-block:: shell-session
|
||||||
@ -72,7 +70,7 @@ created in this tutorial and extend it):
|
|||||||
* :doc:`/extension/jupyter`
|
* :doc:`/extension/jupyter`
|
||||||
* :doc:`/extension/sqlalchemy`
|
* :doc:`/extension/sqlalchemy`
|
||||||
|
|
||||||
Then, you can either to jump head-first into your code, or you can have a better grasp at all concepts by
|
Then, you can either jump head-first into your code, or you can have a better grasp at all concepts by
|
||||||
:doc:`reading the full bonobo guide </guide/index>`.
|
:doc:`reading the full bonobo guide </guide/index>`.
|
||||||
|
|
||||||
You should also `join the slack community <https://bonobo-slack.herokuapp.com/>`_ and ask all your questions there! No
|
You should also `join the slack community <https://bonobo-slack.herokuapp.com/>`_ and ask all your questions there! No
|
||||||
|
|||||||
Reference in New Issue
Block a user