149 lines
4.7 KiB
ReStructuredText
149 lines
4.7 KiB
ReStructuredText
Part 3: Working with Files
|
|
==========================
|
|
|
|
Writing to the console is nice, but let's be serious, real world will require us to use files or external services.
|
|
|
|
Let's see how to use a few builtin writers and both local and remote filesystems.
|
|
|
|
|
|
Filesystems
|
|
:::::::::::
|
|
|
|
In |bonobo|, files are accessed within a **filesystem** service (a `fs' FileSystem object
|
|
<https://docs.pyfilesystem.org/en/latest/builtin.html>`_).
|
|
|
|
As a default, you'll get an instance of a local filesystem mapped to the current working directory as the `fs` service.
|
|
You'll learn more about services in the next step, but for now, let's just use it.
|
|
|
|
|
|
Writing to files
|
|
::::::::::::::::
|
|
|
|
To write in a file, we'll need to have an open file handle available during the whole transformation life.
|
|
|
|
We'll use a context processor to do so. A context processor is something very much like a
|
|
:obj:`contextlib.contextmanager`, that |bonobo| will use to run a setup/teardown logic on objects that need to have
|
|
the same lifecycle as a job execution.
|
|
|
|
Let's write one that just handle opening and closing the file:
|
|
|
|
.. code-block:: python
|
|
|
|
def with_opened_file(self, context):
|
|
with open('output.txt', 'w+') as f:
|
|
yield f
|
|
|
|
Now, we need to write a `writer` transformation, and apply this context processor on it:
|
|
|
|
.. code-block:: python
|
|
|
|
from bonobo.config import use_context_processor
|
|
|
|
@use_context_processor(with_opened_file)
|
|
def write_repr_to_file(f, *row):
|
|
f.write(repr(row))
|
|
|
|
The `f` parameter will contain the value yielded by the context processors, in order of appearance (you can chain
|
|
multiple context processors).
|
|
|
|
Please note that the :func:`bonobo.config.use_context_processor` decorator will modify the function in place, but won't
|
|
modify its behaviour. If you want to call it out of the |bonobo| job context, it's your responsibility to provide
|
|
the right parameters (and here, the opened file).
|
|
|
|
|
|
Using the filesystem
|
|
::::::::::::::::::::
|
|
|
|
We opened the output file using a hardcoded filename and filesystem implementation. Writing flexible jobs include the
|
|
ability to change the load targets at runtime, and |bonobo| suggest to use the `fs` service to achieve this with files.
|
|
|
|
Let's rewrite our context processor to use it.
|
|
|
|
.. code-block:: python
|
|
|
|
def with_opened_file(self, context):
|
|
with context.get_service('fs').open('output.txt', 'w+') as f:
|
|
yield f
|
|
|
|
Interface does not change much, but this small change allows the end-user to change the filesystem implementation at
|
|
runtime, which is great to handle different environments (local development, staging servers, production, ...).
|
|
|
|
Note that |bonobo| only provide very few services with default implementation (actually, only `fs` and `http`), but
|
|
you can define all the services you want, depending on your system. You'll learn more about this in the next tutorial
|
|
chapter.
|
|
|
|
|
|
Using a different filesystem
|
|
::::::::::::::::::::::::::::
|
|
|
|
To change the `fs` implementation, you need to provide your implementation in the dict returned by `get_services()`.
|
|
|
|
Let's write to a remote location, which will be an Amazon S3 bucket. First, we need to install the driver:
|
|
|
|
.. code-block:: shell-session
|
|
|
|
pip install fs-s3fs
|
|
|
|
Then, just provide the correct bucket to :func:`bonobo.open_fs`:
|
|
|
|
.. code-block:: python
|
|
|
|
def get_services(**options):
|
|
return {
|
|
'fs': bonobo.open_fs('s3://bonobo-examples')
|
|
}
|
|
|
|
.. note::
|
|
|
|
You must provide a bucket for which you have the write permission, and it's up to you to setup your amazon
|
|
credentials in such a way that `boto` can access your AWS account.
|
|
|
|
|
|
Using builtin writers
|
|
:::::::::::::::::::::
|
|
|
|
Until then, and to have a better understanding of what happens, we implemented our writers ourselves.
|
|
|
|
|bonobo| contains writers for a variety of standard file formats, and you're probably better off using builtin writers.
|
|
|
|
Let's use a :obj:`bonobo.CsvWriter` instance instead, by replacing our custom transformation in the graph factory
|
|
function:
|
|
|
|
.. code-block:: python
|
|
|
|
def get_graph(**options):
|
|
graph = bonobo.Graph()
|
|
graph.add_chain(
|
|
...
|
|
bonobo.CsvWriter('output.csv'),
|
|
)
|
|
return graph
|
|
|
|
Reading from files
|
|
::::::::::::::::::
|
|
|
|
Reading from files is done using the same logic as writing, except that you'll probably have only one call to a reader.
|
|
|
|
Our example application does not include reading from files, but you can read the file we just wrote by using a
|
|
:obj:`bonobo.CsvReader` instance.
|
|
|
|
|
|
Atomic writes
|
|
:::::::::::::
|
|
|
|
.. include:: _todo.rst
|
|
|
|
|
|
Moving forward
|
|
::::::::::::::
|
|
|
|
You now know:
|
|
|
|
* How to use the filesystem (`fs`) service.
|
|
* How to read from files.
|
|
* How to write to files.
|
|
* How to substitute a service at runtime.
|
|
|
|
It's now time to jump to :doc:`4-services`.
|
|
|