Work in progress on documentation for 0.6
This commit is contained in:
@ -150,8 +150,8 @@ Transformations that take input and yields nothing are also called **loaders**.
|
||||
different types, to work with various external systems.
|
||||
|
||||
Please note that as a convenience mean and because the cost is marginal, most builtin `loaders` will send their
|
||||
inputs to their output, so you can easily chain more than one loader, or apply more transformations after a given
|
||||
loader was applied.
|
||||
inputs to their output unmodified, so you can easily chain more than one loader, or apply more transformations after a
|
||||
given loader.
|
||||
|
||||
|
||||
Graph Factory
|
||||
@ -255,4 +255,4 @@ You now know:
|
||||
* How to execute a job file.
|
||||
* How to read the console output.
|
||||
|
||||
**Next: :doc:`2-jobs`**
|
||||
**Jump to** :doc:`2-jobs`
|
||||
|
||||
@ -1,6 +1,38 @@
|
||||
Part 2: Writing ETL Jobs
|
||||
========================
|
||||
|
||||
What's an ETL job ?
|
||||
:::::::::::::::::::
|
||||
|
||||
- data flow, stream processing
|
||||
- each node, first in first out
|
||||
- parallelism
|
||||
|
||||
Each node has input rows, each row is one call, and each call has the input row passed as *args.
|
||||
|
||||
Each call can have outputs, sent either using return, or yield.
|
||||
|
||||
Each output row is stored internally as a tuple (or a namedtuple-like structure), and each output row must have the same structure (same number of fields, same len for tuple).
|
||||
|
||||
If you yield something which is not a tuple, bonobo will create a tuple of one element.
|
||||
|
||||
By default, exceptions are not fatal in bonobo. If a call raise an error, then bonobo will display the stack trace, increment the "err" counter for this node and move to the next input row.
|
||||
|
||||
Some errors are fatal, though. For example, if you pass a 2 elements tuple to a node that takes 3 args, bonobo will raise an UnrecoverableTypeError, and exit the current execution.
|
||||
|
||||
Let's write one
|
||||
:::::::::::::::
|
||||
|
||||
We'll create a job to do the following
|
||||
|
||||
* Extract all the FabLabs from an open data API
|
||||
* Apply a bit of formating
|
||||
* Geocode the address and normalize it, if we can
|
||||
* Display it (in the next step, we'll learn about writing the result to a file.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Moving forward
|
||||
::::::::::::::
|
||||
|
||||
@ -1,6 +1,16 @@
|
||||
Part 3: Working with Files
|
||||
==========================
|
||||
|
||||
* Filesystems
|
||||
|
||||
* Reading files
|
||||
|
||||
* Writing files
|
||||
|
||||
* Writing files to S3
|
||||
|
||||
* Atomic writes ???
|
||||
|
||||
|
||||
Moving forward
|
||||
::::::::::::::
|
||||
|
||||
@ -1,6 +1,10 @@
|
||||
Part 5: Projects and Packaging
|
||||
==============================
|
||||
|
||||
Until then, we worked with one file managing a job. But real life is about set of jobs working together within a project.
|
||||
|
||||
Let's see how to move from the current status to a package.
|
||||
|
||||
|
||||
Moving forward
|
||||
::::::::::::::
|
||||
|
||||
Reference in New Issue
Block a user