Work in progress on documentation for 0.6

2017-12-04 08:31:24 +01:00
parent a1f883e3c6
commit 99c4745b4e
6 changed files with 63 additions and 37 deletions
--- a/docs/tutorial/2-jobs.rst
+++ b/docs/tutorial/2-jobs.rst
@ -1,6 +1,38 @@
 Part 2: Writing ETL Jobs
 ========================

+What's an ETL job ?
+:::::::::::::::::::
+
+- data flow, stream processing
+- each node, first in first out
+- parallelism
+
+Each node has input rows, each row is one call, and each call has the input row passed as *args.
+
+Each call can have outputs, sent either using return, or yield.
+
+Each output row is stored internally as a tuple (or a namedtuple-like structure), and each output row must have the same structure (same number of fields, same len for tuple).
+
+If you yield something which is not a tuple, bonobo will create a tuple of one element.
+
+By default, exceptions are not fatal in bonobo. If a call raise an error, then bonobo will display the stack trace, increment the "err" counter for this node and move to the next input row.
+
+Some errors are fatal, though. For example, if you pass a 2 elements tuple to a node that takes 3 args, bonobo will raise an UnrecoverableTypeError, and exit the current execution.
+
+Let's write one
+:::::::::::::::
+
+We'll create a job to do the following
+
+* Extract all the FabLabs from an open data API
+* Apply a bit of formating
+* Geocode the address and normalize it, if we can
+* Display it (in the next step, we'll learn about writing the result to a file.
+
+
+
+

 Moving forward
 ::::::::::::::