From e4eba5dd9b5b10871e02d9643a4663dadf766ff8 Mon Sep 17 00:00:00 2001 From: Cameron Yick Date: Sat, 10 Feb 2018 17:16:50 -0500 Subject: [PATCH 1/2] Fix spelling of independent + proper nouns + minor grammar --- docs/guide/introduction.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/guide/introduction.rst b/docs/guide/introduction.rst index 2e18fa3..7662485 100644 --- a/docs/guide/introduction.rst +++ b/docs/guide/introduction.rst @@ -7,10 +7,10 @@ can understand if it could be a good fit for your use cases. How it works? ::::::::::::: -**Bonobo** is an **Extract Transform Load** framework aimed at coders, hackers, or any other person who's at ease with +**Bonobo** is an **Extract Transform Load** framework aimed at coders, hackers, or any other people who are at ease with terminals and source code files. -It is a **data streaming** solution, that treat datasets as ordered collections of independant rows, allowing to process +It is a **data streaming** solution, that treat datasets as ordered collections of independent rows, allowing to process them "first in, first out" using a set of transformations organized together in a directed graph. Let's take a few examples. @@ -101,16 +101,16 @@ What is it not? |bonobo| is not: * A data science, or statistical analysis tool, which need to treat the dataset as a whole and not as a collection of - independant rows. If this is your need, you probably want to look at `pandas `_. + independent rows. If this is your need, you probably want to look at `pandas `_. -* A workflow or scheduling solution for independant data-engineering tasks. If you're looking to manage your sets of - data processing tasks as a whole, you probably want to look at `airflow `_. +* A workflow or scheduling solution for independent data-engineering tasks. If you're looking to manage your sets of + data processing tasks as a whole, you probably want to look at `Airflow `_. Although there is no |bonobo| extension yet that handles that, it does make sense to integrate |bonobo| jobs in an airflow (or other similar tool) workflow. -* A big data solution, `as defined by wikipedia `_. We're aiming at "small +* A big data solution, `as defined by Wikipedia `_. We're aiming at "small scale" data processing, which can be still quite huge for humans, but not for computers. If you don't know whether or - not this is sufficient for your needs, it probably means you're not in the "big data" land. + not this is sufficient for your needs, it probably means you're not in "big data" land. .. include:: _next.rst From d1b54cb6edd4bf2a60e3f62f982ba079b0e34f31 Mon Sep 17 00:00:00 2001 From: Cameron Yick Date: Sat, 10 Feb 2018 17:24:46 -0500 Subject: [PATCH 2/2] Fix spelling of independent in other documentation files --- docs/guide/graphs.rst | 5 ++--- docs/tutorial/0.5/tut01.rst | 3 +-- 2 files changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/guide/graphs.rst b/docs/guide/graphs.rst index e59122c..bdfc502 100644 --- a/docs/guide/graphs.rst +++ b/docs/guide/graphs.rst @@ -22,12 +22,12 @@ Handling the data-flow this way brings the following properties: the order existing at the divergence point wont stay true at the convergence point. -- **Parallelism**: each node run in parallel (by default, using independant +- **Parallelism**: each node run in parallel (by default, using independent threads). This is useful as you don't have to worry about blocking calls. If a thread waits for, let's say, a database, or a network service, the other nodes will continue handling data, as long as they have input rows available. -- **Independance**: the rows are independant from each other, making this way +- **Independence**: the rows are independent from each other, making this way of working with data flows good for line-by-line data processing, but also not ideal for "grouped" computations (where an output depends on more than one line of input data). You can overcome this with rolling windows if @@ -299,4 +299,3 @@ the CLI, and reading the source you should be able to figure out its usage quite .. include:: _next.rst - diff --git a/docs/tutorial/0.5/tut01.rst b/docs/tutorial/0.5/tut01.rst index 97181ac..df26a33 100644 --- a/docs/tutorial/0.5/tut01.rst +++ b/docs/tutorial/0.5/tut01.rst @@ -78,7 +78,7 @@ Create a transformation graph Amongst other features, Bonobo will mostly help you there with the following: -* Execute the transformations in independant threads +* Execute the transformations in independent threads * Pass the outputs of one thread to other(s) thread(s) inputs. To do this, it needs to know what data-flow you want to achieve, and you'll use a :class:`bonobo.Graph` to describe it. @@ -200,4 +200,3 @@ Next :::: Time to jump to the second part: :doc:`tut02`. -