Data Warehouse

For Data Storage

  1. BigQuery
  2. Athena
  3. Redshift

Pipeline and Job Scheduling

  1. Apache Airflow: Pipeline management
  2. mosql (Ruby): MongoDB to PostgreSQL
  3. luigi (Python): for complex pipelines of batch jobs
  4. stolos (Python): to simplify complex distributed pipelines
  5. Bubbles (Python): easy to use pipeline builder

Data Visualization

  1. Apache incubator-superset (Apache 2.0): business intelligence web application.

Back to top

© 2016-2018, Lei Ma | Created with Sphinx and . | On GitHub | Physics Notebook Statistical Mechanics Notebook Neutrino Physics Notes Intelligence | Index | Page Source