Difference between revisions of "Data Science"
Kourouklides (talk | contribs) |
Kourouklides (talk | contribs) |
||
(196 intermediate revisions by 7 users not shown) | |||
Line 1: | Line 1: | ||
− | This page contains resources about [https://en.wikipedia.org/wiki/Data_science Data Science], |
+ | This page contains resources about [https://en.wikipedia.org/wiki/Data_science Data Science], '''Data Engineering''' and [https://en.wikipedia.org/wiki/Data_management Data Management]. |
== Subfields and Concepts == |
== Subfields and Concepts == |
||
+ | * Agile Data Science |
||
* [[Machine Learning]] / Data Mining |
* [[Machine Learning]] / Data Mining |
||
− | * Exploratory Data Analysis |
+ | * Exploratory Data Analysis (EDA) |
− | * Data Preparation and Preprocessing |
+ | * Data Preparation and Data Preprocessing |
+ | * Data Fusion and Data Integration |
||
− | * Parallel/Distributed/Concurrent Computing for Machine Learning |
||
− | * Data |
+ | * Data Wrangling / Data Munging |
+ | * Data Scraping |
||
+ | * Data Sampling |
||
+ | * Data Cleaning |
||
* Data Visualization |
* Data Visualization |
||
+ | * Explainable AI (XAI) / Interpretable AI |
||
* Big Data |
* Big Data |
||
+ | * Data Engineering, Data Management and Databases |
||
+ | * High Performance/Parallel/Distributed/Cloud Computing for Machine Learning |
||
+ | * Concurrent/Multi-threading Computing for Machine Learning |
||
+ | * Synchronous Communication (for Web Services) |
||
+ | ** Representational State Transfer (REST) Protocol |
||
+ | ** Remote Procedure Call (RPC) |
||
+ | ** Simple Object Access Protocol (SOAP) |
||
+ | * Asynchronous Communication / Asynchronous Messaging (for Web Services) |
||
+ | ** Message broker/Message bus/Event bus/Integration broker/Interface engine |
||
+ | ** Message queue |
||
+ | ** Asynchronous protocols |
||
+ | *** Advanced Message Queuing Protocol (AMQP) |
||
+ | *** MQ Telemetry Transport (MQTT) |
||
+ | * Messaging patterns |
||
+ | ** Fire-and-Forget / One-Way |
||
+ | ** Request-Response / Request-Reply |
||
+ | ** Publisher-Subscriber |
||
+ | ** Request-Callback |
||
+ | * Software Architecture |
||
+ | ** Monolithic Architecture |
||
+ | ** Microservices Architecture |
||
+ | ** Service-Oriented Architecture (SOA) |
||
+ | * Stream Processing |
||
== Online courses == |
== Online courses == |
||
=== Video Lectures === |
=== Video Lectures === |
||
− | * [https://www. |
+ | * [https://www.coursera.org/learn/competitive-data-science How to Win a Data Science Competition: Learn from Top Kagglers] - Coursera |
− | |||
=== Lecture Notes === |
=== Lecture Notes === |
||
− | * [https:// |
+ | * [https://goo.gl/VSTGUQ Data Science by Ioannis Kourouklides] |
* [https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf When <nowiki> [to use] </nowiki> and When Not to Use Distributed Machine Learning by Chih-Jen Lin] |
* [https://www.csie.ntu.edu.tw/~cjlin/talks/bigdata-bilbao.pdf When <nowiki> [to use] </nowiki> and When Not to Use Distributed Machine Learning by Chih-Jen Lin] |
||
* [https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 Open Machine Learning Course] (Medium) |
* [https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 Open Machine Learning Course] (Medium) |
||
* [http://www.mmds.org/#mooc Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman] |
* [http://www.mmds.org/#mooc Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman] |
||
+ | * [https://www.systems.ethz.ch/courses/fall2017/hadp Hardware Acceleration for Data Processing by Gustavo Alonso] |
||
+ | * [http://cs109.github.io/2015/ CS109: Data Science] |
||
==Books== |
==Books== |
||
+ | * Newman, S. (2021). ''Building Microservices: Designing Fine-Grained Systems''. 2nd Ed. O'Reilly Media. |
||
− | * Tukey, J. W. (1977). ''Exploratory data analysis''. Addison-Wesley. |
||
− | * |
+ | * Bellemare, A. (2020). ''Building Event-Driven Microservices: Leveraging Organizational Data at Scale''. O'Reilly Media. |
+ | * Richards, M. (2020). ''Fundamentals of Software Architecture''. O'Reilly Media. |
||
− | * Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. ([http://www.mmds.org/ link]) |
||
− | * |
+ | * Dean A., & Crettaz, V. (2019). ''Event Streams in Action''. Manning. |
+ | * Richardson, C. (2018). ''Microservices Patterns''. Manning Publications. |
||
+ | * Pacheco, V. F. (2018). ''Microservice Patterns and Best Practices''. Packt Publishing. |
||
+ | * De la Torre C., Wagner, B., & Rousos, M. (2018). ''.NET Microservices: Architecture for Containerized .NET Applications''. Microsoft Corporation. ([https://github.com/dzfweb/microsoft-microservices-book link]) |
||
+ | * Lanaro, G. (2017). ''Python High Performance''. Packt Publishing. |
||
+ | * Wickham, H., & Grolemund, G. (2017). ''R for Data Science''. O'Reilly Media. |
||
+ | * Kleppmann, M. (2017). ''Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems''. O'Reilly Media. |
||
+ | * VanderPlas, J. (2016). ''Python Data Science Handbook: Essential Tools for Working with Data''. O'Reilly Media. |
||
+ | * Pierfederici, F. (2016). ''Distributed Computing with Python''. Packt Publishing. |
||
+ | * Dunning, T., & Friedman, E. (2016). ''Streaming Architecture: New Designs Using Apache Kafka and MapR Streams.'' O'Reilly Media. |
||
* Nolan, D., & Lang, D. T. (2015). ''Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving''. CRC Press. |
* Nolan, D., & Lang, D. T. (2015). ''Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving''. CRC Press. |
||
* Elston, S. F. (2015). ''Data Science in the Cloud with Microsoft Azure Machine Learning and R.'' O'Reilly Media, Inc. |
* Elston, S. F. (2015). ''Data Science in the Cloud with Microsoft Azure Machine Learning and R.'' O'Reilly Media, Inc. |
||
* Grus, J. (2015). ''Data Science from Scratch: First Principles with Python''. O'Reilly Media. |
* Grus, J. (2015). ''Data Science from Scratch: First Principles with Python''. O'Reilly Media. |
||
− | * Madhavan, S. (2015). ''Mastering Python for Data Science''. Packt Publishing |
+ | * Madhavan, S. (2015). ''Mastering Python for Data Science''. Packt Publishing. |
+ | * Kale, V. (2015). ''Guide to Cloud Computing for Business and Technology Managers: From Distributed Computing to Cloudware Applications''. CRC Press. |
||
− | * Blum, A., Hopcroft, J., & Kannan, R. (2015). Foundations of Data Science. |
||
− | * |
+ | * Ejsmont, A. (2015). ''Web Scalability for Startup Engineers''. McGraw Hill. |
+ | * Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). ''Mining of Massive Datasets''. Cambridge University Press. ([http://www.mmds.org/ link]) |
||
− | * Wickham, H., & Grolemund, G. (2017). ''R for Data Science''. O'Reilly Media. |
||
+ | * Zumel, N., Mount, J., & Porzak, J. (2014). ''Practical Data Science with R''. Manning. |
||
+ | * Schutt, R., & O'Neil, C. (2013). ''Doing Data Science: Straight Talk from the Frontline''. O'Reilly Media. |
||
+ | * Videla, A., & J.W. Williams, J. (2012). ''RabbitMQ in Action''. Manning. |
||
+ | * Tukey, J. W. (1977). ''Exploratory Data Analysis''. Addison-Wesley. |
||
+ | |||
+ | ==Scholarly Articles== |
||
+ | * Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ''ACM Computing Surveys (CSUR), 53''(2), 1-33. |
||
+ | * Buchlovsky, P. ... (2018). TF-Replicator: Distributed Machine Learning for Researchers. ''arXiv preprint arXiv:1902.00465.'' |
||
+ | * Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. ''Proceedings of the VLDB Endowment, 10''(11), 1586-1597. |
||
+ | * Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? Explaining the Predictions of Any Classifier. In ''Proceedings of the 22nd [https://en.wikipedia.org/wiki/SIGKDD ACM SIGKDD International Conference on Knowledge Discovery and Data Mining]'' (pp. 1135-1144). |
||
+ | * Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. ''Engineering, 2''(2), 179-195. |
||
+ | * Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. ''International Journal of Data Science and Analytics, 1''(3-4), 145-164. |
||
+ | * Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In ''[https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems Advances in Neural Information Processing Systems]'' (pp. 2503-2511). |
||
+ | * Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In ''Proceedings of the 2015 [https://en.wikipedia.org/wiki/SIGMOD ACM SIGMOD International Conference on Management of Data]'' (pp. 607-618). |
||
+ | * Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. ''arXiv preprint arXiv:1511.06051''. |
||
+ | * Upadhyaya, S. R. (2013). Parallel approaches to machine learning—A comprehensive survey. ''Journal of Parallel and Distributed Computing, 73''(3), 284-292. |
||
+ | * Sakr, S., Liu, A., Batista, D. M., & Alomari, M. (2011). A survey of large scale data management approaches in cloud environments. ''IEEE Communications Surveys & Tutorials, 13''(3), 311-336. |
||
+ | * Abadi, D. J. (2009). Data management in the cloud: Limitations and opportunities. ''IEEE Data Eng. Bull., 32''(1), 3-12. |
||
==Software== |
==Software== |
||
+ | * [https://www.docker.com/ Docker] (Containers) |
||
+ | * [https://www.anaconda.com/ Anaconda Distribution] - Python |
||
+ | * [https://cython.org/ Cython] - Python |
||
* [https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Beautiful Soup 4] - Python |
* [https://www.crummy.com/software/BeautifulSoup/bs4/doc/ Beautiful Soup 4] - Python |
||
+ | * [https://lxml.de/ lxml] - Python |
||
+ | * [https://selenium-python.readthedocs.io/ Selenium] - Python |
||
+ | * [https://doc.scrapy.org/en/latest/index.html Scrapy] - Python |
||
* [https://github.com/ray-project/ray ray] - Python |
* [https://github.com/ray-project/ray ray] - Python |
||
+ | * [https://docs.python.org/3.4/library/multiprocessing.html multiprocessing] - Python |
||
− | * [https://www.elastic.co/products/elasticsearch Elasticsearch] |
||
+ | * [https://docs.python.org/3.4/library/threading.html threading] - Python |
||
+ | * [https://github.com/ClimbsRocks/auto_ml auto_ml] - Python |
||
+ | * [https://docs.celeryproject.org/en/stable/getting-started/introduction.html Celery] - Python |
||
+ | * [https://www.elastic.co/products/elasticsearch Elasticsearch], [https://www.elastic.co/products/logstash Logstash], [https://www.elastic.co/products/kibana Kibana] (ELK) |
||
* [https://www.mongodb.com/ MongoDB] |
* [https://www.mongodb.com/ MongoDB] |
||
* [http://lucene.apache.org/solr/ Apache Solr] |
* [http://lucene.apache.org/solr/ Apache Solr] |
||
Line 45: | Line 111: | ||
* [https://spark.apache.org/ Apache Spark] |
* [https://spark.apache.org/ Apache Spark] |
||
* [https://hive.apache.org/ Apache Hive] |
* [https://hive.apache.org/ Apache Hive] |
||
− | * [http://kafka.apache.org/ Apache Kafka], which includes [https://www.confluent.io/product/connectors/ Kafka Connect] |
||
* [http://cassandra.apache.org/ Apache Cassandra] |
* [http://cassandra.apache.org/ Apache Cassandra] |
||
* [https://zookeeper.apache.org/ Apache ZooKeeper] |
* [https://zookeeper.apache.org/ Apache ZooKeeper] |
||
Line 52: | Line 117: | ||
* [http://couchdb.apache.org/ Apache CouchDB] |
* [http://couchdb.apache.org/ Apache CouchDB] |
||
* [http://activemq.apache.org/ Apache ActiveMQ] |
* [http://activemq.apache.org/ Apache ActiveMQ] |
||
− | * [ |
+ | * [http://samza.apache.org/ Apache Samza] |
+ | * [https://flink.apache.org/ Apache Flink] |
||
+ | * [http://kafka.apache.org/ Apache Kafka] (which includes [https://www.confluent.io/product/connectors/ Kafka Connect]) - A message broker |
||
+ | * [https://www.rabbitmq.com/ RabbitMQ] - A message broker |
||
+ | * [https://redis.io/ Redis] - A message broker |
||
+ | * [https://spark.apache.org/docs/latest/api/python/index.html pyspark] - Spark Python API |
||
* [http://platanios.org/tensorflow_scala/ tensorflow_scala] - Scala API for TensorFlow |
* [http://platanios.org/tensorflow_scala/ tensorflow_scala] - Scala API for TensorFlow |
||
+ | * [https://github.com/migueldeicaza/TensorFlowSharp TensorFlowSharp] - TensorFlow API for .NET languages |
||
+ | * [https://github.com/yahoo/TensorFlowOnSpark TensorFlowOnSpark] - It brings TensorFlow programs onto Apache Spark clusters |
||
+ | * [https://numba.pydata.org/ Numba] - Python |
||
+ | * [https://graphql.org/ GraphQL] |
||
+ | * [https://www.nginx.com/ nginx] |
||
+ | * [https://dvc.org/ DVC] - Data Version Control |
||
+ | * [https://www.kubeflow.org/ kubeflow] |
||
+ | * [https://akka.io/ Akka] |
||
+ | * [https://www.pykka.org/ Pykka] |
||
+ | * [https://apache.github.io/incubator-heron/ Heron] |
||
+ | * [https://airflow.apache.org/ Apache Airflow] - Workflow Management System |
||
+ | * [http://druid.io/ Druid] |
||
+ | * [https://superset.incubator.apache.org/druid.html Apache Superset] |
||
+ | * [https://github.com/horovod/horovod Horovod] - TensorFlow, Keras, PyTorch, and MXNet |
||
+ | * [https://www.acumos.org/ Acumos AI] |
||
+ | * [https://hopsworks.readthedocs.io/en/0.9/hopsml/hopsML.html HopsML] |
||
+ | * [https://arrow.apache.org/ Apache Arrow] |
||
==See also== |
==See also== |
||
Line 59: | Line 146: | ||
==Other Resources== |
==Other Resources== |
||
+ | ===General=== |
||
+ | *[https://www.slideshare.net/kourouklides/what-is-data-science-99294704/ What is Data Science by Ioannis Kourouklides] - slides |
||
*[https://datascienceguide.github.io/ Data Science Guide] |
*[https://datascienceguide.github.io/ Data Science Guide] |
||
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way] |
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way] |
||
+ | *[https://www.oreilly.com/ideas/a-manifesto-for-agile-data-science A manifesto for Agile data science] - blog post |
||
+ | *[https://towardsdatascience.com/data-science-project-flow-for-startups-282a93d4508d Data Science Project Flow for Startups] - blog post |
||
*[http://www.cse.ust.hk/~kxmo/LargeML.html Large Scale Machine Learning] - libraries and papers |
*[http://www.cse.ust.hk/~kxmo/LargeML.html Large Scale Machine Learning] - libraries and papers |
||
*[https://www.quora.com/What-are-some-courses-on-large-scale-learning What are some courses on large scale learning?] - Quora |
*[https://www.quora.com/What-are-some-courses-on-large-scale-learning What are some courses on large scale learning?] - Quora |
||
*[https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-python.html 7 Steps to Mastering Data Preparation with Python] - blog post |
*[https://www.kdnuggets.com/2017/06/7-steps-mastering-data-preparation-python.html 7 Steps to Mastering Data Preparation with Python] - blog post |
||
*[https://www.kdnuggets.com/2017/12/baesens-web-scraping-data-science-python.html Web Scraping for Data Science with Python] - blog post |
*[https://www.kdnuggets.com/2017/12/baesens-web-scraping-data-science-python.html Web Scraping for Data Science with Python] - blog post |
||
− | *[https://medium.com/@Petuum/intro-to-distributed-deep-learning-systems-a2e45c6b8e7 Intro to Distributed Deep Learning Systems] - blog post |
||
− | *[https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python Detailed exploratory data analysis with python] - blog post |
||
*[http://vlad17.github.io/COS513-Blog/ Princeton Commodities Modeling Blog] |
*[http://vlad17.github.io/COS513-Blog/ Princeton Commodities Modeling Blog] |
||
− | *[https://github.com/ajaymache/data-analysis-using-python Exploratory data analysis using Python for used car database taken from Kaggle] - Github |
||
− | *[https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python Detailed exploratory data analysis with Python] - Kaggle |
||
*[https://github.com/upalr/Python-camp Python-camp] - Github |
*[https://github.com/upalr/Python-camp Python-camp] - Github |
||
*[http://mtitek.com/big-data.php Big Data: Spark, Hadoop, Hive, ZooKeeper, Solr, Kafka, Nutch, MongoDB, ...] - installation instructions |
*[http://mtitek.com/big-data.php Big Data: Spark, Hadoop, Hive, ZooKeeper, Solr, Kafka, Nutch, MongoDB, ...] - installation instructions |
||
+ | *[https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html Deep Learning with Apache Spark and TensorFlow] - blog post |
||
+ | *[https://khartig.wordpress.com/2017/12/30/build-a-simple-chatbot-with-tensorflow-python-and-mongodb/ Build a Simple Chatbot with Tensorflow, Python and MongoDB] - blog post |
||
+ | *[https://plot.ly/python/maps/ Plotly Python Library Maps] |
||
+ | *[https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f 5 Quick and Easy Data Visualizations in Python with Code] - blog post |
||
+ | *[https://medium.com/@williamkoehrsen William Koehrsen] - blog |
||
+ | *[http://www.claoudml.co/ ClaoudML] - Free Data Science & Machine Learning Resources |
||
+ | *[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet] |
||
+ | *[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar |
||
+ | *[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post |
||
+ | *[https://www.zurich.ibm.com/snapml/ Snap ML] - IBM |
||
+ | *[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources |
||
+ | *[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post |
||
+ | *[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post |
||
+ | *[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf Rules of Machine Learning: Best Practices for ML Engineering] - blog post |
||
+ | *[https://blog.kovalevskyi.com/google-compute-engine-now-has-images-with-pytorch-1-0-0-and-fastai-1-0-2-57c49efd74bb Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2] - blog post |
||
+ | *[https://eng.uber.com/michelangelo-pyml/ Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development] |
||
+ | *[https://towardsdatascience.com/manage-your-data-science-project-structure-in-early-stage-95f91d4d0600 Manage your Data Science project structure in early stage] - blog post |
||
+ | *[https://medium.com/@rrfd/cookiecutter-data-science-organize-your-projects-atom-and-jupyter-2be7862f487e Cookiecutter Data Science — Organize your Projects — Atom and Jupyter] - blog post |
||
+ | *[https://github.com/SurrealAI/surreal surreal (GitHub)] - code |
||
+ | *[https://github.com/SurrealAI/cloudwise cloudwise (GitHub)] - code |
||
+ | *[https://github.com/SurrealAI/caraml caraml (GitHub)] - code |
||
+ | *[https://github.com/SurrealAI/symphony symphony (GitHub)] - code |
||
+ | *[https://www.analyticsindiamag.com/tensorflow-vs-spark-differ-work-tandem TensorFlow Vs. Spark: How Do They Differ And Work In Tandem With Each Other] - blog post |
||
+ | *[https://github.com/bulutyazilim/awesome-datascience awesome-datascience (GitHub)] |
||
+ | *[https://github.com/siboehm/awesome-learn-datascience awesome-learn-datascience (GitHub)] |
||
+ | *[https://www.logicalclocks.com/blog/when-deep-learning-with-gpus-use-a-cluster-manager When Deep Learning with GPUs, use a Cluster Manager] - blog post |
||
+ | |||
+ | ===Data Annotation & Labelling=== |
||
+ | *[https://appen.com/blog/data-annotation/ What is Data Annotation?] |
||
+ | *[https://www.mturk.com Amazon Mechanical Turk] |
||
+ | *[https://www.cloudfactory.com/ CloudFactory] |
||
+ | *[https://appen.com/ Appen] |
||
+ | *[https://www.alegion.com/ Alegion] |
||
+ | *[https://imerit.net/ iMerit] |
||
+ | *[https://playment.io/ Playment] |
||
+ | *[https://www.rev.com/ Rev] - Transcription from video and audio |
||
+ | *[https://labelbox.com/ Labelbox] |
||
+ | *[https://github.com/diffgram/diffgram diffgram] |
||
+ | *[https://dl.acm.org/citation.cfm?id=1866696 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk] |
||
+ | *[https://www.cloudfactory.com/data-annotation-tool-guide Data Annotation Tools for Machine Learning (Evolving Guide)] |
||
+ | *[https://github.com/taivop/awesome-data-annotation awesome-data-annotation (GitHub)] |
||
+ | |||
+ | ===EDA=== |
||
+ | *[https://github.com/ajaymache/data-analysis-using-python Exploratory data analysis using Python for used car database taken from Kaggle] - Github |
||
+ | *[https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python Detailed exploratory data analysis with Python] - Kaggle |
||
+ | *[https://www.youtube.com/watch?v=W5WE9Db2RLU Exploratory data analysis in Python - PyCon 2017 (Youtube)] |
||
+ | *[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 Exploratory Data Analysis with Pandas] - blog post |
||
+ | *[https://www.kaggle.com/randylaosat/simple-exploratory-data-analysis-passnyc Simple Exploratory Data Analysis - PASSNYC] - Kaggle |
||
+ | *[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle |
||
+ | *[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post |
||
+ | *[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd Visual Data Analysis with Python] - blog post |
||
+ | |||
+ | ===Asynchronous Communication & Microservices=== |
||
+ | *[https://microservices.io/patterns/microservices.html Pattern: Microservice Architecture] |
||
+ | *[https://www.dineshonjava.com/software-architecture-patterns-and-designs/ Software Architecture Patterns and Designs] |
||
+ | *[https://codeblog.dotsandbrackets.com/asynchronous-communication-with-message-queue/ Asynchronous communication with message queue] |
||
+ | *[https://garba.org/article/general/soa/mep.html Message Exchange Patterns (MEPs)] |
||
+ | *[https://flylib.com/books/en/2.365.1/message_exchange_patterns.html Message exchange patterns] |
||
+ | *[https://docs.microsoft.com/en-us/azure/architecture/patterns/category/messaging Messaging patterns] |
||
+ | *[https://medium.com/@mmz.zaeimi/synchronous-vs-asynchronous-communication-in-microservices-integration-f4dd36478fd2 Synchronous vs Asynchronous communication in microservices integration] |
||
+ | *[https://otonomo.io/blog/redis-kafka-or-rabbitmq-which-microservices-message-broker-to-choose/ Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?] |
||
+ | *[https://dzone.com/articles/akka-streams-and-kafka-streams-where-microservices Akka Streams and Kafka Streams: Where Microservices Meet Fast Data] |
||
+ | *[https://dzone.com/articles/akka-spark-or-kafka-selecting-the-right-streaming Akka, Spark, or Kafka? Selecting the Right Streaming Engine] |
||
+ | *[https://otonomo.io/blog/luigi-airflow-pinball-and-chronos-comparing-workflow-management-systems/ Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems] |
||
+ | *[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code) |
||
+ | *[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post |
||
+ | *[https://tanzu.vmware.com/content/blog/understanding-when-to-use-rabbitmq-or-apache-kafka Understanding When to use RabbitMQ or Apache Kafka] |
||
+ | *[https://www.ververica.com/what-is-stream-processing What is Stream Processing?] |
||
+ | *[https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97 A Gentle Introduction to Stream Processing] |
||
+ | |||
+ | === Distributed Systems=== |
||
+ | *[https://blog.docker.com/2016/10/docker-distributed-system-summit-videos-podcast-episodes/ Docker Distributed System Summit videos podcast episodes] |
||
+ | *[https://www.voltdb.com/files/using-docker-simplify-distributed-systems-development/ Using Docker to Simplify Distributed Systems in Development] - video |
||
+ | *[https://medium.com/@harinilabs/day-11-getting-started-with-docker-and-using-it-to-build-deploy-a-distributed-app-1929669064b8 Day 11: Using Docker to build and deploy a distributed app] - blog post with [https://github.com/harinij/100DaysOfCode/tree/master/Day%20011%20-%20Docker%20WebApp code] |
||
+ | *[https://medium.com/@Petuum/intro-to-distributed-deep-learning-systems-a2e45c6b8e7 Intro to Distributed Deep Learning Systems] - blog post |
||
+ | *[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun] |
||
+ | *[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post |
||
+ | *[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post |
||
+ | *[http://muratbuffalo.blogspot.com/2016/04/petuum-new-platform-for-distributed.html Paper Review. Petuum: A new platform for distributed machine learning on big data] - blog post |
||
+ | *[http://www.cheerml.com/comparison-distributed-ml-platform A comparison of distributed machine learning platform] - blog post |
||
+ | *[https://www.logicalclocks.com/why-you-need-a-distributed-filesystem-for-deep-learning/ Distributed Filesystems for Deep Learning] - blog post |
||
+ | *[https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code) |
||
+ | |||
+ | ===Deployment and Production=== |
||
+ | *[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post |
||
+ | *[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post |
||
+ | *[https://towardsdatascience.com/deploying-deep-learning-models-part-1-an-overview-77b4d01dd6f7 Deploying deep learning models: Part 1 an overview] - blog post |
||
+ | *[https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a A guide to deploying Machine/Deep Learning model(s) in Production] - blog post |
||
+ | *[https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a How redBus uses Scikit-Learn ML models to classify customer complaints?] - blog post |
||
+ | *[https://willk.online/deploying-a-keras-deep-learning-model-as-a-web-application-in-p/ Deploying a Keras Deep Learning Model as a Web Application in Python] - blog post |
||
+ | *[https://awesome-docker.netlify.com/ Awesome-docker] - A curated list of Docker resources and projects |
||
+ | *[https://ramitsurana.github.io/awesome-kubernetes/ Awesome-Kubernetes] - A curated list for awesome kubernetes sources |
||
+ | *[https://www.youtube.com/watch?v=zxcvyrhmjbc Michael Herman - Going Serverless with OpenFaaS, Kubernetes, and Python - PyCon 2018 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=jbb1dbFaovg Aly Sivji, Joe Jasinski, tathagata dasgupta (t) - Docker for Data Science - PyCon 2018 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=kx-048qE-TI Ruben Orduz, Nolan Brubaker - A Python-flavored Introduction to Containers And Kubernetes (Youtube)] - PyCon 2018 |
||
+ | *[https://www.youtube.com/watch?v=nrzLdMWTRMM Miguel Grinberg - Microservices with Python and Flask - PyCon 2017 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=EuzoEaE6Cqs Deploy and scale containers with Docker native, open source orchestration PyCon 2017 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=tdIIJuPh3SI Miguel Grinberg - Flask at Scale - PyCon 2016 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=GpHMTR7P2Ms Deploying and scaling applications with Docker, Swarm, and a tiny bit of Python magic - PyCon 2016 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=ZVaRK10HBjo Jérôme Petazzoni - Introduction to Docker and containers - PyCon 2016 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=DIcpEg77gdE Miguel Grinberg - Flask Workshop - PyCon 2015 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=YiZkHUbE6N0 Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)] |
||
+ | *[https://www.youtube.com/watch?v=FGrIyBDQLPg Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)] |
||
+ | *[https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users] - blog post |
||
+ | *[https://towardsdatascience.com/deploying-keras-deep-learning-models-with-flask-5da4181436a2 Deploying Keras Deep Learning Models with Flask] - blog post |
||
+ | *[https://www.twilio.com/engineering/2012/10/18/open-sourcing-flask-restful Introducing Flask-RESTful] - blog post |
||
+ | *[https://towardsdatascience.com/develop-a-nlp-model-in-python-deploy-it-with-flask-step-by-step-744f3bdd7776 Develop a NLP Model in Python & Deploy It with Flask, Step by Step] - blog post |
||
+ | *[https://www.youtube.com/watch?v=knAFR4u73Es Deploying Machine Learning apps with Docker containers - MUPy 2017] - video |
||
+ | *[https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139 Getting started with Anaconda & Docker] - blog post |
||
+ | *[https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 Docker for Data Science] - blog post |
||
+ | *[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post |
||
+ | *[https://becominghuman.ai/docker-for-data-science-part-1-dd41e5ef1d80 Simplified Docker-ing for Data Science — Part 1] - blog post |
||
+ | *[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post |
||
+ | *[https://towardsdatascience.com/how-to-write-a-production-level-code-in-data-science-5d87bd75ced How to write a production-level code in Data Science?] - blog post |
||
+ | *[https://www.elastic.co/webinars/event-logs-in-elasticsearch-and-machine-learning Web Access Logs in Elasticsearch and Machine Learning] - webinar |
||
+ | *[https://www.youtube.com/watch?v=f3I0izerPvc Deploying Python models to production] - video |
||
+ | *[https://www.youtube.com/watch?v=-UYyyeYJAoQ How to deploy machine learning models into production] - video |
||
+ | *[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post |
||
+ | *[https://github.com/practicalAI/productionML productionML (GitHub)] - code for creating Production level API services for Machine Learning |
||
+ | *[https://medium.com/kredaro-engineering/ai-tales-building-machine-learning-pipeline-using-kubeflow-and-minio-4b88da30437b AI Tales: Building Machine learning pipeline using Kubeflow and Minio] - blog post |
||
+ | *[https://github.com/ahkarami/Deep-Learning-in-Production Deep-Learning-in-Production (GitHub)] |
||
+ | *[https://medium.com/dataswati-garage/create-a-robust-ai-rest-api-71a8050ce314 Deploy your AI model the hard (and robust) way] - blog post |
Latest revision as of 21:35, 24 November 2020
This page contains resources about Data Science, Data Engineering and Data Management.
Subfields and ConceptsEdit
- Agile Data Science
- Machine Learning / Data Mining
- Exploratory Data Analysis (EDA)
- Data Preparation and Data Preprocessing
- Data Fusion and Data Integration
- Data Wrangling / Data Munging
- Data Scraping
- Data Sampling
- Data Cleaning
- Data Visualization
- Explainable AI (XAI) / Interpretable AI
- Big Data
- Data Engineering, Data Management and Databases
- High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
- Concurrent/Multi-threading Computing for Machine Learning
- Synchronous Communication (for Web Services)
- Representational State Transfer (REST) Protocol
- Remote Procedure Call (RPC)
- Simple Object Access Protocol (SOAP)
- Asynchronous Communication / Asynchronous Messaging (for Web Services)
- Message broker/Message bus/Event bus/Integration broker/Interface engine
- Message queue
- Asynchronous protocols
- Advanced Message Queuing Protocol (AMQP)
- MQ Telemetry Transport (MQTT)
- Messaging patterns
- Fire-and-Forget / One-Way
- Request-Response / Request-Reply
- Publisher-Subscriber
- Request-Callback
- Software Architecture
- Monolithic Architecture
- Microservices Architecture
- Service-Oriented Architecture (SOA)
- Stream Processing
Online coursesEdit
Video LecturesEdit
Lecture NotesEdit
- Data Science by Ioannis Kourouklides
- When [to use] and When Not to Use Distributed Machine Learning by Chih-Jen Lin
- Open Machine Learning Course (Medium)
- Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman
- Hardware Acceleration for Data Processing by Gustavo Alonso
- CS109: Data Science
BooksEdit
- Newman, S. (2021). Building Microservices: Designing Fine-Grained Systems. 2nd Ed. O'Reilly Media.
- Bellemare, A. (2020). Building Event-Driven Microservices: Leveraging Organizational Data at Scale. O'Reilly Media.
- Richards, M. (2020). Fundamentals of Software Architecture. O'Reilly Media.
- Dean A., & Crettaz, V. (2019). Event Streams in Action. Manning.
- Richardson, C. (2018). Microservices Patterns. Manning Publications.
- Pacheco, V. F. (2018). Microservice Patterns and Best Practices. Packt Publishing.
- De la Torre C., Wagner, B., & Rousos, M. (2018). .NET Microservices: Architecture for Containerized .NET Applications. Microsoft Corporation. (link)
- Lanaro, G. (2017). Python High Performance. Packt Publishing.
- Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.
- Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media.
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
- Pierfederici, F. (2016). Distributed Computing with Python. Packt Publishing.
- Dunning, T., & Friedman, E. (2016). Streaming Architecture: New Designs Using Apache Kafka and MapR Streams. O'Reilly Media.
- Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
- Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
- Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
- Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing.
- Kale, V. (2015). Guide to Cloud Computing for Business and Technology Managers: From Distributed Computing to Cloudware Applications. CRC Press.
- Ejsmont, A. (2015). Web Scalability for Startup Engineers. McGraw Hill.
- Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge University Press. (link)
- Zumel, N., Mount, J., & Porzak, J. (2014). Practical Data Science with R. Manning.
- Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. O'Reilly Media.
- Videla, A., & J.W. Williams, J. (2012). RabbitMQ in Action. Manning.
- Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.
Scholarly ArticlesEdit
- Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33.
- Buchlovsky, P. ... (2018). TF-Replicator: Distributed Machine Learning for Researchers. arXiv preprint arXiv:1902.00465.
- Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
- Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).
- Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
- Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
- Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
- Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618).
- Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.
- Upadhyaya, S. R. (2013). Parallel approaches to machine learning—A comprehensive survey. Journal of Parallel and Distributed Computing, 73(3), 284-292.
- Sakr, S., Liu, A., Batista, D. M., & Alomari, M. (2011). A survey of large scale data management approaches in cloud environments. IEEE Communications Surveys & Tutorials, 13(3), 311-336.
- Abadi, D. J. (2009). Data management in the cloud: Limitations and opportunities. IEEE Data Eng. Bull., 32(1), 3-12.
SoftwareEdit
- Docker (Containers)
- Anaconda Distribution - Python
- Cython - Python
- Beautiful Soup 4 - Python
- lxml - Python
- Selenium - Python
- Scrapy - Python
- ray - Python
- multiprocessing - Python
- threading - Python
- auto_ml - Python
- Celery - Python
- Elasticsearch, Logstash, Kibana (ELK)
- MongoDB
- Apache Solr
- Apache Hadoop
- Apache HBase
- Apache Spark
- Apache Hive
- Apache Cassandra
- Apache ZooKeeper
- Apache Pig
- Apache Storm
- Apache CouchDB
- Apache ActiveMQ
- Apache Samza
- Apache Flink
- Apache Kafka (which includes Kafka Connect) - A message broker
- RabbitMQ - A message broker
- Redis - A message broker
- pyspark - Spark Python API
- tensorflow_scala - Scala API for TensorFlow
- TensorFlowSharp - TensorFlow API for .NET languages
- TensorFlowOnSpark - It brings TensorFlow programs onto Apache Spark clusters
- Numba - Python
- GraphQL
- nginx
- DVC - Data Version Control
- kubeflow
- Akka
- Pykka
- Heron
- Apache Airflow - Workflow Management System
- Druid
- Apache Superset
- Horovod - TensorFlow, Keras, PyTorch, and MXNet
- Acumos AI
- HopsML
- Apache Arrow
See alsoEdit
Other ResourcesEdit
GeneralEdit
- What is Data Science by Ioannis Kourouklides - slides
- Data Science Guide
- Data Science Engineering, your way
- A manifesto for Agile data science - blog post
- Data Science Project Flow for Startups - blog post
- Large Scale Machine Learning - libraries and papers
- What are some courses on large scale learning? - Quora
- 7 Steps to Mastering Data Preparation with Python - blog post
- Web Scraping for Data Science with Python - blog post
- Princeton Commodities Modeling Blog
- Python-camp - Github
- Big Data: Spark, Hadoop, Hive, ZooKeeper, Solr, Kafka, Nutch, MongoDB, ... - installation instructions
- Deep Learning with Apache Spark and TensorFlow - blog post
- Build a Simple Chatbot with Tensorflow, Python and MongoDB - blog post
- Plotly Python Library Maps
- 5 Quick and Easy Data Visualizations in Python with Code - blog post
- William Koehrsen - blog
- ClaoudML - Free Data Science & Machine Learning Resources
- Data Science in Python: Pandas Cheat Sheet
- Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch - webinar
- Federated Learning: Collaborative Machine Learning without Centralized Training Data - blog post
- Snap ML - IBM
- pyspark (GitHub) - collection of resources
- Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data - blog post
- Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads - blog post
- Rules of Machine Learning: Best Practices for ML Engineering - blog post
- Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2 - blog post
- Michelangelo PyML: Introducing Uber’s Platform for Rapid Python ML Model Development
- Manage your Data Science project structure in early stage - blog post
- Cookiecutter Data Science — Organize your Projects — Atom and Jupyter - blog post
- surreal (GitHub) - code
- cloudwise (GitHub) - code
- caraml (GitHub) - code
- symphony (GitHub) - code
- TensorFlow Vs. Spark: How Do They Differ And Work In Tandem With Each Other - blog post
- awesome-datascience (GitHub)
- awesome-learn-datascience (GitHub)
- When Deep Learning with GPUs, use a Cluster Manager - blog post
Data Annotation & LabellingEdit
- What is Data Annotation?
- Amazon Mechanical Turk
- CloudFactory
- Appen
- Alegion
- iMerit
- Playment
- Rev - Transcription from video and audio
- Labelbox
- diffgram
- Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
- Data Annotation Tools for Machine Learning (Evolving Guide)
- awesome-data-annotation (GitHub)
EDAEdit
- Exploratory data analysis using Python for used car database taken from Kaggle - Github
- Detailed exploratory data analysis with Python - Kaggle
- Exploratory data analysis in Python - PyCon 2017 (Youtube)
- Exploratory Data Analysis with Pandas - blog post
- Simple Exploratory Data Analysis - PASSNYC - Kaggle
- EDA and Clustering - Kaggle
- Introduction to Exploratory Data Analysis in Python - blog post
- Visual Data Analysis with Python - blog post
Asynchronous Communication & MicroservicesEdit
- Pattern: Microservice Architecture
- Software Architecture Patterns and Designs
- Asynchronous communication with message queue
- Message Exchange Patterns (MEPs)
- Message exchange patterns
- Messaging patterns
- Synchronous vs Asynchronous communication in microservices integration
- Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?
- Akka Streams and Kafka Streams: Where Microservices Meet Fast Data
- Akka, Spark, or Kafka? Selecting the Right Streaming Engine
- Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems
- kafka-streams-machine-learning-examples (GitHub) - Machine Learning + Kafka Streams Examples (with code)
- Realtime Machine Learning predictions with Kafka and H2O.ai - blog post
- Understanding When to use RabbitMQ or Apache Kafka
- What is Stream Processing?
- A Gentle Introduction to Stream Processing
Distributed SystemsEdit
- Docker Distributed System Summit videos podcast episodes
- Using Docker to Simplify Distributed Systems in Development - video
- Day 11: Using Docker to build and deploy a distributed app - blog post with code
- Intro to Distributed Deep Learning Systems - blog post
- Parallel and Distributed Deep Learning by Tal Ben-Nun
- An introduction to parallel programming using Python's multiprocessing module - blog post
- Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet - blog post
- Paper Review. Petuum: A new platform for distributed machine learning on big data - blog post
- A comparison of distributed machine learning platform - blog post
- Distributed Filesystems for Deep Learning - blog post
- Distributed-TensorFlow-Guide (GitHub) - Distributed TensorFlow basics and examples of training algorithms (with code)
Deployment and ProductionEdit
- How Docker Can Help You Become A More Effective Data Scientist - blog post
- How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka - blog post
- Deploying deep learning models: Part 1 an overview - blog post
- A guide to deploying Machine/Deep Learning model(s) in Production - blog post
- How redBus uses Scikit-Learn ML models to classify customer complaints? - blog post
- Deploying a Keras Deep Learning Model as a Web Application in Python - blog post
- Awesome-docker - A curated list of Docker resources and projects
- Awesome-Kubernetes - A curated list for awesome kubernetes sources
- Michael Herman - Going Serverless with OpenFaaS, Kubernetes, and Python - PyCon 2018 (Youtube)
- Aly Sivji, Joe Jasinski, tathagata dasgupta (t) - Docker for Data Science - PyCon 2018 (Youtube)
- Ruben Orduz, Nolan Brubaker - A Python-flavored Introduction to Containers And Kubernetes (Youtube) - PyCon 2018
- Miguel Grinberg - Microservices with Python and Flask - PyCon 2017 (Youtube)
- Deploy and scale containers with Docker native, open source orchestration PyCon 2017 (Youtube)
- Miguel Grinberg - Flask at Scale - PyCon 2016 (Youtube)
- Deploying and scaling applications with Docker, Swarm, and a tiny bit of Python magic - PyCon 2016 (Youtube)
- Jérôme Petazzoni - Introduction to Docker and containers - PyCon 2016 (Youtube)
- Miguel Grinberg - Flask Workshop - PyCon 2015 (Youtube)
- Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)
- Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)
- Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users - blog post
- Deploying Keras Deep Learning Models with Flask - blog post
- Introducing Flask-RESTful - blog post
- Develop a NLP Model in Python & Deploy It with Flask, Step by Step - blog post
- Deploying Machine Learning apps with Docker containers - MUPy 2017 - video
- Getting started with Anaconda & Docker - blog post
- Docker for Data Science - blog post
- How Docker Can Help You Become A More Effective Data Scientist - blog post
- Simplified Docker-ing for Data Science — Part 1 - blog post
- Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning - blog post
- How to write a production-level code in Data Science? - blog post
- Web Access Logs in Elasticsearch and Machine Learning - webinar
- Deploying Python models to production - video
- How to deploy machine learning models into production - video
- Putting Machine Learning Models into Production - blog post
- productionML (GitHub) - code for creating Production level API services for Machine Learning
- AI Tales: Building Machine learning pipeline using Kubeflow and Minio - blog post
- Deep-Learning-in-Production (GitHub)
- Deploy your AI model the hard (and robust) way - blog post