Difference between revisions of "Data Science"

From Ioannis Kourouklides
Jump to navigation Jump to search
Line 87: Line 87:
   
 
==Other Resources==
 
==Other Resources==
  +
===General===
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
Line 109: Line 110:
 
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
 
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
 
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
 
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post
 
 
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
 
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
 
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet]
 
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet]
Line 115: Line 115:
 
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
 
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
*[https://www.elastic.co/webinars/event-logs-in-elasticsearch-and-machine-learning Web Access Logs in Elasticsearch and Machine Learning] - webinar
 
*[https://www.youtube.com/watch?v=f3I0izerPvc Deploying Python models to production] - video
 
*[https://www.youtube.com/watch?v=-UYyyeYJAoQ How to deploy machine learning models into production] - video
 
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
*[https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users] - blog post
 
*[https://towardsdatascience.com/deploying-keras-deep-learning-models-with-flask-5da4181436a2 Deploying Keras Deep Learning Models with Flask] - blog post
 
*[https://www.twilio.com/engineering/2012/10/18/open-sourcing-flask-restful Introducing Flask-RESTful] - blog post
 
*[https://www.youtube.com/watch?v=knAFR4u73Es Deploying Machine Learning apps with Docker containers - MUPy 2017] - video
 
*[https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139 Getting started with Anaconda & Docker] - blog post
 
*[https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 Docker for Data Science] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
 
*[https://becominghuman.ai/docker-for-data-science-part-1-dd41e5ef1d80 Simplified Docker-ing for Data Science — Part 1] - blog post
 
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
 
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post
 
 
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
 
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
  +
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
  +
  +
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
  +
*[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post
  +
*[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf Rules of Machine Learning: Best Practices for ML Engineering] - blog post
  +
*[https://blog.kovalevskyi.com/google-compute-engine-now-has-images-with-pytorch-1-0-0-and-fastai-1-0-2-57c49efd74bb Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2] - blog post
  +
===Deployment and Production===
  +
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
  +
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post
 
*[https://towardsdatascience.com/deploying-deep-learning-models-part-1-an-overview-77b4d01dd6f7 Deploying deep learning models: Part 1 an overview] - blog post
 
*[https://towardsdatascience.com/deploying-deep-learning-models-part-1-an-overview-77b4d01dd6f7 Deploying deep learning models: Part 1 an overview] - blog post
 
*[https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a A guide to deploying Machine/Deep Learning model(s) in Production] - blog post
 
*[https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a A guide to deploying Machine/Deep Learning model(s) in Production] - blog post
 
*[https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a How redBus uses Scikit-Learn ML models to classify customer complaints?] - blog post
 
*[https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a How redBus uses Scikit-Learn ML models to classify customer complaints?] - blog post
  +
*[https://willk.online/deploying-a-keras-deep-learning-model-as-a-web-application-in-p/ Deploying a Keras Deep Learning Model as a Web Application in Python] - blog post
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
 
*[https://towardsdatascience.com/how-to-write-a-production-level-code-in-data-science-5d87bd75ced How to write a production-level code in Data Science?] - blog post
 
 
*[https://awesome-docker.netlify.com/ Awesome-docker] - A curated list of Docker resources and projects
 
*[https://awesome-docker.netlify.com/ Awesome-docker] - A curated list of Docker resources and projects
 
*[https://ramitsurana.github.io/awesome-kubernetes/ Awesome-Kubernetes] - A curated list for awesome kubernetes sources
 
*[https://ramitsurana.github.io/awesome-kubernetes/ Awesome-Kubernetes] - A curated list for awesome kubernetes sources
Line 151: Line 146:
 
*[https://www.youtube.com/watch?v=YiZkHUbE6N0 Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)]
 
*[https://www.youtube.com/watch?v=YiZkHUbE6N0 Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)]
 
*[https://www.youtube.com/watch?v=FGrIyBDQLPg Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)]
 
*[https://www.youtube.com/watch?v=FGrIyBDQLPg Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)]
  +
*[https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users] - blog post
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
 
  +
*[https://towardsdatascience.com/deploying-keras-deep-learning-models-with-flask-5da4181436a2 Deploying Keras Deep Learning Models with Flask] - blog post
  +
*[https://www.twilio.com/engineering/2012/10/18/open-sourcing-flask-restful Introducing Flask-RESTful] - blog post
  +
*[https://www.youtube.com/watch?v=knAFR4u73Es Deploying Machine Learning apps with Docker containers - MUPy 2017] - video
  +
*[https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139 Getting started with Anaconda & Docker] - blog post
  +
*[https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 Docker for Data Science] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
*[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post
+
*[https://becominghuman.ai/docker-for-data-science-part-1-dd41e5ef1d80 Simplified Docker-ing for Data Science — Part 1] - blog post
  +
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
*[https://willk.online/deploying-a-keras-deep-learning-model-as-a-web-application-in-p/ Deploying a Keras Deep Learning Model as a Web Application in Python] - blog post
 
  +
*[https://towardsdatascience.com/how-to-write-a-production-level-code-in-data-science-5d87bd75ced How to write a production-level code in Data Science?] - blog post
*[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf Rules of Machine Learning: Best Practices for ML Engineering] - blog post
 
  +
*[https://www.elastic.co/webinars/event-logs-in-elasticsearch-and-machine-learning Web Access Logs in Elasticsearch and Machine Learning] - webinar
*[https://blog.kovalevskyi.com/google-compute-engine-now-has-images-with-pytorch-1-0-0-and-fastai-1-0-2-57c49efd74bb Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2] - blog post
 
  +
*[https://www.youtube.com/watch?v=f3I0izerPvc Deploying Python models to production] - video
  +
*[https://www.youtube.com/watch?v=-UYyyeYJAoQ How to deploy machine learning models into production] - video
  +
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post

Revision as of 02:13, 14 December 2018

This page contains resources about Data Science, including Data Engineering and Data Management.

Subfields and Concepts

  • Machine Learning / Data Mining
  • Exploratory Data Analysis
  • Data Preparation and Data Preprocessing
  • Data Fusion and Data Integration
  • Data Wrangling / Data Munging
  • Data Scraping
  • Data Sampling
  • Data Cleaning
  • High Performance/Parallel/Distributed Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Data Engineering, Data Management and Databases
  • Data Visualization
  • Big Data

Online courses

Video Lectures

Lecture Notes

Books

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
  • Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.

Scholarly Articles

  • Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618). ACM.
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.

Software

See also

Other Resources

General

Deployment and Production