This wiki has no edits or logs made within the last 45 days, therefore it is marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here. If there are no signs of this wiki being used within the next 15 days, this wiki may be closed per the Dormancy Policy. This wiki will then be eligible for adoption by another user. If not adopted and still inactive 135 days from now, this wiki will become eligible for deletion. Please be sure to familiarize yourself with Miraheze's Dormancy Policy. If you are a bureaucrat, you can go to Special:ManageWiki and uncheck "inactive" yourself. If you have any other questions or concerns, please don't hesitate to ask at Stewards' noticeboard.

Difference between revisions of "Data Science"

From Ioannis Kourouklides
Jump to navigation Jump to search
Line 87: Line 87:
   
 
==Other Resources==
 
==Other Resources==
  +
===General===
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
Line 109: Line 110:
 
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
 
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
 
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
 
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post
 
 
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
 
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
 
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet]
 
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet]
Line 115: Line 115:
 
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
 
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
*[https://www.elastic.co/webinars/event-logs-in-elasticsearch-and-machine-learning Web Access Logs in Elasticsearch and Machine Learning] - webinar
 
*[https://www.youtube.com/watch?v=f3I0izerPvc Deploying Python models to production] - video
 
*[https://www.youtube.com/watch?v=-UYyyeYJAoQ How to deploy machine learning models into production] - video
 
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
*[https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users] - blog post
 
*[https://towardsdatascience.com/deploying-keras-deep-learning-models-with-flask-5da4181436a2 Deploying Keras Deep Learning Models with Flask] - blog post
 
*[https://www.twilio.com/engineering/2012/10/18/open-sourcing-flask-restful Introducing Flask-RESTful] - blog post
 
*[https://www.youtube.com/watch?v=knAFR4u73Es Deploying Machine Learning apps with Docker containers - MUPy 2017] - video
 
*[https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139 Getting started with Anaconda & Docker] - blog post
 
*[https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 Docker for Data Science] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
 
*[https://becominghuman.ai/docker-for-data-science-part-1-dd41e5ef1d80 Simplified Docker-ing for Data Science — Part 1] - blog post
 
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
 
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post
 
 
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
 
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
 
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
  +
 
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
  +
*[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post
 
*[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf Rules of Machine Learning: Best Practices for ML Engineering] - blog post
 
*[https://blog.kovalevskyi.com/google-compute-engine-now-has-images-with-pytorch-1-0-0-and-fastai-1-0-2-57c49efd74bb Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2] - blog post
  +
===Deployment and Production===
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
 
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post
 
*[https://towardsdatascience.com/deploying-deep-learning-models-part-1-an-overview-77b4d01dd6f7 Deploying deep learning models: Part 1 an overview] - blog post
 
*[https://towardsdatascience.com/deploying-deep-learning-models-part-1-an-overview-77b4d01dd6f7 Deploying deep learning models: Part 1 an overview] - blog post
 
*[https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a A guide to deploying Machine/Deep Learning model(s) in Production] - blog post
 
*[https://medium.com/@maheshkkumar/a-guide-to-deploying-machine-deep-learning-model-s-in-production-e497fd4b734a A guide to deploying Machine/Deep Learning model(s) in Production] - blog post
 
*[https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a How redBus uses Scikit-Learn ML models to classify customer complaints?] - blog post
 
*[https://medium.com/redbus-in/how-to-deploy-scikit-learn-ml-models-d390b4b8ce7a How redBus uses Scikit-Learn ML models to classify customer complaints?] - blog post
 
*[https://willk.online/deploying-a-keras-deep-learning-model-as-a-web-application-in-p/ Deploying a Keras Deep Learning Model as a Web Application in Python] - blog post
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
 
*[https://towardsdatascience.com/how-to-write-a-production-level-code-in-data-science-5d87bd75ced How to write a production-level code in Data Science?] - blog post
 
 
*[https://awesome-docker.netlify.com/ Awesome-docker] - A curated list of Docker resources and projects
 
*[https://awesome-docker.netlify.com/ Awesome-docker] - A curated list of Docker resources and projects
 
*[https://ramitsurana.github.io/awesome-kubernetes/ Awesome-Kubernetes] - A curated list for awesome kubernetes sources
 
*[https://ramitsurana.github.io/awesome-kubernetes/ Awesome-Kubernetes] - A curated list for awesome kubernetes sources
Line 151: Line 146:
 
*[https://www.youtube.com/watch?v=YiZkHUbE6N0 Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)]
 
*[https://www.youtube.com/watch?v=YiZkHUbE6N0 Andrew T. Baker - Docker 101: Introduction to Docker - PyCon 2015 (Youtube)]
 
*[https://www.youtube.com/watch?v=FGrIyBDQLPg Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)]
 
*[https://www.youtube.com/watch?v=FGrIyBDQLPg Miguel Grinberg: Flask by Example - PyCon 2014 (Youtube)]
 
*[https://towardsdatascience.com/learn-to-build-machine-learning-services-prototype-real-applications-and-deploy-your-work-to-aa97b2b09e0c Learn to Build Machine Learning Services, Prototype Real Applications, and Deploy your Work to Users] - blog post
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
 
 
*[https://towardsdatascience.com/deploying-keras-deep-learning-models-with-flask-5da4181436a2 Deploying Keras Deep Learning Models with Flask] - blog post
 
*[https://www.twilio.com/engineering/2012/10/18/open-sourcing-flask-restful Introducing Flask-RESTful] - blog post
 
*[https://www.youtube.com/watch?v=knAFR4u73Es Deploying Machine Learning apps with Docker containers - MUPy 2017] - video
 
*[https://medium.com/@patrickmichelberger/getting-started-with-anaconda-docker-b50a2c482139 Getting started with Anaconda & Docker] - blog post
 
*[https://towardsdatascience.com/docker-for-data-science-9c0ce73e8263 Docker for Data Science] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
 
*[https://towardsdatascience.com/how-docker-can-help-you-become-a-more-effective-data-scientist-7fc048ef91d5 How Docker Can Help You Become A More Effective Data Scientist] - blog post
*[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post
+
*[https://becominghuman.ai/docker-for-data-science-part-1-dd41e5ef1d80 Simplified Docker-ing for Data Science — Part 1] - blog post
 
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
*[https://willk.online/deploying-a-keras-deep-learning-model-as-a-web-application-in-p/ Deploying a Keras Deep Learning Model as a Web Application in Python] - blog post
 
 
*[https://towardsdatascience.com/how-to-write-a-production-level-code-in-data-science-5d87bd75ced How to write a production-level code in Data Science?] - blog post
*[http://martin.zinkevich.org/rules_of_ml/rules_of_ml.pdf Rules of Machine Learning: Best Practices for ML Engineering] - blog post
 
 
*[https://www.elastic.co/webinars/event-logs-in-elasticsearch-and-machine-learning Web Access Logs in Elasticsearch and Machine Learning] - webinar
*[https://blog.kovalevskyi.com/google-compute-engine-now-has-images-with-pytorch-1-0-0-and-fastai-1-0-2-57c49efd74bb Google Compute Engine Now Has Images With PyTorch 1.0.0 and FastAi 1.0.2] - blog post
 
 
*[https://www.youtube.com/watch?v=f3I0izerPvc Deploying Python models to production] - video
 
*[https://www.youtube.com/watch?v=-UYyyeYJAoQ How to deploy machine learning models into production] - video
 
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post

Revision as of 02:13, 14 December 2018

This page contains resources about Data Science, including Data Engineering and Data Management.

Subfields and Concepts

  • Machine Learning / Data Mining
  • Exploratory Data Analysis
  • Data Preparation and Data Preprocessing
  • Data Fusion and Data Integration
  • Data Wrangling / Data Munging
  • Data Scraping
  • Data Sampling
  • Data Cleaning
  • High Performance/Parallel/Distributed Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Data Engineering, Data Management and Databases
  • Data Visualization
  • Big Data

Online courses

Video Lectures

Lecture Notes

Books

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
  • Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.

Scholarly Articles

  • Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618). ACM.
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.

Software

See also

Other Resources

General

Deployment and Production