Difference between revisions of "Data Science"

From Ioannis Kourouklides
Jump to navigation Jump to search
Line 118: Line 118:
 
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
 
*[https://www.born2data.com/2017/deeplearning_install-part4.html Deep Learning Installation Tutorial - Part 4: How to install Docker for Deep Learning ] - blog post
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
  +
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
 
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post
 
*[https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/ How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka] - blog post

Revision as of 19:33, 18 October 2018

This page contains resources about Data Science, including Data Engineering.

Subfields and Concepts

  • Machine Learning / Data Mining
  • Exploratory Data Analysis
  • Data Preparation and Preprocessing
  • High Performance/Parallel/Distributed Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Data Engineering and Databases
  • Data Visualization
  • Big Data

Online courses

Video Lectures

Lecture Notes

Books

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
  • Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • Blum, A., Hopcroft, J., & Kannan, R. (2015). Foundations of Data Science.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.

Scholarly Articles

  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618). ACM.
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.

Software

See also

Other Resources