Data Science

From Ioannis Kourouklides
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

This page contains resources about Data Science, including Data Engineering and Data Management.

Subfields and Concepts

  • Agile Data Science
  • Machine Learning / Data Mining
  • Exploratory Data Analysis
  • Data Preparation and Data Preprocessing
  • Data Fusion and Data Integration
  • Data Wrangling / Data Munging
  • Data Scraping
  • Data Sampling
  • Data Cleaning
  • High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Data Engineering, Data Management and Databases
  • Data Visualization
  • Big Data
  • Explainable AI (XAI) / Interpretable AI

Online courses

Video Lectures

Lecture Notes

Books

  • Lanaro, G. (2017). Python High Performance. Packt Publishing Ltd.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Pierfederici, F. (2016). Distributed Computing with Python. Packt Publishing Ltd.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
  • Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.

Scholarly Articles

  • Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33.
  • Buchlovsky, P. ... (2018). TF-Replicator: Distributed Machine Learning for Researchers. arXiv preprint arXiv:1902.00465.
  • Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).
  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618).
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.
  • Upadhyaya, S. R. (2013). Parallel approaches to machine learning—A comprehensive survey. Journal of Parallel and Distributed Computing, 73(3), 284-292.
  • Sakr, S., Liu, A., Batista, D. M., & Alomari, M. (2011). A survey of large scale data management approaches in cloud environments. IEEE Communications Surveys & Tutorials, 13(3), 311-336.
  • Abadi, D. J. (2009). Data management in the cloud: Limitations and opportunities. IEEE Data Eng. Bull., 32(1), 3-12.

Software

See also

Other Resources

General

Deployment and Production