This wiki has no edits or logs made within the last 45 days, therefore it is marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here. If there are no signs of this wiki being used within the next 15 days, this wiki may be closed per the Dormancy Policy. This wiki will then be eligible for adoption by another user. If not adopted and still inactive 135 days from now, this wiki will become eligible for deletion. Please be sure to familiarize yourself with Miraheze's Dormancy Policy. If you are a bureaucrat, you can go to Special:ManageWiki and uncheck "inactive" yourself. If you have any other questions or concerns, please don't hesitate to ask at Stewards' noticeboard.

Difference between revisions of "Data Science"

From Ioannis Kourouklides
Jump to navigation Jump to search
Line 46: Line 46:
 
* Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. ''Engineering, 2''(2), 179-195.
 
* Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. ''Engineering, 2''(2), 179-195.
 
* Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. ''International Journal of Data Science and Analytics, 1''(3-4), 145-164.
 
* Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. ''International Journal of Data Science and Analytics, 1''(3-4), 145-164.
* Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In ''Advances in neural information processing systems'' (pp. 2503-2511).
+
* Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In ''[https://en.wikipedia.org/wiki/Conference_on_Neural_Information_Processing_Systems Advances in Neural Information Processing Systems]'' (pp. 2503-2511).
 
* Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In ''Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data'' (pp. 607-618). ACM.
 
* Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In ''Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data'' (pp. 607-618). ACM.
 
* Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. ''arXiv preprint arXiv:1511.06051''.
 
* Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. ''arXiv preprint arXiv:1511.06051''.

Revision as of 02:06, 14 December 2018

This page contains resources about Data Science, including Data Engineering and Data Management.

Subfields and Concepts

  • Machine Learning / Data Mining
  • Exploratory Data Analysis
  • Data Preparation and Data Preprocessing
  • Data Fusion and Data Integration
  • Data Wrangling / Data Munging
  • Data Scraping
  • Data Sampling
  • Data Cleaning
  • High Performance/Parallel/Distributed Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Data Engineering, Data Management and Databases
  • Data Visualization
  • Big Data

Online courses

Video Lectures

Lecture Notes

Books

  • Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
  • Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.

Scholarly Articles

  • Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618). ACM.
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.

Software

See also

Other Resources