This wiki has no edits or logs made within the last 45 days, therefore it is marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here. If there are no signs of this wiki being used within the next 15 days, this wiki may be closed per the Dormancy Policy. This wiki will then be eligible for adoption by another user. If not adopted and still inactive 135 days from now, this wiki will become eligible for deletion. Please be sure to familiarize yourself with Miraheze's Dormancy Policy. If you are a bureaucrat, you can go to Special:ManageWiki and uncheck "inactive" yourself. If you have any other questions or concerns, please don't hesitate to ask at Stewards' noticeboard.

Difference between revisions of "Data Science"

From Ioannis Kourouklides
Jump to navigation Jump to search
Line 24: Line 24:
** Message broker/Message bus/Event bus/Integration broker/Interface engine
** Message broker/Message bus/Event bus/Integration broker/Interface engine
** Message queue
** Message queue
** Messaging patterns
*** Fire-and-Forget / One-Way
*** Request-Response
*** Publish-and-Subscribe
** Asynchronous protocols
** Asynchronous protocols
*** Advanced Message Queuing Protocol (AMQP)
*** Advanced Message Queuing Protocol (AMQP)
*** MQ Telemetry Transport (MQTT)
*** MQ Telemetry Transport (MQTT)
* Messaging patterns
** Fire-and-Forget / One-Way
** Request-Response
** Publish-and-Subscribe
** Callback
* Software Architecture
* Software Architecture
** Monolithic Architecture
** Monolithic Architecture

Revision as of 18:10, 24 November 2020

This page contains resources about Data Science, including Data Engineering and Data Management.

Subfields and Concepts

  • Agile Data Science
  • Machine Learning / Data Mining
  • Exploratory Data Analysis (EDA)
  • Data Preparation and Data Preprocessing
  • Data Fusion and Data Integration
  • Data Wrangling / Data Munging
  • Data Scraping
  • Data Sampling
  • Data Cleaning
  • Data Visualization
  • Explainable AI (XAI) / Interpretable AI
  • Big Data
  • Data Engineering, Data Management and Databases
  • High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
  • Concurrent/Multi-threading Computing for Machine Learning
  • Synchronous Communication (for Web Services)
    • Representational State Transfer (REST) Protocol
    • Remote Procedure Call (RPC)
    • Simple Object Access (SOA) Protocol
  • Asynchronous Communication / Asynchronous Messaging (for Web Services)
    • Message broker/Message bus/Event bus/Integration broker/Interface engine
    • Message queue
    • Asynchronous protocols
      • Advanced Message Queuing Protocol (AMQP)
      • MQ Telemetry Transport (MQTT)
  • Messaging patterns
    • Fire-and-Forget / One-Way
    • Request-Response
    • Publish-and-Subscribe
    • Callback
  • Software Architecture
    • Monolithic Architecture
    • Microservices Architecture

Online courses

Video Lectures

Lecture Notes


  • Newman, S. (2021). Building Microservices: Designing Fine-Grained Systems. 2nd Ed. O'Reilly Media.
  • Richardson, C. (2018). Microservices Patterns. Manning Publications.
  • De la Torre C., Wagner, B., & Rousos, M. (2018). .NET Microservices: Architecture for Containerized .NET Applications. Microsoft Corporation. (link)
  • Lanaro, G. (2017). Python High Performance. Packt Publishing Ltd.
  • Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.
  • Kleppmann, M. (2017). Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems. O'Reilly Media.
  • VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
  • Pierfederici, F. (2016). Distributed Computing with Python. Packt Publishing Ltd.
  • Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
  • Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
  • Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
  • Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
  • Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of Massive Datasets. Cambridge University Press. (link)
  • Zumel, N., Mount, J., & Porzak, J. (2014). Practical Data Science with R. Manning.
  • Schutt, R., & O'Neil, C. (2013). Doing Data Science: Straight Talk from the Frontline. O'Reilly Media.
  • Tukey, J. W. (1977). Exploratory Data Analysis. Addison-Wesley.

Scholarly Articles

  • Verbraeken, J., Wolting, M., Katzy, J., Kloppenburg, J., Verbelen, T., & Rellermeyer, J. S. (2020). A Survey on Distributed Machine Learning. ACM Computing Surveys (CSUR), 53(2), 1-33.
  • Buchlovsky, P. ... (2018). TF-Replicator: Distributed Machine Learning for Researchers. arXiv preprint arXiv:1902.00465.
  • Kang, D., Emmons, J., Abuzaid, F., Bailis, P., & Zaharia, M. (2017). NoScope: optimizing neural network queries over video at scale. Proceedings of the VLDB Endowment, 10(11), 1586-1597.
  • Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Why Should I Trust You? Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135-1144).
  • Xing, E. P., Ho, Q., Xie, P., & Wei, D. (2016). Strategies and principles of distributed machine learning on big data. Engineering, 2(2), 179-195.
  • Salloum, S., Dautov, R., Chen, X., Peng, P. X., & Huang, J. Z. (2016). Big data analytics on Apache Spark. International Journal of Data Science and Analytics, 1(3-4), 145-164.
  • Sculley, D., Holt, G., Golovin, D., Davydov, E., Phillips, T., Ebner, D., ... & Dennison, D. (2015). Hidden technical debt in machine learning systems. In Advances in Neural Information Processing Systems (pp. 2503-2511).
  • Huang, Y., Zhu, F., Yuan, M., Deng, K., Li, Y., Ni, B., ... & Zeng, J. (2015). Telco Churn Prediction with Big Data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 607-618).
  • Moritz, P., Nishihara, R., Stoica, I., & Jordan, M. I. (2015). SparkNet: Training Deep Networks in Spark. arXiv preprint arXiv:1511.06051.
  • Upadhyaya, S. R. (2013). Parallel approaches to machine learning—A comprehensive survey. Journal of Parallel and Distributed Computing, 73(3), 284-292.
  • Sakr, S., Liu, A., Batista, D. M., & Alomari, M. (2011). A survey of large scale data management approaches in cloud environments. IEEE Communications Surveys & Tutorials, 13(3), 311-336.
  • Abadi, D. J. (2009). Data management in the cloud: Limitations and opportunities. IEEE Data Eng. Bull., 32(1), 3-12.


See also

Other Resources


Asynchronous Communication & Microservices Architecture

Data Annotation & Labelling


Distributed Systems

Deployment and Production