Editing Data Science

Jump to navigation Jump to search

Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then publish the changes below to finish undoing the edit.

Latest revision Your text
Line 1: Line 1:
This page contains resources about [https://en.wikipedia.org/wiki/Data_science Data Science], '''Data Engineering''' and [https://en.wikipedia.org/wiki/Data_management Data Management].
+
This page contains resources about [https://en.wikipedia.org/wiki/Data_science Data Science], including '''Data Engineering''' and [https://en.wikipedia.org/wiki/Data_management Data Management].
   
 
== Subfields and Concepts ==
 
== Subfields and Concepts ==
Line 11: Line 11:
 
* Data Sampling
 
* Data Sampling
 
* Data Cleaning
 
* Data Cleaning
* Data Visualization
 
* Explainable AI (XAI) / Interpretable AI
 
* Big Data
 
* Data Engineering, Data Management and Databases
 
 
* High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
 
* High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
 
* Concurrent/Multi-threading Computing for Machine Learning
 
* Concurrent/Multi-threading Computing for Machine Learning
* Synchronous Communication (for Web Services)
+
* Synchronous Communication
  +
** REST protocol
** Representational State Transfer (REST) Protocol
 
  +
* Asynchronous Communication / Asynchronous Messaging (for microservices)
** Remote Procedure Call (RPC)
 
** Simple Object Access Protocol (SOAP)
 
* Asynchronous Communication / Asynchronous Messaging (for Web Services)
 
 
** Message broker/Message bus/Event bus/Integration broker/Interface engine
 
** Message broker/Message bus/Event bus/Integration broker/Interface engine
 
** Message queue
 
** Message queue
  +
* Data Engineering, Data Management and Databases
** Asynchronous protocols
 
  +
* Data Visualization
*** Advanced Message Queuing Protocol (AMQP)
 
  +
* Big Data
*** MQ Telemetry Transport (MQTT)
 
  +
* Explainable AI (XAI) / Interpretable AI
* Messaging patterns
 
** Fire-and-Forget / One-Way
 
** Request-Response / Request-Reply
 
** Publisher-Subscriber
 
** Request-Callback
 
* Software Architecture
 
** Monolithic Architecture
 
** Microservices Architecture
 
** Service-Oriented Architecture (SOA)
 
* Stream Processing
 
   
 
== Online courses ==
 
== Online courses ==
Line 52: Line 37:
   
 
==Books==
 
==Books==
  +
* Lanaro, G. (2017). ''Python High Performance''. Packt Publishing Ltd.
* Newman, S. (2021). ''Building Microservices: Designing Fine-Grained Systems''. 2nd Ed. O'Reilly Media.
 
* Bellemare, A. (2020). ''Building Event-Driven Microservices: Leveraging Organizational Data at Scale''. O'Reilly Media.
 
* Richards, M. (2020). ''Fundamentals of Software Architecture''. O'Reilly Media.
 
* Dean A., & Crettaz, V. (2019). ''Event Streams in Action''. Manning.
 
* Richardson, C. (2018). ''Microservices Patterns''. Manning Publications.
 
* Pacheco, V. F. (2018). ''Microservice Patterns and Best Practices''. Packt Publishing.
 
* De la Torre C., Wagner, B., & Rousos, M. (2018). ''.NET Microservices: Architecture for Containerized .NET Applications''. Microsoft Corporation. ([https://github.com/dzfweb/microsoft-microservices-book link])
 
* Lanaro, G. (2017). ''Python High Performance''. Packt Publishing.
 
 
* Wickham, H., & Grolemund, G. (2017). ''R for Data Science''. O'Reilly Media.
 
* Wickham, H., & Grolemund, G. (2017). ''R for Data Science''. O'Reilly Media.
* Kleppmann, M. (2017). ''Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems''. O'Reilly Media.
 
 
* VanderPlas, J. (2016). ''Python Data Science Handbook: Essential Tools for Working with Data''. O'Reilly Media.
 
* VanderPlas, J. (2016). ''Python Data Science Handbook: Essential Tools for Working with Data''. O'Reilly Media.
* Pierfederici, F. (2016). ''Distributed Computing with Python''. Packt Publishing.
+
* Pierfederici, F. (2016). ''Distributed Computing with Python''. Packt Publishing Ltd.
* Dunning, T., & Friedman, E. (2016). ''Streaming Architecture: New Designs Using Apache Kafka and MapR Streams.'' O'Reilly Media.
 
 
* Nolan, D., & Lang, D. T. (2015). ''Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving''. CRC Press.
 
* Nolan, D., & Lang, D. T. (2015). ''Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving''. CRC Press.
 
* Elston, S. F. (2015). ''Data Science in the Cloud with Microsoft Azure Machine Learning and R.'' O'Reilly Media, Inc.
 
* Elston, S. F. (2015). ''Data Science in the Cloud with Microsoft Azure Machine Learning and R.'' O'Reilly Media, Inc.
 
* Grus, J. (2015). ''Data Science from Scratch: First Principles with Python''. O'Reilly Media.
 
* Grus, J. (2015). ''Data Science from Scratch: First Principles with Python''. O'Reilly Media.
* Madhavan, S. (2015). ''Mastering Python for Data Science''. Packt Publishing.
+
* Madhavan, S. (2015). ''Mastering Python for Data Science''. Packt Publishing Ltd.
  +
* Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. ([http://www.mmds.org/ link])
* Kale, V. (2015). ''Guide to Cloud Computing for Business and Technology Managers: From Distributed Computing to Cloudware Applications''. CRC Press.
 
  +
* Zumel, N., Mount, J., & Porzak, J. (2014). ''Practical data science with R''. Manning.
* Ejsmont, A. (2015). ''Web Scalability for Startup Engineers''. McGraw Hill.
 
  +
* Schutt, R., & O'Neil, C. (2013). ''Doing data science: Straight talk from the frontline''. O'Reilly Media.
* Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). ''Mining of Massive Datasets''. Cambridge University Press. ([http://www.mmds.org/ link])
 
* Zumel, N., Mount, J., & Porzak, J. (2014). ''Practical Data Science with R''. Manning.
+
* Tukey, J. W. (1977). ''Exploratory data analysis''. Addison-Wesley.
* Schutt, R., & O'Neil, C. (2013). ''Doing Data Science: Straight Talk from the Frontline''. O'Reilly Media.
 
* Videla, A., & J.W. Williams, J. (2012). ''RabbitMQ in Action''. Manning.
 
* Tukey, J. W. (1977). ''Exploratory Data Analysis''. Addison-Wesley.
 
   
 
==Scholarly Articles==
 
==Scholarly Articles==
Line 147: Line 120:
 
==Other Resources==
 
==Other Resources==
 
===General===
 
===General===
*[https://www.slideshare.net/kourouklides/what-is-data-science-99294704/ What is Data Science by Ioannis Kourouklides] - slides
+
*[https://www.slideshare.net/kourouklides/what-is-data-science-99294704/ What is Data Science by Ioannis Kourouklides]
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[https://datascienceguide.github.io/ Data Science Guide]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
 
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
Line 168: Line 141:
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
 
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
 
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
*[https://www.zurich.ibm.com/snapml/ Snap ML] - IBM
 
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
 
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
 
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
Line 185: Line 157:
 
*[https://github.com/siboehm/awesome-learn-datascience awesome-learn-datascience (GitHub)]
 
*[https://github.com/siboehm/awesome-learn-datascience awesome-learn-datascience (GitHub)]
 
*[https://www.logicalclocks.com/blog/when-deep-learning-with-gpus-use-a-cluster-manager When Deep Learning with GPUs, use a Cluster Manager] - blog post
 
*[https://www.logicalclocks.com/blog/when-deep-learning-with-gpus-use-a-cluster-manager When Deep Learning with GPUs, use a Cluster Manager] - blog post
  +
*[https://otonomo.io/blog/luigi-airflow-pinball-and-chronos-comparing-workflow-management-systems/ Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems]
  +
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
  +
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
  +
*[https://otonomo.io/blog/redis-kafka-or-rabbitmq-which-microservices-message-broker-to-choose/ Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?]
  +
*[https://dzone.com/articles/akka-streams-and-kafka-streams-where-microservices Akka Streams and Kafka Streams: Where Microservices Meet Fast Data]
  +
*[https://dzone.com/articles/akka-spark-or-kafka-selecting-the-right-streaming Akka, Spark, or Kafka? Selecting the Right Streaming Engine]
   
 
===Data Annotation & Labelling===
 
===Data Annotation & Labelling===
Line 190: Line 168:
 
*[https://www.mturk.com Amazon Mechanical Turk]
 
*[https://www.mturk.com Amazon Mechanical Turk]
 
*[https://www.cloudfactory.com/ CloudFactory]
 
*[https://www.cloudfactory.com/ CloudFactory]
*[https://appen.com/ Appen]
+
*[https://www.rev.com/ Rev]
*[https://www.alegion.com/ Alegion]
 
*[https://imerit.net/ iMerit]
 
*[https://playment.io/ Playment]
 
*[https://www.rev.com/ Rev] - Transcription from video and audio
 
*[https://labelbox.com/ Labelbox]
 
*[https://github.com/diffgram/diffgram diffgram]
 
 
*[https://dl.acm.org/citation.cfm?id=1866696 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk]
 
*[https://dl.acm.org/citation.cfm?id=1866696 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk]
 
*[https://www.cloudfactory.com/data-annotation-tool-guide Data Annotation Tools for Machine Learning (Evolving Guide)]
 
*[https://www.cloudfactory.com/data-annotation-tool-guide Data Annotation Tools for Machine Learning (Evolving Guide)]
*[https://github.com/taivop/awesome-data-annotation awesome-data-annotation (GitHub)]
 
   
 
===EDA===
 
===EDA===
Line 210: Line 181:
 
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
 
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
 
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd Visual Data Analysis with Python] - blog post
 
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd Visual Data Analysis with Python] - blog post
 
===Asynchronous Communication & Microservices===
 
*[https://microservices.io/patterns/microservices.html Pattern: Microservice Architecture]
 
*[https://www.dineshonjava.com/software-architecture-patterns-and-designs/ Software Architecture Patterns and Designs]
 
*[https://codeblog.dotsandbrackets.com/asynchronous-communication-with-message-queue/ Asynchronous communication with message queue]
 
*[https://garba.org/article/general/soa/mep.html Message Exchange Patterns (MEPs)]
 
*[https://flylib.com/books/en/2.365.1/message_exchange_patterns.html Message exchange patterns]
 
*[https://docs.microsoft.com/en-us/azure/architecture/patterns/category/messaging Messaging patterns]
 
*[https://medium.com/@mmz.zaeimi/synchronous-vs-asynchronous-communication-in-microservices-integration-f4dd36478fd2 Synchronous vs Asynchronous communication in microservices integration]
 
*[https://otonomo.io/blog/redis-kafka-or-rabbitmq-which-microservices-message-broker-to-choose/ Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?]
 
*[https://dzone.com/articles/akka-streams-and-kafka-streams-where-microservices Akka Streams and Kafka Streams: Where Microservices Meet Fast Data]
 
*[https://dzone.com/articles/akka-spark-or-kafka-selecting-the-right-streaming Akka, Spark, or Kafka? Selecting the Right Streaming Engine]
 
*[https://otonomo.io/blog/luigi-airflow-pinball-and-chronos-comparing-workflow-management-systems/ Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems]
 
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
 
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
 
*[https://tanzu.vmware.com/content/blog/understanding-when-to-use-rabbitmq-or-apache-kafka Understanding When to use RabbitMQ or Apache Kafka]
 
*[https://www.ververica.com/what-is-stream-processing What is Stream Processing?]
 
*[https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97 A Gentle Introduction to Stream Processing]
 
   
 
=== Distributed Systems===
 
=== Distributed Systems===
Line 240: Line 193:
 
*[http://www.cheerml.com/comparison-distributed-ml-platform A comparison of distributed machine learning platform] - blog post
 
*[http://www.cheerml.com/comparison-distributed-ml-platform A comparison of distributed machine learning platform] - blog post
 
*[https://www.logicalclocks.com/why-you-need-a-distributed-filesystem-for-deep-learning/ Distributed Filesystems for Deep Learning] - blog post
 
*[https://www.logicalclocks.com/why-you-need-a-distributed-filesystem-for-deep-learning/ Distributed Filesystems for Deep Learning] - blog post
  +
*[https://www.zurich.ibm.com/snapml/ Snap ML] - IBM
 
*[https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
*[https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
   

Please note that all contributions to Ioannis Kourouklides are considered to be released under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) (see Ioannis Kourouklides:Copyrights for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource. Do not submit copyrighted work without permission!

Cancel Editing help (opens in new window)