This wiki has no edits or logs made within the last 45 days, therefore it is marked as inactive. If you would like to prevent this wiki from being closed, please start showing signs of activity here. If there are no signs of this wiki being used within the next 15 days, this wiki may be closed per the Dormancy Policy. This wiki will then be eligible for adoption by another user. If not adopted and still inactive 135 days from now, this wiki will become eligible for deletion. Please be sure to familiarize yourself with Miraheze's Dormancy Policy. If you are a bureaucrat, you can go to Special:ManageWiki and uncheck "inactive" yourself. If you have any other questions or concerns, please don't hesitate to ask at Stewards' noticeboard.

Difference between revisions of "Data Science"

Jump to navigation Jump to search
 
(58 intermediate revisions by the same user not shown)
This page contains resources about [https://en.wikipedia.org/wiki/Data_science Data Science], including '''Data Engineering''' and [https://en.wikipedia.org/wiki/Data_management Data Management].
 
== Subfields and Concepts ==
* Agile Data Science
* [[Machine Learning]] / Data Mining
* Exploratory Data Analysis (EDA)
* Data Preparation and Data Preprocessing
* Data Fusion and Data Integration
* Data Sampling
* Data Cleaning
* High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
* Concurrent/Multi-threading Computing for Machine Learning
* Synchronous Vs Asynchronous Communication
* Data Engineering, Data Management and Databases
* Data Visualization
* Big Data
* Explainable AI (XAI) / Interpretable AI
* Big Data
* Data Engineering, Data Management and Databases
* High Performance/Parallel/Distributed/Cloud Computing for Machine Learning
* Concurrent/Multi-threading Computing for Machine Learning
* Synchronous Communication (for Web Services)
** Representational State Transfer (REST) Protocol
** Remote Procedure Call (RPC)
** Simple Object Access Protocol (SOAP)
* Asynchronous Communication / Asynchronous Messaging (for Web Services)
** Message broker/Message bus/Event bus/Integration broker/Interface engine
** Message queue
** Asynchronous protocols
*** Advanced Message Queuing Protocol (AMQP)
*** MQ Telemetry Transport (MQTT)
* Messaging patterns
** Fire-and-Forget / One-Way
** Request-Response / Request-Reply
** Publisher-Subscriber
** Request-Callback
* Software Architecture
** Monolithic Architecture
** Microservices Architecture
** Service-Oriented Architecture (SOA)
* Stream Processing
 
== Online courses ==
 
==Books==
* Newman, S. (2021). ''Building Microservices: Designing Fine-Grained Systems''. 2nd Ed. O'Reilly Media.
* Lanaro, G. (2017). ''Python High Performance''. Packt Publishing Ltd.
* Bellemare, A. (2020). ''Building Event-Driven Microservices: Leveraging Organizational Data at Scale''. O'Reilly Media.
* Richards, M. (2020). ''Fundamentals of Software Architecture''. O'Reilly Media.
* Dean A., & Crettaz, V. (2019). ''Event Streams in Action''. Manning.
* Richardson, C. (2018). ''Microservices Patterns''. Manning Publications.
* Pacheco, V. F. (2018). ''Microservice Patterns and Best Practices''. Packt Publishing.
* De la Torre C., Wagner, B., & Rousos, M. (2018). ''.NET Microservices: Architecture for Containerized .NET Applications''. Microsoft Corporation. ([https://github.com/dzfweb/microsoft-microservices-book link])
* Lanaro, G. (2017). ''Python High Performance''. Packt Publishing.
* Wickham, H., & Grolemund, G. (2017). ''R for Data Science''. O'Reilly Media.
* Kleppmann, M. (2017). ''Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems''. O'Reilly Media.
* VanderPlas, J. (2016). ''Python Data Science Handbook: Essential Tools for Working with Data''. O'Reilly Media.
* Pierfederici, F. (2016). ''Distributed Computing with Python''. Packt Publishing Ltd.
* Dunning, T., & Friedman, E. (2016). ''Streaming Architecture: New Designs Using Apache Kafka and MapR Streams.'' O'Reilly Media.
* Nolan, D., & Lang, D. T. (2015). ''Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving''. CRC Press.
* Elston, S. F. (2015). ''Data Science in the Cloud with Microsoft Azure Machine Learning and R.'' O'Reilly Media, Inc.
* Grus, J. (2015). ''Data Science from Scratch: First Principles with Python''. O'Reilly Media.
* Madhavan, S. (2015). ''Mastering Python for Data Science''. Packt Publishing Ltd.
* Kale, V. (2015). ''Guide to Cloud Computing for Business and Technology Managers: From Distributed Computing to Cloudware Applications''. CRC Press.
* Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. ([http://www.mmds.org/ link])
* Ejsmont, A. (2015). ''Web Scalability for Startup Engineers''. McGraw Hill.
* Zumel, N., Mount, J., & Porzak, J. (2014). ''Practical data science with R''. Manning.
* Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). ''Mining of Massive Datasets''. Cambridge University Press. ([http://www.mmds.org/ link])
* Schutt, R., & O'Neil, C. (2013). ''Doing data science: Straight talk from the frontline''. O'Reilly Media.
* TukeyZumel, N., Mount, J., W& Porzak, J. (19772014).  ''ExploratoryPractical Data Science datawith analysisR''. Addison-WesleyManning.
* Schutt, R., & O'Neil, C. (2013). ''Doing Data Science: Straight Talk from the Frontline''. O'Reilly Media.
* Videla, A., & J.W. Williams, J. (2012). ''RabbitMQ in Action''. Manning.
* Tukey, J. W. (1977). ''Exploratory Data Analysis''. Addison-Wesley.
 
==Scholarly Articles==
==Other Resources==
===General===
*[https://www.slideshare.net/kourouklides/what-is-data-science-99294704/ What is Data Science by Ioannis Kourouklides] - slides
*[https://datascienceguide.github.io/ Data Science Guide]
*[http://jadianes.me/data-science-your-way/ Data Science Engineering, your way]
*[http://vlad17.github.io/COS513-Blog/ Princeton Commodities Modeling Blog]
*[https://github.com/upalr/Python-camp Python-camp] - Github
*[https://github.com/ajaymache/data-analysis-using-python Exploratory data analysis using Python for used car database taken from Kaggle] - Github
*[https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python Detailed exploratory data analysis with Python] - Kaggle
* [https://www.youtube.com/watch?v=W5WE9Db2RLU Exploratory data analysis in Python - PyCon 2017 (Youtube)]
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 Exploratory Data Analysis with Pandas] - blog post
*[https://www.kaggle.com/randylaosat/simple-exploratory-data-analysis-passnyc Simple Exploratory Data Analysis - PASSNYC] - Kaggle
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
*[http://mtitek.com/big-data.php Big Data: Spark, Hadoop, Hive, ZooKeeper, Solr, Kafka, Nutch, MongoDB, ...] - installation instructions
*[https://databricks.com/blog/2016/01/25/deep-learning-with-apache-spark-and-tensorflow.html Deep Learning with Apache Spark and TensorFlow] - blog post
*[https://khartig.wordpress.com/2017/12/30/build-a-simple-chatbot-with-tensorflow-python-and-mongodb/ Build a Simple Chatbot with Tensorflow, Python and MongoDB] - blog post
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd Visual Data Analysis with Python] - blog post
*[https://plot.ly/python/maps/ Plotly Python Library Maps]
*[https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f 5 Quick and Easy Data Visualizations in Python with Code] - blog post
*[https://medium.com/@williamkoehrsen William Koehrsen] - blog
*[http://www.claoudml.co/ ClaoudML] - Free Data Science & Machine Learning Resources
*[https://medium.com/@Petuum/intro-to-distributed-deep-learning-systems-a2e45c6b8e7 Intro to Distributed Deep Learning Systems] - blog post
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet]
*[https://www.elastic.co/webinars/time-series-anomaly-detection-optimizing-machine-learning-jobs-in-elasticsearch Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch] - webinar
*[https://ai.googleblog.com/2017/04/federated-learning-collaborative.html Federated Learning: Collaborative Machine Learning without Centralized Training Data] - blog post
*[https://www.zurich.ibm.com/snapml/ Snap ML] - IBM
*[https://github.com/vsmolyakov/pyspark pyspark (GitHub)] - collection of resources
* [https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
*[https://becominghuman.ai/cheat-sheets-for-ai-neural-networks-machine-learning-deep-learning-big-data-678c51b4b463 Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data] - blog post
*[https://eng.uber.com/peloton/ Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads] - blog post
*[https://github.com/SurrealAI/caraml caraml (GitHub)] - code
*[https://github.com/SurrealAI/symphony symphony (GitHub)] - code
*[https://www.mturk.com Amazon Mechanical Turk]
*[https://dl.acm.org/citation.cfm?id=1866696 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk]
*[http://muratbuffalo.blogspot.com/2016/04/petuum-new-platform-for-distributed.html Paper Review. Petuum: A new platform for distributed machine learning on big data] - blog post
*[http://www.cheerml.com/comparison-distributed-ml-platform A comparison of distributed machine learning platform] - blog post
*[https://www.analyticsindiamag.com/tensorflow-vs-spark-differ-work-tandem TensorFlow Vs. Spark: How Do They Differ And Work In Tandem With Each Other] - blog post
*[https://github.com/bulutyazilim/awesome-datascience awesome-datascience (GitHub)]
*[https://github.com/siboehm/awesome-learn-datascience awesome-learn-datascience (GitHub)]
*[https://www.logicalclocks.com/blog/when-deep-learnerslearning-with-gpus-use-a-cluster-manager-for-your-gpus/ When Deep Learning with GPUs, use a Cluster Manager] - blog post
 
*[https://www.logicalclocks.com/why-you-need-a-distributed-filesystem-for-deep-learning/ Distributed Filesystems for Deep Learning] - blog post
===Data Annotation & Labelling===
*[https://www.zurich.ibm.com/snapml/ Snap ML] - IBM
*[https://appen.com/blog/data-annotation/ What is Data Annotation?]
*[https://otonomo.io/blog/luigi-airflow-pinball-and-chronos-comparing-workflow-management-systems/ Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems]
*[https://www.mturk.com Amazon Mechanical Turk]
*[https://www.cloudfactory.com/ CloudFactory]
*[https://appen.com/ Appen]
*[https://www.alegion.com/ Alegion]
*[https://imerit.net/ iMerit]
*[https://playment.io/ Playment]
*[https://www.rev.com/ Rev] - Transcription from video and audio
*[https://labelbox.com/ Labelbox]
*[https://github.com/diffgram/diffgram diffgram]
*[https://dl.acm.org/citation.cfm?id=1866696 Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk]
*[https://www.cloudfactory.com/data-annotation-tool-guide Data Annotation Tools for Machine Learning (Evolving Guide)]
*[https://github.com/taivop/awesome-data-annotation awesome-data-annotation (GitHub)]
 
===EDA===
*[https://github.com/ajaymache/data-analysis-using-python Exploratory data analysis using Python for used car database taken from Kaggle] - Github
*[https://www.kaggle.com/ekami66/detailed-exploratory-data-analysis-with-python Detailed exploratory data analysis with Python] - Kaggle
*[https://www.youtube.com/watch?v=W5WE9Db2RLU Exploratory data analysis in Python - PyCon 2017 (Youtube)]
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-1-exploratory-data-analysis-with-pandas-de57880f1a68 Exploratory Data Analysis with Pandas] - blog post
*[https://www.kaggle.com/randylaosat/simple-exploratory-data-analysis-passnyc Simple Exploratory Data Analysis - PASSNYC] - Kaggle
*[https://www.kaggle.com/moizzz/eda-and-clustering EDA and Clustering] - Kaggle
*[https://medium.com/python-pandemonium/introduction-to-exploratory-data-analysis-in-python-8b6bcb55c190 Introduction to Exploratory Data Analysis in Python] - blog post
*[https://medium.com/open-machine-learning-course/open-machine-learning-course-topic-2-visual-data-analysis-in-python-846b989675cd Visual Data Analysis with Python] - blog post
 
===Asynchronous Communication & Microservices===
*[https://microservices.io/patterns/microservices.html Pattern: Microservice Architecture]
*[https://www.dineshonjava.com/software-architecture-patterns-and-designs/ Software Architecture Patterns and Designs]
*[https://codeblog.dotsandbrackets.com/asynchronous-communication-with-message-queue/ Asynchronous communication with message queue]
*[https://garba.org/article/general/soa/mep.html Message Exchange Patterns (MEPs)]
*[https://flylib.com/books/en/2.365.1/message_exchange_patterns.html Message exchange patterns]
*[https://docs.microsoft.com/en-us/azure/architecture/patterns/category/messaging Messaging patterns]
*[https://medium.com/@mmz.zaeimi/synchronous-vs-asynchronous-communication-in-microservices-integration-f4dd36478fd2 Synchronous vs Asynchronous communication in microservices integration]
*[https://otonomo.io/blog/redis-kafka-or-rabbitmq-which-microservices-message-broker-to-choose/ Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?]
*[https://dzone.com/articles/akka-streams-and-kafka-streams-where-microservices Akka Streams and Kafka Streams: Where Microservices Meet Fast Data]
*[https://dzone.com/articles/akka-spark-or-kafka-selecting-the-right-streaming Akka, Spark, or Kafka? Selecting the Right Streaming Engine]
*[https://otonomo.io/blog/luigi-airflow-pinball-and-chronos-comparing-workflow-management-systems/ Luigi, Airflow, Pinball, and Chronos: Comparing Workflow Management Systems]
*[https://github.com/kaiwaehner/kafka-streams-machine-learning-examples kafka-streams-machine-learning-examples (GitHub)] - Machine Learning + Kafka Streams Examples (with code)
*[https://aseigneurin.github.io/2018/09/05/realtime-machine-learning-predictions-wth-kafka-and-h2o.html Realtime Machine Learning predictions with Kafka and H2O.ai] - blog post
*[https://tanzu.vmware.com/content/blog/understanding-when-to-use-rabbitmq-or-apache-kafka Understanding When to use RabbitMQ or Apache Kafka]
*[https://www.ververica.com/what-is-stream-processing What is Stream Processing?]
*[https://medium.com/stream-processing/what-is-stream-processing-1eadfca11b97 A Gentle Introduction to Stream Processing]
 
=== Distributed Systems===
*[https://blog.docker.com/2016/10/docker-distributed-system-summit-videos-podcast-episodes/ Docker Distributed System Summit videos podcast episodes]
*[https://www.voltdb.com/files/using-docker-simplify-distributed-systems-development/ Using Docker to Simplify Distributed Systems in Development] - video
*[https://medium.com/@harinilabs/day-11-getting-started-with-docker-and-using-it-to-build-deploy-a-distributed-app-1929669064b8 Day 11: Using Docker to build and deploy a distributed app] - blog post with [https://github.com/harinij/100DaysOfCode/tree/master/Day%20011%20-%20Docker%20WebApp code]
*[https://medium.com/@Petuum/intro-to-distributed-deep-learning-systems-a2e45c6b8e7 Intro to Distributed Deep Learning Systems] - blog post
*[https://www.systems.ethz.ch/sites/default/files/parallel-distributed-deep-learning.pdf Parallel and Distributed Deep Learning by Tal Ben-Nun]
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post
*[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post
*[http://muratbuffalo.blogspot.com/2016/04/petuum-new-platform-for-distributed.html Paper Review. Petuum: A new platform for distributed machine learning on big data] - blog post
*[http://www.cheerml.com/comparison-distributed-ml-platform A comparison of distributed machine learning platform] - blog post
*[https://www.logicalclocks.com/why-you-need-a-distributed-filesystem-for-deep-learning/ Distributed Filesystems for Deep Learning] - blog post
*[https://github.com/tmulc18/Distributed-TensorFlow-Guide Distributed-TensorFlow-Guide (GitHub)] - Distributed TensorFlow basics and examples of training algorithms (with code)
 
===Deployment and Production===
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post
*[https://github.com/practicalAI/productionML productionML (GitHub)] - code for creating Production level API services for Machine Learning
*[https://blog.docker.com/2016/10/docker-distributed-system-summit-videos-podcast-episodes/ Docker Distributed System Summit videos podcast episodes]
*[https://www.voltdb.com/files/using-docker-simplify-distributed-systems-development/ Using Docker to Simplify Distributed Systems in Development] - video
*[https://medium.com/@harinilabs/day-11-getting-started-with-docker-and-using-it-to-build-deploy-a-distributed-app-1929669064b8 Day 11: Using Docker to build and deploy a distributed app] - blog post with [https://github.com/harinij/100DaysOfCode/tree/master/Day%20011%20-%20Docker%20WebApp code]
*[https://medium.com/kredaro-engineering/ai-tales-building-machine-learning-pipeline-using-kubeflow-and-minio-4b88da30437b AI Tales: Building Machine learning pipeline using Kubeflow and Minio] - blog post
*[https://github.com/ahkarami/Deep-Learning-in-Production Deep-Learning-in-Production (GitHub)]
Cookies help us deliver our services. By using our services, you agree to our use of cookies.

Navigation menu