Data Science: Difference between revisions
Kourouklides (talk | contribs) |
Kourouklides (talk | contribs) |
||
Line 89: | Line 89: | ||
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post |
*[https://sebastianraschka.com/Articles/2014_multiprocessing.html An introduction to parallel programming using Python's multiprocessing module] - blog post |
||
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post |
*[https://blog.cambridgespark.com/putting-machine-learning-models-into-production-d768560907bd Putting Machine Learning Models into Production] - blog post |
||
+ | *[https://www.kdnuggets.com/2015/12/spark-deep-learning-training-with-sparknet.html Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet] - blog post |
||
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet] |
*[https://www.datasciencecentral.com/profiles/blogs/data-science-in-python-pandas-cheat-sheet Data Science in Python: Pandas Cheat Sheet] |
||
*[https://www.kaggle.com/randylaosat/simple-exploratory-data-analysis-passnyc Simple Exploratory Data Analysis - PASSNYC] - Kaggle |
*[https://www.kaggle.com/randylaosat/simple-exploratory-data-analysis-passnyc Simple Exploratory Data Analysis - PASSNYC] - Kaggle |
Revision as of 16:49, 7 August 2018
This page contains resources about Data Science, including Data Engineering.
Subfields and Concepts
- Machine Learning / Data Mining
- Exploratory Data Analysis
- Data Preparation and Preprocessing
- High Performance/Parallel/Distributed Computing for Machine Learning
- Concurrent/Multi-threading Computing for Machine Learning
- Data Engineering and Databases
- Data Visualization
- Big Data
Online courses
Video Lectures
- How to Win a Data Science Competition: Learn from Top Kagglers - Coursera
- Exploratory data analysis in Python by Chloe Mawer and Jonathan Whitmore - PyCon 2017
Lecture Notes
- What is Data Science by Ioannis Kourouklides
- When [to use] and When Not to Use Distributed Machine Learning by Chih-Jen Lin
- Open Machine Learning Course (Medium)
- Mining Massive Datasets by Jure Leskovec, Anand Rajaraman and Jeff Ullman
- Hardware Acceleration for Data Processing by Gustavo Alonso
Books
- Tukey, J. W. (1977). Exploratory data analysis. Addison-Wesley.
- Schutt, R., & O'Neil, C. (2013). Doing data science: Straight talk from the frontline. O'Reilly Media.
- Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive datasets. Cambridge University Press. (link)
- Zumel, N., Mount, J., & Porzak, J. (2014). Practical data science with R. Manning.
- Nolan, D., & Lang, D. T. (2015). Data Science in R: A Case Studies Approach to Computational Reasoning and Problem Solving. CRC Press.
- Elston, S. F. (2015). Data Science in the Cloud with Microsoft Azure Machine Learning and R. O'Reilly Media, Inc.
- Grus, J. (2015). Data Science from Scratch: First Principles with Python. O'Reilly Media.
- Madhavan, S. (2015). Mastering Python for Data Science. Packt Publishing Ltd.
- Blum, A., Hopcroft, J., & Kannan, R. (2015). Foundations of Data Science.
- VanderPlas, J. (2016). Python Data Science Handbook: Essential Tools for Working with Data. O'Reilly Media.
- Wickham, H., & Grolemund, G. (2017). R for Data Science. O'Reilly Media.
Software
- Anaconda Distribution - Python
- Beautiful Soup 4 - Python
- ray - Python
- Elasticsearch
- MongoDB
- Apache Solr
- Apache Hadoop
- Apache HBase
- Apache Spark
- Apache Hive
- Apache Kafka, which includes Kafka Connect
- Apache Cassandra
- Apache ZooKeeper
- Apache Pig
- Apache Storm
- Apache CouchDB
- Apache ActiveMQ
- RabbitMQ
- tensorflow_scala - Scala API for TensorFlow
- TensorFlowSharp - TensorFlow API for .NET languages
- TensorFlowOnSpark - It brings TensorFlow programs onto Apache Spark clusters
- multiprocessing - Python
- threading - Python
See also
Other Resources
- Data Science Guide
- Data Science Engineering, your way
- Large Scale Machine Learning - libraries and papers
- What are some courses on large scale learning? - Quora
- 7 Steps to Mastering Data Preparation with Python - blog post
- Web Scraping for Data Science with Python - blog post
- Intro to Distributed Deep Learning Systems - blog post
- Princeton Commodities Modeling Blog
- Exploratory data analysis using Python for used car database taken from Kaggle - Github
- Detailed exploratory data analysis with Python - Kaggle
- Python-camp - Github
- Big Data: Spark, Hadoop, Hive, ZooKeeper, Solr, Kafka, Nutch, MongoDB, ... - installation instructions
- Deep Learning with Apache Spark and TensorFlow - blog post
- Build a Simple Chatbot with Tensorflow, Python and MongoDB - blog post
- Visual Data Analysis with Python - blog post
- Exploratory Data Analysis with Pandas - blog post
- Plotly Python Library Maps
- 5 Quick and Easy Data Visualizations in Python with Code - blog post
- William Koehrsen - blog
- ClaoudML - Free Data Science & Machine Learning Resources
- Parallel and Distributed Deep Learning by Tal Ben-Nun
- An introduction to parallel programming using Python's multiprocessing module - blog post
- Putting Machine Learning Models into Production - blog post
- Spark + Deep Learning: Distributed Deep Neural Network Training with SparkNet - blog post
- Data Science in Python: Pandas Cheat Sheet
- Simple Exploratory Data Analysis - PASSNYC - Kaggle
- EDA and Clustering - Kaggle
- Time Series Anomaly Detection: Optimizing your Machine Learning Jobs in Elasticsearch - webinar
- Web Access Logs in Elasticsearch and Machine Learning - webinar
- Deploying Python models to production - video