Data Science Resources

If you are aware of good online resources that you think we should include, please email us.

The following websites are useful for people interested in machine learning with Python.

Minnesota Supercomputing Institute

The Minnesota Supercomputing Institute has a YouTube channel with a number of tutorials on it. In addition, they regularly offer tutorials and other events. Check out their webpage for current offerings.

Tutorials available on their YouTube channel include:

Python

Below we provide a selection of resources for Python, a general purpose programming language that has become popular for use in data science. One reason for that popularity is the existence of a number of packages for Python that implement various mathematical, statistical, and machine learning-related functions. Another attractive feature of Python is the widespread use of Jupyter notebooks that allow for interactive programming in Python (as well as other languages). Jupyter notebooks are also an easy way to share code and create tutorials for various programming tasks. Of course, Python programs can also be created using standard editors used for programming languages. There will be more on this below.

General Python background

Getting Started with Python

The first task in getting started with Python is to get access. You can do this by installing it on a machine you have access to, such as your personal computer, or you can get access through a computing facility in your organization.


Access Python without installing it on your personal machine

Option 1: Google Colab
This puts you directly into a Python notebook, which has a link to an introductory video, and to a new Jupyter notebook. Other links on the side of this notebook have links to examples, etc.  If you just want to see the webpage, you can hit the cancel button at the bottom of the notebook. Colab allows you to get started with coding in Python right away. Access to Python's packages and GPUs is provided in this environment, but it is not intended for running compute-intensive programs.

Option 2: MSI Jupyter Notebook Server
Those who have an account at the Minnesota Supercomputing Institute can go directly to the Jupyter notebook server. You must have an MSI account to use MSI resources. You must also connect either from a machine on a UMN network or via VPN. See the MSI website for additional documentation. You can perform moderate-size tasks using this option, but for programs that use more resources, you should create and submit jobs via the batch system at MSI.

Option 3: Code in the Cloud from Anaconda
Anaconda also provides an environment that you can use to install and use Python and its packages. If you just want to have access to a Jupyter notebook and play around with learning and light use of Python, this could be another possibility. You will have to create an account with Anaconda.

Install Python on your personal machine

We recommend installing Anaconda as it will install Python and also install other regularly used packages for scientific computing and data science. It will also help make sure all your packages remain compatible with one another. Much of the functionality of Python, like R, Matlab, and other languages, comes from the packages. 

The Anaconda website also provides a webpage for Getting started with Anaconda

Anaconda takes a fair amount of space on the disk and time to install, so if you have an older computer with limited disk space or a limited internet connection, you should consider lighter-weight options such as Miniconda.

Alternatively, you can download Python from python.org.  In that case, you will probably want to use a package manager such as pip, which comes with Python if you install it from python.org. This gives you more control but requires more understanding of Python, its packages, and how they are managed. 

Learning to program in Python

For those unfamiliar with Python 3 and Jupyter notebooks, this is a relatively easy environment to learn. Learning Python 3 will help you advance your knowledge of data analytics, as most big data platforms and data mining/machine learning projects require a working knowledge of Python. The following resources can help you learn Python and get started in using Python for machine learning.

Running Jupyter Notebooks

If you have installed Anaconda, you can run Anaconda Navigator and then click on the Jupyter notebook icon to start the Jupyter notebook. (Once you become familiar with a Jupyter notebook, you may want to run JupyterLab, a more advanced Notebook Interface.)  

You can also log onto the MSI Jupyter notebook server. 

General Machine Learning

Deep Learning with Python

  • Deep Learning, by Goodfellow, Bengio, and Courville An online book that gives a good, although demanding, introduction to deep learning.
  • Neural Networks and Deep Learning, by Nielsen. A more introductory online book for deep learning.
  •  Dive into Deep Learning, by by Zhang, Lipton, Li, and Smola. This interactive online book includes concepts, exercises, and code.
  • Practical Deep Learning for Coders. This interactive online book also includes concepts, exercises, and code. It now has a part 2, From Deep Learning Foundations to Stable Diffusion
  • Deep Learning. This website features instructors Yann LeCun and Alfredo Canziani teaching a course on deep learning in Spring 2020 at the NYU Center for Data Science. It includes YouTube videos, slides, and Jupyter notebooks. The course concerns the latest techniques in deep learning and representation learning.
  • Welcome to the UVA Deep Learning Tutorials! The University of Amsterdam has a set of Python notebook tutorials for deep learning. These tutorial notebooks are fairly self-contained but are accompanied by videos available on YouTube.