Data science workflow repository to explore and guide you through the data science task using command line tools.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Gerardo Marx Chávez-Campos 398b6a62e5 initial commit 4 years ago
Dockerfile initial commit 4 years ago
Readme.md initial commit 4 years ago

Readme.md

Data Science Workflow

This repository explores through examples how to use the command line in an efficient and productive way for data science tasks. Learning to obtain, scrub, explore, and model your data.

Introduction

During this examples your will learn how to: (i) run docker containers, (ii) use the command line, (iii) run a basic application.

Docker

Let us introduce docker, the first platform to make data science. Docker is a tool that allows developers, sys-admins or data-scientist to easily deploy their applications in a sandbox (called containers) to run on a host operating system i.e. Linux. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.1

Installing and using the Docker image

Docker pull

We recommend that you create a new directory, navigate to this new directory, and then run the following when you’re on macOS or Linux:

$ docker run --rm -it -v`pwd`:/data datascienceworkshops/data-science-at-the-command-line

Or the following when you’re on Windows and using the command line:

$ docker run --rm -it -v %cd%:/data datascienceworkshops/data-science-at-the-command-line

Or the following when you’re using Windows PowerShell:

$ docker run --rm -it -v ${PWD}:/data datascienceworkshops/data-science-at-the-command-line

In the above commands, the option -v instructs docker to map the current directory to the /data directory inside the container, so this is the place to get data in and out of the Docker container.

Notes


  1. Docker for beginners, https://docker-curriculum.com/. ↩︎