Data science workflow repository to explore and guide you through the data science task using command line tools.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

43 lines
1.9 KiB

4 years ago
  1. # Data Science Workflow #
  2. This repository explores through examples how to use the command line in an efficient and productive way for data science tasks. Learning to obtain, scrub, explore, and model your data.
  3. # Introduction #
  4. During this examples your will learn how to: (*i*) run docker containers, (*ii*) use the command line, (*iii*) run a basic application.
  5. ## Docker ##
  6. Let us introduce docker, the first platform to make data science. Docker is a tool that allows developers, sys-admins or data-scientist to easily deploy their applications in a sandbox (**called containers**) to run on a host *operating system i.e. Linux*. The key benefit of Docker is that it allows users to package an application with all of its dependencies into a standardized unit for software development. Unlike virtual machines, containers do not have high overhead and hence enable more efficient usage of the underlying system and resources.[^1]
  7. ### Installing and using the Docker image ###
  8. Docker pull
  9. We recommend that you create a new directory, navigate to this new directory, and then run the following when you’re on macOS or Linux:
  10. ``` shell
  11. $ docker run --rm -it -v`pwd`:/data datascienceworkshops/data-science-at-the-command-line
  12. ```
  13. Or the following when you’re on Windows and using the command line:
  14. ``` shell
  15. $ docker run --rm -it -v %cd%:/data datascienceworkshops/data-science-at-the-command-line
  16. ```
  17. Or the following when you’re using Windows PowerShell:
  18. ``` shell
  19. $ docker run --rm -it -v ${PWD}:/data datascienceworkshops/data-science-at-the-command-line
  20. ```
  21. In the above commands, the option -v instructs docker to map the current directory to the /data directory inside the container, so this is the place to get data in and out of the Docker container.
  22. # Notes #
  23. - [ ] Make an container with Ubuntu 18.04
  24. - [ ] Packages to install: csvkit,
  25. [^1]: Docker for beginners, https://docker-curriculum.com/.