[tisem, airbnb, template, workflow, example]


Overview

Using publicly available data from AirBnB (available via Kaggle.com), we illustrate how a reproducible workflow may look like in practice.

Check out our GitHub repository for all the details on how to clone the project and run it. Alternatively, continue reading below.

We’ve crafted this project to run:

  • platform-independent (Mac, Linux, Windows)
  • across a diverse set of software programs (Stata, Python, R)
  • producing an entire (mock) paper, including modules that
    • download data from Kaggle,
    • prepare data for analysis,
    • run a simple analysis,
    • produce a paper with output tables and figures.

How to run it

Dependencies

  • Install Python.

    • Anaconda is recommended. Download Anaconda.
    • check availability: type anaconda --version in the command line.
  • Install Kaggle package.

    • Kaggle API instruction for installation and setup.
  • Install Automation tools.

    • GNU make: already installed in Mac and Linux OS. Download Make for Windows OS and install.
    • Windows OS users only: make Make available via the command line.
      • Right Click on Computer
      • Go to Property, and click Advanced System Settings
      • Choose Environment Variables, and choose Path under the system variables, click edit
      • Add the bin of Make
    • check availability: type make --version in the command line.
  • Install Stata.

    • making Stata available via the command line. Instruction for adding Stata to path.
    • check availability: type $STATA_BIN --version in the command line.
  • Install Perl.

    • Perl is already installed in Mac and Linux OS. Download Perl for Windows OS.
    • Make sure Perl available via the command line.
    • check availability: type perl -v in the command line.
  • Install LyX.

    • LyX is an open source document processor based on the LaTeX. Download LyX.
    • make sure LyX available via the command line.
    • check availability: type $LYX_BIN in the command line.

Run it

Open your command line tool:

  • Check whether your present working directory is tisem-airbnb by typing pwd in terminal

    • if not, type cd yourpath/tisem-airbnb to change your directory to tisem-airbnb
  • Type make in the command line.

Directory structure

Make sure makefile is put in the present working directory. The directory structure for the Airbnb project is shown below.

├── data
├── gen
│   ├── analysis
│   │   ├── input
│   │   ├── output
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   └── temp
│   ├── data_preparation
│   │   ├── audit
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   ├── input
│   │   ├── output
│   │   │   ├── figure
│   │   │   ├── log
│   │   │   └── table
│   │   └── temp
│   └── paper
│       ├── input
│       ├── output
│       └── temp
└── src
    ├── analysis
    ├── data_preparation
    └── paper
  • gen: all generated files such as tables, figures, logs.
    • Three parts: data_preparation, analysis, and paper.
    • audit: put the resulting log/tables/figures of audit program. It has three sub-folders: figure, log, and table.
    • temp : put the temporary files, such as some intermediate datasets. We may delete these filed in the end.
    • output: put results, including the generated figures in sub-folder figure, log files in sub-folder log, and tables in sub-folder table.
    • input: put all temporary input files
  • data: all raw data.
  • src: all source codes.
    • Three parts: data_preparation, analysis, and paper (including TeX files).