Importing Data & Project-Oriented Workflows

class: title-slide, center, middle

.top-left[
<img src="images/uo-logo2.png" width="100%" />
]

.top-right[
<img src="images/psy-logo.png" width="100%" />
]

# Importing Data & Project-Oriented Workflows

UO R Bootcamp 2025

---

# Importing data

Importing data in R has 2 challenging aspects...

1. You need to call a function that works with a particular data format (e.g., `.csv`, `.txt`, `.sav`)

2. You need to tell R where to look for the data

---
# Importing data

.pull-left[

.center[
### {readr}

`read_csv()`, `read_tsv()`, `read_delim()`, `read_fwf()`, etc...
]
]

.pull-right[

.center[
### {rio}

`import()`
]
]

---

# Importing data

.pull-left[

.center[
### {readr}

`read_csv()`, `read_tsv()`, `read_delim()`, `read_fwf()`, etc...
]
]

.pull-right[

.center[
### {rio}

`import()`

✔️
]
]

---
# Importing data

### `rio::import()`

With `{rio}`, we just call `import()` and under the hood it calls the right read function given the file's extension (`.csv`, `.txt`, `.sav`, `.xlsx`).

We'll get some practice with this in a few minutes.

---
# Project-oriented workflows

When R looks for a file, it has a starting point. This is called the **working directory**.

The working directory that you're currently in is displayed in the console window and the files tab. Let's take a look in RStudio.

--
***

If you ever get lost, you can print your working directory with `getwd()`

If you are working in a `.Rmd` document, R by default will set whatever folder on your computer where that `.Rmd` file lives as your working directory

---

# Project-oriented workflows

``` r
getwd()
```

```
## [1] "/Users/ishryock/Documents/GitHub/summeRbootcamp2025/static/slides"
```

For example, I created these slides in a `.Rmd` document that lives in this folder on my computer ☝️

---
class: split-three
# Project-oriented workflows

The best way to simplify issues with working directories is to use <u>**RStudio Projects**</u>.

--
***

.column[.content[.center[
<br><br><br><br><br><br><br>

### Step 1
<img src="images/create_project1.png" width="90%" />
]]]

.column[.content[.center[
<br><br><br><br><br><br><br>

### Step 2
<img src="images/create_project2.png" width="90%" />

]]]

.column[.content[.center[
<br><br><br><br><br><br><br>

### Step 3
<img src="images/create_project3.png" width="90%" />

]]]

---
# Project-oriented workflows

When you create a Project in RStudio, it is associated with a folder somewhere on your computer.

It will automatically create a `.Rproj` file in that folder, which will keep track of the "top level" of your project.

--
***

---
class: yourturn
# Your turn 1

Q1. Load the `{rio}` package.

Q2. Run the following code to import the data called `pragmatic_scales_data.csv`. Why do you get an error? Where is this file saved? *Hint*: Look through the folder(s) in the Files pane

``` r
ps_data <- import("pragmatic_scales_data.csv")
```

Q3. Fix the error in the code above to import the data.

Q4. Remember that `{rio}` is flexible with file types---`rio::import()` will call the right function under the hood to read in the file based on the file extension. Use `{rio}` to import `pragmatic_scales_data.sav` (an SPSS file type) and save it to a new object named `ps_data_2`.

---
class: solution
# Solution

.panelset[
.panel[.panel-name[Q1]

``` r
library(rio)
```
]

.panel[.panel-name[Q2]

``` r
ps_data <- import("06_pragmatic_scales_data.csv")
```

```
## Error: No such file: 06_pragmatic_scales_data.csv
```

***

The file *pragmatic_scales_data.csv* is saved in the *data* folder, so we need to tell R to look in that folder.
]

.panel[.panel-name[Q3]

``` r
ps_data <- import("data/06_pragmatic_scales_data.csv")
```

]

.panel[.panel-name[Q4]

``` r
ps_data_2 <- import("data/06_pragmatic_scales_data.sav")
```

]
]

---
# Exporting data

You can also use `{rio}` to export your data using `export()`.

--
***

Here are the arguments you will need to use for `export()`:

``` r
export(x, file)
```

`x` is the `data.frame` object in your RStudio Environment you want to export

`file` is the path/filename for the resulting file

--
***

For example, let's say I want to export `ps_data` as an `.xlsx` file and put it into the `data/` subdirectory.

``` r
export(ps_data, "data/ps_data.xlsx")
```

---
class: yourturn
# Your turn 2

1. Look through the Files pane and find the file `another_data_set.csv`. Make note of what subdirectory it is saved in. Import the data and save to an object called `another_df`.

1. Now export the data you just imported and save it into the `data/` directory. Make sure the name of the resulting file is `another_data_set`, and export it as a `.xlsx` file.

1. One of your colleagues insists you send them a `.sav` file so that they can run the analyses in SPSS. Make another copy of `another_data_set` in the `data/` subdirectory that is in the `.sav` format.

---
class: solution
# Solution

.panelset[
.panel[.panel-name[Q1]

``` r
another_df <- import("data/more_data/another_data_set.csv")
```

]

.panel[.panel-name[Q2]

``` r
export(another_df, "data/another_data_set.xlsx")
```
]

.panel[.panel-name[Q3]

``` r
export(another_df, "data/another_data_set.sav")
```

]

---
# Viewing data

Now that your data is loaded in R, you'll want to take a look at it. There are a few different ways to do that, each offering different information.

--
***

### `View()`

One way is to click on the `View` button in the environment pane.

You should see `ps_data` in the environment pane with a little data table icon at the far right. Click on that icon.

You'll notice that this ran `View(ps_data)` in the console. We could have instead just typed this directly ourselves`*`---notice the capital `V` in `View()`

---
# Viewing data

.panelset[
.panel[.panel-name[`head()`]

You can also see just the first six rows of a data frame with `head()`, which is especially helpful for large data sets.

``` r
head(ps_data)
```

```
##   subid   item correct  age condition
## 1   M22  faces       1 2.00     Label
## 2   M22 houses       1 2.00     Label
## 3   M22  pasta       0 2.00     Label
## 4   M22   beds       0 2.00     Label
## 5   T22   beds       0 2.13     Label
## 6   T22  faces       0 2.13     Label
```
]

.panel[.panel-name[`tail()`]

`tail()` is the complement to `head()`, displaying just the final six rows from a data frame.

``` r
tail(ps_data)
```

```
##      subid   item correct  age condition
## 583 MSCH84  pasta       1 2.83  No Label
## 584 MSCH84   beds       0 2.83  No Label
## 585 MSCH85  faces       0 2.69  No Label
## 586 MSCH85 houses       0 2.69  No Label
## 587 MSCH85  pasta       0 2.69  No Label
## 588 MSCH85   beds       0 2.69  No Label
```
]

.panel[.panel-name[`str()`]

We saw `str()` when we first introduced data frames. It's worth mentioning it again because it can be so useful when you import data to see how your variables were read in (i.e. their types)

``` r
str(ps_data)
```

```
## 'data.frame':	588 obs. of  5 variables:
##  $ subid    : chr  "M22" "M22" "M22" "M22" ...
##  $ item     : chr  "faces" "houses" "pasta" "beds" ...
##  $ correct  : int  1 1 0 0 0 0 1 1 0 0 ...
##  $ age      : num  2 2 2 2 2.13 2.13 2.13 2.13 2.32 2.32 ...
##  $ condition: chr  "Label" "Label" "Label" "Label" ...
```

]

.panel[.panel-name[`glimpse()`]

`glimpse()` is very similar to `str()` but is a tidyverse function, and it shows you a little more raw data

``` r
glimpse(ps_data)
```

```
## Rows: 588
## Columns: 5
## $ subid     <chr> "M22", "M22", "M22", "M22", "T22", "T22", "T22", "T22", "T17…
## $ item      <chr> "faces", "houses", "pasta", "beds", "beds", "faces", "houses…
## $ correct   <int> 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, …
## $ age       <dbl> 2.00, 2.00, 2.00, 2.00, 2.13, 2.13, 2.13, 2.13, 2.32, 2.32, …
## $ condition <chr> "Label", "Label", "Label", "Label", "Label", "Label", "Label…
```

]
]

---
# Viewing data

A `tibble` is much like the data frame in base R, but **it has nicer printing methods**.

As a result, you only have to call a `tibble` to see much of the information you would be interested in.

---
# Viewing data

``` r
ps_data <- tibble(ps_data)

ps_data
```

```
## # A tibble: 588 × 5
##    subid item   correct   age condition
##    <chr> <chr>    <int> <dbl> <chr>    
##  1 M22   faces        1  2    Label    
##  2 M22   houses       1  2    Label    
##  3 M22   pasta        0  2    Label    
##  4 M22   beds         0  2    Label    
##  5 T22   beds         0  2.13 Label    
##  6 T22   faces        0  2.13 Label    
##  7 T22   houses       1  2.13 Label    
##  8 T22   pasta        1  2.13 Label    
##  9 T17   pasta        0  2.32 Label    
## 10 T17   faces        0  2.32 Label    
## # ℹ 578 more rows
```

---
class: yourturn

# Your turn 3

1. Take a look at `another_df`, which should be in your Global Environment. Click the "View" button in the Environment pane and also use `View()` in your Console.

2. Now look at some summary information about `another_df` using `str()` and `glimpse()`. *Hint*. You will need to load the tidyverse package first in order to use `glimpse()`.

3. Lastly, find the number of rows and columns in `another_df` using `nrow()` and `ncol()`, respectively. Make sure your answers match the summary information given to you above.

---
class: solution

# Solution

.panelset[
.panel[.panel-name[Q1]

``` r
View(another_df)
```
]

.panel[.panel-name[Q2]

``` r
library(tidyverse)

str(another_df)
```

```
## 'data.frame':	32 obs. of  4 variables:
##  $ subid  : chr  "A001" "A001" "A001" "A001" ...
##  $ stimuli: chr  "A" "B" "C" "D" ...
##  $ correct: int  0 0 1 0 1 1 1 1 0 0 ...
##  $ age    : num  2.5 2.5 2.5 2.5 2.75 2.75 2.75 2.75 3.6 3.6 ...
```

``` r
glimpse(another_df)
```

```
## Rows: 32
## Columns: 4
## $ subid   <chr> "A001", "A001", "A001", "A001", "B002", "B002", "B002", "B002"…
## $ stimuli <chr> "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "A…
## $ correct <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1,…
## $ age     <dbl> 2.50, 2.50, 2.50, 2.50, 2.75, 2.75, 2.75, 2.75, 3.60, 3.60, 3.…
```
]

.panel[.panel-name[Q3]

``` r
nrow(another_df)
```

```
## [1] 32
```

``` r
ncol(another_df)
```

```
## [1] 4
```

]
]

---

## The Art of File Naming

Replace `paper_final_FINAL_REALLY.docx` with a consistent pattern:

```
YYYY-MM-DD_ProjectName_DocumentType.ext
```

**Examples**

```
2025-01-15_DissertationProposal.docx
2025-02-03_SurveyData_Raw.csv
2025-03-10_LitReview_Submitted.pdf
```

---

## Why this helps

- **Chronological ordering**: files sort by date automatically
- **Clarity**: purpose is obvious from the name
- **Searchability**: easier to filter/find by date or keywords
- **Version control**: clean history even without Git/cloud
  - Pro tip: Implement a method for backing up your data **before** it's lost

---

## Collaboration & versions

- Add initials for the last editor when useful:  
  `2025-04-18_InterviewAnalysis_JS.docx`
- Prefer **cloud version history** (Drive/OneDrive) instead of making duplicates

---

## Structuring Research Projects

Use a numbered, reproducible layout. Everything lives under the root **ProjectName**.

```text
ProjectName/
├─ 01_Data/
│  ├─ Raw/           # original, unmodified data
│  ├─ Processed/     # cleaned and transformed data
│  └─ Metadata/      # codebooks, data dictionaries
├─ 02_Scripts/
│  ├─ Data_Cleaning/ # code to process raw data
│  ├─ Analysis/      # statistical analysis scripts
│  └─ Visualization/ # code for figures and tables
├─ 03_Output/
│  ├─ Tables/        # generated results tables
│  ├─ Figures/       # plots, diagrams, visualizations
│  └─ Models/        # saved statistical models
├─ 04_Documents/
│  ├─ Drafts/        # working papers, draft manuscripts
│  ├─ Final/         # submitted versions
│  └─ Feedback/      # reviewer comments, notes
└─ README.txt        # project overview, navigation guide
```

> Within each folder, apply the same **file naming** convention.

---

## Why this structure works

- **Reproducibility**: trace from raw data → outputs
- **Clarity**: inputs, code, and results are separated
- **Collaboration**: teammates know where things live
- **Publication**: easier to prep data/code for journals

---
class: yourturn, center, middle
# Q & A

---
class: yourturn, center, middle
# Break!