• I just finished my first project for the ST 558: Data Science for Statisticians. The purpose of the first project was to get familiarized with R’s object oriented programming, functions and loops in R. I learned about technicalities of functions, how to define one, how to pass arguments, how to read them, how to define default values for certain arguments. Further, I also learned about the if, else loops, the %in% operator and its syntax, and how to implememt them within a function. I also learned about manipulating data, extracting precise information that had been asked in the question, that included filtering data, arranging data, summarizing, and clubbing up data to create a list to return from a function. I also got familiar with extracting certain characters from strings, using the sub_str() function in R, splitting the data based on any character within the string, and creating new columns with the modified string.

  • One of the things I did differently was the extraction of year from the data. While there are many ways to extract year from last two digits, I put a condition where I checked if the number was more than 22, in which case that was the 20th century year, and ‘19’ was appended in fron tof it, while if the number was less than 22, it means the year is from 21st century, and ‘20’ had to be attached in fron of it. The whole string was then converted to numeric using as.numeric() function.

  • I also learned how to plot different graphs using different data, and different filter, sorting conditions. I designed plotting functions for custom classes (‘county’, and ‘state’), and when the plot for that particular class was called, corresponding function would be called (with certain default values) to output the graph using ggplot2 library. The function also allowed user to send arguments as per their requirement, to plot very specific information (in the case of state type class).


<
Previous Post
Data Scientists: The Hybrid Tacticians
>
Next Post
Project 2 - NASA APIs