Skip to content

han-yan01/DS4002P1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DS4002P1

Section 1: Software and platform section

For this project we used tools including Rstudio and Python. Rsutdio was used for a variety of data processing purposes. First, Rstudio was used for data processing including appending columns such as percent change in price for later analysis. Rstudio was once again used to retrieve sentiment and append scores using the VADER package. Finally, Rstudio was used to run the final regression and GGplot2 was used to visualize. Rstudio for Windows was used for regression and visualization, and all other processes were run on MacOS. Python was used to obtain date data from the URL using regex.

Section 2: A Map of your documentation.

├── DATA

│   ├── LCID Stock Trends .gsheet

│   ├── LCID_Stock_Trends.csv

│   ├── LCID_initial_raw_query.xlsx

│   ├── LCID_updated_with_date.xlsx

│   ├── NVDA_Stock_Trends.csv

│   ├── NVDA_initial_raw_query.xlsx

│   ├── NVDA_with_dates.xlsx

│   ├── Sentiment_score.xlsx

│   ├── daily_stock_percent_changes.csv

│   ├── lcid_stock_data_sentimentVpercChange.xlsx

│   └── nvda_stock_data_sentimentVpercChange.xlsx

│   └── Data Appendix.pdf

├── OUTPUT

│   └── Analysis_plan_final.png

│   └── Compound_Sentiment_Score.png

│   └── regression_results.png

│   └── Sentiment_by_date.png

│   └── Stock_Information_Comparison.png

├── SCRIPTS

│   ├── R

│   │   ├── R.Rproj

│   │   ├── correlationLm.R

│   │   ├── desktop.ini

│   │   ├── lcid_stock_data.csv

│   │   ├── mergingdailypercents.Rmd

│   │   └── nvda_stock_data.csv

│   └── rawDataProcess

│       ├── extractDatesFromUrl.py

│       ├── sentimentVsScoreClean.py

│       └── sentiment_score.Rmd

 

Section 3: Instructions for reproducing your results.

This project takes 4 base data files and performs sentiment analysis and linear regression to achieve the given results.

Raw data collection:

NVDA_initial_raw_query.xlsx: original raw data query from databar.ai. Link to data:https://databar.ai/public-table-full/9419fc15-9bd5-4050-9253-155b4dc56111

LCID_initial_raw_query.xlsx: original raw data query from databar.ai. Link to data:https://databar.ai/public-table-full/361fe669-c6b6-4f89-b7cc-9561d7bc2314

Data processing:

  • To extract Date info from the raw data, run extractDatesFromUrl.py on NVDA_initial_raw_query.xlsx and LCID_initial_raw_query.xlsx
  • The result files are NVDA_with_dates.xlsx and LCID_with_dates.xlsx
  • To combine sentiment scores and daily percentage change by the same date for each data, run sentimentVsScoreClean.py on NVDA_with_dates.xlsx and LCID_with_dates.xlsx
  • The result files are nvda_stock_data_sentimentVpercChange.xlsx and lcid_stock_data_sentimentVpercChange.xlsx

First, take the base data files NVDA.xlsx and LCID.xlsx. Running the extractDatesFromUrl.py code file will pull date data for each sentiment from the URL's which will be important for merging later on. The result datasets with date are stored in data folder, called LCID_updated_with_date.xlsx and NVDA_with_date.xlsx.

Second, run the sentiment_score.Rmd file in Rstudio to append the VADER compound sentiment scores to the text data, and combined the NVDA sentiment score with LCID sentiment score by published date. The result file is named as Sentiment_score.xlsx.

NVDA_Stock_Trends.csv and LCID_Stock_Trends.csv must be cleaned to create daily_stock_percent_changes.csv. Their data types must be converted to reflect date and integer data types in order to run analysis. Then, the daily percent price change is found for each stock by subtracting the opening value from the closing value, dividing it by the opening value, and multiplying that value by 100. The daily stock percent change is made into a column, and the two sets are merged on date. Finally, the merged data set is exported as daily_stock_percent_changes.csv.        

Analysis Phase:

  • Correlation analysis and Linear regression:
  • Run correlationLm.R, make sure to load nvda_stock_data_sentimentVpercChange.xlsx and lcid_stock_data_sentimentVpercChange.xlsx in your R workspace

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •