Improve doc.
This commit is contained in:
179
README.md
179
README.md
@@ -1,107 +1,154 @@
|
|||||||
# rstat - Reddit Stock Ticker Analyzer Tool
|
# rstat - Reddit Stock Analyzer Tool
|
||||||
|
|
||||||
This is a command-line tool to analyze stock ticker mentions across a predefined list of subreddits. It scrapes posts and comments, counts the number of times each ticker is mentioned, fetches the ticker's market capitalization, and will calculate a sentiment score for each mention.
|
A powerful, installable command-line tool to scan Reddit for stock ticker mentions, perform sentiment analysis, and generate insightful summary reports.
|
||||||
|
|
||||||
## Features
|
## Key Features
|
||||||
|
|
||||||
* Scans a user-defined list of subreddits from a JSON configuration file.
|
* **Persistent Storage:** Scraped data is stored in a local SQLite database (`reddit_stocks.db`), so you can track trends over time.
|
||||||
* Identifies stock tickers (e.g., `$AAPL`, `TSLA`) in Reddit posts and comments.
|
* **Deep Scanning:** Analyzes both post titles and comments from a user-defined list of subreddits.
|
||||||
* Fetches market capitalization for each identified ticker using the Yahoo Finance API.
|
* **Sentiment Analysis:** Uses NLTK's VADER engine to calculate a sentiment score (Bullish, Bearish, or Neutral) for each mention.
|
||||||
* Summarizes the findings in a clear, command-line-based report.
|
* **Financial Data:** Enriches ticker data by fetching market capitalization from Yahoo Finance, with intelligent caching to minimize API calls.
|
||||||
* (Future) Performs sentiment analysis on each mention.
|
* **Data Quality:** Utilizes a configurable blacklist and smart filtering to ignore common words and reduce false positives (e.g., "YOLO", "CEO", "A").
|
||||||
|
* **Automatic Cleanup:** Automatically purges old, invalid data from the database if you update the ticker blacklist.
|
||||||
|
* **Installable Command:** Packaged with `setuptools`, allowing you to install the tool and run it from anywhere on your system using the `rstat` command.
|
||||||
|
* **Flexible Reporting:** The final report can be customized using command-line arguments to control the number of results shown.
|
||||||
|
|
||||||
## Installation
|
## Project Structure
|
||||||
|
|
||||||
Follow these steps to set up the project and its dependencies on your local machine.
|
```
|
||||||
|
reddit_stock_analyzer/
|
||||||
|
├── .env # Your secret API keys
|
||||||
|
├── requirements.txt # Project dependencies
|
||||||
|
├── setup.py # Installation script for the tool
|
||||||
|
├── subreddits.json # Configuration for which subreddits to scan
|
||||||
|
├── rstat_tool/ # The main source code package
|
||||||
|
│ ├── __init__.py
|
||||||
|
│ ├── main.py # Main entry point and CLI logic
|
||||||
|
│ ├── database.py # All SQLite database functions
|
||||||
|
│ ├── sentiment_analyzer.py
|
||||||
|
│ ├── setup_nltk.py # One-time NLTK setup script
|
||||||
|
│ └── ticker_extractor.py
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
### 1. Clone the Repository
|
## Setup and Installation
|
||||||
|
|
||||||
First, clone this repository to your local machine (or simply download and create the files as described).
|
Follow these steps to set up the project on your local machine.
|
||||||
|
|
||||||
|
### 1. Prerequisites
|
||||||
|
* Python 3.7+
|
||||||
|
* Git
|
||||||
|
|
||||||
|
### 2. Clone the Repository
|
||||||
```bash
|
```bash
|
||||||
git clone <your-repository-url>
|
git clone <your-repository-url>
|
||||||
cd rstat
|
cd reddit_stock_analyzer
|
||||||
```
|
```
|
||||||
|
|
||||||
### 2. Set Up a Python Virtual Environment
|
### 3. Set Up a Python Virtual Environment
|
||||||
|
It is highly recommended to use a virtual environment to manage dependencies.
|
||||||
It is highly recommended to use a virtual environment to manage project-specific dependencies, preventing conflicts with your global Python installation.
|
|
||||||
|
|
||||||
**On macOS / Linux:**
|
**On macOS / Linux:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Create a virtual environment named 'venv'
|
python3 -m venv .venv
|
||||||
python3 -m venv venv
|
source .venv/bin/activate
|
||||||
|
|
||||||
# Activate the virtual environment
|
|
||||||
source venv/bin/activate
|
|
||||||
```
|
```
|
||||||
*You will know it's active when you see `(venv)` at the beginning of your terminal prompt.*
|
|
||||||
|
|
||||||
**On Windows:**
|
**On Windows:**
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Create a virtual environment named 'venv'
|
python -m venv .venv
|
||||||
python -m venv venv
|
.\.venv\Scripts\activate
|
||||||
|
|
||||||
# Activate the virtual environment
|
|
||||||
.\venv\Scripts\activate
|
|
||||||
```
|
```
|
||||||
*You will know it's active when you see `(venv)` at the beginning of your command prompt.*
|
|
||||||
|
|
||||||
### 3. Install Dependencies
|
|
||||||
|
|
||||||
Once your virtual environment is activated, install the required Python libraries using the `requirements.txt` file.
|
|
||||||
|
|
||||||
|
### 4. Install Dependencies
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### 5. Configure Reddit API Credentials
|
||||||
|
The tool needs API access to read data from Reddit.
|
||||||
|
|
||||||
|
1. Go to the [Reddit Apps preferences page](https://www.reddit.com/prefs/apps) and create a new "script" app.
|
||||||
|
2. Create a file named `.env` in the root of the project directory.
|
||||||
|
3. Add your credentials to the `.env` file like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
REDDIT_CLIENT_ID=your_client_id_from_reddit
|
||||||
|
REDDIT_CLIENT_SECRET=your_client_secret_from_reddit
|
||||||
|
REDDIT_USER_AGENT=A custom user agent string (e.g., python:rstat:v1.0)
|
||||||
|
```
|
||||||
|
**IMPORTANT:** Never commit your `.env` file to version control.
|
||||||
|
|
||||||
|
### 6. Set Up NLTK
|
||||||
|
Run the included setup script **once** to download the required `vader_lexicon` for sentiment analysis.
|
||||||
|
```bash
|
||||||
|
python rstat_tool/setup_nltk.py
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Build and Install the `rstat` Command
|
||||||
|
Install the tool in "editable" mode. This creates the `rstat` command in your virtual environment and links it to your source code. Any changes you make to the code will be immediately available.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
pip install -e .
|
||||||
|
```
|
||||||
|
The installation is now complete.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
Before running the tool, you need to configure the list of subreddits you want to analyze.
|
### Subreddits
|
||||||
|
Modify the `subreddits.json` file to define which communities the tool should scan.
|
||||||
1. Open the `subreddits.json` file.
|
|
||||||
2. Modify the list of strings to include your desired subreddits.
|
|
||||||
|
|
||||||
**Example `subreddits.json`:**
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"subreddits": [
|
"subreddits": [
|
||||||
"wallstreetbets",
|
"wallstreetbets",
|
||||||
"stocks",
|
"stocks",
|
||||||
"investing",
|
"investing",
|
||||||
"pennystocks"
|
"options"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Usage
|
### Ticker Blacklist (Advanced)
|
||||||
|
To improve data quality, you can add common words that are mistaken for tickers to the `COMMON_WORDS_BLACKLIST` set inside the `rstat_tool/ticker_extractor.py` file. The tool will automatically clean the database of these tickers on the next run.
|
||||||
To run the tool, execute the `main.py` script from the root directory of the project, passing the path to your configuration file as an argument.
|
|
||||||
|
|
||||||
Make sure your virtual environment is activated before running the script.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
python main.py subreddits.json
|
|
||||||
```
|
|
||||||
|
|
||||||
### Expected Output
|
|
||||||
|
|
||||||
The tool will first confirm the loaded subreddits and then proceed with its analysis, printing the results directly to the terminal.
|
|
||||||
|
|
||||||
```
|
|
||||||
Loading configuration...
|
|
||||||
Successfully loaded 4 subreddits: wallstreetbets, stocks, investing, pennystocks
|
|
||||||
------------------------------
|
|
||||||
Testing market data functionality...
|
|
||||||
Market Cap for AAPL: $2,912,488,124,416
|
|
||||||
------------------------------
|
|
||||||
Next up: Integrating the Reddit API to find tickers...
|
|
||||||
```
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
This `README.md` provides a clear and concise guide for anyone (including your future self) to get the project up and running quickly.
|
## Usage
|
||||||
|
|
||||||
We are now ready to move on to the next implementation step. Shall we proceed with integrating the Reddit API using PRAW?
|
Once installed, you can run the tool from any directory using the `rstat` command.
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
Run an analysis using the default settings (scans 25 posts, 100 comments/post, shows top 20 tickers).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rstat subreddits.json
|
||||||
|
```
|
||||||
|
|
||||||
|
### Advanced Usage with Arguments
|
||||||
|
Use command-line arguments to control the scan and the report.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Scan only 10 posts, 50 comments per post, and show a report of the top 5 tickers
|
||||||
|
rstat subreddits.json --posts 10 --comments 50 --limit 5
|
||||||
|
```
|
||||||
|
|
||||||
|
### Getting Help
|
||||||
|
To see all available commands and their descriptions:
|
||||||
|
```bash
|
||||||
|
rstat --help
|
||||||
|
```
|
||||||
|
|
||||||
|
### Example Output
|
||||||
|
|
||||||
|
```
|
||||||
|
--- Top 5 Tickers by Mention Count ---
|
||||||
|
Ticker | Mentions | Bullish | Bearish | Neutral | Market Cap
|
||||||
|
---------------------------------------------------------------------------
|
||||||
|
TSLA | 183 | 95 | 48 | 40 | $580.45B
|
||||||
|
NVDA | 155 | 110 | 15 | 30 | $1.15T
|
||||||
|
AAPL | 98 | 50 | 21 | 27 | $2.78T
|
||||||
|
SPY | 76 | 30 | 35 | 11 | N/A
|
||||||
|
AMD | 62 | 45 | 8 | 9 | $175.12B
|
||||||
|
```
|
Reference in New Issue
Block a user