Equity Research Analysis for Brazilian Energy Companies (Taesa & Engie) using LlamaIndex & Gemini

This project demonstrates an automated equity research analysis workflow for Brazilian energy sector companies, specifically Taesa and Engie. It leverages LlamaIndex for building a retrieval augmented generation (RAG) pipeline, Google's Gemini Pro as the Large Language Model (LLM) for analysis and summarization, and LlamaExtract (via LlamaCloud) for structured data extraction from PDF financial reports.

The workflow ingests quarterly financial reports (PDFs), extracts key financial and operational data, creates granular vector indexes, and then uses a multi-step LlamaIndex Workflow to:

Generate individual financial summaries for Taesa and Engie.
Perform a comparative analysis.
Produce a final equity research memo in a structured Pydantic format.

Features

Automated Data Extraction: Uses LlamaExtract with Pydantic schemas to pull structured data from PDF reports.
Granular Indexing: Creates fine-grained nodes in LlamaIndex VectorStoreIndex for precise data retrieval.
Multi-Step Analysis Workflow: Employs LlamaIndex Workflow to orchestrate asynchronous analysis steps.
- Individual company summaries.
- Comparative analysis.
- Final memo generation.
Structured Output: Uses Pydantic models for all LLM outputs, ensuring consistent and parsable results.
LLM Integration: Leverages Google Gemini Pro for text generation and structured data prediction.
Customizable: Prompts, Pydantic schemas, and modeling assumptions can be adapted for different companies or sectors.

Project Structure

Equity-Research-llama/
├── .env                        # Environment variables (API keys) - !! GITIGNORE !!
├── .venv/                      # Python virtual environment - !! GITIGNORE !!
├── data/
│   ├── reference/
│   │   └── modeling_assumptions.txt # Key assumptions for financial modeling
│   ├── release-engie/
│   │   └── ENGIE_REPORT.pdf    # Engie's PDF financial report
│   └── release-taese/
│       └── TAESA_REPORT.pdf    # Taesa's PDF financial report
├── notebooks/                  # Jupyter notebooks for experimentation (if any)
├── output/
│   ├── extracted_documents/    # JSON outputs from LlamaExtract
│   ├── final_equity_memo_YYYYMMDD_HHMMSS.json # Final generated memo
│   └── indexes/                # Persisted LlamaIndex vector stores
│       ├── engie/
│       └── taesa/
├── reference/                  # Reference materials, original notebooks
├── scripts/
│   ├── data_extractor.py       # Script for LlamaExtract data extraction
│   ├── index_builder.py        # Script for creating granular LlamaIndex indexes
│   ├── equity_analyzer_agent.py # Main script for running the analysis workflow
│   └── output_schemas.py       # Pydantic models for structured LLM outputs
├── README.md                   # This file
├── requirements.txt            # Python dependencies
└── tasks.md                    # Project task tracking

Setup

Clone the repository (or initialize if you have the files):

# If cloning an existing repo:
# git clone https://github.com/arthur0211/llama-equity-analyst-brasil.git
# cd llama-equity-analyst-brasil

Create and activate a Python virtual environment:

python -m venv .venv
# On Windows
.venv\Scripts\activate
# On macOS/Linux
source .venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
(Note: A requirements.txt will be generated in a later step. For now, ensure you have installed the packages mentioned in tasks.md)
Set up API Keys: Create a .env file in the project root with your API keys:
```
GEMINI_API_KEY="YOUR_GOOGLE_GEMINI_API_KEY"
LLAMA_CLOUD_API_KEY="YOUR_LLAMA_CLOUD_API_KEY"
```
Replace YOUR_GOOGLE_GEMINI_API_KEY and YOUR_LLAMA_CLOUD_API_KEY with your actual keys.
Place Data Files:
- Put Taesa's PDF report in data/release-taese/ (e.g., TAESA-Release-1T25.pdf).
- Put Engie's PDF report in data/release-engie/ (e.g., 250507-Release-de-Resultados-1T25.pdf).
- Update data/reference/modeling_assumptions.txt if needed.

Running the Analysis

The workflow consists of three main script executions:

Extract Data (LlamaExtract): This script uses LlamaExtract to parse the PDF reports and save the structured data as JSON. You'll need to run this for each company, typically by modifying the script to point to the correct PDF and output path.
```
python -m scripts.data_extractor
```
(Review scripts/data_extractor.py to ensure it's configured for the desired company and PDF before each run if not yet parameterized).
Build Indexes: This script takes the extracted JSON data and builds granular LlamaIndex vector stores.
```
python -m scripts.index_builder
```
Run Equity Analyzer Workflow: This script loads the built indexes and runs the multi-step analysis to generate the final equity memo.
```
python -m scripts.equity_analyzer_agent
```
The output memo will be saved in the output/ directory.

Next Steps & Potential Improvements

Parameterize scripts (data_extractor.py, index_builder.py, equity_analyzer_agent.py) using argparse for easier execution with different files/settings.
Refine prompts for LLM steps to improve the quality and detail of generated summaries and analyses.
Expand Pydantic schemas to capture more detailed financial or operational metrics.
Implement more sophisticated error handling and logging.
Add unit and integration tests.
Explore different LLM models or LlamaIndex components.

Contributing

Contributions, issues, and feature requests are welcome.

(This is a placeholder for a more detailed contributing guide if the project becomes more collaborative).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
output/senior_reports		output/senior_reports
reference		reference
scripts		scripts
.gitignore		.gitignore
README.md		README.md
google_genai.ipynb		google_genai.ipynb
requirements.txt		requirements.txt
taee-extraction.md		taee-extraction.md
tasks.md		tasks.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equity Research Analysis for Brazilian Energy Companies (Taesa & Engie) using LlamaIndex & Gemini

Features

Project Structure

Setup

Running the Analysis

Next Steps & Potential Improvements

Contributing

About

Uh oh!

Releases

Packages

Languages

arthur0211/llama-equity-analyst-brasil

Folders and files

Latest commit

History

Repository files navigation

Equity Research Analysis for Brazilian Energy Companies (Taesa & Engie) using LlamaIndex & Gemini

Features

Project Structure

Setup

Running the Analysis

Next Steps & Potential Improvements

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages