Skip to content

Commit 659c8cd

Browse files
committed
refactor: Update image description minimum word threshold in get_content_of_website_optimized
1 parent 9ee9887 commit 659c8cd

8 files changed

+71
-16
lines changed

CONTRIBUTORS.md

+31
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Contributors to Crawl4AI
2+
3+
We would like to thank the following people for their contributions to Crawl4AI:
4+
5+
## Core Team
6+
7+
- [Unclecode](https://github.com/unclecode) - Project Creator and Main Developer
8+
- [Nasrin](https://github.com/ntohidi) - Project Manager and Developer
9+
10+
## Community Contributors
11+
12+
- [Aravind Karnam](https://github.com/aravindkarnam) - Developed textual description extraction feature
13+
- [FractalMind](https://github.com/FractalMind) - Created the first official Docker Hub image and fixed Dockerfile errors
14+
- [ketonkss4](https://github.com/ketonkss4) - Identified Selenium's new capabilities, helping reduce dependencies
15+
16+
## Other Contributors
17+
18+
- [Gokhan](https://github.com/gkhngyk)
19+
- [Shiv Kumar](https://github.com/shivkumar0757)
20+
- [QIN2DIM](https://github.com/QIN2DIM)
21+
22+
23+
## Acknowledgements
24+
25+
We also want to thank all the users who have reported bugs, suggested features, or helped in any other way to make Crawl4AI better.
26+
27+
---
28+
29+
If you've contributed to Crawl4AI and your name isn't on this list, please [open a pull request](https://github.com/unclecode/crawl4ai/pulls) with your name, link, and contribution, and we'll review it promptly.
30+
31+
Thank you all for your contributions!

README.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Crawl4AI v0.2.75 🕷️🤖
1+
# Crawl4AI v0.2.7765 🕷️🤖
22

33
[![GitHub Stars](https://img.shields.io/github/stars/unclecode/crawl4ai?style=social)](https://github.com/unclecode/crawl4ai/stargazers)
44
[![GitHub Forks](https://img.shields.io/github/forks/unclecode/crawl4ai?style=social)](https://github.com/unclecode/crawl4ai/network/members)
@@ -10,6 +10,8 @@ Crawl4AI simplifies web crawling and data extraction, making it accessible for l
1010

1111
## Try it Now!
1212

13+
✨ Play around with this [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sJPAmeLj5PMrg2VgOwMJ2ubGIcK0cJeX?usp=sharing)
14+
1315
✨ visit our [Documentation Website](https://crawl4ai.com/mkdocs/)
1416

1517
✨ Check [Demo](https://crawl4ai.com/mkdocs/demo)
@@ -31,6 +33,18 @@ Crawl4AI simplifies web crawling and data extraction, making it accessible for l
3133
- 🎯 CSS selector support
3234
- 📝 Passes instructions/keywords to refine extraction
3335

36+
# Crawl4AI
37+
38+
## 🌟 Shoutout to Contributors of v0.2.76!
39+
40+
A big thank you to the amazing contributors who've made this release possible:
41+
42+
- [@aravindkarnam](https://github.com/aravindkarnam) for the new image description feature
43+
- [@FractalMind](https://github.com/FractalMind) for our official Docker Hub image
44+
- [@ketonkss4](https://github.com/ketonkss4) for helping streamline our Selenium setup
45+
46+
Your contributions are driving Crawl4AI forward! 🚀
47+
3448
## Cool Examples 🚀
3549

3650
### Quick Start

docs/examples/llm_extraction_openai_pricing.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,8 @@ class OpenAIModelFee(BaseModel):
2121
url=url,
2222
word_count_threshold=1,
2323
extraction_strategy= LLMExtractionStrategy(
24-
provider= "openai/gpt-4o", api_token = os.getenv('OPENAI_API_KEY'),
24+
# provider= "openai/gpt-4o", api_token = os.getenv('OPENAI_API_KEY'),
25+
provider= "groq/llama-3.1-70b-versatile", api_token = os.getenv('GROQ_API_KEY'),
2526
schema=OpenAIModelFee.model_json_schema(),
2627
extraction_type="schema",
2728
instruction="From the crawled content, extract all mentioned model names along with their "\

docs/md/changelog.md

+19
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,24 @@
11
# Changelog
22

3+
# Changelog
4+
5+
## [v0.2.76] - 2024-08-02
6+
7+
Major improvements in functionality, performance, and cross-platform compatibility! 🚀
8+
9+
- 🐳 **Docker enhancements**: Significantly improved Dockerfile for easy installation on Linux, Mac, and Windows.
10+
- 🌐 **Official Docker Hub image**: Launched our first official image on Docker Hub for streamlined deployment.
11+
- 🔧 **Selenium upgrade**: Removed dependency on ChromeDriver, now using Selenium's built-in capabilities for better compatibility.
12+
- 🖼️ **Image description**: Implemented ability to generate textual descriptions for extracted images from web pages.
13+
-**Performance boost**: Various improvements to enhance overall speed and performance.
14+
15+
A big shoutout to our amazing community contributors:
16+
- [@aravindkarnam](https://github.com/aravindkarnam) for developing the textual description extraction feature.
17+
- [@FractalMind](https://github.com/FractalMind) for creating the first official Docker Hub image and fixing Dockerfile errors.
18+
- [@ketonkss4](https://github.com/ketonkss4) for identifying Selenium's new capabilities, helping us reduce dependencies.
19+
20+
Your contributions are driving Crawl4AI forward! 🙌
21+
322
## [v0.2.75] - 2024-07-19
423

524
Minor improvements for a more maintainable codebase:

docs/md/index.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Crawl4AI v0.2.75
1+
# Crawl4AI v0.2.76
22

33
Welcome to the official documentation for Crawl4AI! 🕷️🤖 Crawl4AI is an open-source Python library designed to simplify web crawling and extract useful information from web pages. This documentation will guide you through the features, usage, and customization of Crawl4AI.
44

docs/md/installation.md

+2
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ There are three ways to use Crawl4AI:
88

99
## Option 1: Library Installation
1010

11+
You can try this Colab for a quick start: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1sJPAmeLj5PMrg2VgOwMJ2ubGIcK0cJeX#scrollTo=g1RrmI4W_rPk)
12+
1113
Crawl4AI offers flexible installation options to suit various use cases. Choose the option that best fits your needs:
1214

1315
- **Default Installation** (Basic functionality):

docs/md/introduction.md

-12
Original file line numberDiff line numberDiff line change
@@ -20,18 +20,6 @@ Crawl4AI is designed to simplify the process of crawling web pages and extractin
2020
- **🎯 CSS Selector Support**: Extract specific content using CSS selectors.
2121
- **📝 Instruction/Keyword Refinement**: Pass instructions or keywords to refine the extraction process.
2222

23-
## Recent Changes (v0.2.5) 🌟
24-
25-
- **New Hooks**: Added six important hooks to the crawler:
26-
- 🟢 `on_driver_created`: Called when the driver is ready for initializations.
27-
- 🔵 `before_get_url`: Called right before Selenium fetches the URL.
28-
- 🟣 `after_get_url`: Called after Selenium fetches the URL.
29-
- 🟠 `before_return_html`: Called when the data is parsed and ready.
30-
- 🟡 `on_user_agent_updated`: Called when the user changes the user agent, causing the driver to reinitialize.
31-
- **New Example**: Added an example in [`quickstart.py`](https://github.com/unclecode/crawl4ai/blob/main/docs/examples/quickstart.py) in the example folder under the docs.
32-
- **Improved Semantic Context**: Maintaining the semantic context of inline tags (e.g., abbreviation, DEL, INS) for improved LLM-friendliness.
33-
- **Dockerfile Update**: Updated Dockerfile to ensure compatibility across multiple platforms.
34-
3523
Check the [Changelog](https://github.com/unclecode/crawl4ai/blob/main/CHANGELOG.md) for more details.
3624

3725
## Power and Simplicity of Crawl4AI 🚀

setup.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
setup(
2727
name="Crawl4AI",
28-
version="0.2.74",
28+
version="0.2.76",
2929
description="🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper",
3030
long_description=open("README.md", encoding="utf-8").read(),
3131
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)