TeleGraphite: Fast and Reliable Telegram Channel Scraper
Summary
TeleGraphite is a powerful Python tool designed for scraping public Telegram channels efficiently. It allows users to fetch posts, download media, and export all data into structured JSON files. This makes it an ideal solution for data collection, analysis, and archiving Telegram channel content.
Repository Info
Tags
Click on any tag to explore related repositories
Introduction
TeleGraphite is a fast and reliable Python-based Telegram channel scraper designed to efficiently fetch posts and export them to JSON format. This powerful tool allows users to collect data, download media, and organize content from public Telegram channels with ease. It's an excellent solution for researchers, data analysts, or anyone needing to archive specific channel information.
Key features include:
- Fetching posts from multiple Telegram channels.
- Saving posts as JSON files, including contact exports like emails, phone numbers, and links.
- Downloading and saving media files, such as photos, documents, and videos.
- Deduplicating posts to prevent saving duplicate content.
- Options to run once or continuously with a specified interval.
- Filtering posts by keywords or content type (text-only, media-only).
- Scheduling fetching at specific days and times for automated data collection.
Installation
Getting started with TeleGraphite is straightforward. You can install it either from source or using pip.
From Source
# Clone the repository
git clone https://github.com/hamodywe/telegram-scraper-TeleGraphite.git
cd telegram-scraper-TeleGraphite
# Install the package
pip install -e .
Using pip
pip install telegraphite
Before usage, ensure you have a Telegram API application setup (API ID and API Hash) and a .env
file with your credentials. You'll also need a channels.txt
file listing the Telegram channels you wish to scrape.
Examples
TeleGraphite offers a flexible command-line interface for various scraping scenarios.
Basic Usage
Fetch posts once and exit:
telegraphite once
Fetch posts continuously with a 1-hour interval:
telegraphite continuous --interval 3600
Advanced Options
Fetch 20 posts from each channel and save to a custom directory:
telegraphite once --limit 20 --data-dir custom_data
Fetch only posts containing specific keywords:
telegraphite once --keywords announcement important news
Run continuously on specific days and times:
telegraphite continuous --days monday wednesday friday --times 09:00 18:00
You can also use a YAML configuration file for more complex setups, allowing you to define filters, schedules, and other options.
Why use TeleGraphite?
TeleGraphite stands out as a robust solution for Telegram data extraction due to its comprehensive feature set and ease of use. It provides unparalleled flexibility for collecting, organizing, and analyzing public Telegram channel content. Whether you need to archive historical posts, monitor ongoing discussions, or extract specific data points like contact information, TeleGraphite simplifies the process. Its ability to download media, deduplicate content, and run on a schedule makes it a powerful tool for automated data acquisition, saving significant time and effort.
Links
- GitHub Repository: hamodywe/telegram-scraper-TeleGraphite
- Telegram API Application: my.telegram.org