That's how we crawl

We begin our initial crawl by targeting the 25 largest German-speaking channels, which we have identified as presenting political topics. The classification of these channels was determined through a manual content analysis of their last 20 messages. If at least two distinct political topics were discussed within these messages, the channel was classified as relevant. The channels were found by comparing different datasets from previous projects.

The crawl goes back, starting from the beginning of 2025. If the crawler detects a forwarded message from a channel that is not yet in our dataset, we add it as long as it has more than 10.000 followers*. This newly discovered channel is also crawled fully, going back to January 2025. If another new channel is found through this process, the same checks for relevance and inclusion are applied.

This snowball effect continues until all relevant channels are added and no new channels are detected. At that point, the initial crawl is considered complete.

From then on, we crawl regularly, at least once per week.

In addition to messages, we also routinely collect channel metadata, such as follower counts.

*Exception: If the primary language of a channel is Russian, we only crawl it if it has more than 50,000 followers. Otherwise, the large number of major Russian channels would exceed the capacity of our crawler.