Talk pages are a pretty key part of how wikis have worked over the years. Realtime chat apps and services are probably changing this dynamic somewhat, but they are still used, and also most of the history of these pages is still recorded.

I started up an IPython Notebook to try and take a look at some of the connections between different users on Wikidata over the years. Below you'll find a few representations of these connections, as well as notable things I spotted along the way, the generating code, SQL query and more!

The data

MediaWiki maintains links tables for all pages, so getting all of the current links out of Wikidata is very easy. I made use of the Wikimedia Cloud Quarry service to run this query and host a CSV of the results.

SELECT   SUBSTRING_INDEX(page_title, '/', 1) AS t1,   pl_from_namespace AS t1ns,   SUBSTRING_INDEX(pl_title, '/', 1) AS t2,   pl_namespace AS t2ns FROM pagelinks, page WHERE pl_namespace IN (3,5) AND pl_from_namespace IN (3,5) AND page_id = pl_from AND page_title != pl_title GROUP BY t1, t2

I then loaded this data directly into an IPython Notebook and did some cleaning, such as removing all IP addresses. I then spent quite some time applying more filtering and twiddling knobs to try and get some graphics out that are easy to look at. The first attempts looked like solid blobs as you can see in this tweet.

You can find a copy of the Notebook on notebooksharing.space.

Read more of this post

addshore | December 12, 2021 at 9:23 am | Tags: graph, IPython, notebook, quarry, Wikidata | Categories: Posts, Tech | URL: https://wp.me/p5ZEV0-27m
Comment