mozilla

Contributing to Talk: visualization of commenters


#1

Hello Coral Community,

As currently I have some extra time, I would like to persuade some idea that has been in my head for a while.

Like most people (I guess), I find myself often skimming the comments under news articles. What I found very interesting is to check the comments under political articles especially when it comes to international affairs. The reason for this interest is not really to gain some enlightenment (as these comments tend to be of a very low quality and real discussions are rare), but to compare the poles of opinions. It would be really exciting to see how the commenters cluster in such polarizing topics, what kind of views/opinions they share and in what way the differ. Such visualization could possibly reveal organized troll attacks (which in my country of origin are rather often since years).

The most straightforward way to gain such insights is to use the information of likes (thumbs up). A more sophisticated approach would be to use some basic natural language processing tools to gain information from the comments and extract the main theses. I am willing to spend some effort in contributing to a solution that could visualise such findings. However, I am not sure where to start and whether there is something similar already implemented. After having a look at the github project, there seems to be quite many repos. Another issue is that to start such analysis, it would be crucial to have some initial data to test/verify ideas and approaches.

So my many questions can be summarized: would the Coral Community have any suggestion what is a good starting point to start working on such visualization tool?

Any help would be appreciated. Many thanks!


#2

Hi Kelskjr

This sounds great! Some starting points:

Our comment platform Talk only has one repo:

And the docs are here:

https://coralproject.github.io/talk/

Your approach sounds like it has something in common with those of Pol.is, Rawr, Opinary and of Nick Diakopoulos’s work.

However, the major issue in what you’ve asked so far is that we don’t have any comment data you can use - we don’t host the comments, and each newsroom runs their own servers and owns their own data.

You could contact newsrooms that use Talk - currently Civil Beat, Brisbane Times, Washington Post, Wall Street Journal, Página 12 (in Spanish), Estadao (in Portuguese) - and ask about their data, but their terms of use might preclude that. Otherwise, you could find comment data from other platforms on the web.

I hope that helps!

Andrew


#3

@andrew_coral, thanks for the interesting links!

Yes, indeed the issue is that some initial data would be necessary. My plan was to play with it locally and see how different visualization techniques look like and then, eventually, to develop a plugin.

So far my attempts to gain some data from publishers have failed. I will check some of the newsrooms you suggested. Do you think that in this forum I might find the right contacts?

Cheers!


#4

You could try using the Reddit dataset:

Does that give you enough to get started?


#5

cool, I will try the Reddit datasets. many thanks!


#6

Great suggestions. You can already see you are hamstrung by vert limited data. Look for data where there are “dislikes” and other additional tag options. Your results will be intrinsically flawed.

As an example, see the research on Google Search Data re covert racism in the US. Its focus was on the use of the n-word. Culturally dependent on culture and the identity of the speaker. The word like being worse and often meaning something else. In my case it means relevant, followup, valid or even like.

I working on a sort of imposed solution to the problem. There are a host of reasons the industry is hesitant to address this. It is being addressed using AI but human resources are also needed. Yours is not a trivial task, but I believe the solution must be reasonably simple.

You will note I frequently disagree on feature and moderation issues. Its pointless to support/debate my views lacking a viable solution to the very real issues their positions mitigate.

Myself I propose a very robust selection of tags (likes) that accommodate standard tagging for logical fallacies and bias types. Further, I want this to drill down into the content as opposed to the entire post. In Coral speak that would be a plugin. Cuz everything is.

I chose the Coral Project based on it’s development stack and scalability. My own needs are better reflected by forum software but I see that as a lesser issue. The trick is to use Coral without forking Talk.


#7

“The reason for this interest is not really to gain some enlightenment (as these comments tend to be of a very low quality and real discussions are rare),”

That suggests you don’t read The Intercept. In fact I have seen some great analysis in conservative publications as well as Libertarian and so called conspiracy sites.