Abstract:
The enormous success of crowdsourced portals such as Wikipedia, Stack Overflow, Quora and
GitHub has motivated researchers to discern the underlying dynamics of knowledge-building
and collective intelligence on these portals. Although collaborative knowledge-building portals
are known to be better than expert-driven knowledge repositories, there is limited research on
their knowledge-building dynamics. The motivation of the contributors to these collaborative
portals is unclear and the relationship between knowledge acquisition on online platforms and
the fundamentals of knowledge-building is yet to be defined. This fundamental gap in research
is attributed to the unavailability of an efficient standard data representation format, and proper
tools and libraries that analyse knowledge-building dynamics. The massive size of the datasets
of collaborative knowledge-building portals makes analysis infeasible. Furthermore, the extensive
programming knowledge required to perform the analysis acts as a barrier for researchers without
a background in computer science to study collective intelligence using these portals.
The aim of this thesis is to propose and organize resources that facilitate the research and
analysis of data from crowdsourced portals. It describes a range of libraries and toolkits to
efficiently mine, parse and analyse the unstructured dataset of collaborative knowledge-building
portals. The Knowledge Data Analysis and processing Platform (KDAP), for instance, is an
easy-to-use programming toolkit that provides high-level operations to analyse knowledge data.
The Knowledge Markup Language (Knol-ML), a standard representation format developed for
the dataset of revision-based collaborative portals, optimizes space and time complexity. The
libraries built using the proposed toolkits can efficiently process the massive amounts of data
from crowdsourced portals such as Wikipedia and Stack Overflow. A data dump of various
collaborative knowledge- building portals in the Knol-ML format is included in the thesis. Proof
of the concept is established by the accurate analysis of newly proposed research questions using
the toolkits presented. Researchers are expected to be able to perform benchmark analysis of the
open-source library enabled by the Knol-ML format.