The Research Project
Project: Utilizing NLP models
Summarizing scientific literature
UN 17 Sustainable Development Goals
At the dawn of the 21st century, the world is facing challenging tensions in all aspects of society: economic, political, technological, social, and environmental. In 2015, the United Nations pledged to end poverty, protect the planet, and ensure prosperity for the next fifteen years through implementing 17 sustainable goals by its 195 member states. Leaders in government, business, and civil societies have identified a myriad of similar challenges.
Business and management research can do much more to contribute to meeting these challenges by discovering management processes and systems to improve collective work at the organizational and national levels.
These could include the responsible use of financial resources, accounting methods for assessing societal impacts, innovative products and services to meet the needs of the base of the pyramid, sustainable marketing and supply chain, logistics to reach currently inaccessible regions, strategies for economic growth and significant innovation, attention to both wealth creation and wealth distribution, to name a few.
CAIDEMA are together with other companies developing an Artificial Intelligence engine that will help facilitate Business and management research to contribute more to meeting these challenges by discovering management processes and systems to improve collective work at the organizational and national levels.
Contributing to a better world should the ultimate goal of science and there is an acute need for large-scale help digesting scientific literature. In 2018, the total number of published scientific articles was estimated at 2.52 million and the number of scientific journals at around 30 thousand . With such vast amounts of new information, in addition to the enormous amount that already exist, it is virtually humanly impossible to be fully up to date on the latest developments within most scientific fields.
Furthermore, the literature itself, and the information it conveys, is typically of a heterogeneous nature. Studies within the same field of research may very well point in different directions (some show a positive relationship while others might a negative, or non-significant relationship) This ambiguity presents added challenges to developing a useful multi-document summaries in relation to the information.
Lastly, studies have shown that a large proportion of the research articles produced at various research institutions around the world are not read by a sufficient large number of people outside a narrow circle of subject-oriented professionals or academics.
The valuable knowledge that society spends large amounts of resources on creating the framework for, is thus not utilized to the extent one could wish for, which of course is not optimal for any of the involved parties.
To gain a clearer overall picture for the true direction within a research field, it is necessary to synthesize many comparable studies through review techniques and statistical methods in the form of systematic reviews and meta-analyzes.
A greater focus on systematic reviews and meta-analyzes would be able to if not rectify the at least alleviate the problems with fuzzy research and low readability rate as the directional arrows within the individual research areas would be much clearer and the amount of literature needed for lay people to read in order to stay fairly up to date within a given research area, would be significantly reduced.
The focus of this paper is to construct a corpus of systematic review and meta-analytical articles that looks at factors to improve corporate performance with special focus upon human resource management issues and how to explore the possibilities of using NLP to achieve that.
To begin with, the corpus for this experiment will be based upon 80 to 100 systematic reviews and meta-analyses (SR/MA). Each of these SR/MA will be deconstructed into the original 20-40 scientific articles of which they are based upon. This means that the corpus of the project in total will consist of 2500-3000 scientific articles.
The scientific domain
The scientific domain is Micro-economics, (Business) Management, If a sufficient large number exists the focal area will be narrowed down even further to organizational development and perhaps Human Resources Management (HRM) and the relationship between the focal area (eg. HRM) and corporate performance. (eg. Team Diversity and Corporate Performance). The focal area will be narrowed down as much as possible to keep the domain specific vocabulary intact but not so much that we do not have a corpus of an adequate size.
Sources of scientific text
Peer-reviewed systematic reviews and meta-analyses are found and retrieved using different databases e.g. Business Source Complete, Web of Science, Scopus, APA PsycInfo, Google Scholar, Microsoft Academic and search engines e.g. Semantic Scholar and Iris.AI.
Each summarizations methods, either extractive or abstractive oriented, have their advantages and disadvantages. We will use both extractive and abstractive approaches to perform the summarizations. Furthermore, we will explore if a combination of the two approaches can result in a more accurate as well as readable summary. Especially the accuracy measure is very important in relation to the summarization of scientific articles.
After assembling the corpus, the next step will be to test different summarizations methods and models. We intend to use several language models like GPT-3, BERT, XLNet, and T5 etc. to perform the summarizations.
Finally, the results of the summarizations will be evaluated in relation to several evaluation measures like ROUGE (Recall-Oriented Understudy for Gisting Evaluation.), FFCI (Faithfulness, Focus, Coverage and Inter-Sentential Coherence) to see how well they perform according to different sub-measures in comparison to a reference summary in terms of a human-made summary of the original systematic review and meta-analysis.
Project presented at the SciNLP conference 2021
 Rynes et al. 2002; Sanders et. al. 2008, Carless, S.; Rasiah, J. & Irmer, B. 2009; Tenhiälä, A., et. al. 2016, Tourish, 2019.