JUMS meets …Estevan Vilar
JUMS does not only publish – we also do research.
We regularly meet our JUMS authors, but also professors and scientists, and talk about their theses and ask them for important tips for writing theses.
Today we meet Estevan Vilar, ESCP Europe undergraduate. His bachelor thesis “Word Embedding, Neural Networks and Text Classification: what is the State-of-the-Art?” was published in the 10th edition of JUMS.
Characteristics : Estevan Vilar
Title of the thesis:
Word Embedding, Neural Networks and Text Classification: what is the State-of-the-Art?
Type of thesis, University:
Bachelor thesis, ESCP Europe
Master of Philosophy in Development Studies, University of Cambridge incoming activity: PhD in Management, ETH Zurich: “Organising in the Age of Artificial Intelligence”
JUMS: Dear Estevan, you mentioned that your thesis has the goal to facilitate access to the machine learning world to a broader audience than merely computer science students and experts. How did you come up with the topic of your work?
Since an early age, I have a passion for technology, but to understand it I need to get hands-on with it. Also, in the recent years, the public discourse has been increasingly filled with words such as “automation”, “machine learning”, “artificial intelligence”, and “big data” triggering my curiosity about the opportunities and challenges the concepts bring. As these technologies evolve, more significant questions will arise, especially what makes the human species intelligence different from other forms, including artificial intelligence. I believe the capacity for language, as a cognitive tool, is part of the answer, together with free-will.
My first goal for the thesis was, therefore, to get my hands on the technology, and to understand how far or close machines were from being capable of apprehending language the same way humans do. Natural language processing became thus the field that I wanted to learn about. In particular, I was directed towards neural networks by Daniel Esser and Eldad Louw from LogMeIn. Therefore, I tried to understand the state-of-the-art neural networks applied to natural language processing (NLP) and sentiment classification problems in particular.
The goal to make it accessible to a broader audience emerged from my personal experience. My background is in the science of management, and I noticed I lacked the knowledge to be critical about the technological advancement in the field. My experience, I believe, is part of a broader phenomenon in which decision-makers, such as managers or politicians, are increasingly alienated from the technology they have to reflect on. Therefore, if I could contribute to closing the gap with an accessible manuscript, my work would have a better representation of the process through which I have been.
JUMS: You compared two architectures used for text classification and found that one is easier to train and yields better results than the other. Could you shortly explain how you proceeded in the analysis and how you approached the architectures methodically?
In the end, my analysis was straight forward, but it reflects a business approach (cost-benefits) and not a computer scientist one. A computer scientist would probably look at the pure performance (accuracy), at least at first instance. Therefore, when performing the literature review I first identified two models that were yielding – at that time – the best accuracy on a sentiment classification task across various data sets.
Then came the broad managerial question: “For a given accuracy, what were the necessary steps and resources to achieve it?” With “step” and “resources” come questions such as: “How many parameters are necessary to fine-tune to achieve this accuracy?”; “How sensitive a parameter is in term of gain/loss of accuracy?”; “How long (with regards to time) did the neural network need to be trained given my computational power?”. I did not constrain my models on the necessary size of data to achieve that level of accuracy. However, it is an important question that must be answered as well: “How much data is necessary to train the network?”
In a business context, this analysis must be carried on at a “macro” level and translated into real cost-benefit study in monetary terms. From the acquisition of the data to the returns in the improvement of the product all steps must be integrated into the analysis, taking into account the processing of the data. For instance, here is a study carried on by Microsoft – on a different prediction technique – and the impact of an increase in 0.1% accuracy in term of revenues for ads… the answer is in hundreds of millions of dollars.
In my thesis, I did it the analysis at a very micro-level just around the model selection, without a monetary component.
JUMS: Can you shortly explain the results you have achieved in your thesis?
I have tried to answer several questions, however, on the one that you have previously mentioned, as a result of the literature review, I had identified two models namely Convolutional Neural Network (CNN) and Long Short-Time Memory (LSTM). I then benchmarked the CNN against a hybrid version combining CNN and LSTM and concluded that on the database I tested and for a sentiment classification task, CNN was more appropriate. However, it is essential to note that the model that is chosen is highly dependent on the nature of the job and the data set it is used on.
JUMS: You demand the democratization of programming languages at the end of your work. If you could wish for three changes, that could reach this goal, what would these be?
As I mentioned at the beginning of the interview, an increasing number of questions will arise from the deployment of Artificial Intelligence. It will include questions on the very nature of human species as opposed to the machines. Also, as autonomous cars or advanced personal assistants are being deployed, ethical issues arise. One is what kind of moral decisions machines will make? The trolley problem illustrates well the questions autonomous cars are facing for instance. Another is the purpose of the technology: is it merely for behaviour prediction or also for behaviour influence? Cambridge Analytica and Pokémon Go are examples of deployment influencing behaviour, the former in the virtual sphere and the latter in the real world for political and commercial purpose respectively. Therefore the democratisation of programming language must be fully embedded in the broadening of the debate on the ethics of artificial intelligence. A functional democracy goes in hands with an educated electorate, and if the technology is deployed, people must be able to make an informed decision about it. It comes with an understanding of the limit, opportunities and threats that the technology offers.
We must not all be a neuroscientist or psychologists to interact with human beings, the same way we do not need to be all computer scientists to interact with machines. However, we all have rough ideas on the functioning of human beings; it is not the case for the machine.
That being said, on the democratisation of programming language specifically, I would wish the three following changes:
– The integration of programming logic, if not programming language directly, at an early stage of educations programs. The logics must also been developed in parallel with creativity. It comes as well with further developments at later stages of education.
– Efforts from governments and firms to reduce the gender bias present in Computer Science and more generally STEM subjects.
– Investment from firms in the training of their employees in programming languages to avoid a generational gap that would arise from 1.
JUMS: In addition to your Bachelor’s degree, you also have a Masters in Philosophy in Development Studies and a MicroMaster in Data and Economics. Why do you think this combination of Data, Economics and Ethics is a valuable knowledge base?
If we look back at history we find that economics, data and philosophy were one. Adam Smith, who is considered the founding father of political economy, wrote The Wealth of Nations, and The Theory of Moral Sentiments. With his acute sense of observation, he could gather the necessary “data” at a small scale, to describe the productive benefits of the division of labour, but also the contradictions it comes with. Unfortunately, the later dimension in Smith is often forgotten. Nowadays, in term of data, the scale and scope are augmented, and whilst it brings technical challenge, it is not the full story. John Maynard Keynes in the obituary essay for Alfred Marshall wrote “an economist must be mathematician, historian, statesman, philosopher- in some degree … [He must] study the present in the light of the past for the purposes of the future”.
Since the last half-century, the field of economics has become very technical at the expense of the other dimensions; this is one of the reasons why I decided to study development studies rather than pure economics. My opinion is one cannot consider questions around the economy without at least being aware that economics, like social science, is value-laden. Others would find the latter proposition debatable.
JUMS: What would you recommend to other students in advance of their Bachelor’s or Master’s thesis? What do they have to pay attention to and what were your personal challenges?
Like any piece of work, I find fundamental that people enjoy what they are doing. For a thesis, the challenge is thus to identify a subject that you will enjoy investigating and learn about. A Bachelor/Master thesis can be a 3-6 months commitment. It can also quickly become a lonely journey. Unless there is interest, the work becomes a burden.
Also, supervision is essential. I had the chance to have not one, but three supervisors that could guide me along the process. Therefore, once the student has a rough idea of the subject, identifying the supervisor that is interested in the topic is critical.
JUMS: Why did you come up with the idea to submit your thesis to JUMS?
It relates to the previous question. My supervisor, Professor Markus Bick, suggested sending the manuscript to JUMS. He has been very helpful during my whole year at ESCP Europe in Berlin and after. Also, having the possibility to be published was aligned with my goal to provide an accessible manuscript on machine learning for a “business audience”. Finally, the feedback that JUMS gave me was constructive to understand how to improve my future work.
JUMS: At the end of the conversation, we always have a small supplement that we would ask you to complete: “Writing a thesis meant for me …”
… planting the seeds to pursue a career in academia.
JUMS: Thank you, dear Estevan, for the interesting insights and tips you can give to our readers. We wish you good luck for your future!