Python word cloud from a html page: Difference between revisions
From wikiluntti
Line 6: | Line 6: | ||
=== Fetching the table === | === Fetching the table === | ||
Data scraping is easiest using Pandas. BeautifulSoup is an other good option. | |||
=== Linguistic analyzation === | === Linguistic analyzation === |
Revision as of 22:07, 18 August 2021
Introduction
Analyze html tables using word clouds.
Theory
Fetching the table
Data scraping is easiest using Pandas. BeautifulSoup is an other good option.
Linguistic analyzation
The Finnish language is used, thus Voikko morphological analyzer is used to lemmatize the words into the base format.
sudo apt -y install -y voikko-fi python-libvoikko pip3 install libvoikko
References
https://data.solita.fi/finnish-stemming-and-lemmatization-in-python/