Those wings... I want them too.
АнглийскийApparently, I'll never make a habit to write diary posts regularly, without falling offline for long time, even for the sake of improving English. This time I got deeply into family solicitudes, that are mercilessly devouring my time. That's good that I've managed to finish the OpenDataScience course, things could go the other way.
And this course was another activity that took almost all my time, in November and December. Usually, I stop writing due to participation in competitions and for some extend mlcourse.ai was a kind of a "competition".
It had a rating and names of participants with top-100 (out of 2000+) results were published on the course site. This list is there and I'm the 5th (yes, I'm bragging):
https://mlcourse.ai/rating
There were Kaggle Inclass competitions among activities that give points for rating, but in general structure of the course was more complex. The activities that gave credits for the rating were:
Assignments, most of which were Jupyter Notebooks with tasks to write code, compute the answers and fill a Google Form. It was the main, but the simplest part of the course, in my opinion. I've mistaken only a pair of times and lost ~3-4 points out of ~100.
The first two assignments, devoted to Exploratory Data Analysis with Pandas and Visual Analysis, were easy, so I managed to complete similar lessons from Kaggle Learn simultaneously:
https://www.kaggle.com/learn/pandas
https://www.kaggle.com/learn/data-visualisation
The 3rd, the 5th and the 7th assignments about Decision Trees & K-NN, Ensembles of Algorithms and Unsupervised Learning correspondingly, were more profound ML topics, but not so difficult too.
The 4th assigment, about Linear Models, was connected to competition and had two parts. The first was again to fill the answers into Google Form, the second, the "freeride", was to improve created model and beat competitions baselines. That wasn't easy.
The 6th, Feature Engineering, and the 10th, Gradient Boosting, were competitions too and had only "freeride" parts. I'll tell about competitions separately.
Though the 8th topic was devoted to the Vopal Wabbit library, in the corresponding assignment we were implementing our own version of logistic regression and used it to predict tags of StackOverflow questions. It required good mathematical skills and organizers considered it as the hardest assignment. For me, however, the competitions were harder. (Maybe because I
Math).
In contrast, the 9th assignment, on Time Series prediction was short and simple. Possibly, it was the easiest task in the course, though corresponding articles were long and saturated.
Competition "Alice" (do not mix up with yandex voice assistant, nothing in common), the 4th assignment:
https://www.kaggle.com/c/catch-me-if-you-can-intruder-detection-through-webpage-session-tracking2
The task was to distinct web sessions of specific user ("Alice"
from sessions of all other users. Each session was presented by a row in the dataset, containing up to 10 sites along with the times they were visited. It was assumed that a session lasted 30 minutes or until ten sites were visited. If user visited more sites in 30 minutes or was visiting them a longer period, they were considered as belonging to a next session. That not very correct dataset composing, in my opinion, but that was just given as is.
I spent a plenty of time and efforts trying to beat all specified baselines, but finally managed to. In the first two competitions, there were additional credits, awarded to those who had achieved the top-10 places, but I decided not to participate in that race. Due to my gender I'm not so fond of "dick contests" of all types
Competition "Medium", the 6th assignment:
https://www.kaggle.com/c/how-good-is-your-medium-article
The task was to predict the amount of "claps" (likes) that article gathered on the blog platform Medium. Again, I've beaten all needed baselines, but didn't try to go higher.
What confused me in this competition is that the "author" feature turned out to be the most useful and it was recommened for usage by organizers. Such model works well only for known authors, and for someone new it will show poor results. It would be better to make predictions on the characteristics of the article itself, after "anonymization", in my opinion. But the dataset was provided in such form and I had to work with it this way.
Competition "Flight delays", the 10th assignment:
https://www.kaggle.com/c/flight-delays-fall-2018
In this competition we had to predict, whether the flight would be delayed for more than 15 minutes or not. Organizers claimed that it was a simple task, but for me it wasn't so primitive, though, yes, much more easier then two other competitions.
Project "Alice" or individual project. In this activity we had to create data analysis project on either the data from the competition "Alice" or on our own dataset. In the first case, the analysis was performed with step-by-step procedure, prepared by the organizers. n the second we were working by ourselves with accordance to specified plan. I've choosen the second option and analysed dataset with results of Stack Overflow Developer Survey 2018. Here's my research:
Developer Career Satisfaction.
Tutorial. A Jupyter Notebook that teaches some ML-related topic or shows some library usage. Initially I thought I would not have enough time to write tutorial, but when I had finished individual project, there were several days left before deadline. I remembered that I had encounteret interesting information about NumPy in different places and quickly composed it into tutorial:
NumPy tutorial
That's it. For now, I'm not going to participate in any course or competition in the next few months. It makes me upset, because there are so many opportunities, and I'm busy
By the way, there are three PyLadies community in our country now. Besides PyLadies Spb, PyLadies Msk (tadam!) and PyLadies Kazan have appeared (Telegram chats links):


And this course was another activity that took almost all my time, in November and December. Usually, I stop writing due to participation in competitions and for some extend mlcourse.ai was a kind of a "competition".
It had a rating and names of participants with top-100 (out of 2000+) results were published on the course site. This list is there and I'm the 5th (yes, I'm bragging):
https://mlcourse.ai/rating
There were Kaggle Inclass competitions among activities that give points for rating, but in general structure of the course was more complex. The activities that gave credits for the rating were:
Assignments, most of which were Jupyter Notebooks with tasks to write code, compute the answers and fill a Google Form. It was the main, but the simplest part of the course, in my opinion. I've mistaken only a pair of times and lost ~3-4 points out of ~100.
The first two assignments, devoted to Exploratory Data Analysis with Pandas and Visual Analysis, were easy, so I managed to complete similar lessons from Kaggle Learn simultaneously:
https://www.kaggle.com/learn/pandas
https://www.kaggle.com/learn/data-visualisation
The 3rd, the 5th and the 7th assignments about Decision Trees & K-NN, Ensembles of Algorithms and Unsupervised Learning correspondingly, were more profound ML topics, but not so difficult too.
The 4th assigment, about Linear Models, was connected to competition and had two parts. The first was again to fill the answers into Google Form, the second, the "freeride", was to improve created model and beat competitions baselines. That wasn't easy.
The 6th, Feature Engineering, and the 10th, Gradient Boosting, were competitions too and had only "freeride" parts. I'll tell about competitions separately.
Though the 8th topic was devoted to the Vopal Wabbit library, in the corresponding assignment we were implementing our own version of logistic regression and used it to predict tags of StackOverflow questions. It required good mathematical skills and organizers considered it as the hardest assignment. For me, however, the competitions were harder. (Maybe because I

In contrast, the 9th assignment, on Time Series prediction was short and simple. Possibly, it was the easiest task in the course, though corresponding articles were long and saturated.
Competition "Alice" (do not mix up with yandex voice assistant, nothing in common), the 4th assignment:
https://www.kaggle.com/c/catch-me-if-you-can-intruder-detection-through-webpage-session-tracking2
The task was to distinct web sessions of specific user ("Alice"

I spent a plenty of time and efforts trying to beat all specified baselines, but finally managed to. In the first two competitions, there were additional credits, awarded to those who had achieved the top-10 places, but I decided not to participate in that race. Due to my gender I'm not so fond of "dick contests" of all types

Competition "Medium", the 6th assignment:
https://www.kaggle.com/c/how-good-is-your-medium-article
The task was to predict the amount of "claps" (likes) that article gathered on the blog platform Medium. Again, I've beaten all needed baselines, but didn't try to go higher.
What confused me in this competition is that the "author" feature turned out to be the most useful and it was recommened for usage by organizers. Such model works well only for known authors, and for someone new it will show poor results. It would be better to make predictions on the characteristics of the article itself, after "anonymization", in my opinion. But the dataset was provided in such form and I had to work with it this way.
Competition "Flight delays", the 10th assignment:
https://www.kaggle.com/c/flight-delays-fall-2018
In this competition we had to predict, whether the flight would be delayed for more than 15 minutes or not. Organizers claimed that it was a simple task, but for me it wasn't so primitive, though, yes, much more easier then two other competitions.
Project "Alice" or individual project. In this activity we had to create data analysis project on either the data from the competition "Alice" or on our own dataset. In the first case, the analysis was performed with step-by-step procedure, prepared by the organizers. n the second we were working by ourselves with accordance to specified plan. I've choosen the second option and analysed dataset with results of Stack Overflow Developer Survey 2018. Here's my research:
Developer Career Satisfaction.
Tutorial. A Jupyter Notebook that teaches some ML-related topic or shows some library usage. Initially I thought I would not have enough time to write tutorial, but when I had finished individual project, there were several days left before deadline. I remembered that I had encounteret interesting information about NumPy in different places and quickly composed it into tutorial:
NumPy tutorial
That's it. For now, I'm not going to participate in any course or competition in the next few months. It makes me upset, because there are so many opportunities, and I'm busy

By the way, there are three PyLadies community in our country now. Besides PyLadies Spb, PyLadies Msk (tadam!) and PyLadies Kazan have appeared (Telegram chats links):


@темы: english writing skills, Data Science, Machine Learning