Here’s a function: f(x). It’s expensive to calculate, not necessarily an analytic expression, and you don’t know its derivative.
Your task: find the global minima.
This is, for sure, a difficult task, one more difficult than other optimization problems within machine learning. Gradient descent, for one, has access to a function’s derivatives and takes advantage of mathematical shortcuts for faster expression evaluation.
Alternatively, in some optimization scenarios the function is cheap to evaluate. If we can get hundreds of results for variants of an input x in a few seconds, a simple grid search can be employed with good results.
Or, an entire host of non-conventional non-gradient optimization methods can be used, like particle swarming or simulated annealing. …
2021 is here, and deep learning is as active as ever; research in the field is speeding up exponentially. There are obviously many more deep learning advancements that are fascinating and exciting. To me, though, the five presented demonstrate a central undercurrent in ongoing deep learning research: how necessary is the largeness of deep learning models?
tl;dr: GrowNet applies gradient boosting to shallow neural networks. It has been rising in popularity, yielding superior results in classification, regression, and ranking. It may indicate research supporting larger ensembles and shallower networks on non-specialized data (non-image or sequence).
Gradient boosting has proven to become very popular in recent years, rivalling that of a neural network. The idea is to have an ensemble of weak (simple) learners, where each corrects the mistake of the previous. For instance, an ideal 3-model gradient boosting ensemble may look like this, where the real label of the example is
All machine learning models originate in the computer lab. They’re initialized, trained, tested, redesigned, trained again, fine-tuned, tested yet again before they are deployed.
Afterwards, they fulfill their duty in epidemiological modelling, stock trading, shopping item recommendation, and cyber attack detection, among many other purposes. Unfortunately, success in the lab may not always mean success in the real world — even if the model does well on the test data.
This big problem — that the machine learning models being developed on the computer to serve a purpose in the real world can often crash — has had little research or discussion. …
The model had been training across several sessions for many days on an image recognition competition. It was a relatively simple, and scored about a 0.9 AUC initially — the metric for the competition, which is between 0 and 1. I didn’t expect much from it at all.
That’s why I quite literally jumped out of my seat when I began the usual routine of loading the model weights and training for several epochs:
The A/B Test is revered across the marketing and experimentation landscapes as the golden testing standard. While there are other testing methods like bandits, the simplicity of the A/B test usually makes it a default.
Indeed, the A/B test is incredibly simple. It’s essentially what you did in your elementary school science fair experiment: do something to one group, do another thing to another group, and see the different in results. It’s not really an algorithm so much as it is the process of science.
But that simplicity can be very misleading, because it’s often presented with relatively simple examples. …
If you’ve been keeping up with the Kaggle News, you may be familiar with the Mechanisms of Action competition by the Laboratory for Innovation Science at Harvard recently closed. I’m proud to say that my partner, Andy Wang, and I managed to place in the top 4% — 152nd out of 4,373 teams.
What’s interesting, though, is that we’re relatively new to Kaggle competition. In terms of machine learning, we’re not exactly professionals — we’re both students that have picked up Python and machine learning from online courses and tutorials.
We didn’t get gold, of course. That’s for the top 10 in the entire competition, and something astronomically difficult to get to. The solutions are usually incredibly intricate and complex. For instance, this is part of the 7-model solution the first place team came up…
Neural networks are notoriously bad at handling tabular data: it seems that their spotlight has primarily been on specialized forms of data like images, text, and sound. The convolutional neural network is one such example.
By converting non-image data, or even sequential data, into an image, convolutional neural networks can utilize their special properties of being computationally efficient and locally focused. Furthermore, it is able to leverage the unique insights and nonlinearities of unsupervised learning.
Study into convolutional neural networks (CNNs) has risen dramatically in recent years as the computing power and data is being increasingly made available. CNNs, however, are exclusively used for image data — and why not? …
Take a moment to locate the nearest major city around you.
If you were to travel there now, which mode of transportation would you use? You may take a car, a bus, or a train. Perhaps you’ll ride a bike, or even purchase an airplane ticket.
Regardless of how you get to the location, you incorporate probability into the decision-making process. Perhaps there’s a 70 percent chance of rain or a car crash, which can cause traffic jams. If your bike tire is old, it may pop - this is certainly a large probabilistic factor.
On the other hand, there are deterministic costs — for instance, the cost of gas or an airplane ticket — as well as deterministic rewards — like substantially faster travel times taking an airplane (depending on the location) or perhaps a free Uber/Lyft ride. …
Deep learning has a big problem: it needs to gobble up massive quantities of data before it can generalize reasonably well and become useful. This is in fact one of the limitations of deep learning that restrict its applications in many fields where data is either not abundant or difficult to obtain.
Contrast this with humans — who, while being portrayed on the losing side of a human versus machine intelligence battle — can learn complex concepts with only a few training examples. A baby who has no idea of what a cat or a dog is can learn to classify them after seeing a few images of a cat and a few images of a dog. …
Deep neural networks have a big problem — they’re constantly hungry for data. When there is too little data — an amount that would be acceptable for other algorithms — deep neural networks have tremendous difficulty generalizing. This phenomenon highlights a gap between human and machine cognition; humans can learn complex patterns with few training examples (albeit at a slower rate).
While research in self-supervised learning is growing to develop structures in which labels are completely unnecessary (labels are cleverly found in the training data itself), its use cases are restricted.
Semi-supervised learning, another quickly growing field, utilizes latent variables learned through unsupervised training to boost the performance of supervised learning. This is an important concept, but its scope is limited to use cases where there is a relatively large unsupervised-to-supervised data ratio, and where the unlabeled data is compatible with labeled data. …