Six members of Facebook AI Research (FAIR) used Transformer's popular neural network architecture to create an end-to-end object recognition AI. This approach simplifies the creation of object recognition models and reduces the need for handcrafted components. The model, known as the Detection Transformer (DETR), can recognize objects in an image in one pass at a time.
DETR is the first object detection framework that successfully integrates the transformer architecture as a central component in the detection pipeline, according to FAIR in a blog entry. The authors added that Transformers could revolutionize computer vision, as they have done in natural language processing in recent years, or close gaps between NLP and computer vision.
"DETR directly (in parallel) predicts the final set of detections by combining a common CNN with a transformer architecture," says a FAIR paper published on Wednesday alongside the open source version of DETR. "The new model is conceptually simple and, unlike many other modern detectors, does not require a special library."
Created by Google researchers in 2017 that Transformer network architecture was originally intended as a means to improve machine translation, but has become a cornerstone of machine learning to develop some of the most popular, pre-built, state-of-the-art language models, such as Google's BERT, Facebook's RoBERTa, and many others. Speaking to VentureBeat, Google AI chief Jeff Dean and other AI experts said that transformer-based language models are an important trend in 2019 and are expected to continue in 2020.
Transformers use attention functions instead of a recurring neural network to predict what's next in a sequence. When applied to object recognition, a transformer can cut out steps to create a model, e.g. For example, the need to create spatial anchors and custom layers.
DETR achieves results comparable to Faster R-CNN, an object recognition model that was primarily created by Microsoft Research and, according to arXiv, has received almost 10,000 citations since it was launched in 2015. The DETR researchers carried out experiments with the COCO object recognition data set as well as others in connection with panoptical segmentation, the type of object recognition in which image areas are painted instead of a bounding box.
A major problem the authors encountered: DETR works better on large objects than on small objects. "Current detectors have required improvements over several years to solve similar problems, and we expect future work to successfully address them for DETR," the authors wrote.
DETR is the latest Facebook AI initiative looking for a language model solution to solve a computer vision challenge. Earlier this month, Facebook unveiled the hateful meme record and the challenge of working to create a multimodal AI that can detect when an image and text in a meme violate Facebook guidelines. In related news earlier this week, the Wall Street Journal reported An internal investigation in 2018 found that Facebook's recommendation algorithms "exploit the attraction of the human brain to cleavage," but executives largely ignored the analysis.