Airfeed

Top papers for you

April 6, 2023
Training-Free Layout Control with Cross-Attention Guidance
Computer Vision and Pattern Recognition

read on arxiv

TLDR:
Researchers have developed a technique, called layout guidance, which allows for robust layout control without requiring training and can be applied to both generated and real images.

Summary:

  • Diffusion-based image generators can't interpret instructions for spatial layout.
  • Researchers propose a technique called "layout guidance" to achieve layout control without training.
  • The technique manipulates cross-attention layers to steer image reconstruction in the desired direction.

Why is this important?

  • This technique allows for robust layout control without requiring training, making it more accessible.
  • It can be extended to editing the layout and context of real images.

April 7, 2023
Text Semantics to Image Generation: A method of building facades design base on Stable Diffusion model
Computer Vision and Pattern Recognition, Artificial Intelligence

read on arxiv

TLDR:
Researchers proposed a multi-network approach using Stable Diffusion model, LoRA approach, and ControlNet model to enhance the controllability of generating text-to-building facade images in architectural design.

Summary:

  • A multi-network combined text-to-building facade image generating method is proposed using Stable Diffusion model, LoRA approach, and ControlNet model.
  • The LoRA training approach decreases the possibility of fine-tuning the Stable Diffusion large model.
  • The addition of the ControlNet model increases the controllability of the creation of text-to-building facade images.

Why is this important?

  • This method enhances the controllability of generated image content in architectural design.
  • The results provide a foundation for further studies on the generation of architectural images.

April 6, 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Computation and Language, Artificial Intelligence

read on arxiv

TLDR:
G-Eval is a framework that uses GPT-4 and chain-of-thoughts to accurately assess the quality of NLG outputs, outperforming previous methods and providing a new way to improve NLG systems.

Summary:

  • Measuring the quality of texts generated by NLG is difficult.
  • Using large language models as evaluators can be more effective than reference-based metrics.
  • G-Eval framework uses GPT-4 and chain-of-thoughts to assess the quality of NLG outputs.

Why is this important?

  • Accurately evaluating the quality of NLG outputs is important for improving NLG systems.
  • G-Eval provides a new framework that outperforms previous methods in text summarization and dialogue generation.

April 7, 2023
Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
Computer Vision and Pattern Recognition

read on arxiv

TLDR:
The paper proposes Lift3D, a new method for generating high-quality 3D training data using an inverted 2D-to-3D generation framework, which can improve the performance of 3D vision tasks.

Summary:

  • The paper proposes a new method for generating 3D training data for computer vision tasks.
  • The method, called Lift3D, uses an inverted 2D-to-3D generation framework.
  • Lift3D can generate high-resolution and photorealistic 3D objects with accurate 3D annotations.

Why is this important?

  • Generating 3D training data is crucial for improving the performance of 3D vision tasks.
  • Lift3D provides a more effective and accurate way to generate 3D training data compared to existing methods.

April 6, 2023
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Computer Vision and Pattern Recognition

read on arxiv

TLDR:
InstantBooth is a novel approach for fast and efficient personalized text-to-image generation without the need for test-time finetuning.

Summary:

  • InstantBooth is a new approach for personalized text-to-image generation.
  • It eliminates the need for test-time finetuning and can generate images instantly.
  • It extracts the general concept of input images through a learnable image encoder and keeps fine details of identity using adapter layers.

Why is this important?

  • Existing personalization approaches require heavy test-time finetuning for each concept, which is time-consuming and difficult to scale.
  • InstantBooth eliminates the need for test-time finetuning, making personalized image generation faster and more efficient.

April 7, 2023
From Retrieval to Generation: Efficient and Effective Entity Set Expansion
Computation and Language, Information Retrieval

read on arxiv

TLDR:
This paper proposes a new generative framework, GenExpan, for efficient and effective Entity Set Expansion (ESE) using a pre-trained language model.

Summary:

  • Entity Set Expansion (ESE) is the task of expanding a small seed entity set to include more entities of the same class.
  • Existing ESE methods are inefficient and have poor scalability.
  • This paper proposes a new generative ESE framework called GenExpan that uses a pre-trained language model to generate new entities.
  • GenExpan is efficient and achieves better performance than previous ESE methods.

Why is this important?

  • ESE is an important task for expanding entity sets and improving natural language processing.
  • The proposed GenExpan framework offers a more efficient and effective solution for ESE compared to existing methods.

April 7, 2023
Revisiting Automated Prompting: Are We Actually Doing Better?
Computation and Language, Machine Learning

read on arxiv

TLDR:
This paper suggests that automated prompting does not consistently outperform manual prompts in improving the performance of Large Language Models in downstream tasks.

Summary:

  • Large Language Models (LLMs) are great few-shot learners and prompting can significantly increase their performance in downstream tasks.
  • Automation has been attempted for human-led prompting, but it does not consistently outperform simple manual prompts.
  • Fine-tuning and manual prompts should be used as a baseline in this line of research.

Why is this important?

  • Large Language Models have the potential to revolutionize many fields, but their performance can be further improved with prompting.
  • Automated prompting can save time and effort, but this paper suggests that manual prompts should still be used as a baseline to compare the performance of automated methods.


Papers 🤝 open source

April 6, 2023
AMS-DRL: Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones
Robotics, Artificial Intelligence

read on arxiv | view code

TLDR:
The paper proposes a novel approach, AMS-DRL, for safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers, which outperforms baselines in extensive simulations and physical experiments.

Summary:

  • The paper proposes a novel approach, AMS-DRL, to train an adversarial neural network that can learn from the actions of multiple pursuers and adapt quickly to their behavior to enable a drone to avoid attacks and reach its target.
  • The approach guarantees convergence by ensuring Nash Equilibrium among agents from game-theory analysis.
  • The method outperforms baselines with higher navigation success rates, as shown in extensive simulations and physical experiments.

Why is this important?

  • The proposed approach provides a solution to the challenge of safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers.
  • The method can be applied to real-world scenarios where drones need to evade multiple attackers to reach their targets.

April 6, 2023
Self-Supervised Video Similarity Learning
Computer Vision and Pattern Recognition, Machine Learning

read on arxiv | view code

TLDR:
S$^2$VS is a self-supervised video similarity learning approach that achieves state-of-the-art performance on multiple tasks without any labeled data, reducing the cost and effort required for training.

Summary:

  • S$^2$VS is a video similarity learning approach with self-supervision.
  • This method performs video similarity learning without any labeled data.
  • The approach uses instance-discrimination with task-tailored augmentations and the InfoNCE loss to achieve state-of-the-art performance on multiple retrieval and detection tasks.

Why is this important?

  • S$^2$VS achieves state-of-the-art performance on multiple retrieval and detection tasks without the need for labeled data.
  • This approach can greatly reduce the cost and effort required to train video similarity models.

April 6, 2023
Graph Collaborative Signals Denoising and Augmentation for Recommendation
Information Retrieval, Artificial Intelligence, Machine Learning

read on arxiv | view code

TLDR:
The paper proposes a new graph adjacency matrix and user-item interaction matrix to enhance graph-based recommendation systems by incorporating user-user and item-item correlations.

Summary:

  • Graph collaborative filtering is a technique for recommendation systems.
  • The adjacency matrix for GCF can be noisy for users/items with abundant interactions and insufficient for users/items with scarce interactions.
  • A new graph adjacency matrix is proposed that incorporates user-user and item-item correlations, as well as a properly designed user-item interaction matrix.

Why is this important?

  • The proposed method improves the recommendation system by enhancing the user-item interaction matrix and incorporating user-user and item-item correlations.
  • The method can improve recommendations for users with both abundant and insufficient interactions.

That's all for today, see you tomorrow! :)

Send feedback: https://tally.so/r/w5bolM