[1]Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision
paper:https://arxiv.org/abs/2304.01484
code:https://github.com/xinyiying/lesps
[2]Multi-view Adversarial Discriminator: Mine the Non-causal Factors for Object Detection in Unseen Domains
paper:https://arxiv.org/abs/2304.02950
[3]Continual Detection Transformer for Incremental Object Detection
paper:https://arxiv.org/abs/2304.03110
[4]DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
paper:https://arxiv.org/abs/2304.04514
[5]Benchmarking the Physical-world Adversarial Robustness of Vehicle Detection
paper:https://arxiv.org/abs/2304.05098
3D目标检测(3D object detection)
[1]Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
paper:https://arxiv.org/abs/2304.01464
code:https://github.com/azhuantou/hssda
[2]Curricular Object Manipulation in LiDAR-based Object Detection
paper:https://arxiv.org/abs/2304.04248
code:https://github.com/zzy816/com
人物交互检测(HOI Detection)
[1]Instant-NVR: Instant Neural Volumetric Rendering for Human-object Interactions from Monocular RGBD Stream
paper:https://arxiv.org/abs/2304.03184
[2]Relational Context Learning for Human-Object Interaction Detection
paper:https://arxiv.org/abs/2304.04997
异常检测(Anomaly Detection)
[1]Robust Outlier Rejection for 3D Registration with Variational Bayes
paper:https://arxiv.org/abs/2304.01514
code:https://github.com/jiang-hb/vbreg
[2]Video Event Restoration Based on Keyframes for Video Anomaly Detection
paper:https://arxiv.org/abs/2304.05112
语义分割(Semantic Segmentation)
[1]DiGA: Distil to Generalize and then Adapt for Domain Adaptive Semantic Segmentation
paper:https://arxiv.org/abs/2304.02222
code:https://github.com/fy-vision/diga
[2]Exploiting the Complementarity of 2D and 3D Networks to Address Domain-Shift in 3D Semantic Segmentation
paper:https://arxiv.org/abs/2304.02991
code:https://github.com/cvlab-unibo/mm2d3d
[4]Continual Semantic Segmentation with Automatic Memory Sample Selection
paper:https://arxiv.org/abs/2304.05015
深度估计(Depth Estimation)
[1]EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation
paper:https://arxiv.org/abs/2304.03369
[2]DualRefine: Self-Supervised Depth and Pose Estimation Through Iterative Epipolar Sampling and Refinement Toward Equilibrium
paper:https://arxiv.org/abs/2304.03560
code:https://github.com/antabangun/dualrefine
人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
[1]A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB Image
paper:https://arxiv.org/abs/2304.03635
code:https://github.com/changlongjianggit/a2j-transformer
[2]Monocular 3D Human Pose Estimation for Sports Broadcasts using Partial Sports Field Registration
paper:https://arxiv.org/abs/2304.04437
code:https://github.com/tobibaum/partialsportsfieldreg_3dhpe
[3]DeFeeNet: Consecutive 3D Human Motion Prediction with Deviation Feedback
paper:https://arxiv.org/abs/2304.04496
视频处理(Video Processing)
[1]BiFormer: Learning Bilateral Motion Estimation via Bilateral Transformer for 4K Video Frame Interpolation
paper:https://arxiv.org/abs/2304.02225
code:https://github.com/junheum/biformer
[1]Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
paper:https://arxiv.org/abs/2304.01436
[2]StyleGAN Salon: Multi-View Latent Optimization for Pose-Invariant Hairstyle Transfer
paper:https://arxiv.org/abs/2304.02744
[3]GANHead: Towards Generative Animatable Neural Head Avatars
paper:https://arxiv.org/abs/2304.03950
目标跟踪(Object Tracking)
[1]Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
paper:https://arxiv.org/abs/2304.01893
[2]Unsupervised Sampling Promoting for Stochastic Human Trajectory Prediction
paper:https://arxiv.org/abs/2304.04298
code:https://github.com/viewsetting/unsupervised_sampling_promoting
[1]Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
paper:https://arxiv.org/abs/2304.05173
行人重识别/检测(Re-Identification/Detection)
[1]PartMix: Regularization Strategy to Learn Part Discovery for Visible-Infrared Person Re-identification
paper:https://arxiv.org/abs/2304.01537
[2]Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
paper:https://arxiv.org/abs/2304.04205
code:https://github.com/jiawei151/sgiel_vireid
图像/视频字幕(Image/Video Caption)
[1]Cross-Domain Image Captioning with Discriminative Finetuning
paper:https://arxiv.org/abs/2304.01662
code:https://github.com/facebookresearch/EGG
[1]Topology-Guided Multi-Class Cell Context Generation for Digital Pathology
paper:https://arxiv.org/abs/2304.02255
[2]Deep Prototypical-Parts Ease Morphological Kidney Stone Identification and are Competitively Robust to Photometric Perturbations
paper:https://arxiv.org/abs/2304.04077
code:https://github.com/danielf29/prototipical_parts
[3]Coherent Concept-based Explanations in Medical Image and Its Application to Skin Lesion Diagnosis
paper:https://arxiv.org/abs/2304.04579
code:https://github.com/cristianopatricio/coherent-cbe-skin
图像生成/图像合成(Image Generation/Image Synthesis)
[1]Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
paper:https://arxiv.org/abs/2304.01816
[2]Few-shot Semantic Image Synthesis with Class Affinity Transfer
paper:https://arxiv.org/abs/2304.02321
点云(Point Cloud)
[1]MEnsA: Mix-up Ensemble Average for Unsupervised Multi Target Domain Adaptation on 3D Point Clouds
paper:https://arxiv.org/abs/2304.01554
code:https://github.com/sinashish/mensa_mtda
场景重建/视图合成/新视角合成(Novel View Synthesis)
[1]Lift3D: Synthesize 3D Training Data by Lifting 2D GAN to 3D Generative Radiance Field
paper:https://arxiv.org/abs/2304.03526
[2]POEM: Reconstructing Hand in a Point Embedded Multi-view Stereo
paper:https://arxiv.org/abs/2304.04038
code:https://github.com/lixiny/poem
[3]Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos
paper:https://arxiv.org/abs/2304.04452
[1]Towards Unified Scene Text Spotting based on Sequence Generation
paper:https://arxiv.org/abs/2304.03435
神经网络结构设计(Neural Network Structure Design)
[1]SMPConv: Self-moving Point Representations for Continuous Convolution
paper:https://arxiv.org/abs/2304.02330
code:https://github.com/sangnekim/smpconv
CNN
[1]VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
paper:https://arxiv.org/abs/2304.01434
code:https://github.com/jaeill/CVPR23-VNE
Transformer
[1]METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
paper:https://arxiv.org/abs/2304.02211
[2]MethaneMapper: Spectral Absorption aware Hyperspectral Transformer for Methane Detection
paper:https://arxiv.org/abs/2304.02767
[3]Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
paper:https://arxiv.org/abs/2304.03282
code:https://github.com/dingmyu/dependencyvit
[4]Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention
paper:https://arxiv.org/abs/2304.04237
code:https://github.com/leaplabthu/slide-transformer
图神经网络(GNN)
[1]Adversarially Robust Neural Architecture Search for Graph Neural Networks
paper:https://arxiv.org/abs/2304.04168
归一化/正则化(Batch Normalization)
[1]Delving into Discrete Normalizing Flows on SO(3) Manifold for Probabilistic Rotation Modeling
paper:https://arxiv.org/abs/2304.03937
模型训练/泛化(Model Training/Generalization)
[1]Re-thinking Model Inversion Attacks Against Deep Neural Networks
paper:https://arxiv.org/abs/2304.01669
[2]Improved Test-Time Adaptation for Domain Generalization
paper:https://arxiv.org/abs/2304.04494
长尾分布(Long-Tailed Distribution)
[1]Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
paper:https://arxiv.org/abs/2304.01279
code:https://github.com/jinyan-06/shike
视觉表征学习(Visual Representation Learning)
[1]HNeRV: A Hybrid Neural Representation for Videos
paper:https://arxiv.org/abs/2304.02633
code:https://github.com/haochen-rye/hnerv
多模态学习(Multi-Modal Learning)
[1]Detecting and Grounding Multi-Modal Media Manipulation
paper:https://arxiv.org/abs/2304.02556
code:https://github.com/rshaojimmy/multimodal-deepfake
[2]Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
paper:https://arxiv.org/abs/2304.02853
[3]Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
paper:https://arxiv.org/abs/2304.03307
code:https://github.com/talalwasim/vita-clip
视觉-语言(Vision-language)
[1]Learning to Name Classes for Vision and Language Models
paper:https://arxiv.org/abs/2304.01830
[2]VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision
paper:https://arxiv.org/abs/2304.03135
code:https://github.com/lmy98129/vlpd
[3]CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
paper:https://arxiv.org/abs/2304.04231
code:https://github.com/dk-liang/crowdclip
[4]Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
paper:https://arxiv.org/abs/2304.04907
场景图生成(Scene Graph Generation)
[1]Devil's on the Edges: Selective Quad Attention for Scene Graph Generation
paper:https://arxiv.org/abs/2304.03495
视觉推理/视觉问答(Visual Reasoning/VQA)
[1]Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering
paper:https://arxiv.org/abs/2304.03754
[1]Weakly supervised segmentation with point annotations for histopathology images via contrast-based variational model
paper:https://arxiv.org/abs/2304.03572
[2]Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
paper:https://arxiv.org/abs/2304.04175
[3]SOOD: Towards Semi-Supervised Oriented Object Detection
paper:https://arxiv.org/abs/2304.04515
code:https://github.com/hamperdredes/sood
[4]Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
paper:https://arxiv.org/abs/2304.01482
code:https://github.com/ucdvision/patchsearch
神经网络可解释性(Neural Network Interpretability)
[1]Gradient-based Uncertainty Attribution for Explainable Bayesian Deep Learning
paper:https://arxiv.org/abs/2304.04824
图像计数(Image Counting)
[1]Density Map Distillation for Incremental Object Counting
paper:https://arxiv.org/abs/2304.05255
其他
[1]Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
paper:https://arxiv.org/abs/2304.01804
code:https://github.com/youngwk/bridgegapexplanationpamc
[2]Knowledge Combination to Learn Rotated Detection Without Rotated Annotation
paper:https://arxiv.org/abs/2304.02199
[3]CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition
paper:https://arxiv.org/abs/2304.03167
[4]DC2: Dual-Camera Defocus Control by Learning to Refocus
paper:https://arxiv.org/abs/2304.0328