While much of the work in the design of convolutional networks over the last ﬁve years has revolved around the empirical investigation of the importance of depth, ﬁlter sizes, and number of feature channels, recent studies have shown that branching, i.e., splitting the computation along parallel but distinct threads and then aggregating their outputs, represents a new promising dimension for signiﬁcant improvements in performance. To combat the complexity of design choices in multi-branch architectures, prior work has adopted simple strategies, such as a ﬁxed branching factor, the same input being fed to all parallel branches, and an additive combination of the outputs produced by all branches at aggregation points. In this work we remove these predeﬁned choices and propose an algorithm to learn the connections between branches in the network. Instead of being chosen a priori by the human designer, the multi-branch connectivity is learned simultaneously with the weights of the network by optimizing a single loss function deﬁned with respect to the end task. We demonstrate our approach on the problem of multi-class image classiﬁcation using four different datasets where it yields consistently higher accuracy compared to the state-of-the-art ResNeXt multi-branch network given the same learning capacity.