scholarly-entity-usage-detection/usage-classificator/annotations_method.csv at master · michaelfaerber/scholarly-entity-usage-detection · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
;doc_id;relation;used;ner;sentence;pre_sentence;post_sentence;section_name;section_index
30875;10203151008a20b32ce089f7f9d580005c2426cf;Method;0;convolutional layer activations;In particular for image retrieval , Babenko et al . and Gong et al . concurrently propose the use of Fully Connected ( FC ) layer activations as descriptors , while convolutional layer activations are later shown to have superior performance .;Using CNN layer activations as off - the - shelf image descriptors appears very effective and is adopted in many tasks .;Generalization to other tasks is attained by CNN activations , at least up to some extent .;Introduction;1
97801;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;1;weight initialization;A poor weight initialization may take longer to train and / or converge to sub - optimal solutions .;Optimal parameter initialization remains a crucial problem for neural network training .;Here , we propose a method of weight re - initialization by repeated annealing and injection of noise in the training process .;Abstract;1
54406;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;0;Spatio - temporal filters;Spatio - temporal filters .;Of course spatial pyramids are widely used in other areas of computer vision and have recently been used in deep neural networks to learn generative image models .;Burt and Adelson lay out the theory of spatio - temporal models for motion estimation and Heeger provides a computational embodiment .;Related Work;2
103733;435259c5f3cffd75ef837a8e638cc8f6244e25c4;Method;0;sliding - window strategy;A naive approach follows a sliding - window strategy , where regions defined by the window are processed independently .;Originally designed for image recognition and classification , CNNs are now commonly used in semantic image segmentation .;As explained before , this technique presents two main drawbacks : reduction of segmentation accuracy and low efficiency .;Methods;4
17186;0a053f55804eee01f3c8b4138a1d3364d5bc45ac;Method;0;Neural LP;IRN and Neural LP explore multi - step relations by using an RNN controller with attention over an external memory .;Hence , recent works have proposed approaches for injecting multi - step paths such as random walks through sequences of triples during training , further improving performance on KBC tasks .;Compared to RL - based approaches , it is hard to interpret the traversal paths , and these models can be computationally expensive to access the entire graph in memory .;Knowledge Base Completion;15
38388;15cc54ed7b1582b2efd71bedf28b23634d82991b;Method;1;MMD - rep - b;[ reference ] shows that : When PICO was used , the hinge , MMD - rbf and MMD - rep methods were sensitive to the choices of while MMD - rep - b was robust .;Fig .;For hinge and MMD - rbf , higher may result in better FID scores and less diverged cases over 16 learning rate combinations .;Experiments;37
27104;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;deep neural ranking models;In particular , we use classic unsupervised IR models as a weak supervision signal for training deep neural ranking models .;To overcome this issue , in this paper , we propose to leverage large amounts of unsupervised data to infer “ noisy ” or “ weak ” labels and use that signal for learning supervised models as if we had the ground truth labels .;Weak supervision here refers to a learning approach that creates its own training data by heuristically retrieving documents for a large query set .;Introduction;1
25448;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;gradual feature fusion steps;With proposed gradual feature fusion steps and cascade label guidance structure , we produce decent prediction results .;[ reference ] and [ reference ] show the visual results of ICNet on Cityscapes .;Intriguingly , output of the ‘ sub4 ’ branch can already capture most of semantically meaningful objects .;Visual Improvement;24
12675;071b16f25117fb6133480c6259227d54fc2a5ea0;Method;0;long short - term memory ( LSTM ) unit;This gated unit is similar to a long short - term memory ( LSTM ) unit proposed earlier by , sharing with it the ability to better model and learn long - term dependencies .;The gated hidden unit is an alternative to the conventional simple units such as an element - wise .;This is made possible by having computation paths in the unfolded RNN for which the product of derivatives is close to 1 .;Recurrent Neural Network;23
72738;2e10643c3759f97b673ff8c297778c0b6c20032b;Method;1;distributed word representation;Comparing with traditional models , this suggests such a simple use of a distributed word representation may not give us an advantage to text classification .;One of the most obvious facts one could observe from table [ reference ] and figure [ reference ] is that the bag - of - means model performs worse in every case .;However , our experiments does not speak for any other language processing tasks or use of word2vec in any other way .;Discussion;12
107712;46018a894d533813d67322827ca51f78aed6d59e;Method;1;basic model;The segmentation results on two subjects from our validation set , produced by different variations of the basic model can be viewed in Figure [ reference ] .;However , we can study the impact these features have on predictions by visualizing the segmentation results of different models .;As shown in the figure , the two - phase training procedure allows the model to learn from a more realistic distribution of labels and thus removes false positives produced by the model which trains with one training phase .;The TwoPathCNN architecture;14
4393;027f9695189355d18ec6be8e48f3d23ea25db35d;Method;0;Concrete distribution;Gumbel - Softmax ( Jang , Gu , and Poole 2017 ) ( or Concrete distribution [ reference ] ) is a method of utilizing discrete random variables in a network .;;Since it approximates one - hot vectors sampled from a categorical distribution by making them continuous , gradients of model parameters can be calculated using the reparameterization trick and the standard backpropagation .;Gumbel - Softmax;6
13557;07c4fc48ad7b7d1a417b0bb72d0ae2d4efc5aa83;Method;0;convolution window size;subsection : Filter dilation and convolution window size;In particular , in our experiments we always alternate and .;Filter dilation , as introduced in , is a technique for aggregating multiscale information across considerably larger receptive fields in convolution operations , while avoiding an explosion in parameter count for the convolution kernels .;Filter dilation and convolution window size;5
5922;03184ac97ebf0724c45a29ab49f2a8ce59ac2de3;Method;1;Attribute representations;Attribute representations are defined as a vector per class , or a column of the ( class attribute ) matrix .;Discrete vs Continuous Attributes .;These vectors ( 85 - dim for AWA , 312 - dim for CUB ) can either model the presence / absence ( ) or the confidence level ( ) of each attribute .;Experimental Results;13
34427;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;1;AS Reader;Lastly , on CBT - CN the GA Reader with the qe - comm feature outperforms all previously published single models except the NSE , and AS Reader trained on a larger corpus .;For CBT - NE , GA Reader with the qe - comm feature outperforms all previous single and ensemble models except the AS Reader trained on the much larger BookTest Corpus bajgar2016embracing .;For each of the 4 datasets on which GA achieves the top performance , we conducted one - sample proportion tests to test whether GA is significantly better than the second - best baseline .;Performance Comparison;11
4893;02a5b7a41ffa8518eb3b7cae9914a2bd2bbc886b;Method;1;Min - max strategy;To initialise SiamMask , we extract the axis - aligned bounding box ( Min - max strategy , Figure 3 ) from the mask provided in the first frame .;For each measure C ∈ { J , F } , three statistics are considered : mean C M , recall C O , and decay C D , which informs us about the gain / loss of performance over time [ reference ] .;Similarly to most VOS methods , in case of multiple objects in the same video ( DAVIS - 2017 ) we simply perform multiple inferences .;Evaluation for semi - supervised VOS;12
61151;25c108a56e4cb757b62911639a40e9caf07f1b4f;Method;1;shallow version;The structure of our model is a shallow version of the ResNet where the first seven ResNet blocks are used , i.e. , from conv1 to res3c .;All faces are labelled with bounding boxes and five landmarks .;We use this model in scale - forecast network and LRN .;Setup and Implementation Details;9
27028;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;1;Markovian;The RAM is therefore a relatively compact representation of the game state , and in contrast to the game screen , it is also Markovian .;The Atari 2600 has only bits of random access memory , which must hold the complete internal state of a game : location of game entities , timers , health indicators , etc .;The purpose of our RAM - based agent is to investigate whether features generated from the RAM affect performance differently from features generated from game screens .;RAM - based Feature Generation;42
79855;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;dropout technique;In between intermediate layers , we use batch normalization and dropout technique to prevent overfitting along with norm regularization .;Output of layer is then passed to two fully connected layers with again output dimensions and finally connects to a softmax layer for computing class probabilities .;We set depending upon the dataset size ( towards higher for larger dataset ) and for setting hidden dimension .;Experiment and Results;12
52443;207e0ac5301a3c79af862951b70632ed650f74f7;Method;1;deep learning based model;In addition , deep learning based model is also compared .;Note that XQDA can be considered as hybrid between metric learning and subspace learning .;For fair comparison , whenever possible ( i.e. code is available and features can be replaced ) , we compare with these methods using the same LOMO features .;Fully Supervised Learning Results;12
46567;1bc072002d97808340b312b69427baf2dc9fcb8e;Method;1;feature transformation methods;To get our DNNs efficiently work , we propose to leverage three feature transformation methods , i.e. , factorisation machines ( FMs ) , restricted Boltzmann machines ( RBMs ) and;To tackle the issue , we propose two novel models using deep neural networks ( DNNs ) to automatically learn effective patterns from categorical feature interactions and make predictions of users ’ ad clicks .;denoising auto - encoders ( DAEs ) .;Deep Learning over Multi - field Categorical Data;0
102438;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;0;assumed kernel;On the other hand , when the assumed kernel is sharper than the true kernel , high frequency ringing artifacts will appear .;Most of SISR methods actually favor for such case .;;Blur kernel .;5
33378;1109b663453e78a59e4f66446d71720ac58cec25;Method;1;localization approach;Our localization approach won the 2013 ILSVRC competition and significantly outperformed all 2012 and 2013 approaches .;The scheme we propose involves substantial modifications to networks designed for classification , but clearly demonstrate that ConvNets are capable of these more challenging tasks .;The detection model was among the top performers during the competition , and ranks first in post - competition results .;Discussion;15
98917;40eb1e54cb5382dfd3b7efd16dc7df826262ea52;Method;1;box estimation network;The center residual predicted by the box estimation network is combined with the previous center residual from the T - Net and the masked points ’ centroid to recover an absolute center ( Eq .;We take a “ residual ” approach for box center estimation .;[ reference ] ) .;Amodal 3D Box Estimation PointNet;12
104535;4402c6c8445f17f4161e0f64573b7e28df1ca180;Method;1;relu function;Compared with the sigmoidal family , relu function has the advantages of sparsity and efficient gradient , which is possible to gain more benefits on multi - field categorical data .;x ) .;Figure 5 compares these activation functions on FNN , IPNN and OPNN .;1 ) Embedding Layer :;14
17068;0a053f55804eee01f3c8b4138a1d3364d5bc45ac;Method;1;embedding - based models;We compare against RL - based methods , embedding - based models ( including DistMult , ComplEx and ConvE ) and recent work in logical rules ( NeuralLP ) .;We use HITS@1 , 3 and mean reciprocal rank ( MRR ) as the evaluation metrics for WN18RR , and use mean average precision ( MAP ) for NELL995 , where HITS@ computes the percentage of the desired entities being ranked among the top - list , and MRR computes an average of the reciprocal rank of the desired entities .;For all the baseline methods , we used the implementation released by the corresponding authors with their best - reported hyperparameter settings .;Knowledge Base Completion;10
72024;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;dilated models;Also , the dilated models outperform their regular counterparts , Vanilla ( did n’t converge , omitted ) , LSTM and GRU , without increasing the model complexity .;To the best of our knowledge , the dilated GRU with 1.27 BPC achieves the best result among models of similar sizes without layer normalizations .;;Language modeling;17
72901;2e4c06dd00c4c09ad5ac6be883cc66c19d88ea79;Method;1;augmented MBCE loss;In this work , we enforce both inputs to be in the range for simplicity and improved performance , and compute the augmented MBCE loss as follows : where is the standard cross - entropy loss with sigmoid function .;"The motivation behind this design is to maintain flexibility to handle different input formats ; the input is usually binary , but the input can be binary , real - valued , or both .";At inference time , we use the reconstructed output for link prediction and disregard the output .;Link Prediction;3
29656;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;1;smooth loss;We used Focal loss with and smooth loss for classification and bbox regression , respectively .;We optimized subnet with Adam starting from learning rate 1e - 5 and is decreased by a factor of 0.1 in plateaux .;We obtained final proposals using NMS with a threshold of 0.3 .;Person Detection Subnet :;15
7767;051b3763c2ad4e4271db712b0e9a4cfe298d05db;Method;1;f - warp layer;Unlike Spatial Transformer , f - warp layer is not fully constrained and is a relaxed version of it as the flow field is not parameterized .;Our network uses the proposed f - warp layer to displace each channel of the given vector - valued feature according to the provided flow field .;While transformation in FlowNet2 and SPyNet is limited to images , our decider network is a more generic warping network that warps high - level CNN features .;Related Work;2
18807;0a3a003457f5d7758a42a0e4b7278b39a86ed0bd;Method;1;meta - batch strategy;Using our HT meta - batch strategy , hard tasks are sampled every time after running meta - batches , , the failure classes used for sampling hard tasks are from tasks .;The size of meta - batch is set to ( tasks ) due to the memory limit .;"The number of hard task is selected for different settings by validation : and hard tasks respectively for the 1 - shot and 5 - shot experiments on the miniImageNet dataset ; and respectively , and hard tasks for the 1 - shot , 5 - shot and 10 - shot experiments on the FC100 dataset .";Implementation details;17
93702;3d734edc41c13fb4da2c3709e8255b004d083962;Method;1;MULTI - SCALE INFORMATION LEARNING NETWORK STRUCTURE;section : PROPOSED MULTI - SCALE INFORMATION LEARNING NETWORK STRUCTURE;Furthermore , the output of the dilated convolution can keep the same size with its input , so we can fuse these different scale information through concatenation operator easily .;The configuration of our proposed single image SR network structure is outlined in Fig .;PROPOSED MULTI - SCALE INFORMATION LEARNING NETWORK STRUCTURE;3
99486;41232a69c0f8d4b993e6c6e00b16c223442c962f;Method;1;sequence - to - sequence;The dual - attention sequence - to - sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions .;To avoid generating fake facts in a summary , we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text .;Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80 % .;Faithful to the Original : Fact Aware Neural Abstractive Summarization;0
103516;434bf475addfb580707208618f99c8be0c55cf95;Method;0;sparse feature representation;This helps creating a sparse feature representation .;When initializing the weights uniformly , half of the weights are negative .;Another positive aspect is the relatively cheap computation .;Rectified Linear Unit;6
71595;2d876ed1dd2c58058d7197b734a8e4d349b8f231;Method;0;cuDNN library ’s RNN primitives;It is also important to note that the cuDNN library ’s RNN primitives do not natively support any form of recurrent dropout .;"Note that the softmax , over a vocabulary size of only 10 , 000 words , is relatively small ; for tasks with larger vocabularies , the softmax would likely dominate computation time .";That is , running an LSTM that uses a state - of - the - art regularization scheme at cuDNN - like speeds would likely require an entirely custom kernel .;Language Modeling;6
86649;36c3972569a6949ecca90bfa6f8e99883e092845;Method;1;Detectron models;Second , we choose models trained with different settings , , the tweaked up - down model trained on the VQA dataset with / without data augmentation and models trained with image features extracted from different Detectron models with / without data augmentation .;As can be seen from Fig [ reference ] , the performance plateaus at 70.96 % .;As can be seen , this ensembling strategy is much more effective than the previous one .;Model Ensembling;8
53325;20cc4bfdb648fd7947c71252589fc867d4d16933;Method;1;PC;To qualitatively study the improvement in localization due to PC , we obtain samples from the CUB - 200 - 2011 dataset and visualize the localization regions returned from Grad - CAM for both the baseline and PC - trained VGG - 16 model .;Change in Class - Activation Mapping :;As shown in Figure 3 , PC models provide tighter , more accurate localization around the target object , whereas sometimes the baseline model has localization driven by image artifacts .;Improvement in Localization Ability;15
35823;143a3186c368544ded00a444be33153420baa254;Method;1;pretraining baseline;"The pretraining baseline in the main text trained a single network on all tasks , which we referred to as "" pretraining on all tasks "" .";;To evaluate the model , as with MAML , we fine - tuned this model on each test task using K examples .;C.1 . Multi - task baselines;23
28073;0f0cab9235bbf185acdd4f9713fd111ca50effca;Method;0;LogEig layer;The final LogEig layer endows elements in Riemannian manifold with a Lie Group structure so that matrices can be flattened and standard euclidean operations can be applied .;As discussed earlier , SPD matrices lie on Riemannian manifold .;If be input matrix , be output matrix , the LogEig layer applied in - th layer is defined as where is an eigenvalue decomposition and is an element - wise matrix operation .;Log Eigenvalue Layer ( LogEig );13
15174;0899bb0f3d5425c88b358638bb8556729720c8db;Method;1;pinhole camera model;The pinhole camera model yields the distance estimate with synthetic rendering distance and focal lengths , of the test sensor and synthetic views .;At test time , we compute the ratio between the detected bounding box diagonal and the corresponding codebook diagonal , i.e. at similar orientation .;It follows that with principal points and bounding box centers .;Projective Distance Estimation .;20
47483;1c0e8c3fb143eb5eb5af3026eae7257255fcf814;Method;1;Overfeat - 7;We also compare WSDDN to the SPP - net [ reference ] which uses the Overfeat - 7 [ reference ] with a 4 - level spatial pyramid pooling layer { 6 × 6 , 3 × 3 , 2 × 2 , 1 × 1 } for supervised object detection .;WS - DDN S and M improves 8 and 7 points over VGG - F and VGG - M - 1024 respectively .;While they do not perform fine - tuning , they include a spatial pooling layer .;Classification Results;13
33740;128c727ac06fcc50f1735cb222a441eee6adcab6;Method;0;linear state - of - the - art models;We further show that previous linear state - of - the - art models , RESCAL , DistMult , ComplEx and SimplE , are all special cases of our model .;As well as being fully expressive , TuckER ’s number of parameters grows linearly with respect to embedding dimension as the number of entities or relations in a knowledge graph increases .;Future work might include exploring various means of softly regularizing the model other than dropout and finding a way to incorporate background knowledge on individual relation properties into the existing model .;Conclusion;19
43136;19839ffab4c30db1556d7fd9275d1344a6e3fa46;Method;1;0.2 dropout;We apply 0.5 dropout to the word embeddings and character CNN outputs and 0.2 dropout to all hidden layers and feature embeddings .;Training Details During training , we use the categorical cross - entropy as objective , with Adam optimizer initial learning rate 0.001 .;In the LSTMs , we employ variational dropout masks that are shared across timesteps , with 0.4 dropout rate .;Hyperparameters;16
63118;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;recurrent convolution networks;This research demonstrates two modified and improved segmentation models , one using recurrent convolution networks , and another using recurrent residual convolutional networks .;Thus , it is important to design efficient DCNN architectures for segmentation tasks which can ensure better performance with less number of network parameters .;To accomplish our goals , the proposed models are evaluated on different modalities of medical imagining as shown in Fig .;I. INTRODUCTION;2
38702;15e07c1344e97e46ade2ee0a57017371fa05fe12;Method;1;Margin - based scoring;subsection : Margin - based scoring;We next describe our scoring method inspired by this idea in Section [ reference ] , and discuss our candidate generation and filtering strategy in Section [ reference ] .;In order to account for the relative scale of cosine similarity and provide a globally consistent measure , our proposed scoring function considers the margin between the cosine of a given candidate and the average cosine of its nearest neighbors in both directions as follows : where denotes the nearest neighbors of in the other language , and analogously for .;Margin - based scoring;5
96752;3febb2bed8865945e7fddc99efd791887bb7e14f;Method;1;top LSTM layer;More specifically , we learn a linear combination of the vectors stacked above each input word for each end task , which markedly improves performance over just using the top LSTM layer .;Unlike previous approaches for learning contextualized word vectors Peters2017SemisupervisedST , McCann2017LearnedIT , ELMo representations are deep , in the sense that they are a function of all of the internal layers of the biLM .;Combining the internal states in this manner allows for very rich word representations .;Introduction;1
103164;43428880d75b3a14257c3ee9bda054e61eb869c0;Method;na;Nvidia M40 GPU;[ reference ] . All models are implemented in Torch [ reference ] and trained on a single Nvidia M40 GPU except for WMT'14 EnglishFrench for which we use a multi - GPU setup on a single machine .;Besides dropout on the embeddings and the decoder output , we also apply dropout to the input of the convolutional blocks;"We train on up to eight GPUs synchronously by maintaining copies of the model on each card and split the batch so that each worker computes 1 / 8 - th of the gradients ; at the end we sum the gradients via Nvidia NCCL .";Model Parameters and Optimization;14
37417;15212fa4d30863ea1f9bd9591eee03848278242d;Method;0;JaTeCS framework;A drawback of JaDCI is thus that , for the researcher wishing to replicate the results of or simply wishing to use JaDCI , a substantive effort in installing and properly configuring the entire JaTeCS framework is thus needed .;JaTeCS is a complex package , since it makes available many functionalities for text analytics research .;In this paper we present PyDCI , a new implementation of the DCI method written in Python and built on top of the SciPy stack and scikit - learn toolkit .;Introduction;1
4396;027f9695189355d18ec6be8e48f3d23ea25db35d;Method;0;GumbelSoftmax;GumbelSoftmax is known to have an advantage over score - functionbased gradient estimators such as REINFORCE [ reference ] which suffer from high variance and slow convergence;Since it approximates one - hot vectors sampled from a categorical distribution by making them continuous , gradients of model parameters can be calculated using the reparameterization trick and the standard backpropagation .;[ reference ] .;Gumbel - Softmax;6
105366;4508f81033c9a7cec785ce4d16f1193920c1b341;Method;1;neural sequence models;Table [ reference ] lists recent results of various neural sequence models on the Wikipedia dataset .;At each step we sample a batch of sequences of 500 characters each , use the first 100 characters as the minimum context and predict the latter 400 characters .;All the results except for the ByteNet result are obtained using some variant of the LSTM recurrent neural network .;Character Prediction;14
1673;01125e3c68edb420b8d884ff53fb38d9fbe4f2b8;Method;1;volumetric regression network;[ t ] 1 [ t ] 0.98 [ t ] 1 We train our volumetric regression network using the sigmoid cross entropy loss function : where is the corresponding sigmoid output at voxel of the regressed volume .;The second hourglass is used to refine this output , and has an identical structure to that of the first one .;At test time , and given an input 2D image , the network regresses a 3D volume from which the outer 3D facial mesh is recovered .;Volumetric Regression Networks;7
37113;1518039b5001f1836565215eb047526b3ac7f462;Method;0;variable - length encoding of words;The main difference to other compression algorithms , such as Huffman encoding , which have been proposed to produce a variable - length encoding of words for NMT [ reference ] , is that our symbol sequences are still interpretable as subword units , and that the network can generalize to translate and produce new words ( unseen at training time ) on the basis of these subword units .;In practice , we increase efficiency by indexing all pairs , and updating data structures incrementally .;Figure 1 shows a toy example of learned BPE operations .;Byte Pair Encoding ( BPE );6
73713;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;0;deformable part models;Besides the boosted cascade methods , several studies apply deformable part models ( DPM ) for face detection .;After that , numerous of works have focused on developing more advanced features and more powerful classifiers .;The DPM methods detect faces by modeling the relationship of deformable facial parts .;Related Work;2
63830;27e4b65121d3c88643d86dc91a9bdafdf223b988;Method;0;log - linear extractive summarization model;Since these corpora are too small to train large neural networks on , namas trained their models on the Gigaword corpus , but combined it with an additional log - linear extractive summarization model with handcrafted features , that is trained on the DUC 2003 corpus .;The DUC corpus comes in two parts : the 2003 corpus consisting of 624 document , summary pairs and the 2004 corpus consisting of 500 pairs .;They call the original neural attention model the ABS model , and the combined model ABS + .;DUC Corpus;10
14526;0875fc92cce33df5cf7df169590dbf0ca00d2652;Method;0;dynamic sentence representation;Given the caption representation from the language model , , the operator outputs a dynamic sentence representation at each step through a weighted sum using alignment probabilities : The corresponding alignment probability for the word in the caption is obtained using the caption representation and the current hidden state of the generative model : where , , and are the learned model parameters of the alignment model .;[ reference ] ) , with and initialized to learned biases : The function is used to compute the alignment between the input caption and intermediate image generative steps bahdanau_mt .;The function of Eq .;Image Model : the Conditional DRAW Network;5
34465;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;1;information filters;Our model design is backed up by an ablation study showing statistically significant improvements of using Gated Attention as information filters .;Our model achieves the state - of - the - art performance on several large - scale benchmark datasets with more than 4 % improvements over competitive baselines .;We also showed empirically that multiplicative gating is superior to addition and concatenation operations for implementing gated - attentions , though a theoretical justification remains part of future research goals .;Conclusion;15
43993;1a0912bb76777469295bb2c059faee907e7f3258;Method;1;deconv layer;The keypoint head consists of a stack of eight 3×3 512 - d conv layers , followed by a deconv layer and 2× bilinear upscaling , producing an output resolution of 56×56 .;We adopt the ResNet - FPN variant , and the keypoint head architecture is similar to that in Figure 4 ( right ) .;We found that a relatively high resolution output ( compared to masks ) is required for keypoint - level localization accuracy .;Mask R - CNN for Human Pose Estimation;11
37382;15212fa4d30863ea1f9bd9591eee03848278242d;Method;1;Distributional Correspondence Indexing;This paper introduces PyDCI , a new implementation of Distributional Correspondence Indexing ( DCI ) written in Python .;;DCI is a transfer learning method for cross - domain and cross - lingual text classification for which we had provided an implementation ( here called JaDCI ) built on top of JaTeCS , a Java framework for text classification .;Revisiting Distributional Correspondence Indexing : A Python Reimplementation and New Experiments;0
63076;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;pixel - based approach;However , to switch from the patch - based approach to the pixel - based approach that works with the entire image , we must be aware of the class imbalance problem .;In this work , we have evaluated the proposed approaches on both patch - based and entire image - based approaches .;In the case of semantic segmentation , the image backgrounds are assigned a label and the foreground regions are assigned a target class .;I. INTRODUCTION;2
12385;07045f87709d0b7b998794e9fa912c0aba912281;Method;1;stochastic gradient descent implementation of Caffe;The input images and their corresponding segmentation maps are used to train the network with the stochastic gradient descent implementation of Caffe .;;Due to the unpadded convolutions , the output image is smaller than the input by a constant border width .;Training;3
58974;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;neural generative model;We propose a neural generative model with multinomial conditional likelihood .;Vaes generalize linear latent - factor models and enable us to explore non - linear probabilistic latent - variable models , powered by neural networks , on large - scale recommendation datasets .;Despite being widely used in language modeling and economics [ reference ][ reference ] , multinomial likelihoods appear less studied in the collaborative filtering literature , particularly within the context of latent - factor models .;INTRODUCTION;2
512;0012de6bec1f25599e4f02517637e531a71909b9;Method;1;network predictions;The network predictions , which consist of two volumes having the same resolution as the original input data , are processed through a soft - max layer which outputs the probability of each voxel to belong to foreground and to background .;;In medical volumes such as the ones we are processing in this work , it is not uncommon that the anatomy of interest occupies only a very small region of the scan .;Dice loss layer;3
98871;40eb1e54cb5382dfd3b7efd16dc7df826262ea52;Method;0;depth map;Given a 2D image region ( and its corresponding 3D frustum ) , several methods might be used to obtain 3D location of the object : One straightforward solution is to directly regress 3D object locations ( e.g. , by 3D bounding box ) from a depth map using 2D CNNs .;;However , this problem is not easy as occluding objects and background clutter is common in natural scenes ( as in Fig .;3D Instance Segmentation;8
5059;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;0;multi - atlas label propagation approach;Both works incorporate morphological and contextual features to better capture the heterogeneity of lesions . also incorporate brain structure segmentation results obtained from a multi - atlas label propagation approach ( ) to provide strong tissue - class priors to the Random Forests .;This framework was adopted in multiple works , with representative pipelines for brain tumors by and TBI by .;additionally use a Markov Random Field ( MRF ) to incorporate spatial regularization .;Related Work;2
49377;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;Eigen v3;Eigen v3 was used for the fast sparse tensor operations , using the provided CSC and CSR formats .;We used Torch7 to implement SAM , DAM , NTM , DNC and SDNC .;All benchmarks were run on a Linux desktop running Ubuntu 14.04.1 with 32GiB of RAM and an Intel Xeon E5 - 1650 3.20GHz processor with power scaling disabled .;Benchmarking details;33
42836;18d62040534012818abb90e37eade5dab6dca716;Method;0;neural sequence - to - sequence models;We also show that our classifier can be used to improve the performance of neural sequence - to - sequence models for generating questions for reading comprehension .;We construct and release a dataset of 25 , 100 publicly available questions classified into well - formed and non - wellformed categories and report an accuracy of 70.7 % on the test set .;;Identifying Well - formed Natural Language Questions;0
79362;33998aff64ce51df8dee45989cdca4b6b1329ec4;Method;0;multi - head attention;[ reference ] . Applying multi - head attention multiplies the storage and parameter requirements by a factor of K , while the individual heads ' computations are fully independent and can be parallelized .;This complexity is on par with the baseline methods such as Graph Convolutional Networks ( GCNs );As opposed to GCNs , our model allows for ( implicitly ) assigning different importances to nodes of a same neighborhood , enabling a leap in model capacity .;COMPARISONS TO RELATED WORK;5
15170;0899bb0f3d5425c88b358638bb8556729720c8db;Method;1;ResNet50 backbone;We also train RetinaNet with ResNet50 backbone which is slower but more accurate .;We finetune SSD with VGG16 base using object recordings on black background from different viewpoints which are provided in the training datasets of LineMOD and T - LESS .;Multiple objects are copied in a scene at random orientation , scale and translation .;Training the Object Detector .;19
9761;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;1;MatConvNet;The network is trained using stochastic gradient descend with momentum , implemented in MatConvNet;Training details .;[ reference ] .;Implementation details;8
76498;303fef411f235e6d1125a40af1e93224f498a4d5;Method;0;generalization of Ngram language models;However , their focus was to improve the generalization of Ngram language models via a sparse plus low - rank approximation .;In language modeling , hutchinson2011low , hutchinson2012sparse have previously considered the problem from a matrix rank perspective .;By contrast , as neural language models already generalize well , we focus on a high - rank neural language model that improves expressiveness without sacrificing generalization .;Related work;20
40689;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;1;LSTM - LM setup;subsection : LSTM - LM setup;While the RNN - LM estimates a probability for unknown words , we take a different approach in rescoring : The number of out - of - set words is recorded for each hypothesis and a penalty for them is estimated for them when optimizing the relative weights for all model scores ( acoustic , LM , pronunciation ) , using the SRILM nbest - optimize tool .;After obtaining good results with RNN - LMs we also explored the LSTM recurrent network architecture for language modeling , inspired by recent work showing gains over RNN - LMs for conversational speech recognition .;LSTM - LM setup;11
85161;3652c2d20f198dc39ad159eba55d08341c56d628;Method;1;dimensional spatial maps;More precisely , our scheme approximates the kernel map of defined in ( [ reference ] ) at layer by finite - dimensional spatial maps , where is a set of coordinates related to , and is a positive integer controlling the quality of the approximation .;In this section , we show that when the coordinate sets are two - dimensional regular grids , a natural approximation for the multilayer convolutional kernel consists of a sequence of spatial convolutions with learned filters , pointwise non - linearities , and pooling operations , as illustrated in Figure [ reference ] .;Consider indeed two images represented at layer by image feature maps and , respectively .;Training Invariant Convolutional Kernel Networks;12
49862;1e21b925b65303ef0299af65e018ec1e1b9b8d60;Method;0;feed - forward methods;While initially style transfer was obtained by a slow optimization process styletransfer , recently , the emphasis was put on feed - forward methods ulyanov16texture , Johnson2016Perceptual .;We do not employ style losses in our method .;There are many links between style transfer and our work : both are unsupervised and generate a sample under constancy given an input sample .;Related work;2
56755;231af7dc01a166cac3b5b01ca05778238f796e41;Method;na;ADAM procedure;This framework allows to show convergence for gradient descent methods beyond stochastic gradient like for the ADAM procedure where current learning parameters are memorized and updated .;;The random processes and may track the current learning status for the fast and slow iterate , respectively .;Comments .;36
73809;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;0;position - sensitive RoI pooling;In the original R - FCN work , global average pooling is adopted to aggregate the features after position - sensitive RoI pooling into a single dimension .;;This operation leads to the uniform contribution of each position of the face .;Position - Sensitive Average Pooling;5
91504;3b1b94441010615195a5c404409ce2416860508c;Method;1;Stochastic gradient Descent;We use Stochastic gradient Descent ( SGD ) with mini - batches of 100 image - QA pairs .;Note that is a regularization term , where .;The attributes , internal textual representation , external knowledge embedding size , word embedding size and hidden state size are all 256 in all experiments .;An Answer Generation Model with Multiple Inputs;12
11517;063ad0349f05c8aacbbb653ffcf01047a293fa30;Method;1;BoW representation;Separating the left and the right context ( LR - Left - Right ) for BoW representation , does not improve the performance .;We also experimented with adding tri - grams but it did not have a positive effect on the overall scores .;Left - right pooling of dense embeddings performed weakly in comparison with other representations and therefore their results were omitted .;Results;26
78182;325093f2c5b33d7507c10aa422e96aa5b10a33f1;Method;0;Squeeze - and - Excitation Networks;We introduce a novel and unified layer that replaces the commonly used succession of batch normalization ( BN ) and nonlinear activation layers ( Act ) , which are integral with modern deep learning architectures like ResNet , ResNeXt , Inception - ResNet , WideResNet , Squeeze - and - Excitation Networks , DenseNet , .;In this work , we focus on increasing the memory efficiency of the training process of modern network architectures in order to further leverage performance of deep neural networks in tasks like image classification and semantic segmentation .;Our solution is coined InPlace - ABN and proposes to merge batch normalization and activation layers in order to enable in - place computation , using only a single memory buffer for storing the results ( see illustration in Figure [ reference ] ) .;Introduction;1
17901;0a34fe39e9938ae8c813a81ae6d2d3a325600e5c;Method;1;global alignment;In fact , it was recently claimed that a global alignment is both more robust and far faster to warp than non - parametric transformations .;Such aligned faces are then further processed in systems for face recognition , emotion recognition , age and gender estimation , and more .;This paper focuses on such global transformations , showing how they can be estimated quickly and accurately using a deep neural network .;Related work;2
43352;19fd2c2c9d4eecb3cf1befa8ac845a860083e8e7;Method;1;distributed framework;We applied our distributed framework for RL , known as Gorila ( General Reinforcement Learning Architecture ) , to create a massively distributed version of the DQN algorithm .;As in DistBelief , the parameters of the Q - network may also be distributed over many machines .;We applied Gorila DQN to 49 games on the Atari 2600 platform .;Introduction;1
19320;0a6d7e8e61c54c796f53120fdb86a25177e00998;Method;0;Universal Schema approach;For example , the Universal Schema approach factorizes a 2D unfolding of the tensor ( a matrix of entity pairs vs. relations ) while extend this also to other pairs .;Pairwise interaction models were also considered to improve prediction performances .;In the Neural Tensor Network ( NTN ) model , combine linear transformations and multiple bilinear forms of subject and object embeddings to jointly feed them into a nonlinear neural layer .;Related Work;11
100246;41d08fb733f3e50ac183490f84d6377dffccf350;Method;1;point cloud generation network;Using the point cloud generation network trained by either EMD or CD to is enough to outperform 3D - R2N2 's result .;The predicted volume is concatenated with the registered occupancy as the 3D conv network 's input .;The maximum performance as reported in the main paper is obtained by feeding both network 's prediction into the post processing network .;Implementation details;20
65844;2a94c84383ee3de5e6211d43d16e7de387f68878;Method;0;Stacked Hourglass networks;There are recent methods exploiting lateral / skip connections that associate low - level feature maps across resolutions and semantic levels , including U - Net [ reference ] and SharpMask [ reference ] for segmentation , Recombinator networks [ reference ] for face detection , and Stacked Hourglass networks [ reference ] for keypoint estimation .;SSD [ reference ] and MS - CNN [ reference ] predict objects at multiple layers of the feature hierarchy without combining features or scores .;Ghiasi et al .;Related Work;3
42487;1822ca8db58b0382b0c64f310840f0f875ea02c0;Method;0;CoGAN );To overcome this problem , Liu and Tuzel [ reference ] propose a coupled generative adversarial network ( CoGAN ) by employing weight - sharing networks to learn a joint distribution across domains .;The main drawback of [ reference ] is that it requires pairs of corresponding images as training data .;"More recently , CycleGAN [ reference ] introduces cycle consistency based on "" pix2pix "" framework in [ reference ] to learn the image trans -";Related Work;3
11492;063ad0349f05c8aacbbb653ffcf01047a293fa30;Method;1;LSTM - Final;paragraph : Final output state ( LSTM - Final ) :;Representations for a location ( ) are obtained using one of the following two approaches :;is the output embedding of the bidirectional LSTM .;Final output state ( LSTM - Final ) :;21
95487;3e95925d2bca43223453010ff8516a492287ce19;Method;1;local modules;In developing this encoder , we seek to better model rare slot - value pairs by sharing parameters between each slot through global modules and learning slot - specific features through local modules .;This problem is amplified in joint tracking , due to the accumulation of turn - level errors .;The global - locally self - attentive encoder consists of a bidirectional LSTM Hochreiter1997Long , which captures temporal relationships within the sequence , followed by a self - attention layer to compute the summary of the sequence .;Global - Locally Self - Attentive Encoder;3
83023;3580d8a5e7584e98d547ebfed900749d347f6714;Method;1;dual attention;The model consists of field - gating encoder and description generator with dual attention .;We propose a structure - aware seq2seq architecture to encode both the content and the structure of a table for table - to - text generation .;We add a field gate to the encoder LSTM unit to incorporate the field information .;Conclusions;18
97148;3febb2bed8865945e7fddc99efd791887bb7e14f;Method;0;first biLM layer;However , unlike WSD , accuracies using the first biLM layer are higher than the top layer , consistent with results from deep biLSTMs in multi - task training Sgaard2016DeepML , joint - many - iclr07 and MT Belinkov2017WhatDN .;Similar to WSD , the biLM representations are competitive with carefully tuned , task specific biLSTMs Ling2015FindingFI , Ma2016EndtoendSL .;CoVe POS tagging accuracies follow the same pattern as those from the biLM , and just like for WSD , the biLM achieves higher accuracies than the CoVe encoder .;What information is captured by the biLM ’s representations ?;12
49252;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;0;memory architectures;Although we have focused on a specific MANN ( SAM ) , which is closely related to the NTM , the approach taken here is general and can be applied to many differentiable memory architectures , such as Memory Networks .;We have demonstrated that you can train neural networks with large memories via a sparse read and write scheme that makes use of efficient data structures within the network , and obtain significant speedups during training .;It should be noted that there are multiple possible routes toward scalable memory architectures .;Discussion;18
5777;03184ac97ebf0724c45a29ab49f2a8ce59ac2de3;Method;0;Error Correcting Output Codes;In order to collect relationships independently of data , compressed sensing uses random projections whereas Error Correcting Output Codes builds embeddings inspired from information theory .;These relationships can be collected separately from the data , learned from the data or derived from side information .;WSABIE uses images with their corresponding labels to learn an embedding of the labels , and CCA maximizes the correlation between two different data modalities .;Related Work;2
8759;052443e1709c0f7d3432cca7c451534eea76b7ca;Method;0;interpolation - based methods;To address this problem , the SR literature proposes interpolation - based methods [ reference ] , reconstruction - based methods [ reference ][ reference ][ reference ][ reference ][ reference ] , and learning - based methods [ reference ][ reference ][ reference ][ reference ][ reference ][ reference ][ reference ] .;SR is heavily ill - posed since multiple HR patches could correspond to the same LR image patch .;The example - based SR [ reference ] uses prior knowledge under the form of corresponding pairs of LR - HR image patches extracted internally from the input LR image or from external images .;Introduction;2
54908;223319a93dcf3912bbc1e5f949e5ab4d53906e62;Method;0;feature extractor layer Syn Numbers;MNIST - M : top feature extractor layer Syn Numbers;MNIST →;→ SVHN : last hidden layer of the label predictor;Experiments;7
82586;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;1;VAT vat;On the MNIST dataset with 100 and 1000 labels , the performance of AAEs is significantly better than VAEs , on par with VAT vat and CatGAN catgan , but is outperformed by the Ladder networks ladder and the ADGM adgm .;The results of semi - supervised classification experiments on MNIST and SVHN datasets are reported in Table [ reference ] .;We also trained a supervised AAE model on all the available labels , and obtained the error rate of .;Semi - Supervised Adversarial Autoencoders;9
59198;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;elbo;The regularization view of the elbo in Eq . 6;Information - theoretic connection with vae .;resembles maximum - entropy discrimination;RELATED WORK;11
76352;303fef411f235e6d1125a40af1e93224f498a4d5;Method;0;low - rank language models;Empirically , our high - rank language model outperforms conventional low - rank language models on several benchmarks , as shown in Section [ reference ] .;For example , semantic meanings might not be those bases since a few hundred meanings may not be enough to cover everyday meanings , not to mention niche meanings in specialized domains .;We also provide evidences in Section [ reference ] to support our hypothesis that learning a high - rank language model is important .;Hypothesis : Natural Language is High - Rank;9
69421;2cb8497f9214735ffd1bd57db645794459b8ff41;Method;1;recurrent neural network architectures;These models draw on recent developments for incorporating attention mechanisms into recurrent neural network architectures .;We demonstrate the efficacy of our new corpora by building novel deep learning models for reading comprehension .;This allows a model to focus on the aspects of a document that it believes will help it answer a question , and also allows us to visualises its inference process .;Introduction;1
86837;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;bi - LSTMs;We set the dimensionality of the embeddings and the hidden states in the bi - LSTMs to 100 .;;At each training epoch , we trained our model in the order of POS tagging , chunking , dependency parsing , semantic relatedness , and textual entailment .;Training Details;19
76899;309acdd149f5f0ea12acb103b36bb59e6e631671;Method;1;unimodal probabilistic PCA model;Our baseline method using a single unimodal probabilistic PCA model outperforms almost every method in most action types , with the exception of Sanzari , which it still outperforms on average across the entire dataset .;Table [ reference ] shows a comparison between our approach and competing approaches using Protocol # 1 .;The mixture model improves on this again , offering a mm improvement over Sanzari , our closest competitor .;Human3.6 M dataset :;15
94985;3e79a574d776c46bbe6d34f41b1e83b5d0f698f2;Method;1;Bi - directional RNN - CRF structures;Bi - directional RNN - CRF structures , and in particular BiLSTM - CRFs , have achieved the state of the art in the literature for sequence labelling tasks , including POS - tagging and NER .;;We compare S - LSTM - CRF with BiLSTM - CRF for sequence labelling , using the same settings as decided on the movie review development experiments for both BiLSTMs and S - LSTMs .;Final Results for Sequence Labelling;12
82174;35502af359aa60ae8047df172e29503cfb29c3f9;Method;1;von Mises Fisher;Since the Epanechnikov profile is not differentiable at the boundary , we use the squared exponential kernel adapted to vectors on the sphere : which can be viewed as a natural extension of the Gaussian to spherical data ( known as the von Mises Fisher ( vMF ) distribution ) .;The gradient of can be elegantly computed as the difference between and the mean of all data points with , hence the name “ mean - shift ” for performing gradient ascent .;In our experiments we set the bandwidth based on the margin so that .;Details of Recurrent Mean Shift Grouping;24
80856;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;0;hierarchical convolution;Typical convolutional neural networks model context information through hierarchical convolution and pooling .;Visual context is an important component to assist visual - related tasks , such as object recognition and object detection .;For person ReID task , the most important visual cues are visual attribute knowledge , such as clothes color and types .;Multi - scale Context - aware Network;4
8678;0523e14247d74c4505cd5e32e1f0495f291ec432;Method;1;Student - T Mixture model;We also report the log - likelihood results of two mixture models : the GMM and the Student - T Mixture model , from [ 2 ] .;The first method is the RNADE model , a new deep density estimation technique which is an extension of the NADE model for real valued data [ 16 , 15 ] . EoRNADE , which stands for ensemble of RNADE models , is currently the state of the art .;Overall we see that the Deep GMM has a strong performance .;Title;0
50820;1e7678467b1807777dcd9be557b79328ce9419a8;Method;0;evo - toy;evo - toy shows the evaluation of the accuracy with the iterations in both of these cases , averaged across 100 runs .;fig :;It is clear that pairing the data - augmented pairs in one batch accelerates the convergence of this model .;Data - augmented batches : toy model;36
48620;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;0;end - to - end 3D detector;Promising directions of future work include combining the 2D detector and the PointFusion network into a single end - to - end 3D detector , as well as extending our model with a temporal component to perform joint detection and tracking in video and point cloud streams .;We show that with the same architecture and hyper - parameters , our method is able to perform on par or better than methods that hold dataset and sensor - specific assumptions on two drastically different datasets .;;Conclusions and Future Work;24
62136;2742a33946e20dd33140b8d6e80d5fd04fced1b2;Method;1;Local Geometric Descriptors;3DMatch : Learning Local Geometric Descriptors from RGB - D Reconstructions;section :;;Learning Local Geometric Descriptors from RGB - D Reconstructions;0
13652;07c4fc48ad7b7d1a417b0bb72d0ae2d4efc5aa83;Method;0;left - padded convolutions;The model in ( Extended Neural GPU ) used a recurrent stack of gated convolutional layers , while the model in ( ByteNet ) did away with recursion and used left - padded convolutions in the decoder .;Fully convolutional neural machine translation without this bottleneck was first achieved in and .;This idea , introduced in WaveNet , significantly improves efficiency of the model .;Related Work;10
39286;15e81c8d1c21f9e928c72721ac46d458f3341454;Method;0;computationally intensive neural network;This process is not parallelizable , and , in the case of neural MT models , it is particularly slow because a computationally intensive neural network is used to generate each token .;Both model families use autoregressive decoders that operate one step at a time : they generate each token conditioned on the sequence of tokens previously generated .;While several recently proposed models avoid recurrence at train time by leveraging convolutions kalchbrenner2016neural , gehring2017convolutional , kaiser2017depthwise or self - attention vaswani2017attention as more - parallelizable alternatives to recurrent neural networks ( RNNs ) , use of autoregressive decoding makes it impossible to take full advantage of parallelism during inference .;Introduction;1
29325;0fa88943665de1176b0fc6de4ed7469b40cdb08c;Method;0;heuristic repelling regularizer;This is similar to the heuristic repelling regularizer in [ reference ] and the batch normalization based regularizer in [ reference ] , but is derived in a more principled way .;As it is discussed in Section 3 , the kernel provides a repulsive force to produce an amount of variability required for generating samples from p ( x ) .;We take the bandwidth to be h;Model Setup;7
74671;2f0c30d6970da9ee9cf957350d9fa1025a1becb4;Method;0;repetitive modules;Secondly , Aligned - Inception - ResNet consists of repetitive modules , whose design is simpler than the original Inception - ResNet architectures .;Firstly , Aligned - Inception - ResNet does not have the feature alignment problem , by proper padding in convolutional and pooling layers .;The Aligned - Inception - ResNet model is pre - trained on ImageNet - 1 K classification .;Details of Aligned - Inception - ResNet;15
89958;3a28fe49e7a856ddd60d134696a891ed7bca5962;Method;1;GT;We unroll the Conv - LSTM to 5 time steps in consideration of the memory limitation and train the network with GT of each sampled frames .;We try to incrementally stack Conv - LSTM layers to the network , however , due to difficulties in training multiple RNNs , experiments show that stacking two or more Conv - LSTMs is not beneficial .;Fig .;Multi - frame Feature Aggregation;15
48037;1cf6bc0866226c1f8e282463adc8b75d92fba9bb;Method;0;image captioning models;Several early papers about VQA directly adapt the image captioning models to solve the VQA problem by generating the answer using a recurrent LSTM network conditioned on the CNN output .;VQA is related to image captioning .;But these models ’ performance is still limited .;Related work;2
67169;2aec8d465e9a74c27f956ed1136f3e8a3ba0a833;Method;1;MemNet;We also use two CNN - based denoising methods , i.e. , RED30 and MemNet , for further comparison .;Tables [ reference ] and [ reference ] report the PSNR results on BSD68 and Set12 datasets , respectively .;Their PSNR results on BSD68 dataset with noise level 50 are 26.34dB and 26.35dB , respectively .;Experiments on AWGN Removal;15
86114;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;multiview CNN;[ reference ] , multiview CNN ( MultiView ) [ reference ] , DISCO [ reference ] , Hand3D [ reference ] , DeepHand [ reference ] , lie - x group based method ( Lie - X ) [ reference ] , improved DeepPrior ( DeepPrior ++ );[ reference ] , local surface normals ( LSN );[ reference ] , region ensemble network ( REN - 4×6×6 [ reference ] , REN - 9×6×6 [ reference ] ) , CrossingNets [ reference ] , pose - guided REN ( Pose - REN );Comparison with state - of - the - art methods;16
14997;0899bb0f3d5425c88b358638bb8556729720c8db;Method;na;descriptionplural;MultilayerPerceptron , first= MLP ( MLP ), plural = MLPs , descriptionplural =;MLP , description =;MultilayerPerceptrons , firstplural= MLP ( MLP );Implicit 3D Orientation Learning for 6D Object Detection from RGB Images;0
24989;0d467adaf936b112f570970c5210bdb3c626a717;Method;1;FlowNet2 family;One can observe that the FlowNet2 family outperforms the best and fastest existing methods by large margins .;Endpoint error vs. runtime evaluations for Sintel are provided in Figure [ reference ] .;Depending on the type of application , a FlowNet2 variant between 8 to 140 frames per second can be used .;Speed and Performance on Public Benchmarks;11
22945;0c9ae806059196007938f24d0327a4237ed6adf5;Method;1;Adam optimiser;We use the Adam optimiser with learning rate .;;For IIC , the main and auxiliary heads are trained by maximising e :;Training .;19
105527;450e9676991b91e6b5eba3f77ac95dd0d3d6b655;Method;0;deep network architectures;This process is similar to that of the original deep network architectures such as VGG and ResNet .;That is , the dimension slowly increases in input - side layers and sharply increases in output - side layers .;The visual illustrations of additive and multiplicative PyramidNets are shown in Figure [ reference ] .;Feature Map Dimension Configuration;3
13200;074b6fe0cc6848fb86a6703d1c52074494177c79;Method;1;source model;We can begin by simply learning a source model that can perform the task on the source data .;The goal is to learn a model that can correctly predict the label for the target data .;For - way classification with a cross - entropy loss , this corresponds to where denotes the softmax function .;Cycle - Consistent Adversarial Domain Adaption;3
50632;1e7678467b1807777dcd9be557b79328ce9419a8;Method;1;uniform batch sampling;"With uniform batch sampling , one epoch corresponds to two passes over the training set ; with RA and , one epoch corresponds to of the images of the training set .";The batch size is set to and an epoch is defined as a fixed number of iterations .;All classification baselines are trained using this longer schedule for a fair comparison .;Base architecture and training settings .;22
27887;0f0a25d3be0d50a134f6f68e6a82bd8a2f668882;Method;1;pre - trained models;Our code and pre - trained models can be accessed at .;In future , we plan to explore unsupervised network learning approaches .;;Conclusion;16
71280;2d83dbf4c8eabc6bdef3326c4a30d5f33ffc944e;Method;1;question - only model;Since the question - only model ( 50.39 % ) achieves a competitive result to the joint model ( 57.75 % ) , while the image - only model gets a poor accuracy ( 28.13 % ) ( see Table 2 in ) .;The empirical results seem to support this idea .;Eventually , we chose model ( b ) as the best performance and relative simplicity .;Alternative Models;18
22607;0c769c19d894e0dbd6eb314781dc1db3c626df57;Method;1;IDNet;Moreover , by discarding the pedestrian proposal network in our framework and training the remaining net to classify identities with Softmax loss from cropped pedestrian images , we get another baseline re - i d method ( IDNet ) .;Each feature representation is used in conjunction with a specific distance metric , including Euclidean , Cosine similarity , KISSME , and XQDA , where KISSME and XQDA are trained on our dataset .;This training scheme has been exploited in to learn discriminative re - i d feature representations .;Experiment Settings;10
21490;0be9ca65ad318ee3729928882ef2c403d4b6d24e;Method;0;Recurrent Highway Network;Let represent an abstract function of an RNN , which might be the Elman network , the Long Short - Term Memory ( LSTM ) , the Recurrent Highway Network ( RHN ) , or any other RNN variant .;We define at timestep as a zero vector : .;In this research , we stack three LSTM layers based on merityRegOpt because they achieved high performance .;RNN Language Model;2
77895;31e5dab321066712cdc8b30943f7950066840ee1;Method;1;linearized and anonymized AMR graph;Following konstas2017neural , the input sequence is a linearized and anonymized AMR graph .;;Linearization is used to convert the graph into a sequence : The depth - first traversal of the graph defines the indexing between nodes and tokens in the sequence .;Sequential AMRs;5
4015;0217fb2a54a4f324ddf82babc6ec6692a3f6194f;Method;0;adversarial discriminator network D;This generator is trained by playing against an adversarial discriminator network D that aims to distinguish between samples from the true data distribution P data and the generator 's distribution P G .;∼ P noise ( z ) into a sample G ( z ) .;So for a given generator , the optimal discriminator is D ( x ) =;Background : Generative Adversarial Networks;4
80165;34273979fd2a62fd7b49ee6d14a925864ff94e74;Method;0;Inference Machines;ross2011learning proposes Inference Machines which ditch the belief propagation algorithm altogether and instead train a series of regressors to output the correct marginals by passing messages on a graph .;Our work differs by not being rooted in loopy BP , and instead learning all parts of a general message passing algorithm .;wei2016convolutional applies this idea to pose estimation using a series of convolutional layers and deng2016structure;Related work;14
73342;2e57bccb74bcb46cbc5b4225b62679023ed1f9da;Method;0;window - based memory;[ reference ] use attention over window - based memory , which encodes a window of words around entity candidates , by leveraging an endto - end memory network;Hill et al .;[ reference ] . Meanwhile , given the same entity candidate can appear multiple times in a passage , Kadlec et al .;RELATED WORK;3
33780;12db83e66e50152e170d5009c425c925ad2e2c2a;Method;0;negation detection;State - of - the - art systems for RTE so far relied heavily on engineered NLP pipelines , extensive manual creation of features , as well as various external resources and specialized subcomponents such as negation detection ( e.g. [ reference ][ reference ][ reference ][ reference ] . Despite the success of neural networks for paraphrase detection ( e.g. [ reference ][ reference ][ reference ] , end - to - end differentiable neural architectures failed to get close to acceptable performance for RTE due to the lack of large high - quality datasets .;This task is important since many natural language processing ( NLP ) problems , such as information extraction , relation extraction , text summarization or machine translation , rely on it explicitly or implicitly and could benefit from more accurate RTE systems [ reference ] .;An end - to - end differentiable solution to RTE is desirable , since it avoids specific assumptions about the underlying language .;INTRODUCTION;2
79414;33998aff64ce51df8dee45989cdca4b6b1329ec4;Method;1;graph convolutional models;We also directly compare our model against GCNs [ reference ] , as well as graph convolutional models utilising higher - order Chebyshev filters [ reference ] , and the MoNet model presented in [ reference ] .;[ reference ] , the iterative classification algorithm ( ICA ) [ reference ] and Planetoid [ reference ] .;Inductive learning For the inductive learning task , we compare against the four different supervised GraphSAGE inductive methods presented in [ reference ] . These provide a variety of approaches to aggregating features within a sampled neighborhood : GraphSAGE - GCN ( which extends a graph convolution - style operation to the inductive setting ) , GraphSAGE - mean ( taking the elementwise mean value of feature vectors ) , GraphSAGE - LSTM ( aggregating by feeding the neighborhood features into an LSTM ) and GraphSAGE - pool ( taking the elementwise maximization operation of feature vectors transformed by a shared nonlinear multilayer perceptron ) .;STATE - OF - THE - ART METHODS;8
22972;0c9ae806059196007938f24d0327a4237ed6adf5;Method;1;iid_imgclus_semisup;This explicitly validates the quality of our unsupervised learning method , as we beat even the supervised state - of - the - art ( t : iid_imgclus_semisup ) .;For semi - supervised learning , we establish a new state - of - the - art on STL10 out of all reported methods by finetuning a network trained in an entirely unsupervised fashion with the IIC objective ( recall labels in semi - supervised overclustering are used for evaluation and do not influence the network parameters ) .;Given that the bulk of parameters within semi - supervised overclustering are trained unsupervised ( i.e. all network parameters ) , it is unsurprising that f : imgclus_variation shows a 90 % drop in the number of available labels for STL10 ( decreasing the amount of labelled data available from 5000 to 500 over 10 classes ) barely impacts performance , costing just 10 % drop in accuracy .;Semi - supervised learning analysis .;22
65939;2a94c84383ee3de5e6211d43d16e7de387f68878;Method;0;py - faster - rcnn 3;Our code is a reimplementation of py - faster - rcnn 3 using Caffe2 .;[ reference ];[ reference ];Experiments on Object Detection;8
90969;3a8d537bcec370d37990d39eab01c729496ad057;Method;0;spherical Gaussians;Note that and are also spherical Gaussians and contains the generative parameters .;The conditional distribution over each local latent code ( ) is defined as follows : where the first local code is simply :;The ( occupancy ) probability for one voxel can then be calculated by ,;Model Objective : Variational + Latent Loss;5
102446;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;1;downsamplers;Existing literatures have considered two types of downsamplers , including direct downsampler and bicubic downsampler .;;In this paper , we consider the bicubic downsampler since when k is delta kernel and the noise level is zero , Eqn .;Downsampler .;7
35194;13ea9a2ed134a9e238d33024fba34d3dd6a010e0;Method;0;deeper architecture;When using CaffeNet as the backbone , we directly replace the original FC7 layer with the Eigenlayer , in case that one might argue that the performance gain is brought by deeper architecture .;We mainly use two networks pre - trained on ImageNet as backbones , , CaffeNet and ResNet - 50 .;When using ResNet - 50 as the backbone , we have to insert the Eigenlayer before the last FC layer because ResNet has no hidden FC layer and the influence of adding a layer into a 50 - layer architecture can be neglected .;Datasets and Settings;8
106558;45b559e6271570598602fcf9777ed6f2f2d133e6;Method;1;deepest configuration;The deepest configuration , WDX , has 14 weight layers : 10 convolutional and 4 fully connected .;Table [ reference ] shows the configurations of the deep CNNs .;As in , we omit the Rectified Linear Unit ( ReLU ) layers following every convolutional and fully connected layer .;Very Deep Convolutional Networks;3
21461;0be9ca65ad318ee3729928882ef2c403d4b6d24e;Method;1;stacked RNNs;In this study , we propose Direct Output Connection ( DOC ) as a generalization of MoS. For stacked RNNs , DOC computes the probability distributions from the middle layers including input embeddings .;/ abs - 1711 - 03953 proposed Mixture of Softmaxes ( MoS ) , which increases the rank of the matrix by combining multiple probability distributions computed from the encoded fixed - length vector .;In addition to raising the rank , the proposed method helps weaken the vanishing gradient problem in backpropagation because DOC provides a shortcut connection to the output .;Introduction;1
46990;1bea6bbdb4aed87fff5390d42934a1d9b0a7bec4;Method;0;Google NLP pipeline;The text has been run through a Google NLP pipeline .;A news article is usually associated with a few ( e.g. , 3–5 ) bullet points and each of them highlights one aspect of its content .;It it tokenized , lowercased , and named entity recognition and coreference resolution have been run .;The Reading Comprehension Task;2
76150;303065c44cf847849d04da16b8b1d9a120cef73a;Method;1;camera model operation;[ reference ] , the cost function is expressed as Note that the texture reconstruction and landmarks constraint terms of this cost function are non - linear due to the camera model operation .;By introducing the additive incremental updates on the parameters of Eq .;We need to linearize them around using first order Taylor series expansion at .;Gauss - Newton Optimization;8
448;0012de6bec1f25599e4f02517637e531a71909b9;Method;0;level - sets;Post - processing approaches such as connected components analysis normally yield no improvement and therefore , more recent works , propose to use the network predictions in combination with Markov random fields , voting strategies or more traditional approaches such as level - sets .;Such segmentations are obtained by only considering local context and therefore are prone to failure , especially in challenging modalities such as ultrasound , where a high number of mis - classified voxel are to be expected .;Patch - wise approaches also suffer from efficiency issues .;Introduction and Related Work;1
75136;2f95ba08a8f5a97d1a767f3a2490c686ee8f762d;Method;1;Adam gradient descent;We train for 100 epochs using Adam gradient descent with a fixed learning rate of on mini - batches of 64 examples .;In all experiments , the dimensionality of the hidden layers in the MTGAE architecture is fixed at - 256 - 128 - 256 - .;We implement the MTGAE architecture using Keras on top of the GPU - enabled TensorFlow backend .;Empirical Evaluation;2
15190;0899bb0f3d5425c88b358638bb8556729720c8db;Method;0;6D localization algorithms;The SIXD challenge is an attempt to make fair comparisons between 6D localization algorithms by prohibiting the use of test scene pixels .;Symmetric object views are often individually treated or ignored .;We follow these strict evaluation guidelines , but treat the harder problem of 6D detection where it is unknown which of the considered objects are present in the scene .;Test Conditions;24
59558;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;0;matrix square root normalization of global covariance pooling;Very recent works have demonstrated that matrix square root normalization of global covariance pooling plays a key role in achieving state - of - the - art performance in both large - scale visual recognition and challenging FGVC .;One particular kind of structured layer is concerned with global covariance pooling after the last convolution layer , which has shown impressive improvement over the classical first - order pooling , successfully used in FGVC , visual question answering and video action recognition .;CUDA support Scalability to multi - GPUs Large - scale ( LS ) or Small - scale ( SS ) EIG algorithm BP of EIG limited G DeNet SVD algorithm BP of SVD limited Improved B - CNN Newton - Schulz Iter .;Introduction;1
15778;0985497d1de3ffd11713e75289cc2ad55836623d;Method;0;answer representation;Since answer candidates usually have similar boundary words , if we compute the answer representation based on the boundary probabilities , it ’s difficult to model the real difference among different answer candidates .;We can see that the boundary and content probabilities capture different aspects of the answer .;On the contrary , with the content probabilities , we pay more attention to the content part of the answer , which can provide more distinguishable information for verifying the correct answer .;Necessity of the Content Model;18
98416;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;0;distribution perspective;Second , from the distribution perspective , synthetic and real data suffers a considerable distribution mismatch , which makes the model biased to synthetic domain .;First , from the perspective of representation , since the model is trained on synthetic images , the convolutional filters tend to overfit to synthetic style images , making them incompetent to extract informative features for real images .;To overcome such problems , we propose R eality;Introduction;1
94208;3dd2f70f48588e9bb89f1e5eec7f0d8750dd920a;Method;1;piecewise training;Multi - task training ( forth column per group ) improves mAP over piecewise training ( third column per group ) .;1 ( i.e. , setting Table 6 .;λ = 0 ) .;Does multi - task training help ?;20
65640;2a69ddbafb23c63e5e22401664bea229daaeb7d6;Method;1;Res2Net based method;The Res2Net based method outperforms its counterparts by 1.7 on AP and 2.4 on AP@IoU=0.5 .;The performance of instance segmentation on MS COCO dataset is shown in Table [ reference ] .;The performance gains on objects with different sizes are also demonstrated .;Instance Segmentation;25
68445;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;1;deeper nets;Next we describe our deeper nets for ImageNet .;Deeper Bottleneck Architectures .;Because of concerns on the training time that we can afford , we modify the building block as a bottleneck design [ reference ] .;Experiments 4.1 . ImageNet Classification;10
104284;4402c6c8445f17f4161e0f64573b7e28df1ca180;Method;1;DEEP LEARNING;section : III . DEEP LEARNING FOR CTR ESTIMATION;In this paper , we demonstrate the way our PNN models learn local dependencies and high - order feature interactions .;We take CTR estimation in online advertising [ reference ] as a working example to formulate our model and explore the performance on various metrics .;III . DEEP LEARNING FOR CTR ESTIMATION;4
48590;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;1;DSS;Deep Sliding Shapes ( DSS ) generates 3D regions using a proposal network and then processes them using a 3D convolutional network , which is prohibitively slow .;Comparison with other methods We compare our model with three approaches from the current state of the art .;Our model outperforms DSS by 3 % mAP while being 15 times faster .;Evaluation on SUN - RGBD;23
9630;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;0;D - dimensional template descriptor;The final D - dimensional template descriptor is obtained by reducing dimensionality using a fully - connected layer , followed by batch normalization ( BN ) and L2 - normalization .;The descriptors are aggregated into a single fixed - length vector using the GhostVLAD layer .;( 1 ) Take any number of images as input , and output a fixed - length template descriptor to represent the input image set .;Set - based face recognition;4
73509;2e57bccb74bcb46cbc5b4225b62679023ed1f9da;Method;1;bidirectional - GRU encoders;We use bidirectional - GRU encoders to extract the query representation M quer and the passage representation M doc , given a query and a passage .;Memory :;We compute the similarity matrix Table 4 : Small and large random graph in the Graph Reachability dataset .;SQuAD Dataset;9
75403;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;0;unit normalized average encodings;Similarly , train a linear SVM for , using the unit normalized average encodings for media in as positive features and a large feature set as negatives .;The large negative set contains one feature encoding for many subject identities , so this set is very likely to be disjoint with the probe template .;Finally , let be notation for evaluating the SVM functional margin ( e.g. ) trained on , and evaluated using the unit normalized average media encoding in template .;Template Adaptation;3
65904;2a94c84383ee3de5e6211d43d16e7de387f68878;Method;0;common head classifier;This advantage is analogous to that of using a featurized image pyramid , where a common head classifier can be applied to features computed at any image scale .;The good performance of sharing parameters indicates that all levels of our pyramid share similar semantic levels .;With the above adaptations , RPN can be naturally trained and tested with our FPN , in the same fashion as in [ reference ] .;Feature Pyramid Networks for RPN;6
59705;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;Matrix Square Root;subsection : Matrix Square Root and Forward Propagation;As the output of our meta - layer is a symmetric matrix , we concatenate its upper triangular entries forming an - dimensional vector , submitted to the subsequent layer of the ConvNet .;Square roots of matrices , particularly covariance matrices which are symmetric positive ( semi ) definite ( SPD ) , find applications in a variety of fields including computer vision , medical imaging and chemical physics .;Matrix Square Root and Forward Propagation;5
7735;051b3763c2ad4e4271db712b0e9a4cfe298d05db;Method;0;Ranjan;A compact network termed SPyNet from Ranjan is inspired from spatial pyramid .;Our model uses a more efficient architecture containing 30 times fewer parameters than FlowNet2 while the performance is on par with it .;Nevertheless , the accuracy is far below FlowNet2 .;Related Work;2
68687;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;1;sibling 1×1 convolutional layers;This RPN ends with two sibling 1×1 convolutional layers for binary classification ( cls ) and box regression ( reg ) , as in [ reference ] .;Unlike the way in [ reference ] that is category - agnostic , our RPN for localization is designed in a per - class form .;"The cls and reg layers are both in a per - class from , in contrast to [ reference ] . Specifically , the cls layer has a 1000 - d output , and each dimension is binary logistic regression for predicting being or not being an object class ; the reg layer has a 1000×4 - d output consisting of box regressors for 1000 classes .";C. ImageNet Localization;20
95773;3f3a483402a3a2b800cf2c86506a37f6ef1a5332;Method;1;Non - linear Mapping;Non - linear Mapping .;The latter three are intersections over union / minimum / maximum of the two detection boxes , respectively , and .;We augment the feature representation by appending quadratic and exponential terms .;Pairwise Probabilities;6
62791;2788a2461ed0067e2f7aaa63c449a24a237ec341;Method;1;Fast - RCNN detector;Experiment is conducted based on the Fast - RCNN detector .;;The model is initialized by the ImageNet classification models , and then fine - tuned on the object detection data .;Experiment Settings;14
59861;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;improved B - CNN;We observe that , in any case , the forward backward time taken by single meta - layer of improved B - CNN is significant as GPU unfriendly SVD or EIG can not be avoided , even though the forward computation is very efficient when NS iteration is used .;The authors of improved B - CNN also proposed two other implementations , i.e. , FP by NS iteration plus BP by SVD and FP by SVD plus BP by Lyapunov ( Lyap . ) , which take 15.31 ( 2.09 ) and 12.21 ( 11.19 ) , respectively .;Tab .;Evaluation with AlexNet on ImageNet;14
77493;31ae4873da19b1e28eca8787a17f49bba08627e5;Method;0;RoI - pooling layer;The RoI - pooling layer crops and resizes to generate a fixed size feature vector for each object proposal .;Given the feature map , the RoI - pooling layer is used to project the object proposals onto the feature space .;These feature vectors are then passed through fully connected layers .;Overview of Fast - RCNN;5
90965;3a8d537bcec370d37990d39eab01c729496ad057;Method;1;re - parameterization trick;To deal with the stochasticity of the latent variables , which , in VAE models , are typically assumed to be Gaussian distributed , we use the re - parameterization trick in order to back - propagate through the operation of sampling the Gaussian variables .;The inference and generative parameters are then jointly trained by optimizing Equation [ reference ] using back - propagation and stochastic gradient ascent .;We refer the reader to for a much more detailed explanation .;Model Objective : Variational + Latent Loss;5
61959;26fe009b958e8728382d9d764bd7153632f0b869;Method;0;classifier setup;The overall supervised model uses these shortcut - stacked encoders to encode two input sentences into two vectors , and then we use a classifier over the vector combination to label the relationship between these two sentences as that of entailment , contradiction , or neural ( similar to the classifier setup of snli : emnlp2015 and conneau2017supervised ) .;It is basically a stacked ( multi - layered ) bidirectional LSTM - RNN with shortcut connections ( feeding all previous layers ’ outputs and word embeddings to each layer ) and word embedding fine - tuning .;Our simple shortcut - stacked encoders achieve strong improvements over existing encoders due to its multi - layered and shortcut - connected properties , on both matched and mismatched evaluation settings for multi - domain natural language inference , as well as on the original SNLI dataset .;Introduction and Background;1
30537;100c730003033151c0f78ed1aab23df3e9bd5283;Method;0;question and answer representations;The Neural Answer Selection Model ( Figure [ reference ] ) is a supervised model that learns the question and answer representations and predicts their relatedness .;This is a classification task where we treat each training data point as a triple while predicting for the unlabelled question - answer pair .;It employs two different LSTMs to embed raw question inputs and answer inputs .;Neural Answer Selection Model;4
89698;3a28fe49e7a856ddd60d134696a891ed7bca5962;Method;0;causal modeling idea;On the other hand , according to the causal modeling idea proposed by , if one wonders whether there is a bias in bounding - box based annotations , he must figure out corresponding counterfactual : would the performance still be identical or even improved what if we had NOT applied bounding - box based annotations ?;However , in our opinion , what impacts the performance of small - scale objects other than perceptive fields may reside in the very initial phase of machine learning pipeline , which is to say , the annotation phase .;Motivated by above insight and counterfactual argument , we aim to address the scale variation problem with an alternative annotation , by simply locating the somatic topological line of each pedestrian as illustrated in Fig .;Introduction;1
15143;0899bb0f3d5425c88b358638bb8556729720c8db;Method;1;Phong model;Therefore , while keeping reconstruction targets clean , we randomly apply additional augmentations to the input training views : ( 1 ) rendering with random light positions and randomized diffuse and specular reflection ( simple Phong model in OpenGL ) , ( 2 ) inserting random background images from the Pascal VOC dataset , ( 3 ) varying image contrast , brightness , Gaussian blur and color distortions , ( 4 ) applying occlusions using random object masks or black squares .;The goal is that the trained encoder treats the differences to real camera images as just another irrelevant variation .;Fig .;Learning 3D Orientation from Synthetic Object Views;15
85312;3652c2d20f198dc39ad159eba55d08341c56d628;Method;0;normalized linear kernel;The second one is a composition of the Gaussian kernel — which is p.d.— , with feature maps of a normalized linear kernel in .;The first one is obviously p.d . : .;This composition is p.d .;Positive Definiteness of;29
21415;0be9ca65ad318ee3729928882ef2c403d4b6d24e;Method;1;High - Rank Language Model;document : Direct Output Connection for a High - Rank Language Model;bibliography : References;This paper proposes a state - of - the - art recurrent neural network ( RNN ) language model that combines probability distributions computed not only from a final RNN layer but also from middle layers .;Direct Output Connection for a High - Rank Language Model;0
15033;0899bb0f3d5425c88b358638bb8556729720c8db;Method;0;self - supervised way;Instead , it is trained to encode 3D model views in a self - supervised way , overcoming the need of a large pose - annotated dataset .;Finally , the AAE does not require any real pose - annotated training data .;A schematic overview of the approach is shown in Fig [ reference ] .;Introduction;1
20930;0b5aef2894d3248fb5ecc955d50501f0aa276036;Method;0;bimodal;Similarly to bimodal fusion ( sec : bimodal ) , after trimodal fusion we pass the fused features through to incorporate contextual information in them , which yields where , is scalar for , , , and is the context - aware trimodal feature vector .;So , we define the fused features as where , is scalar for and .;;Trimodal fusion;19
34532;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;1;addition;We also showed empirically that multiplicative gating is superior to addition and concatenation operations for implementing gated - attentions , though a theoretical justification remains part of future research goals .;Our model design is backed up by an ablation study showing statistically significant improvements of using Gated Attention as information filters .;Analysis of document and query attentions in intermediate layers of the reader further reveals that the model iteratively attends to different aspects of the query to arrive at the final answer .;Conclusion;15
74578;2f0c30d6970da9ee9cf957350d9fa1025a1becb4;Method;1;RPN network;The RPN network is trained separately as in the first stage of the procedure in .;To facilitate the ablation experiments on VOC , we follow and utilize pre - trained and fixed RPN proposals for the training of Faster R - CNN and R - FCN , without feature sharing between the region proposal and the object detection networks .;For COCO , joint training as in is performed and feature sharing is enabled for training .;Experiment Setup and Implementation;9
14350;07f3f736d90125cb2b04e7408782af411c67dd5a;Method;1;matching model;"The matching score of two short - texts are calculated with an MLP with the embedding of the two documents as input ; DeepMatch : We take the matching model in and train it on our datasets with 3 hidden layers and 1 , 000 hidden nodes in the first hidden layer ;";We first represent each short - text as the sum of the embedding of the words it contains .;uRAE + MLP :;Competitor Methods;19
33147;1109b663453e78a59e4f66446d71720ac58cec25;Method;0;ConvNet;For example Osadchy et al . describe a ConvNet for simultaneous face detection and pose estimation .;Several authors have also proposed to train ConvNets to directly predict the instantiation parameters of the objects to be located , such as the position relative to the viewing window , or the pose of the object .;Faces are represented by a 3D manifold in the nine - dimensional output space .;Introduction;1
98852;40eb1e54cb5382dfd3b7efd16dc7df826262ea52;Method;1;frustum proposal;our system for 3D object detection consists of three modules : frustum proposal , 3D instance segmentation , and 3D amodal bounding box estimation .;[ reference ] ,;We will introduce each module in the following subsections .;3D Detection with Frustum PointNets;6
66748;2aec8d465e9a74c27f956ed1136f3e8a3ba0a833;Method;0;cascade of shrinkage fields;With the aid of unrolled half quadratic splitting ( HQS ) techniques , Schmidt et al . proposed a cascade of shrinkage fields ( CSF ) framework to learn stage - wise inference parameters .;Generally speaking , the methods above only learn the prior parameters in a discriminative manner , while the inference parameters are stage - invariant .;Chen et al . further proposed a trainable nonlinear reaction diffusion ( TNRD ) model through discriminative learning of a compact gradient descent inference step .;MAP Inference Guided Discriminative Learning;3
29219;0fa88943665de1176b0fc6de4ed7469b40cdb08c;Method;0;Stein 's method;Stein variational gradient descent ( SVGD ) [ reference ] ) is a general purpose Bayesian inference algorithm motivated by Stein 's method [ reference ][ reference ] and kernelized Stein discrepancy;;[ reference ][ reference ] .;STEIN VARIATIONAL GRADIENT DESCENT ( SVGD );4
26518;0e8753f550350e53824358ca3f0f8cfd2f2dc2f7;Method;1;row and column permutation;After a row and column permutation , the original MovieLens 10 M matrix is partitioned in blocks , where is the matrix that we use for our experiments ( Figure [ reference ] ) .;The resulting matrix has observed values that correspond to the ratings that a user has given to a movie .;We treat as the users feature matrix , as the movies feature matrix and discard the remaining matrix .;Movielens dataset;9
57009;2329a46590b2036d508097143e65c1b77e571e8c;Method;1;speech systems;Deep Speech also handles challenging noisy environments better than widely used , state - of - the - art commercial speech systems .;Our system , called Deep Speech , outperforms previously published results on the widely studied Switchboard Hub5’00 , achieving 16.0 % error on the full test set .;;Deep Speech : Scaling up end - to - end speech recognition;0
40650;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;1;RNN LMs;We have experimented with both RNN LMs and LSTM LMs , and describe the details in the following two sections .;The N - best hypotheses are then rescored using a combination of the large N - gram LM and several neural net LMs .;;LM Rescoring and System Combination;9
37294;1518039b5001f1836565215eb047526b3ac7f462;Method;1;character bigram segmentation;We show performance gains over the baseline with both BPE segmentation , and a simple character bigram segmentation .;We introduce a variant of byte pair encoding for word segmentation , which is capable of encoding open vocabularies with a compact symbol vocabulary of variable - length subword units .;Our analysis shows that not only out - ofvocabulary words , but also rare in - vocabulary words are translated poorly by our baseline NMT system , and that reducing the vocabulary size of subword models can actually improve performance .;Conclusion;13
51373;1ea6b2f67a3a7f044209aae0d0fd1cb14a1e9e06;Method;1;dimensional RNNs;In this paper we advance two - dimensional RNNs and apply them to large - scale modeling of natural images .;.16;The resulting PixelRNNs are composed of up to twelve , fast two - dimensional Long Short - Term Memory ( LSTM ) layers .;Introduction;1
67969;2bb9f0768fac9622a0be446df69daf75a954d5ac;Method;0;greedy search process;The greedy search process applies rules in a manually defined order .;"op2 "" Korea "" ) , then determine if they are aligned by checking if name is the tail concept of country .";The results are mutually exclusive which means once a graph fragment is aligned by one rule , it can not be realigned .;JAMR Aligner .;4
13688;07c4fc48ad7b7d1a417b0bb72d0ae2d4efc5aa83;Method;1;ByteNet - like architecture;Our experimental results allow us to draw the following conclusions : Depthwise separable convolutions are strictly superior to regular convolutions in a ByteNet - like architecture , resulting in models that are more accurate while requiring fewer parameters and being computationally cheaper to train and run .;The parameter count ( and computation cost ) of the different types of convolution operations used was already presented in Table [ reference ] .;Using sub - separable convolutions with groups of size 16 instead of full depthwise separable convolutions results in a performance dip , which may indicate that higher separability ( i.e. groups as small as possible , tending to full depthwise separable convolutions ) is preferable in this setup , this further confirming the advantages of depthwise separable convolutions .;Experiments;11
87423;372bc106c61e7eb004835e85bbfee997409f176a;Method;0;joint embedding space learning;Our work is related to the prior works in multi - modal learning , including joint embedding space learning and multi - modal Boltzmann machines .;Our work extended GAN to dealing with joint distributions of images .;These approaches can be used for generating corresponding samples in different domains only when correspondence annotations are given during training .;Related Work;6
78513;325093f2c5b33d7507c10aa422e96aa5b10a33f1;Method;1;forward computation;We reconstruct necessary quantities for the backward pass by inverting the forward computation from the storage buffer , and manage to free up almost 50 % of the memory needed in conventional BN + Act implementations at little additional computational costs .;In this work we have presented InPlace - ABN , which is a novel , computationally efficient fusion of batch normalization and activation layers , targeting memory - optimization for modern deep neural networks during training time .;In contrast to state - of - the - art checkpointing attempts , our method is reconstructing discarded buffers backwards during the backward pass , thus allowing us to encapsulate BN + Act as self - contained layer , which is easy to implement and deploy in virtually all modern deep learning frameworks .;Conclusions;12
42536;1822ca8db58b0382b0c64f310840f0f875ea02c0;Method;0;embedding learning;This guarantees that the augmented dataset generally supports a better characterization of the class distributions during embedding learning .;In the first aspect , the fake images fill up the gaps between real data points and marginally expand the class borders in the feature space .;The second aspect , on the other hand , supports the usage of supervised learning [ reference ] , a different mechanism from [ reference ] which leverages unlabeled GAN images for regularization .;Camera - aware Image - Image Translation;6
84329;360cfa09b2f7c8e10b1831d899c5a51aefa1883e;Method;0;CTC - LSTM models;These features were also used in the original work on CTC - LSTM models for speech recognition .;The input features for these experiments were 123 dimensional FBANK features ( 40 + energy + + ) .;The CTC layer itself was trained on the 61 label set .;CTC setup;12
103884;435259c5f3cffd75ef837a8e638cc8f6244e25c4;Method;1;T1 -;Based on this initial automatic segmentation , manual editing was then performed by an experienced neuro - radiologist , to correct segmentation errors in both T1 - and T2 - weighted MR images .;Instead of starting from scratch , an initial automatic segmentation for 6 - month subjects was generated with the guidance from follow - up 24 - month scans with high tissue contrast , using the publicly - available iBEAT tool .;Geometric defects were also removed with the help of surface rendering , using ITK - SNAP .;Ground truth generation;12
25058;0d467adaf936b112f570970c5210bdb3c626a717;Method;0;stacked network architecture;The idea of the stacked network architecture is that the estimated flow field is gradually improved by every network in the stack .;;This improvement has been quantitatively shown in the paper .;Intermediate Results in Stacked Networks;26
50402;1e7678467b1807777dcd9be557b79328ce9419a8;Method;0;image storage platform;For instance , an image storage platform is likely to perform some classification of the input images , aside from detecting copies or instances of the same object .;In settings where billion of images have to be treated , it is of interest to obtain image embeddings suitable for more than one recognition task .;An embedding relevant to all these tasks advantageously reduces both the computing time per image and storage space .;Introduction;1
56584;231af7dc01a166cac3b5b01ca05778238f796e41;Method;1;second order differential equation;A second order differential equation describes the learning dynamics of Adam as an HBF system .;Then we described Adam stochastic optimization as a heavy ball with friction ( HBF ) dynamics , which shows that Adam converges and that Adam tends to find flat minima while avoiding small local minima .;Via this differential equation , the convergence of GANs trained with TTUR to a stationary local Nash equilibrium can be extended to Adam .;Conclusion;14
73981;2ebfc12285f5d426e0d0e8d2befa1af27f99a56e;Method;0;Euclidean distance function;In , the authors study the limiting average outward flux of the gradient of a Euclidean distance function to a 2D or 3D object boundary .;Traditional methods : Many early skeleton detection algorithms are based on gradient intensity maps .;The skeleton is associated with those locations where an energy principle is violated , where there is a net inward flux .;Related Work;2
46420;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;0;ES;The NES family of black - box optimization algorithms use parameterised probability distributions over the search space , instead of an explicit population ( i.e. , a conventional ES [ 27 ] ) .;"Therefore , we instead evolve the policy using a variant for Natural Evolution Strategies ( NES ; [ 25 , 26 ] ) , called Separable NES ( SNES ; [ 19 ] ) .";Typically , the distribution is a multivariate Gaussian parameterised by mean µ and covariance matrix;Title;0
74011;2ebfc12285f5d426e0d0e8d2befa1af27f99a56e;Method;0;output residual units;Ke et al . present a side - output residual network ( SRN ) , which leverages the output residual units to fit the errors between the ground - truth and the side - outputs .;This leads to improved skeleton localization , scale prediction , and better overall performance .;By cascading residual units in a deep - to - shallow manner , SRN can effectively detect the skeleton at different scales .;Related Work;2
67930;2bb9f0768fac9622a0be446df69daf75a954d5ac;Method;0;paragraph;paragraph : AMR Parsers .;section : Related Work;AMR parsing maps a natural language sentence into its AMR graph .;AMR Parsers .;3
91849;3b1d8eb163ffff598c2faa0d9d7cf933857a359f;Method;1;input encoding layer;To determine the overall inference relationship between the premise and the hypothesis , we need to explore a composition layer to compose the local inference vectors ( and ) collected above : Here , we also use BiLSTMs as building blocks for the composition layer , but the responsibility of BiLSTMs in the inference composition layer is completely different from that in the input encoding layer .;In this component , we introduce knowledge - enriched inference composition .;The BiLSTMs here read local inference vectors ( and ) and learn to judge the types of local inference relationship and distinguish crucial local inference vectors for overall sentence - level inference relationship .;Knowledge - Enhanced Inference Composition;8
1093;0095c269e7d0c990249312687fc43521019809c4;Method;1;end - to - end deep architecture;In this paper , we propose an end - to - end deep architecture to capture the strong interaction information of sentence pair .;;Experiments on two large scale text matching tasks demonstrate the efficacy of our proposed model and its superiority to competitor models .;Conclusion and Future Work;35
28572;0f810eb4777fd05317951ebaa7a3f5835ee84cf4;Method;0;dimensional feature map;In LFA the approximate action - value function is a linear combination of state - action features , where is an - dimensional feature map and is a parameter vector .;Function approximation methods achieve generalisation by approximating the value function by a parameterised functional form .;;Reinforcement Learning;3
15562;08d55271589f989d90a7edce3345f78f2468a7e0;Method;0;cascade score generation unit;There is no explicit supervision signals for the cascade score generation unit in training .;;So another problem arises : is it better to use human - defined scores instead of letting the network learn itself ?;Quality by QAN VS . quality by human;15
73377;2e57bccb74bcb46cbc5b4225b62679023ed1f9da;Method;0;last - word vector representation;Typically , the initial state s 1 is the last - word vector representation of query by an RNN .;The internal state is denoted as s which is a vector representation of the question state .;The t - th time step of the internal state is represented by s t .;REASONING NETWORKS;4
21510;0be9ca65ad318ee3729928882ef2c403d4b6d24e;Method;0;probability distributions;MoS computes multiple probability distributions from the hidden state of final RNN layer and regards the weighted average of the probability distributions as the final distribution .;/ abs - 1711 - 03953 proposed Mixture of Softmaxes ( MoS ) .;In this study , we propose Direct Output Connection ( DOC ) , which is a generalization method of MoS. DOC computes probability distributions from the middle layers in addition to the final layer .;Proposed Method : Direct Output Connection;4
9719;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;1;trained network;The classification layer is discarded after training and the trained network is used to extract a single fixed - length template representation for the input face images .;We use the one - versus - all logistic regression loss as empirically we found that it converges faster and outperforms cross - entropy loss .;Training with degraded images .;Network training;7
44393;1a2599e467e855f845dcbf9282f8bdbd97b85708;Method;1;concatenative and parametric baseline systems;We also compare to the original Tacotron that predicts linear spectrograms and uses Griffin - Lim to synthesize audio , as well as concatenative and parametric baseline systems , both of which have been used in production at Google .;In order to better isolate the effect of using mel spectrograms as features , we compare to a WaveNet conditioned on linguistic features with similar modifications to the WaveNet architecture as introduced above .;We find that the proposed system significantly outpeforms all other TTS systems , and results in an MOS comparable to that of the ground truth audio .;Evaluation;8
51409;1ea6b2f67a3a7f044209aae0d0fd1cb14a1e9e06;Method;1;conditioning scheme;Figure [ reference ] ( Left ) illustrates the conditioning scheme .;The generation proceeds row by row and pixel by pixel .;Each pixel is in turn jointly determined by three values , one for each of the color channels Red , Green and Blue ( RGB ) .;Generating an Image Pixel by Pixel;3
34349;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;0;attention - sum module;The contextual representations in GA readers , namely the embeddings of words in the document , are iteratively refined across hops until reaching a final attention - sum module kadlec2016text which maps the contextual representations in the last hop to a probability distribution over candidate answers .;Multi - hop architectures mimic the multi - step comprehension process of human readers , and have shown promising results in several recent models for text comprehension sordoni2016iterative , kumar2015ask , shen2016reasonet .;The attention mechanism has been introduced recently to model human focus , leading to significant improvement in machine translation and image captioning bahdanau2014neural , mnih2014recurrent .;Gated - Attention Reader;3
88432;3861ae2a6bdd2a759c2d901a6583e63a216bc2fc;Method;1;NVIDIA K80 GPUs;We trained all our networks on NVIDIA K80 GPUs with a batch containing roughly 25 , 000 source and target tokens .;Weights for word embeddings were tied to corresponding entries in the final softmax layer inan2016tying , press2016using .;;Training Details;6
41404;1713d05f9d5861cac4d5ec73151667cb03a42bfc;Method;0;machine translation model;The code learning model can not be jointly trained with the machine translation model as it takes far more iterations for the coding layer to converge to one - hot vectors .;As the reconstructed embeddings are not identical to the original embeddings , the model parameters other than the embedding matrix have to be retrained again .;;Code Learning with Gumbel - Softmax;4
62262;2742a33946e20dd33140b8d6e80d5fd04fced1b2;Method;0;reconstruction algorithms;Each of the reconstruction datasets are captured in different environments with different local geometries at varying scales and built with different reconstruction algorithms .;[ reference ] . 54 scenes are used for training and 8 scenes for testing .;;Learning From Reconstructions;4
2956;01959ef569f74c286956024866c1d107099199f7;Method;1;drag - and - drop interface;Subjects used a drag - and - drop interface to create the scenes .;( right ) .;Each object could be flipped horizontally and scaled .;APPENDIX VI : ABSTRACT SCENES DATASET;24
82941;3580d8a5e7584e98d547ebfed900749d347f6714;Method;0;Vanilla attention mechanism;[ reference ][ reference ][ reference ] . Vanilla attention mechanism is proposed to encode the semantic relevance between the encoder states { h t } L t=1 and and the decoder state s t at time;a t is the attention vector which is widely used in many applications;The attention vector is usually represented by the weighted sum of encoder hidden states .;Description Decoder with Dual Attention;6
46353;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;1;Evolution;Evolution has discovered efficient feedforward pathways for recognizing certain objects in the blink of an eye .;Once trained , the CNN never changes its weights or filters during evaluation .;However , an expert ornithologist , asked to classify a bird belonging to one of two very similar species , may have to think for more than a few milliseconds before answering [ 16 , 17 ] , implying that several feedforward evaluations are performed , where each evaluation tries to elicit different information from the image .;Title;0
73750;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;0;hard example mining algorithms;also use hard example mining algorithms to boost the performance of face detection .;In , the authors proposed an on - line hard example mining ( OHEM ) algorithm to improve the object detection performance .;;Related Work;2
47078;1bea6bbdb4aed87fff5390d42934a1d9b0a7bec4;Method;1;relabeling;We report both results ( with and without relabeling ) for future reference .;We attempt to relabel the entity markers based on their first occurrence in the passage and question and find that this step can make training converge faster as well bring slight gains .;All of our models are run on a single GPU ( GeForce GTX TITAN X ) , with roughly a runtime of 3 hours per epoch for CNN , and 12 hours per epoch for Daily Mail .;Training Details;9
40779;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;1;weight estimation;The weight estimation is done using an expectation - maximization algorithm based on aligning the reference words to the confusion networks , and maximizing the weighted probability of the correct word at each alignment position .;A second variant of the search procedure that can give lower error ( as measured on the devset ) estimates the best system weights for each incremental combination candidate .;To avoid overfitting , the weights for an - way combination are smoothed hierarchically , i.e. , interpolated with the weights from the - way system that preceded it .;System Combination;14
33783;12db83e66e50152e170d5009c425c925ad2e2c2a;Method;0;differentiable neural architectures;State - of - the - art systems for RTE so far relied heavily on engineered NLP pipelines , extensive manual creation of features , as well as various external resources and specialized subcomponents such as negation detection ( e.g. [ reference ][ reference ][ reference ][ reference ] . Despite the success of neural networks for paraphrase detection ( e.g. [ reference ][ reference ][ reference ] , end - to - end differentiable neural architectures failed to get close to acceptable performance for RTE due to the lack of large high - quality datasets .;This task is important since many natural language processing ( NLP ) problems , such as information extraction , relation extraction , text summarization or machine translation , rely on it explicitly or implicitly and could benefit from more accurate RTE systems [ reference ] .;An end - to - end differentiable solution to RTE is desirable , since it avoids specific assumptions about the underlying language .;INTRODUCTION;2
13781;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;0;ML attempts;Most of the previous ML attempts possess several limitations in each of these three previously mentioned aspects .;These ML approaches could be broken down into three aspects ( or phases ) : data collection ( sample size selected , duration of study and granularity of data ) , selection of variables for inference ( or combination of variables ) , and method of inference ( the details of the learning algorithm used ) .;Accordingly , these limitations could be arranged into three categories : data collection , variable selection , and methodrelated issues .;Title;0
2473;0171bdeb1c6e333287be655c667cfba5edb89b76;Method;1;multi - scale and / or multicrop testing;We note that many models ( including ours ) start to get saturated on this dataset after using multi - scale and / or multicrop testing .;ResNeXt is the foundation of our entries to the ILSVRC 2016 classification task , in which we achieved 2 nd place .;We had a single - model top - 1 / top - 5 error rates of 17.7% / 3.7 % using the multi - scale dense testing in [ reference ] , on par with Inception - ResNet - v2 's single - model results of 17.8% / 3.7 % that adopts multi - scale , multi - crop testing .;Experiments on ImageNet - 1 K;11
46462;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;0;SNES;Once all of the candidate policies have been evaluated , SNES updates its distribution parameters ( µ , Σ ) according the natural gradient calculated from the sampled fitness values , F .;︸︸ ︷ λL2∥θi∥2 ( 11 ) where λL2 is a regularization parameter .;As SNES repeatedly updates the distribution over the course of many generations , the expected fitness of the distribution improves , until the stopping criterion is met when no improvement is made for several consecutive epochs .;Title;0
100056;41d08fb733f3e50ac183490f84d6377dffccf350;Method;1;Point Set Prediction Network;section : Point Set Prediction Network;See Sec 4.4 .;The task of building a network for point set prediction is new .;Point Set Prediction Network;8
250;000f90380d768a85e2316225854fc377c079b5c4;Method;1;FRRN B;"We report the results of our FRRNs for two settings : FRRN A trained on quarter - resolution ( 256 × 512 ) Cityscapes images ; and FRRN B trained on half - resolution ( 512 × 1024 ) images .";Annotations for the test set remain private and comparison to other methods is performed via a dedicated evaluation server .;We then upsample our predictions using bilinear interpolation in order to report scores at the full image resolution of 1024 × 2048 pixels .;Experimental Evaluation;7
107377;46018a894d533813d67322827ca51f78aed6d59e;Method;1;feature hierarchies;In this work , we apply this approach to learn feature hierarchies adapted specifically to the task of brain tumor segmentation that combine information across MRI modalities .;Deep neural networks have been shown to excel at learning such feature hierarchies .;Specifically , we investigate several choices for training Convolutional Neural Networks ( CNNs ) , which are Deep Neural Networks ( DNNs ) adapted to image data .;Introduction;1
98245;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;convnet;We will use the term receptive field , abbreviated rf , to refer to the set of input pixels that are path - connected to a particular unit in the convnet .;We use the activations of each layer as features , referred to as convn , pooln , or fcn for the th convolutional , pooling , or fully connected layer , respectively .;;Preliminaries;6
39945;1672ffebacadf849188668f24bcd377a19ae4051;Method;0;MLP methods;The use of fixed - length vector will be a bottleneck , which brings difficulty for Embedding & MLP methods to capture user ’s diverse interests effectively from rich historical behaviors .;In this way , user features are compressed into a fixed - length representation vector , in regardless of what candidate ads are .;In this paper , we propose a novel model : Deep Interest Network ( DIN ) which tackles this challenge by designing a local activation unit to adaptively learn the representation of user interests from historical behaviors with respect to a certain ad .;Deep Interest Network for Click - Through Rate Prediction;0
13379;074b6fe0cc6848fb86a6703d1c52074494177c79;Method;1;learning rate;For feature space adaptation we use learning rate 1e - 5 and train for max 200 epochs over the data .;For training the source task net model , we use learning rate 1e - 4 and train for 100 epochs over the data with batch size 128 .;For pixel space adaptation we train our generators and discriminators with equal weighting on all losses , use batch size 100 , learning rate 2e - 4 ( default from CycleGAN ) , and trained for 50 epochs .;Hyperparameters .;14
25891;0dcde9f2c5149f0e4c806db7b4cc4915bed077da;Method;1;QR factorisation;In practice , it is efficient to avoid explicit calculation of the inverse in Equation ( [ reference ] ) and instead use QR factorisation to solve the following set of linear equations for the unknown variables in :;It is known to often be useful to regularise such problems , and instead solve the following ridge regression problem : where is a hyper - parameter and is the identity matrix .;Above we mentioned two algorithms , and Algorithm 2 is simply to form and solve Eqn .;Stage 2 Training : Classifier Weights;6
79834;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;GCNN layer;However among the two , we prefer increasing the depth of GCNN model because the first choice leads to increase in the breadth of the GCNN layer ( see footnote [ reference ] about in Section [ reference ] ) and based on the current understanding of deep learning theory , increasing the depth is favored more over the breadth .;Both these choices increases model complexity and thus would require more data samples to reach satisfying results .;For cases where graph node features are missing , it is a common practice to take node degree as a node feature .;;11
25279;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;Image Cascade Network;section : Image Cascade Network;We note when a good - accuracy fast image semantic - segmentation framework comes into existence , video segmentation will also be benefited .;We start by analyzing computation time budget of different components on the high performance segmentation framework PSPNet with experimental statistics .;Image Cascade Network;8
63700;27e4b65121d3c88643d86dc91a9bdafdf223b988;Method;1;one - hot representations;For continuous features such as TF and IDF , we convert them into categorical values by discretizing them into a fixed number of bins , and use one - hot representations to indicate the bin number they fall into .;We therefore create additional look - up based embedding matrices for the vocabulary of each tag - type , similar to the embeddings for words .;This allows us to map them into an embeddings matrix like any other tag - type .;Capturing Keywords using Feature - rich Encoder;4
61078;25c108a56e4cb757b62911639a40e9caf07f1b4f;Method;0;fitting model;Finally the best fitting model with a specific is used .;Here denotes selected scale numbers of six main scales from to and the scale selection is determined by the threshold of each component .;;Scale - forecast Network;4
77273;3112d2d95d66b3d54a72c55072647aab937e410e;Method;1;Conditional Copy model;Within the Conditional Copy model we compute p ( z;For the Joint Copy model , one attention distribution is not normalized , and is normalized along with all the output - word probabilities .;t |ŷ 1:t−1 , s );B. Generation Model Details;17
85221;3652c2d20f198dc39ad159eba55d08341c56d628;Method;1;finite - dimensional approximations;Given two images feature maps and , we start by approximating by replacing and by their finite - dimensional approximations provided by ( B ) : Then , we use the finite - dimensional approximation of the Gaussian kernel involving and where is defined in ( [ reference ] ) and is defined similarly by replacing by .;This is sufficient for our purpose since we have previously assumed ( B ) for the zeroth layer .;Finally , we approximate the remaining Gaussian kernel by uniform sampling on , following Section [ reference ] .;Approximation principles .;20
92763;3c78c6df5eb1695b6a399e346dde880af27d1016;Method;1;TF - IDF ranking;"We find both TF - IDF ranking and the sum objective to be effective ; even without changing the model we achieve state - of - the - art results .";As shown in Table [ reference ] , our implementation of this approach outperforms the results reported by triviaqa significantly , likely because we are not subsampling the data .;Using our refined model increases the gain by another 4 points .;TriviaQA Web;17
46445;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;0;stochastic policies;However , π could be extended to stochastic policies .;": O → A is a deterministic policy ; given an observation it will always output the same action .";Algorithm 1 TRAIN DASNET ( M , µ , Σ , p , n ) 1 : while True do 2 : images ⇐ NEXTBATCH ( n ) 3 : for i;Title;0
95996;3f3a483402a3a2b800cf2c86506a37f6ef1a5332;Method;0;people detector output;ROI are either based on a ground truth ( GT ROI ) or on the people detector output ( det ROI ) .;This corresponds to unary only performance .;Results on WAF .;Multi Person Pose Estimation;15
58947;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;information bottleneck principle;The resulting model and learning algorithm has information - theoretic connections to maximum entropy discrimination and the information bottleneck principle .;Remarkably , there is an efficient way to tune the parameter using annealing .;Empirically , we show that the proposed approach significantly outperforms several state - of - the - art baselines , including two recently - proposed neural network approaches , on several real - world datasets .;ABSTRACT;1
59886;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;trace;[ reference ] ( bottom rows ) , where iSQRT - COV ( trace ) indicates pre - normalization by trace .;Here we compare them in Tab .;We can see that pre - normalization by trace produces 0.3 % lower error rate than that by Frobenius norm , while taking similar time with the latter .;Evaluation with AlexNet on ImageNet;14
17909;0a34fe39e9938ae8c813a81ae6d2d3a325600e5c;Method;1;pose estimation;Unlike our proposed pose estimation , they regress poses by using iterative methods which involve computationally costly face rendering .;Some recently addressed faces in particular , though their methods are designed to estimate 2D landmarks along with 3D face shapes .;We regress 6DoF directly from image intensities without such rendering steps .;Related work;2
24063;0d0101e65e52ae0cec38bcd13c6a9d631979c577;Method;0;fractal network ’s nested substructure;Modular building blocks of other designs szegedy2015inception , liao2015competitive resemble special cases of a fractal network ’s nested substructure .;Deep supervision not only arises automatically , but also drives a type of student - teacher learning ba2014dodeep , urban2016dodeepsfollowup internal to the network .;For fractal networks , simplicity of training mirrors simplicity of design .;Introduction;1
98112;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;0;SDE;"The SDE dynamics are governed by a "" noise scale "" g ≈ N / B for the learning rate , N the training dataset size , and B the batch size .";The authors argued that the SGD algorithm can be derived through Euler - Maruyama discretization of a Stochastic Differential Equation ( SDE ) .;They conclude that a higher noise scale prevents SGD from settling into sharper minima .;Related Work;3
71437;2d876ed1dd2c58058d7197b734a8e4d349b8f231;Method;1;fo - pooling;"The function may also include an output gate : Or the recurrence relation may include an independent input and forget gate : We term these three options f - pooling , fo - pooling , and ifo - pooling respectively ; in each case we initialize or to zero .";The simplest option , which term “ dynamic average pooling ” , uses only a forget gate : where denotes elementwise multiplication .;Although the recurrent parts of these functions must be calculated for each timestep in sequence , their simplicity and parallelism along feature dimensions means that , in practice , evaluating them over even long sequences requires a negligible amount of computation time .;Model;2
89136;39978ba7c83333475d6825d0ff897692933895fc;Method;1;filtering stage;Therefore , back - propagation through this filtering stage can also be performed in O ( N ) time .;In terms of permutohedral lattice operations , this can be accomplished by only reversing the order of the separable filters in the blur stage , while building the permutohedral lattice , splatting , and slicing in the same way as in the forward pass .;Following [ reference ] , we use two Gaussian kernels , a spatial kernel and a bilateral kernel .;Message Passing;8
3833;020a9aba95bce75dca08e3c499efc9e100f1cbb6;Method;1;Atari emulator;The task consists of choosing actions in an Atari emulator based on raw images of the screen .;We evaluate our approach on 14 games from the Arcade Learning Environment .;Previous work has tackled this task using Q - learning with epsilon - greedy exploration , as well as Monte Carlo tree search and policy gradient methods .;Experimental Results;8
51211;1e7a36c4d4f96b29e3edf51b6eb61f8e16217704;Method;1;Character level language models;Character level language models can be compared with word level language models by converting bits per character to perplexity .;Character level models do have an inherent advantage of being able to capture subword language information , motivating their use on traditionally word - level tasks .;In this case , we model the data at the UTF - 8 byte level .;WikiText - 2;12
59513;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;iterative matrix square root normalization method;Towards addressing this problem , we propose an iterative matrix square root normalization method for fast end - to - end training of global covariance pooling networks .;However , existing methods depend heavily on eigendecomposition ( EIG ) or singular value decomposition ( SVD ) , suffering from inefficient training due to limited support of EIG and SVD on GPU .;At the core of our method is a meta - layer designed with loop - embedded directed graph structure .;Towards Faster Training of Global Covariance Pooling Networks by Iterative Matrix Square Root Normalization;0
27475;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;weak supervision source;An interesting observation here is that in both collections , these two initializations converge when the models exceed the performance of the weak supervision source , which is BM25 in our experiments .;When enough training data is fed to the network , initializing with pre - trained embedding and random values converge to the same performance .;This suggests that the convergence occurs when accurate representations are learned by the networks , regardless of the initialization .;Results and Discussion;12
83971;3600aac8edc5bc015e69f2ffa893c21b6d4e1057;Method;1;anchor - level attention mechanism;As discussed in Section [ reference ] , we apply anchor - level attention mechanism to enhance the facial parts .;;We compare our FAN baseline with and without attention in Table [ reference ] for the WiderFace val dataset .;Attention mechanism;13
4283;023cc7f9f3544436553df9548a7d0575bb309c2e;Method;0;CUDA implementations;While it is possible to have a 10 speed up for char - CNN by using more recent CUDA implementations of convolutions , fastText takes less than a minute to train on these datasets .;Table [ reference ] shows that methods using convolutions are several orders of magnitude slower than fastText .;The GRNNs method of tang2015document takes around 12 hours per epoch on CPU with a single thread .;Training time .;9
93245;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;1;generic face alignment algorithm;[ reference ] that even as a generic face alignment algorithm , 3DDFA still demonstrates competitive performance on the common set and state - of - the - art performance on the challenging set .;It can be seen in Tabel .;;Medium Pose Face Alignment;23
98260;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;fc 7 features;Nearest neighbors are computed using fc 7 features .;We approach this difficult task in the style of SIFT flow : we retrieve near neighbors using a coarse similarity measure , and then compute dense correspondences on which we impose an MRF smoothness prior which finally allows all images to be warped into alignment .;Since we are specifically testing the quality of alignment , we use the same nearest neighbors for convnet or conventional features , and we compute both types of features at the same locations , the grid of convnet rf centers in the response to a single image .;Intraclass alignment;8
15438;08d55271589f989d90a7edce3345f78f2468a7e0;Method;1;score generation parts;We design different score generation parts start at different feature maps .;In quality aware network ( QAN ) , quality generation part is a convolution neural network .;We use QAN split at Pool4 as an instance .;Details of quality generation part;7
96949;3febb2bed8865945e7fddc99efd791887bb7e14f;Method;1;forward CNN - BIG - LSTM;After training for 10 epochs on the 1B Word Benchmark Chelba2014OneBW , the average forward and backward perplexities is 39.7 , compared to 30.0 for the forward CNN - BIG - LSTM .;In contrast , traditional word embedding methods only provide one layer of representation for tokens in a fixed vocabulary .;Generally , we found the forward and backward perplexities to be approximately equal , with the backward value slightly lower .;Pre - trained bidirectional language model architecture;7
49331;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;0;temporal memory linkage;The two innovations proposed by this model are a new approach to tracking memory freeness ( dynamic memory allocation ) and a mechanism for associating memories together ( temporal memory linkage ) .;Recently proposed a novel MANN the Differentiable Neural Computer ( DNC ) .;We demonstrate here that the approaches enumerated in the paper can be adapted to new models by outlining a sparse version of this model , the Sparse Differentiable Neural Computer ( SDNC ) , which learns with similar data efficiency while retaining the computational advantages of sparsity .;Sparse Differentiable Neural Computer;30
90928;3a8d537bcec370d37990d39eab01c729496ad057;Method;1;formulated loss function;Furthermore , our approach makes use of a well - formulated loss function that circumvents the instability involved with adversarial learning while still being able to produce higher - quality samples .;In contrast to prior work , our approach , which is derived from a variational Bayesian perspective view of learning , naturally allows for joint training of all model parameters .;;Related Work;2
51997;2019ede61cc0be14859908312e18458a7c79908f;Method;0;generation systems;Traditionally , generation systems relied on rules and hand - crafted specifications;;"[ reference ][ reference ][ reference ][ reference ][ reference ] . Generation is divided into modular , yet highly interdependent , decisions : ( 1 ) content planning defines which parts of the input fields or meaning representations should be selected ; ( 2 ) sentence planning determines which selected fields are to be dealt with in each output sentence ; and ( 3 ) surface realization generates those sentences .";Related Work;3
89577;39dba6f22d72853561a4ed684be265e179a39e4f;Method;1;C ++ implementation;A C ++ implementation of deep LSTM with the configuration from the previous section on a single GPU processes a speed of approximately 1 , 700 words per second .;;This was too slow for our purposes , so we parallelized our model using an 8 - GPU machine .;Parallelization;8
65177;29c19276b8fff231717c3e342cb24144d2b77726;Method;1;token and subtoken - level representations;We evaluated token and subtoken - level representations for neural network - based part - of - speech tagging across 22 languages and proposed a novel multi - task bi - LSTM with auxiliary loss .;;The auxiliary loss is effective at improving the accuracy of rare words .;Conclusions;12
40598;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;0;LSTM system;For the LSTM system , the conversation - side i - vector is appended to each frame of input .;A 100 - dimensional i - vector is generated for each conversation side .;For convolutional networks , this approach is inappropriate because we do not expect to see spatially contiguous patterns in the input .;Speaker Adaptive Modeling;7
23587;0cb8f50580cc69191144bd503e268451ce966fa6;Method;1;Matrix Multiplication;Matrix Multiplication : We started with the message function used in GG - NN which is defined by the equation .;;Edge Network : To allow vector valued edge features we propose the message function where is a neural network which maps the edge vector to a matrix .;Message Functions;7
15374;08d55271589f989d90a7edce3345f78f2468a7e0;Method;0;images ’ feature generator;Due to mutual benefit between the two parts during training , performance is improved significantly by jointly optimizing images aggregation parameter and images ’ feature generator .;In QAN , score is automatically learned and quality generation unit is joint trained with feature generation unit .;;Related work;2
103047;43428880d75b3a14257c3ee9bda054e61eb869c0;Method;0;state representations;, x m ) of m elements and returns state representations;The encoder RNN processes an input sequence x = ( x 1 , . . .;z =;Recurrent Sequence to Sequence Learning;4
25908;0dcde9f2c5149f0e4c806db7b4cc4915bed077da;Method;1;Filters;subsection : Stage 1 Design : Filters and Pooling;"We found that local and / or global contrast enhancement only diminished performance ; CIFAR - 10 : convert from 3 channels to 4 by adding a conversion to greyscale from the raw RGB ; apply ZCA whitening to each channel of each image , as in .";Since our objective here was to train only a single layer of the network , we did not seek to train the network to find filters optimised for the training set .;Stage 1 Design : Filters and Pooling;10
77302;3112d2d95d66b3d54a72c55072647aab937e410e;Method;0;Record Types layer;: Possible Record Types layer ( ReLU ) MLP into R 500 , which predicts one of the 39 relation types ( or ǫ ) with a linear decoder layer and softmax .;;The bidirectional LSTM model uses a single layer with 500 units in each direction , which are concatenated .;Appendix Player Types;22
83741;35ff11e0a5e465c810a30b022b26a9d577a434ce;Method;0;sequential RNNs;The problem of understanding neural network models in NLP has been previously studied for sequential RNNs .;;shi_16 showed that sequence - to - sequence neural translation models capture a certain degree of syntactic knowledge of the source language , such as voice ( active or passive ) and tense information , as a by - product of the translation objective .;Related Work;12
57492;2393447b8b0b79046afea1c88a8ed3949338949e;Method;1;512 - dimensional ELMo PetersNIGCLZ18;We used the pre - trained uncased 300 - dimensional GloVe PenningtonSM14 and the original 512 - dimensional ELMo PetersNIGCLZ18 .;The numbers of shared encoding blocks , modeling blocks for question , modeling blocks for passages , and decoder blocks were 3 , 2 , 5 , and 8 , respectively .;We used the spaCy tokenizer , and all words were lowercased except the input for ELMo .;Model configurations .;32
76124;303065c44cf847849d04da16b8b1d9a120cef73a;Method;1;error term;The overall cost function of the proposed 3DMM formulation consists of a texture - based term , an optional error term based on sparse 2D landmarks and optional regularization terms on the parameters .;;Texture reconstruction cost .;Cost Function;7
33362;1109b663453e78a59e4f66446d71720ac58cec25;Method;0;segmentation based methods;Combined with our method , we may observe similar improvements as seen here between traditional dense methods and segmentation based methods .;suggest that detection accuracy drops when using dense sliding window as opposed to selective search which discards unlikely object locations hence reducing false positives .;It should also be noted that we did not fine tune on the detection validation set as NEC and UvA did .;Detection;14
107239;45fdc73a239e9c6ea65e98c96f6a2d6dc35d6f72;Method;1;Adam learning rate optimizer;CNNs and QCNNs are trained with the Adam learning rate optimizer and vanilla hyperparameters during epochs .;A dropout of and a regularization of are used across all the layers , except the input and output ones .;Then , a fine - tuning process of epochs is performed with a standard and a learning rate of .;Models architectures;10
13036;072fd0b8d471f183da0ca9880379b3bb29031b6a;Method;0;discriminator architecture;The discriminator architecture is : C64 - C128 - C256 - C512;;After the last layer , a convolution is applied to map to a 1 - dimensional output , followed by a Sigmoid function .;Discriminator architectures;23
82088;35502af359aa60ae8047df172e29503cfb29c3f9;Method;1;instance proposal method;As a final test of our method , we also train it to produce semantic labels which are combined with our instance proposal method to recognize the detected proposals .;;For semantic segmentation which is a k - way classification problem , we train a model using cross - entropy loss alongside our embedding loss .;Semantic Instance Detection;15
68468;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;1;stack of 6n layers;Then we use a stack of 6n layers with 3×3 convolutions on the feature maps of sizes { 32 , 16 , 8 } respectively , with 2n layers for each feature map size .;The first layer is 3×3 convolutions .;The numbers of filters are { 16 , 32 , 64 } respectively .;CIFAR - 10 and Analysis;12
53499;2116b2eaaece4af9c28c32af2728f3d49b792cf9;Method;1;Improving neural networks by preventing co - adaptation of feature detectors;document : Improving neural networks by preventing co - adaptation of feature detectors;This is clearly demonstrated by the fact that using non - convolutional higher layers with a lot of parameters leads to a big improvement with dropout but makes things worse without dropout .;When a large feedforward neural network is trained on a small training set , it typically performs poorly on held - out test data .;Improving neural networks by preventing co - adaptation of feature detectors;0
81056;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;1;DGD;Compared with the similar multi - class person identification network DGD , the Rank - 1 identification rate improves by 1.63 % using our fusion model on the labeled dataset .;Compared with metric learning methods , such as the state - of - the - art approach DNS , the proposed fusion model improves the Rank - 1 identification rate by 11.66 % and 13.29 % on the labeled and detected datasets respectively .;It should be noted that we only use the labeled sets for training , while the DGD is trained on both the labeled and detected datasets .;Comparison with State - of - the - art Methods;11
63458;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;ResU - Net models;The experimental results show that the proposed models outperform the U - Net and ResU - Net models with same number of network parameters .;Table III shows the summary of how well the proposed models performed against equivalent U - Net and ResU - Net models .;Furthermore , many models struggle to define the class boundary properly during segmentation tasks [ reference ] .;5 ) Lung Segmentation;14
49129;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;dense - approximation;This usage definition is incorporated within Dense Access Memory ( DAM ) , a dense - approximation to SAM that is used for experimental comparison in Section [ reference ] .;The first definition is a time - discounted sum of write weights where is the discount factor .;The second usage definition , used by SAM , is simply the number of time - steps since a non - negligible memory access : .;Write;8
49064;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;content - based read operations;By thresholding memory modifications to a sparse subset , and using efficient data structures for content - based read operations , our model is optimal in space and time with respect to memory size , while retaining end - to - end gradient based optimization .;In this paper , we present a MANN named SAM ( sparse access memory ) .;To test whether the model is able to learn with this sparse approximation , we examined its performance on a selection of synthetic and natural tasks : algorithmic tasks from the NTM work , Babi reasoning tasks used with Memory Networks and Omniglot one - shot classification .;Introduction;1
49327;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;0;Differentiable Neural Computer;Recently proposed a novel MANN the Differentiable Neural Computer ( DNC ) .;;The two innovations proposed by this model are a new approach to tracking memory freeness ( dynamic memory allocation ) and a mechanism for associating memories together ( temporal memory linkage ) .;Sparse Differentiable Neural Computer;30
37698;15ca7adccf5cd4dc309cdcaa6328f4c429ead337;Method;1;distance transform;Then we treat the occupied cells as the zero level set of a surface , and apply a distance transform to build a 3D distance field , which is stored in a 3D array indexed by , where , and is the resolution of the distance field .;Given a mesh ( or point cloud ) , we first convert it into a binary occupancy grid representation , where the binary occupancy value in each grid is determined by whether it intersects with any mesh surface ( or contains any sample point ) .;We denote the distance value at by .;Input 3D Fields;7
28391;0f2f4edb7599de34c97f680cf356943e57088345;Method;1;Torch7;The network is trained using Torch7 and for optimization we use rmsprop with a learning rate of 2.5e - 4 .;We avoid translation augmentation of the image since location of the target person is the critical cue determining who should be annotated by the network .;Training takes about 3 days on a 12 GB NVIDIA TitanX GPU .;Training Details;7
48432;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;1;Qi;We process the input point clouds using a variant of the PointNet architecture by Qi .;;PointNet pioneered the use of a symmetric function ( max - pooling ) to achieve permutation invariance in the processing of unordered 3D point cloud sets .;Point Cloud Network;4
52332;207e0ac5301a3c79af862951b70632ed650f74f7;Method;0;generalised eigen - problem;( [ reference ] ) can be done by solving the following generalised eigen - problem : If is non - singular , eigenvectors can be computed corresponding to the largest eigenvalues of .;The optimisation of Eq .;Using them as the columns , the projection matrix can project the original data into a dimensional discriminative subspace where the classes become maximally separable .;Foley - Sammon Transform;5
97069;3febb2bed8865945e7fddc99efd791887bb7e14f;Method;1;SQuAD;Table [ reference ] compares these alternatives for SQuAD , SNLI and SRL .;The choice of the regularization parameter is also important , as large values such as effectively reduce the weighting function to a simple average over the layers , while smaller values ( e.g. , ) allow the layer weights to vary .;Including representations from all layers improves overall performance over just using the last layer , and including contextual representations from the last layer improves performance over the baseline .;Alternate layer weighting schemes;10
25248;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;0;Encoder - decoder structures;Encoder - decoder structures can combine the high - level semantic information from later layers with the spatial information from earlier ones .;DeepLab and used dilated convolution to enlarge the receptive field for dense labeling .;Multi - scale feature ensembles are also used in .;High Quality Semantic Segmentation;5
43804;1a0912bb76777469295bb2c059faee907e7f3258;Method;1;FPN backbone;Faster R - CNN with an FPN backbone extracts RoI features from different levels of the feature pyramid according to their scale , but otherwise the rest of the approach is similar to vanilla ResNet .;FPN uses a top - down architecture with lateral connections to build an in - network feature pyramid from a single - scale input .;Using a ResNet - FPN backbone for feature extraction with Mask R - CNN gives excellent gains in both accuracy and speed .;Mask R - CNN;4
77918;31e5dab321066712cdc8b30943f7950066840ee1;Method;1/na;tai2015improved;We use the Child - Sum variant introduced by tai2015improved , which processes the tree in a bottom - up pass .;TreeLSTMs assume tree - structured input , so AMR graphs must be preprocessed to respect this constraint : reentrancies , which play an essential role in AMR , must be removed , thereby transforming the graphs into trees .;When visiting a node , the hidden states of its children are summed up in a single vector which is then passed into recurrent gates .;TreeLSTM Encoders;8
75534;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;1;joint Bayesian similarity embedding;In this study , we implement triplet loss similarity embedding , joint Bayesian similarity embedding and 2D alignment , and use these alternative feature encodings as input to template adaptation .;The state of the art approaches on LFW and YouTubeFaces often augment a very deep CNN encoding with metric learning for improved verification scores or 2D alignment to better align facial bounding boxes .;In this study , we seek to answer whether these alternative strategies will provide improved performance over using CNN encoding only or CNN encoding with template adaptation .;Analysis of Alternatives;7
23715;0cb8f50580cc69191144bd503e268451ce966fa6;Method;0;first order approximation;Motivated as a first order approximation of the graph laplacian methods , propose the following layer - wise propagation rule : Here where is the real valued adjacency matrix for an undirected graph .;;Adding the identity matrix corresponds to adding self loops to the graph .;The special case of Kipf and Welling ( 2016 );19
59312;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;NDCG@100;We set the weights on all the 0 's to 1 and tune the weights on all the 1 's in the click matrix among { 2 , 5 , 10 , 30 , 50 , 100 } , as well as the latent representation dimension K ∈ { 100 , 200 } by evaluating NDCG@100 on validation users .;"We train wmf with alternating least squares ; this generally leads to better performance than with SGD .";Slim [ reference ] : a linear model which learns a sparse item - to - item similarity matrix by solving a constrained ℓ 1 - regularized optimization problem .;Baselines;17
15635;0985497d1de3ffd11713e75289cc2ad55836623d;Method;1;boundary - based MRC models;First , we follow the boundary - based MRC models to find an answer candidate for each passage by identifying the start and end position of the answer ( Figure [ reference ] ) .;The overall framework of our model is demonstrated in Figure [ reference ] , which consists of three modules .;Second , we model the meanings of the answer candidates extracted from those passages and use the content scores to measure the quality of the candidates from a second perspective .;Introduction;1
40481;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;0;feed - forward MLPs;In comparison to the standard feed - forward MLPs or DNNs that first demonstrated breakthrough performance on conversational speech recognition , these acoustic models have the ability to model a large amount of acoustic context with temporal invariance , and in the case of convolutional models , with frequency invariance as well .;Surprisingly , this is the case for both acoustic modeling and language modeling .;In language modeling , recurrent models appear to improve over classical N - gram models through the use of an unbounded word history , as well as the generalization ability of continuous word representations .;Introduction;1
49395;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;NTMs;For large external memories we observed improvements in empirical run - time and memory overhead by up to three orders magnitude over vanilla NTMs , while maintaining near - identical data efficiency and performance .;We also tested several of these tasks scaled to longer sequences via curriculum learning .;Further , in Supplementary [ reference ] we demonstrate the generality of our approach by describing how to construct a sparse version of the recently published Differentiable Neural Computer .;Introduction;1
52099;2019ede61cc0be14859908312e18458a7c79908f;Method;1;KN;Our baseline is an interpolated Kneser - Ney ( KN ) language model and we use the KenLM toolkit to train 5 - gram models without pruning [ reference ] .;;We also learn a KN language model over templates .;Baseline;16
64977;28eceb438da0b841bbd3d02684dbfa263838ed60;Method;1;pixelwise semantic layouts;We have presented a direct approach to photographic image synthesis conditioned on pixelwise semantic layouts .;;Images are synthesized by a convolutional network trained end - to - end with a regression loss .;Conclusion;18
107213;45fdc73a239e9c6ea65e98c96f6a2d6dc35d6f72;Method;0;symbol sequences representation;This probability results in a symbol sequences representation , with in the latent space .;First , a softmax is applied at each timestep , or frame , providing a probability of emitting each symbol at that timestep .;A blank symbol is introduced as an extra label to allow the classifier to deal with the unknown alignment .;Connectionist Temporal Classification;7
85940;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;Hybrid methods;Hybrid methods are proposed to combine the generative and discriminative approach .;[ reference ] improved their preceding work [ reference ] by utilizing recent network architecture , data augmentation , and better initial hand localization .;Oberweger et al .;Related works;4
27463;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;uniform weights;Also , regardless of the embedding method , learning weights helps models to get better performance compared to the fixed weightings , with either IDF or uniform weights .;This supports the hypothesis that with the embedding vector representation the neural networks learn an embedding that is based on the interactions of query and documents that tends to be tuned better to the corresponding ranking task .;Although weight learning can significantly affect the performance , it has less impact than learning embeddings .;Results and Discussion;12
10404;060ff1aad5619a7d6d6cdfaf8be5da29bff3808c;Method;1;LISA;Future work will explore improving LISA ’s parsing accuracy , developing better training techniques and adapting to more tasks .;LISA out - performs the state - of - the - art on two benchmark SRL datasets , including out - of - domain .;;Conclusion;13
619;007ab5528b3bd310a80d553cccad4b78dc496b02;Method;1;Character embedding layer;Character embedding layer is responsible for mapping each word to a high - dimensional vector space .;;Let and represent the words in the input context paragraph and query , respectively .;1 . Character Embedding Layer .;3
68273;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;1;deeper model;The existence of this constructed solution indicates that a deeper model should produce no higher training error than its shallower counterpart .;There exists a solution by construction to the deeper model : the added layers are identity mapping , and the other layers are copied from the learned shallower model .;But experiments show that our current solvers on hand are unable to find solutions that are comparably good or better than the constructed solution ( or unable to do so in feasible time ) .;Introduction;2
35678;143a3186c368544ded00a444be33153420baa254;Method;1;non - convolutional network;In order to also provide a fair comparison against memory - augmented neural networks [ reference ] and to test the flexibility of MAML , we also provide results for a non - convolutional network .;For MiniImagenet , we used 32 filters per layer to reduce overfitting , as done by [ reference ] .;For this , we use a network with 4 hidden layers with sizes 256 , 128 , 64 , 64 , each including batch normalization and ReLU nonlinearities , followed by a linear layer and softmax .;Classification;15
94543;3e58fbb8cb96880e018ca18a60e2d86e3cb0c10a;Method;1;joint configuration inference;"We optimize in two sequential steps : 1 ) generate joint partition set based on joint candidates ; 2 ) conduct joint configuration inference in each joint partition locally , which reduces the joint configuration complexity and overcomes the drawback of bottom - up approaches .";( [ reference ] ) .;After getting joint partition according to Eqn .;Local Greedy Inference for Pose Estimation;9
71874;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;d - layer DilatedRNN;Among this subset , the d - layer DilatedRNN with dilation rate { M0 , ⋯ , M - d1 } achieves the smallest ¯d .;For the RNNs in this subset , there are d different dilations , 1=s1≤s2≤⋯≤sd = m , and where ni is any arbitrary positive integer .;The proof is motivated by , and is given in appendix [ reference ] .;( Parameter Efficiency of DilatedRNN ) .;11
92317;3b9732bb07dc99bde5e1f9f75251c6ea5039373e;Method;1;fully - connected linear layer;Finally , a fully - connected linear layer projects to the output of the network , i.e. , the Q - values .;All these layers are separated by Rectifier Linear Units ( ReLu ) .;The optimization employed to train the network is RMSProp ( with momentum parameter ) .;Network Architecture;20
76876;309acdd149f5f0ea12acb103b36bb59e6e631671;Method;1;Gaussian blurring;Following , the objective or cost function minimized at each stage is the the squared distance between the generated fusion maps of the layer , and ground - truth belief maps generated by Gaussian blurring the sparse ground - truth locations of each landmark;;For end - to - end training the total loss is the sum over all layers .;The Objective and Training;13
33571;128c727ac06fcc50f1735cb222a441eee6adcab6;Method;0;kazemi2018simple;Similarly to ComplEx , kazemi2018simple show that SimplE is fully expressive with entity and relation embeddings of size , with representing the number of true facts .;As shown in , ComplEx is fully expressive with the bound on entity and relation embedding dimensionality of for achieving full expressiveness .;The authors further prove other models are not fully expressive :;Bound on Embedding Dimensionality for Full Expressiveness;9
98606;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;1;spatial - aware splitting layer;To perform the spatial - aware adaptation , we therefore build a spatial - aware splitting layer , in which we recover the location for each activation in the original image coordinate , and split them into different domain classifiers according to the region it comes from .;Note that , when training semantic segmentation model , due to the high resolution of input image , the input image is usually randomly cropped to fit the GPU memory .;During the test phase , the two newly added modules can be removed , and one can perform semantic segmentation as the same as for the conventional semantic segmentation models .;Network Overview;6
81964;35502af359aa60ae8047df172e29503cfb29c3f9;Method;0;Mean - shift;Mean - shift and closely related algorithms use kernel density estimation to approximate the probability density from a set of samples and then perform clustering on the input data by assigning or moving each sample to the nearest mode ( local maxima ) .;[ reference ] ) which operates recurrently on the embedding space in order to congeal the embedding vectors into a small number of instance labels .;From our perspective , the advantages of this approach are ( 1 ) the final instance labels ( modes ) live in the same embedding space as the initial data , ( 2 ) the recurrent dynamics of the clustering process depend smoothing on the input allowing for easy backpropagation , ( 3 ) the behavior depends on a single parameter , the kernel bandwidth , which is easily interpretable and can be related to the margin used for the embedding loss .;Recurrent Mean - Shift Grouping;8
27911;0f0cab9235bbf185acdd4f9713fd111ca50effca;Method;1;Covariance Pooling;document : Covariance Pooling for Facial Expression Recognition;bibliography : References;Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks .;Covariance Pooling for Facial Expression Recognition;0
30703;100c730003033151c0f78ed1aab23df3e9bd5283;Method;1;deterministic counterpart;The optimisation of the lower bound on one hand maximises the conditional log - likelihood ( that the deterministic counterpart cares about ) and on the other hand minimises the KL - divergence ( that regularises the gradients ) .;the parameters of latent distributions .;Hence , each update of the lower bound actually keeps the gradients w.r.t . from swinging heavily .;Discussion;10
59623;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;Newton - Schulz ( NS ) iteration;The pre - normalization guarantees convergence of Newton - Schulz ( NS ) iteration , while post - compensation plays a key role in achieving state - of - the - art performance with prevalent deep ConvNet architectures , e.g. ResNet .;The design of sandwiching Newton - Schulz iteration using pre - normalization by Frobenius norm or trace and post - compensation is essential , which , as far as we know , did not appear in previous literature ( e.g. in or ) .;The main differences between our method and other related works are summarized in Tab .;Introduction;1
19126;0a49b4de21363d86599d4a058aaf4f5aed019495;Method;1;bi - scale;When the decoder is a multilayer recurrent neural network ( including a stacked network as well as the proposed bi - scale network ) , the decoder outputs multiple hidden vectors– for layers , at a time .;;This allows an extra degree of freedom in the soft - alignment mechanism ( in Eq .;Multilayer Decoder and Soft - Alignment Mechanism;19
57599;2393447b8b0b79046afea1c88a8ed3949338949e;Method;0;neural re - ranker;Earlier pipeline models find a small number of relevant passages with a TF - IDF based ranker and pass them to a neural reader ChenFWB17 , GardnerC18 , while more recent pipeline models use a neural re - ranker to more accurately select the relevant passages WangAAAI2018 , NishidaSOAT18 .;The simplest approach is to concatenate the passages and find the answer from the concatenated one as in WangYWCZ17 .;Also , non - pipelined models ( including ours ) consider all the provided passages and find the answer by comparing scores between passages TanWYDLZ18 , WuWLHWLLL18 .;Multi - passage RC .;47
27236;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;point - wise;We define three different ranking models : one point - wise and two pair - wise models .;;We introduce the architecture of these models and explain how we train them using weak supervision signals .;Ranking Architectures;5
15120;0899bb0f3d5425c88b358638bb8556729720c8db;Method;1;Augmented Autoencoder;subsection : Augmented Autoencoder;Finally , it allows us to bridge the domain gap between simulated and real data .;justification = centering , font = scriptsize , aboveskip=0.15 cm , belowskip=0.25 cm;Augmented Autoencoder;14
32899;10fd174fefd5e36a523805e4c2d2fbf1d12a3ae8;Method;0;’ architectures;A key component which allowed for more expressive models was the introduction of ‘ ‘ content ’ ’ based attention in , and ‘ ‘ computer - like ’ ’ architectures such as the Neural Turing Machine or Memory Networks .;This is most notable in the massive adoption of LSTMs in a variety of tasks such as speech , translation or learning programs .;Our work takes the metalearning paradigm of , where an LSTM learnt to learn quickly from data presented sequentially , but we treat the data as a set .;Memory Augmented Neural Networks;8
50135;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;1;concatenation of previous layer encodings;As is also typical , we can have multiple such layers ( l ) that feed into each other through the concatenation of previous layer encodings .;a BiLSTM :;The last layer l has both forward ( f l c , 1 , ... , f l c , n ) and backward (;Sentence - based Character Model;5
86341;36b1ba4287c4884df27dd684c4c7f66f32e943db;Method;0;matrix of convolutional filters;The result is reshaped to generate a matrix of convolutional filters .;The hypernetwork is a fully connected layer ( denotes filter length and the number of filters per relation , i.e. output channels of the convolution ) that is applied to the relation embedding .;Whilst the overall dimensionality of the filter set is , the rank is restricted to to encourage parameter sharing between relations .;Scoring Function and Model Architecture;5
14028;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;11;walk mode;As could be noted from Table 6 , almost all modes are followed by a walk mode .;The matrix contains the different probabilities of switching between every two modes , which is a good indication of the natural flow of modal mixes .;Therefore , the algorithm then segments the track into several stages , where every two different modal stages are separated by a walk stage .;Title;0
101822;42764b57d0794b63487a295ce8c07eeb6961477e;Method;1;ImageNet pre - trained models;We use the ImageNet pre - trained models ( , VGG - 16 ) to initialize the shared convolutional layers and the corresponding 4096 - d fc layers .;Hyper - parameters for training .;The extra layers are initialized randomly as in .;Implementation Details;9
79321;33998aff64ce51df8dee45989cdca4b6b1329ec4;Method;1;GRAPH ATTENTIONAL LAYER;section : GRAPH ATTENTIONAL LAYER;In this section , we will present the building block layer used to construct arbitrary graph attention networks ( through stacking this layer ) , and directly outline its theoretical and practical benefits and limitations compared to prior work in the domain of neural graph processing .;We will start by describing a single graph attentional layer , as the sole layer utilized throughout all of the GAT architectures used in our experiments .;GRAPH ATTENTIONAL LAYER;4
21376;0b8759d61e93b809df16d9fe9010d2a2d7241c74;Method;1;Inverse of Block Matrices;appendix : Inverse of Block Matrices;bibliography : References;;Inverse of Block Matrices;18
85190;3652c2d20f198dc39ad159eba55d08341c56d628;Method;1;learning scheme;We start by making assumptions on the input data , and then present the learning scheme and its approximation principles .;We have now all the tools in hand to build our convolutional kernel network .;;Approximating the Multilayer Convolutional Kernel;17
26089;0e37c8f19eefeb0c20d92f5cb4df4153077c116b;Method;1;DFT and inverse DFT layers;In the following sections of the paper , we will propagate signals and their gradients through DFT and inverse DFT layers .;2.1 Conjugate symmetry constraints;In these layers , we will represent the frequency domain in the complex field .;Title;0
17432;0a1dc95e4c884a91bd141df8133d1b4961178123;Method;0;larger models;[ reference ] . Several factors are of central importance in this progress : ( i ) the efficient training implementation on modern powerful GPUs [ reference ] , ( ii ) the proposal of the Rectified Linear Unit ( ReLU ) [ reference ] which makes convergence much faster while still presents good quality [ reference ] , and ( iii ) the easy access to an abundance of data ( like ImageNet [ reference ] ) for training larger models .;They have also been successfully applied to other computer vision fields , such as object detection [ reference ] , [ reference ] , [ reference ] , face recognition [ reference ] , and pedestrian detection;Our method also benefits from these progresses .;Convolutional Neural Networks;5
83753;35ff11e0a5e465c810a30b022b26a9d577a434ce;Method;0;probabilistic context - free grammar formalism;Extensive prior work on phrase - structure parsing typically employs the probabilistic context - free grammar formalism , with lexicalized and nonterminal augmentations .;Through the stack - only ablation we demonstrate that the RNNG composition function is crucial to obtaining state - of - the - art parsing performance .;The conjecture that fine - grained nonterminal rules and labels can be discovered given weaker bracketing structures was based on several studies .;Related Work;12
93141;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;0;2D face alignment methods;Therefore 3D evaluation can be degraded to 2D evaluation which also makes it possible to compare 3DDFA with other 2D face alignment methods .;Considering the recent achievements in 3D face reconstruction which can construct a 3D face from 2D landmarks , we assume that a 3D model can be accurately fitted if sufficient 2D landmarks are provided .;However , AFLW is not suitable for evaluating this task since only visible landmarks lead to serious ambiguity in 3D shape , as reflected by the fake good alignment phenomenon in Fig .;Datasets;18
64685;289e91654f6da968d625481ef21f52892052d4fc;Method;0;Decomposable Attention Model;Following this work , Parikh et al . DATTENTION integrated the attention mechanism into this framework , called Decomposable Attention Model .;Wang et al . llstm proposed a “ matching - aggregation ” framework to perform the interaction in Natural Language Inference .;Then Wang et al .;Interaction Mechanism;28
71256;2d83dbf4c8eabc6bdef3326c4a30d5f33ffc944e;Method;1;pretrained image captioning model;Notice that the pretrained image captioning model is not part of training .;If is not a number or yes or no , and appeared at least once in the generated caption , then update .;This simple procedure improves around of the test - dev overall accuracy ( for Other type ) .;Postprocessing;13
72796;2e4c06dd00c4c09ad5ac6be883cc66c19d88ea79;Method;1;shared representation of latent node embeddings;We present a novel densely connected autoencoder architecture capable of learning a shared representation of latent node embeddings from both local graph topology and available explicit node features for LPNC .;Therefore , models capable of exploiting topological structures of graphs have been shown to achieve superior predictive performances on many LPNC tasks .;The resulting autoencoder models are useful for many applications across multiple domains , including analysis of metabolic networks for drug - target interaction , bibliographic networks , social networks such as Facebook (;Introduction;1
14523;0875fc92cce33df5cf7df169590dbf0ca00d2652;Method;0;intermediate image generative steps;[ reference ] ) , with and initialized to learned biases : The function is used to compute the alignment between the input caption and intermediate image generative steps bahdanau_mt .;Formally , an image is generated by iteratively computing the following set of equations for ( see Fig .;Given the caption representation from the language model , , the operator outputs a dynamic sentence representation at each step through a weighted sum using alignment probabilities : The corresponding alignment probability for the word in the caption is obtained using the caption representation and the current hidden state of the generative model : where , , and are the learned model parameters of the alignment model .;Image Model : the Conditional DRAW Network;5
92315;3b9732bb07dc99bde5e1f9f75251c6ea5039373e;Method;1;Rectifier Linear Units;All these layers are separated by Rectifier Linear Units ( ReLu ) .;This is followed by a fully - connected hidden layer of 512 units .;Finally , a fully - connected linear layer projects to the output of the network , i.e. , the Q - values .;Network Architecture;20
77411;31ae4873da19b1e28eca8787a17f49bba08627e5;Method;0;Overfeat detector;Building on the sliding - window paradigm of the Overfeat detector , other computationally - efficient approaches have emerged such as YOLO , SSD and DenseBox .;For example , Fast - RCNN shares the convolutions across different region proposals to provide speed - up , Faster - RCNN and R - FCN incorporate region proposal generation in the framework leading to a completely end - to - end version .;Thorough comparisons among these methods are discussed in .;Related Work;2
95293;3e7f54801c886ea2061650fd24fc481e39be152f;Method;1;iterative error feedback;We compare our model to three state - of - the - art methods : random forests , random tree walks ( RTW ) , and iterative error feedback ( IEF ) .;;One of our primary goals is to achieve viewpoint invariance .;Comparison with State - of - the - Art;11
53180;20cc4bfdb648fd7947c71252589fc867d4d16933;Method;1;conditional probability distributions;"Since the conditional probability distribution over N classes is an element within R N on the unit simplex , we can consider the Euclidean distance to be a metric of "" confusion "" between two conditional probability distributions .";;Analogous to the previous setting , we define the Euclidean Confusion D EC ( · , · ) for a pair of inputs x 1 , x 2 with model parameters θ as :;Euclidean Distance as Confusion;6
103429;434bf475addfb580707208618f99c8be0c55cf95;Method;1;ANNs;ANNs differ , as they are trained on the data with less need for manual interference .;The presented approach uses Artificial Neural Networks ( ANN ) .;Convolutional Neural Networks are a special kind of ANN and have been shown to work well as feature extractor when using images as input and are real - time capable .;Introduction;1
77128;3112d2d95d66b3d54a72c55072647aab937e410e;Method;0;Joint Copy model;We note here that the key distinction for our purposes between the Joint Copy model and the Conditional Copy model is that the latter conditions on whether there is a copy or not , and so in p copy the source records compete only with each other .;Letting r ( y t;In the Joint Copy model , however , the source records also compete with words that can not be copied .;Conditional Copy;8
10970;061c05faf3d68a7bdade9d4debeab369e2f9746c;Method;1;Kantorovich - Rubinstein duality;On the contrary , the WGAN introduces the Lipschitz constraint from the Kantorovich - Rubinstein duality of the EM distance but it is not proved in if the density of samples generated by WGAN is consistent with that of real data .;Under this regularity condition , we have proved in Theorem [ reference ] that the density of generated samples matches the underlying data density .;Here we assert that the WGAN also models an underlying Lipschitz density .;Comparison with Wasserstein GAN;21
97710;40193e7ba0fbd7153a1fe15e95563463b67c71f3;Method;1;Iterative Closest Points;Following [ reference ] , we first employ the Iterative Closest Points ( ICP ) algorithm to find the corresponding nearest points between the reconstructed 3D face and the ground truth point cloud .;The VRN - Guided and PRNet are not compared because of the mis - match of point cloud between them and our method .;We then calculate the NME normalized by the face bounding box size .;3D face reconstruction;16
46072;1b9472907f5b7a1815c98b4562dce6c46dd2cf34;Method;0;Age distribution learning;"Age distribution learning has made other notable progress in age estimation ; here , the researchers defined a new loss function to penalize the difference between estimated age distributions and the ground truth age labels .";Since this siamese CNN has only a single output neuron , comparisons between the input image and multiple , carefully selected anchor images are required to compute the rank .;Recent research has also shown that training a multi - task CNN for various face analysis tasks , including face detection , gender prediction , age estimation , etc . , can improve the overall performance across different tasks compared to a single - task CNN ranjan2017all by sharing lower - layer parameters .;CNN Architectures for Age Estimation;4
18334;0a3a003457f5d7758a42a0e4b7278b39a86ed0bd;Method;1;CONV layers;We first randomly initialize a feature extractor ( e.g. CONV layers in ResNets ) and a classifier ( e.g. the last FC layer in ResNets ) , and then optimize them by gradient descent as follows , where denotes the following empirical loss , e.g. cross - entropy loss , and denotes the learning rate .;For instance , for miniImageNet , there are totally classes in the training split of and each class contains samples used to pre - train a - class classifier .;In this phase , the feature extractor is learned .;DNN training on large - scale data;5
63138;27c761258329eddb90b64d52679ff190cb4527b5;Method;0;Random architecture;In addition , CNNs based segmentation methods based on FCN provide superior performance for natural image segmentation [ reference ] . One of the image patch - based architectures is called Random architecture , which is very computationally intensive and contains around 134.5 M network parameters .;Another solution to this problem is proposed by He et al . , a deep residual model that overcomes the problem utilizing an identity mapping to facilitate the training process [ reference ] .;The main drawback of this approach is that a large number of pixel overlap and the same convolutions are performed many times .;II . RELATED WORK;3
44008;1a0912bb76777469295bb2c059faee907e7f3258;Method;1;box - only;Adding the mask branch to the box - only ( i.e. , Faster R - CNN ) or keypoint - only versions consistently improves these tasks .;More ablations of multi - task learning on minival are in Table 5 .;However , adding the keypoint branch reduces the box / mask AP slightly , suggesting that while keypoint detection benefits from multitask training , it does not in turn help the other tasks .;Main Results and Ablations :;12
50066;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;0;sentence - level context sensitive encodings;Morphosyntactic tagging accuracy has seen dramatic improvements through the adoption of recurrent neural networks - specifically BiLSTMs [ reference ][ reference ] to create sentence - level context sensitive encodings of words .;;"A successful recipe is to first create an initial context insensitive word representation , which usually has three main parts : 1 ) A dynamically trained word embedding ; 2 ) a fixed pre - trained word - embedding , induced from a large corpus ; and 3 ) a sub - word character model , which itself is usually the final state of a recurrent model that ingests one character at a time .";Introduction;2
34175;12e20e4ea572dbe476fd894c5c9a9930cf250dd2;Method;0;self - matching attention layer;Compared to , which achieves state - of - the - art result on the SQuAD test set , our model does n’t contain the self - matching attention layer which is stuck with high computational complexity .;;Our MEMEN was trained with NVIDIA Titan X GPU , and the training process of the 3 - hops model took roughly 5 hours on a single GPU .;Speed and Efficiency;14
39852;165ef2b5f86b9b2c68b652391db5ece8c5a0bc7e;Method;0;direct maximum likelihood learning;Compared to the objective in ( [ reference ] ) for direct maximum likelihood learning , the above objective does not involve the global partition function .;and are written as : Thus the optimization for piecewise training is to minimize the negative log likelihood with regularization :;To calculate the gradient of the above objective , we only need to calculate the gradient and .;Piecewise training of CRFs;13
46422;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;0;covariance matrix;Typically , the distribution is a multivariate Gaussian parameterised by mean µ and covariance matrix;The NES family of black - box optimization algorithms use parameterised probability distributions over the search space , instead of an explicit population ( i.e. , a conventional ES [ 27 ] ) .;Each epoch a generation is sampled from the distribution , which is then updated the direction of the natural gradient of the expected fitness of the distribution .;Title;0
58354;23c141141f4f63c061d3cce14c71893959af5721;Method;1;thin stack;We propose a space - efficient stack representation inspired by the zipper technique that we call thin stack .;Such a naïve implementation would also require copying a largely unchanged stack at each timestep , since each shift or reduce operation writes only one new representation to the top of the stack .;For each input sentence , we represent the stack with a single matrix .;Representing the stack efficiently;13
34623;1329206dbdb0a2b9e23102e1340c17bd2b2adcf5;Method;0;pose - normalized representation;Farrell et al . proposed a pose - normalized representation using poselets .;Several approaches are based on detecting and extracting features from certain parts of objects .;Deformable part models were used in for part localization .;Fine - grained categorization;4
48102;1cf6bc0866226c1f8e282463adc8b75d92fba9bb;Method;1;sequence - based LSTM;As noted in , the simple BOW model performs roughly as well if not better than the sequence - based LSTM for the VQA task .;Other question representations , such as an LSTM , can also be used , however , BOW has fewer parameters yet has shown good performance .;Specifically , we compute where represents the BOW weights for word vectors , and is the bias term .;Word Guided Spatial Attention in One - Hop Model;4
50435;1e7678467b1807777dcd9be557b79328ce9419a8;Method;0;trunk architecture;Most computer vision architectures designed for a wide range of tasks leverage a trunk architecture initially designed for classification , such as Residual networks .;;An improvement on the trunk architecture eventually translates to better accuracies in other tasks , as shown on the detection task of the LSVRC’15 challenge .;Image classification .;3
54635;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;1;spatio - temporal filtering;Early methods for optical flow used analytic spatio - temporal features but , at the time , did not produce good results and the general line of spatio - temporal filtering decayed .;Some of the filters may also be separable .;The difference from early work is that our approach suggests the need for a large filter bank of varied filters .;Discussion and Future Work;15
10864;061c05faf3d68a7bdade9d4debeab369e2f9746c;Method;0;f - GAN;In addition , presented to analyze the GAN from information theoretical perspective , and they seek to minimize the variational estimate of f - divergence , and show that the classic GAN is included as a special case of f - GAN .;However , it still needs to assume the discriminator has infinite modeling capacity to prove the result in a non - parametric fashion , and its generalizability of producing new data out of training examples is unknown without theoretical proof or empirical evidence .;In contrast , InfoGAN proposed another information - theoretic GAN to learn disentangled representations capturing various latent concepts and factors in generating samples .;Related Work;5
48829;1d696a1beb42515ab16f3a9f6f72584a41492a03;Method;1;Joint Bayesian model;To verify the improvements , we learn the Joint Bayesian model for face verification based on each of the four - dimensional feature vectors ( neural activations );;FC - n for in the DeepID2;High - performance of DeepID2 + nets;4
93950;3dd2f70f48588e9bb89f1e5eec7f0d8750dd920a;Method;0;SPPnets;The RoI layer is simply the special - case of the spatial pyramid pooling layer used in SPPnets [ reference ] in which there is only one pyramid level .;Pooling is applied independently to each feature map channel , as in standard max pooling .;We use the pooling sub - window calculation given in [ reference ] .;The RoI pooling layer;7
11598;06b4d8409837dce9d6eb919efd1debdaecc40d01;Method;0;deep learning ( DL );Much attention has been given to a resurgence of neural networks , deep learning ( DL ) in particular , which can be of unsupervised , supervised , or a hybrid form .;;Significant performance gain has been observed , especially in the presence of large amount of training data , when deep learning techniques are used for image classification and speech recognition .;Introduction;1
103461;434bf475addfb580707208618f99c8be0c55cf95;Method;0;local filter layers;Convolutional , pooling , local filter layers and one fully connected layer are used to achieve an accuracy of 99.2 % on the CKP set .;The created network consists of five layers with a total of 65k neurons .;To avoid overfitting the dropout method was used .;Related Work;2
86306;36b1ba4287c4884df27dd684c4c7f66f32e943db;Method;1;sparse tensor;Interestingly , we find that the differences in moving from ConvE to HypER in fact bring the factorization and convolutional approaches together , since the 1D convolution process is equivalent to multiplication by a highly sparse tensor with tied weights ( see Figure [ reference ] ) .;This gives HypER more expressive power , while also reducing parameters .;The multiplication of this “ convolutional tensor ” ( defined by the relation embedding and hypernetwork ) and other weights gives an implicit relation matrix , corresponding to those in e.g. RESCAL , DistMult and ComplEx .;Related Work;2
55264;228db5326a10cd67605ce103a7948207a65feeb1;Method;1;IMG + BOW;On the larger COCO - QA data set , the proposed two - layer SANs significantly outperform the best baselines from ( IMG - CNN ) and ( IMG + BOW and 2 - VIS + BLSTM ) by 5.1 % and 6.6 % in accuracy ( Table .;[ reference ] ) , i.e. , our SAN ( 2 , LSTM ) outperforms the IMG - CNN , the 2 - VIS + BLSTM , the Ask - Your - Neurons approach and the Multi - World by , , and absolute in accuracy , respectively .;[ reference ] ) .;Results and analysis;13
25470;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;0;DeepLab;It is more efficient and accurate than modern segmentation frameworks , such as FCN and DeepLab .;ICNet still performs satisfyingly regarding common thing and stuff understanding .;Compared to our baseline model , it achieves 5.4 times speedup .;COCO - Stuff;27
34269;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;1;compositional operators;The effectiveness of multiplicative interaction is demonstrated by an ablation study , and by comparing to alternative compositional operators for implementing the gated - attention .;Did What dataset .;citecolor = darkblue;Gated - Attention Readers for Text Comprehension;0
10852;061c05faf3d68a7bdade9d4debeab369e2f9746c;Method;0;recurrent generative model;presented to train a recurrent generative model by using adversarial training to unroll gradient - based optimizations to create high quality images .;On the contrary , proposed to use a Laplacian pyramid to produce high - quality images by iteratively adding multiple layers of noises at different resolutions .;In addition to designing different GAN networks , research efforts have been made to train the GAN by different criteria .;Related Work;5
82487;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;1;Likelihood Analysis;section : Likelihood Analysis of Adversarial Autoencoders;d highlights the images generated by walking along the swiss roll axis in the latent space .;The experiments presented in the previous sections have only demonstrated qualitative results .;Likelihood Analysis of Adversarial Autoencoders;7
52370;207e0ac5301a3c79af862951b70632ed650f74f7;Method;1;nearest - neighbour;To this end , we first construct a - nearest - neighbour ( - nn ) graph across camera views with vertices , where each vertex represents a unlabelled data point .;That is , each person in one view can correspond to multiple people in another view depending on their visual similarity in the learned discriminative subspace parameterised by .;is then computed as the weight matrix of using a heat kernel .;Semi - supervised Learning;9
8356;052282998bc24db695891f755a00e3cebd3fd796;Method;1;dueling architecture and prioritised dueling;We compare Reactor against are : DQN mnih15human , Double DQN van2016deep , DQN with prioritised experience replay schaul2015prioritized , dueling architecture and prioritised dueling wang2015dueling , ACER wang2017sample , A3C mnih2016asynchronous , and Rainbow rainbow .;Tables [ reference ] & [ reference ] compare versions of our algorithm , with several other state - of - art algorithms across 57 Atari games for a fixed random seed across all games bellemare2013arcade .;Each algorithm was exposed to 200 million frames of experience , or 500 million frames when followed by , and the same pre - processing pipeline including 4 action repeats was used as in the original DQN paper mnih15human .;Comparing to prior work;19
101953;42764b57d0794b63487a295ce8c07eeb6961477e;Method;1;fc;The first fc layer ( with ReLU ) reduces the dimension to 256 , followed by the second fc layer that regresses a pixel - wise mask .;We append two extra fully - connected ( fc ) layers to this feature for each box .;This mask , of a pre - defined spatial resolution of ( we use ) , is parameterized by an - dimensional vector .;Regressing Mask - level Instances;5
99694;41951953579a0e3620f0235e5fcb80b930e6eee3;Method;1;face verification models;DeepID2 features are learned from the face images of identities randomly sampled from CelebFaces + ( referred to as CelebFaces + A ) , while the remaining face images of identities ( referred to as CelebFaces + B ) are used for the following feature selection and learning the face verification models ( Joint Bayesian ) .;and LFW are mutually exclusive .;When learning DeepID2 on CelebFaces + A , CelebFaces + B is used as a validation set to decide the learning rate , training epochs , and hyperparameter .;Experiments;4
104991;44da806ae67ae9885592492202b3dc5f50182cc8;Method;0;scale expansion algorithm;The detail of scale expansion algorithm is summarized in Algorithm [ reference ] .;Thanks to the “ progressive ” expansion procedure , these boundary conflicts will not affect the final detections and the performances .;In the pseudocode , are the intermediate results .;Progressive Scale Expansion Algorithm;5
12541;071b16f25117fb6133480c6259227d54fc2a5ea0;Method;0;mechanism of attention;Intuitively , this implements a mechanism of attention in the decoder .;The probability , or its associated energy , reflects the importance of the annotation with respect to the previous hidden state in deciding the next state and generating .;The decoder decides parts of the source sentence to pay attention to .;Decoder : General Description;5
53433;20cc4bfdb648fd7947c71252589fc867d4d16933;Method;1;KL;While KL - divergence might seem to be a reasonable choice to design a loss function for optimizing the distance between conditional probability distributions , in Section 3.1 , we show that it is infeasible to train a neural network when using KL - divergence as a regularizer .;"If we wish to confuse the class outputs of the classifier for the pair x 1 and x 2 , we should learn parameters θ that bring these conditional probability distributions "" closer "" under some distance metric , that is , make the predictions for x 1 and x 2 similar .";Therefore , we introduce the Euclidean Distance between distributions as a metric for confusion in Sections 3.2 and 3.3 and describe neural network training with this metric in Section 3.4 .;Method;4
50283;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;1;joint model;The proposed model trains word and character models independently while training a joint model on top .;Impact of the Meta - BiLSTM Model Combination;Here we investigate the part - ofspeech tagging performance of the joint model compared with the word and character models on their own ( using hyperparameters from in 4.1 ) .;Impact of the Sentence - based Character Model;18
81217;34f63959ea4a13a05948274a1558c6854a051150;Method;0;multi - layer bidirectional Transformer;For example , BERT is based on a multi - layer bidirectional Transformer , and is trained on plain text for masked word prediction and next sentence prediction tasks .;These are neural network language models trained on text data using unsupervised objectives .;To apply a pre - trained model to specific NLU tasks , we often need to fine - tune , for each task , the model with additional task - specific layers using task - specific training data .;Introduction;1
60700;258ec208f9c55371a67ebac68aa51bd7f7800a7b;Method;1;shallow networks;Meanwhile , we can see that compared to shallow networks , using significantly deeper networks does improve the deblocking performance .;Compared to AR - CNN , the 30 - layer network exceeds it by 0.37dB and 0.44dB on compression quality of 10 and 20 .;;JPEG deblocking;19
9573;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;0;T - embedding;As revealed by [ reference ] , image retrieval encoding methods like Fisher Vector encoding and T - embedding increase the separation between descriptors extracted from related and unrelated image patches .;Although common aggregation strategies , such as average pooling and max - pooling , are able to aggregate face descriptors to produce a compact template representation [ reference ][ reference ][ reference ] and currently achieves the state - of - the - art results [ reference ] , we seek a better solution in this paper .;We therefore expect a similar encoding to be beneficial for face recognition , including both verification and identification tasks .;Introduction;2
53323;20cc4bfdb648fd7947c71252589fc867d4d16933;Method;1;Class - Activation Mapping;Change in Class - Activation Mapping :;As summarized in Table 4 , we observe an average 3.4 % improvement across five different networks , implying better localization accuracy .;To qualitatively study the improvement in localization due to PC , we obtain samples from the CUB - 200 - 2011 dataset and visualize the localization regions returned from Grad - CAM for both the baseline and PC - trained VGG - 16 model .;Improvement in Localization Ability;15
2918;01959ef569f74c286956024866c1d107099199f7;Method;1;Stanford part - of - speech ( POS;We cast this comparison in terms of nouns , verbs , and adjectives by extracting all words from the caption data ( MS COCO captions for real images and captions collected by us for abstract scenes ) using the Stanford part - of - speech ( POS ) [ reference ] tagger;One method for determining whether the information captured by questions & answers is different from the information captured by captions is to measure some of the differences in the word distributions from the two datasets .;[ reference ] .;APPENDIX OVERVIEW;19
50176;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;1;encoding models;While each of the three models - character , word and meta - are trained with their own loss functions , it should be emphasized that training is synchronous in the sense that the meta - BiLSTM model is trained in tandem with the two encoding models , and not after those models have converged .;Though we use all three losses to update the models , only the meta - BiLSTM layer is used for model selection and test - time prediction .;Since accuracy from the meta - BiLSTM model on the development set determines the best parameters , training is not completely independent .;Training Schema;9
32063;10a36dea0167511b66deca65fdca978aa9afdb11;Method;1;train2014;Here train2014 and val2014 are the standard splits of the image set in the COCO dataset .;There are in total 248 , 349 pairs in train2014 and 121 , 512 pairs in val2014 , for 123 , 287 images overall in the training set .;To generate the training set and validation set for our model , we first randomly split the images of COCO val2014 into 70 % subset A and 30 % subset B. To avoid potential overfitting , questions sharing the same image will be placed into the same split .;Experiments;3
100982;424561d8585ff8ebce7d5d07de8dbf7aae5e7270;Method;0;Non - approximate joint training;( iii ) Non - approximate joint training .;This solver is included in our released Python code .;As discussed above , the bounding boxes predicted by RPN are also functions of the input .;Sharing Features for RPN and Fast R - CNN;8
22982;0c9ae806059196007938f24d0327a4237ed6adf5;Method;1;6 - label variant;We test both the 6 - label variant ( roads and cars , vegetation and trees , buildings and clutter ) and a 3 - label variant ( Potsdam - 3 ) formed by merging each of the 3 pairs .;Potsdam is divided into 8550 RGBIR px satellite images , of which 3150 are unlabelled .;All segmentation training and testing sets will be released with our code .;Datasets .;24
99878;41b38da2f4137c957537908f9cb70cbd2fac8bc1;Method;0;canonical correlation analysis;Features are fused using canonical correlation analysis and classified with SVM .;Turan and Lam extract features from the eye and mouth regions using local phase quantization and pyramid of HOG descriptors .;In our previous work , we used the variations in Euclidean distances between landmark pairs as spatial features , which gives slightly worse results than handling horizontal and vertical distances independently .;Related Work;2
102746;432d8cba544bf7b09b0455561fea098177a85db1;Method;1;variational_autoencoder;We can then optimize this lower bound with respect to and using the reparameterization trick introduced by variational_autoencoder and reparam_paper to get a Monte - Carlo estimate of the gradient .;I.e. , where Likewise the full - data log likelihood is lower bounded by the sum of the terms over the whole dataset .;;Variational Autoencoder;4
73807;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;0;R - FCN work;In the original R - FCN work , global average pooling is adopted to aggregate the features after position - sensitive RoI pooling into a single dimension .;;This operation leads to the uniform contribution of each position of the face .;Position - Sensitive Average Pooling;5
89759;3a28fe49e7a856ddd60d134696a891ed7bca5962;Method;0;Resnet;Rather than using a single downstream classifier , the fused deep neural network ( F - DNN + SS ) method uses a derivation of the Faster R - CNN framework fusing multiple parallel classifiers including Resnet and Googlenet using soft - rejection , and further incorporates pixel - wise semantic segmentation in a post - processing manner to suppress background proposals .;Similarly , proposes a uniï¬ed multi - scale convolutional neural network ( MS - CNN ) , which performs detection at multiple intermediate layers to match objects of different scales , as well as an upsampling operation to prevent insufficient resolution of feature maps for handling small instances .;Simultaneous Detection & Segmentation RCNN ( SDS - RCNN ) improves object detection by using semantic segmentation as a strong cue , infusing the segmentation masks on top of shared feature maps as a reinforcement to the pedestrian detector .;Multi - scale Object Detection;3
94104;3dd2f70f48588e9bb89f1e5eec7f0d8750dd920a;Method;1;semantic - segmentation method;"SegDeepM is trained on VOC12 trainval plus segmentation annotations ; it is designed to boost R - CNN accuracy by using a Markov random field to reason over R - CNN detections and segmentations from the O 2 P [ 1 ] semantic - segmentation method .";On VOC10 , SegDeepM [ reference ] achieves a higher mAP than Fast R - CNN ( 67.2 % vs. 66.1 % ) .;Fast R - CNN can be swapped into SegDeepM in place of R - CNN , which may lead to better results .;VOC 2010 and 2012 results;15
86910;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;five - layer bi - LSTMs;We investigated our JMT results without using the vertical connections in the five - layer bi - LSTMs .;;More concretely , when constructing the input vectors , we do not use the bi - LSTM hidden states of the previous layers .;Vertical connections;32
50467;1e7678467b1807777dcd9be557b79328ce9419a8;Method;0;image feature extractors;After the emergence of convolutional neural networks ( CNNs ) for large - scale classification on ImageNet , it has become apparent that CNNs trained on classification datasets are very competitive image feature extractors for various vision tasks , including instance retrieval .;Local image descriptors are traditionally aggregated to global image descriptors suited for matching in an inverted database , as in the seminal bag - of - words model .;;Image search : from local features to CNN .;4
20536;0b5519f76fc8e31ecf9931f00184aee86694e3a4;Method;1;spatially variant filtering;We leverage predictive filter flow for targeting three specific image reconstruction tasks which can be framed as performing spatially variant filtering over local image patches .;We address these by directly predicting flows from image data .;Non - Uniform Blind Motion Blur Removal is an extremely challenging yet practically significant task of removing blur caused by object motion or camera shake on a blurry photo .;Related Work;2
10644;06150e6e69a379c27e1d0100fcd7660f073cbacf;Method;1;LCDF detector;The LCDF detector ( which uses orthogonal splits ) improves accuracy over ACF with oblique splits by an additional 1 % MR .;Results of the LDCF detector on the INRIA dataset are given in the last row of Table 1 .;Training time is significantly faster , and indeed , is only ∼1 minute longer than for the original ACF detector .;Title;0
32105;10a36dea0167511b66deca65fdca978aa9afdb11;Method;1;word model;From the performance of BOW in Table [ reference ] , we can see that a good word model is crucial to the accuracy , as BOW model alone could achieve closely to 48 % , even without looking at the image content .;The learning rate for the word embedding layer should be much higher than the learning rate of softmax layer to learn a good word embedding .;Model parameters to tune .;Training Details;5
56009;22aab110058ebbd198edb1f1e7b4f69fb13c0613;Method;1;gradient norm clipping;We tried gradient norm clipping ( both the global variant typically used in recurrent networks , and a local version where the clipping value is determined on a per - parameter basis ) but found this did not alleviate instability .;We also experimented with Spectrally Normalizing these MLPs , and with providing these ( and the linear projections ) with a bias at their output , but did not notice any benefit .;;Negative Results;29
3981;0217fb2a54a4f324ddf82babc6ec6692a3f6194f;Method;0;disentangled representations;In addition , prior research attempted to learn disentangled representations using supervised data .;[ reference ] have been able to learn representations using probabilistic inference over Bayesian programs , which achieved convincing one - shot learning results on the OMNI dataset .;"One class of such methods trains a subset of the representation to match the supplied label using supervised learning : bilinear models [ reference ] separate style and content ; multi - view perceptron [ reference ] separate face identity and view point ; and Yang et al .";Related Work;3
48482;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;1;spatial anchor scheme;It generalizes any 3D shapes with N reference points , and it works well with our spatial anchor scheme : we can predict the spatial offsets instead of the absolute locations of the corners .;-2;;3D box parameterization;10
52490;207e0ac5301a3c79af862951b70632ed650f74f7;Method;1;subspace learning methods;Comparing Table [ reference ] with Table [ reference ] , it is apparent that the performance of all three compared subspace learning methods , kCCA , kLFDA , and XQDA degrades drastically .;"This dataset has only 100 pairs or 200 training samples ; with only one third of them labelled , the SSS problem becomes the most acute than any experiment we conducted before .";In contrast , the performance of our method decrease much more gracefully from 29.80 % to 24.70 % on Rank 1 .;Semi - supervised Learning Results;13
74428;2f0c30d6970da9ee9cf957350d9fa1025a1becb4;Method;0;inverse STN method;The inverse STN method replaces the expensive feature warping by efficient transformation parameter propagation .;STN has shown successes in small scale image classification problems .;The offset learning in deformable convolution can be considered as an extremely light - weight spatial transformer in STN .;In Context of Related Works;7
73836;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;0;Face R - CNN;Our training hyper - parameters are similar to Face R - CNN .;;Different from Face R - CNN , we initialize our network with the pre - trained weights of 101 - layer ResNet trained on ImageNet .;Implementation Details;8
37594;15212fa4d30863ea1f9bd9591eee03848278242d;Method;1;Cosine;In any case , the method seems to reach a plateau for higher values of , allowing the Cosine variant to reach new peaks of classification accuracy of 0.839 ( when ) in MDS , and 0.840 ( when ) in Webis - CLS - 10 .;"In this case , and in contrast with JaDCI , classification accuracy increases noticeably when more pivots are taken into account ; this might be a side effect of the modifications discussed in Section [ reference ] .";Regarding the efficiency of the method , PyDCI exhibits a quasi - linear trend in time complexity , e.g. , when the number of pivots is doubled , the execution time is roughly doubled too .;Effectiveness vs. Efficiency Trade - off;8
33804;12db83e66e50152e170d5009c425c925ad2e2c2a;Method;1;attentive neural network;In contrast , we are proposing an attentive neural network that is capable of reasoning over entailments of pairs of words and phrases by processing the hypothesis conditioned on the premise .;Bowman et al . 's LSTM encodes the premise and hypothesis as dense fixed - length vectors whose concatenation is subsequently used in a multi - layer perceptron ( MLP ) for classification .;Our contributions are threefold : ( i );INTRODUCTION;2
84222;360cfa09b2f7c8e10b1831d899c5a51aefa1883e;Method;0;piecewise - linear activations;Standard tanh activations are less used in feedforward networks because they do not work as well as piecewise - linear activations when training deeper networks .;[ reference ] ) , as follows :;The adoption of ReLU - based neurons , that have shown to be effective in improving such limitations , was not so common in the past for RNNs .;ReLU activations;5
64794;28eceb438da0b841bbd3d02684dbfa263838ed60;Method;0;composite loss functions;"Dosovitskiy and Brox [ reference ] introduced a family of composite loss functions for image synthesis , which combine regression over the activations of a fixed "" perceiver "" network with a GAN loss .";Our model , loss , and problem setting are different , enabling synthesis of sharper higherresolution images of scenes without 3D models .;Networks trained using these composite loss functions were applied to synthesize preimages that induce desired excitation patterns in image classification models [ reference ] and images that excite specific elements in such models [ reference ] .;Related Work;6
54302;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;1;flow update;Instead of the standard minimization of an objective function at each pyramid level , we train one deep network per level to compute the flow update .;This estimates large motions in a coarse - to - fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow .;"Unlike the recent FlowNet approach , the networks do not need to deal with large motions ; these are dealt with by the pyramid .";Optical Flow Estimation using a Spatial Pyramid Network;0
80933;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;0;person representation;At last , the features of global full body and local body parts are concatenated to be a 256 - dimension feature as the final person representation .;The Dropout is adopted after each FC layer to prevent overfitting .;;Feature Extraction and Fusion;6
9168;052443e1709c0f7d3432cca7c451534eea76b7ca;Method;1;H;The Improved A + ( IA ) method is 0.9dB better than the baseline A + method by using 5 techniques ( A , H , R , C , E ) .;The seven ways to improve A + are summarized in Fig . 8 .;;Improved A + ( IA );15
36542;14ad9d060c1e8f0449e697ee189ac346353fbfbc;Method;0;Character level word embedding;"CE : Character embedding ; CLWE : Character level word embedding ; CNN : convolution neural network ; CRF : Conditional random field ; LSTM : long short - term memory ; MTL : Multi - task learning ; MTM : Multi - task model ; NER : Named entity recognition ; NLP : Natural language processing ; PMC : PubMed Central ; STM :";"Biomedical named entity recognition ;";"Single - task model ; RNN :";Abbreviations;21
94957;3e79a574d776c46bbe6d34f41b1e83b5d0f698f2;Method;1;external techniques;The result suggests that external techniques such as attention can play orthogonal roles compared with internal recurrent structures , therefore benefiting both BiLSTMs and S - LSTMs .;Attention leads to improved accuracies for both BiLSTM and S - LSTM in classification , with S - LSTM still outperforming BiLSTM significantly .;Similar observations are found using external CRF layers for sequence labelling .;Development Experiments;10
89895;3a28fe49e7a856ddd60d134696a891ed7bca5962;Method;0;Conv - LSTM layer;Specifically , for a video sequence , convolutional layers for representation are shared by each frame to extract spatial features , then multi - layer features of each frame are taken as input to the Conv - LSTM layer .;Conv - LSTM is incorporated as a means of propagating frame - level information across time .;At each time step , it refines output features on the basis of the state and input , extracts additional temporal cues from the input , and updates the state .;Multi - frame Temporal Feature Aggregation;11
79438;33998aff64ce51df8dee45989cdca4b6b1329ec4;Method;1;exponential linear unit;The first layer consists of K = 8 attention heads computing F = 8 features each ( for a total of 64 features ) , followed by an exponential linear unit ( ELU ) [ reference ] nonlinearity .;Its architectural hyperparameters have been optimized on the Cora dataset and are then reused for Citeseer .;The second layer is used for classification : a single attention head that computes C features ( where C is the number of classes ) , followed by a softmax activation .;EXPERIMENTAL SETUP;9
59282;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;1 - hidden - layer mlp generative model;the overall architecture for a Mult - vae pr / Mult - dae with 1 - hidden - layer mlp generative model would be [ I → 600 → 200;As a concrete example , recall I is the total number of items ,;→ 600 → I ] .;Experimental setup;16
7920;051b3763c2ad4e4271db712b0e9a4cfe298d05db;Method;0;noise augmentation;No noise augmentation was performed but we introduced image mirroring to improve the diversity of the training set .;We also fine - tuned LiteFlowNet on a mixture of Sintel clean and final training data ( LiteFlowNet - ft ) using the generalized Charbonnier loss .;LiteFlowNet - ft outperforms FlowNet2 - ft - sintel and EpicFlow for Sintel final testing set .;Results;7
96548;3f45d73a7b8d10a59a68688c11950e003f4852fc;Method;1;BIF );The third feature called gBiCov is a combination of Biologically Inspired Features ( BIF ) and Covariance descriptors .;The other feature is proposed in , which applied the HSV , and Lab color feature , as well as a texture feature extracted by LBP .;We applied both the direct Cosine similarity measure and the XQDA algorithm to compare the four different kinds of features , resulting in the CMC curves shown in Fig .;Comparison of Features;13
26009;0dcde9f2c5149f0e4c806db7b4cc4915bed077da;Method;0;least squares regression training;In closing , following acceptance of this paper , we became aware of a newly published paper that combines convolutional feature extraction with least squares regression training of classifier weights to obtain good results for the NORB dataset .;These methods can be easily adapted for use with the convolutional front - end , if , for example , additional batches of training data become available , or if the problem involves online learning .;The three main differences between the method of the current paper and the method of are as follows .;Discussion and Conclusions;18
72054;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;vanilla RNN cells;All models use vanilla RNN cells with hidden state size 20 .;We compare the DilatedRNN models with different numbers of layers on the noisy MNIST task .;The number of dilations starts at one .;Discussion;19
98295;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;SVM parameter;0.45 0.45 Cross validation scores for cat keypoint classification as a function of the SVM parameter .;Each plot is a 2D histogram of the locations of the maximum responses of a classifer in a 21 by 21 pixel rectangle taken around a ground truth keypoint .;"In ( a ) , we plot mean accuracy against for five different convnet features ; in ( b ) we plot the same for SIFT features of different sizes .";Keypoint classification;9
72023;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;0;layer normalizations;To the best of our knowledge , the dilated GRU with 1.27 BPC achieves the best result among models of similar sizes without layer normalizations .;Although Zoneout , LayerNorm HM - LSTM and HyperNetowrks outperform the DilatedRNN models , they apply batch or layer normalizations as regularization .;Also , the dilated models outperform their regular counterparts , Vanilla ( did n’t converge , omitted ) , LSTM and GRU , without increasing the model complexity .;Language modeling;17
79917;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;deep graphlet kernel;For graphlet kernel ( GK ) , we chose graphlets size and for deep graph kernels ( DGK ) , we report the best classification accuracy obtained among : deep graphlet kernel , deep shortest path kernel and deep Weisfeiler - Lehman kernel .;For Weisfeiler - Lehman ( WL ) kernel , we chose height of subtree kernel from .;For Multiscale Laplacian Graph ( MLG ) kernel , we chose and parameter of the algorithm from , radius size from , and level number from .;Experiment and Results;12
5104;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;parallel convolutional pathways;We employ parallel convolutional pathways for multi - scale processing , a solution to efficiently incorporate both local and contextual information which greatly improves segmentation results .;We exploit the utilization of small kernels , a design approach previously found beneficial in 2D networks ( ) that impacts 3D CNNs even more , and present adopted solutions that enable training deeper networks .;We demonstrate the generalization capabilities of our system , which without significant modifications outperforms the state - of - the - art on a variety of challenging segmentation tasks , with top ranking results in two MICCAI challenges , ISLES and BRATS .;Contributions;3
16087;09879f7956dddc2a9328f5c1472feeb8402bcbcf;Method;0;DCGAN DBLP;Inspired by the texture generation work by DBLP : conf / nips / GatysEB15 , theis2015generative and extrapolation test with DCGAN DBLP : journals / corr;;/ RadfordMC15 , we also evaluate the statistics captured by our model by generating images twice or ten times as large as present in the dataset .;Extrapolation;19
17503;0a1dc95e4c884a91bd141df8133d1b4961178123;Method;0;linear convolution;The ReLU can be equivalently considered as a part of the second operation ( Non - linear mapping ) , and the first operation ( Patch extraction and representation ) becomes purely linear convolution .;4 .;Here W 3 corresponds to c filters of a size n 2 × f 3 × f 3 , and B 3 is a c - dimensional vector .;Reconstruction;11
24389;0d24a0695c9fc669e643bad51d4e14f056329dec;Method;0;RNNs );Recent work has shown that recurrent neural networks ( RNNs ) can deliver excellent performance in many such tasks when trained to predict the next output token given the input and previous tokens .;In many important applications of machine learning , the task is to develop a system that produces a sequence of discrete tokens given an input .;This approach has been applied successfully in machine translation sutskever2014sequence , bahdanau2015neural , caption generation kiros2014unifying , donahue2015long , vinyals2015show , xu2015show , karpathy2015deep , and speech recognition chorowski2015attention , chan2015listen .;Introduction;1
98478;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;0;covariance matrix alignment;Conventional methods include asymmetric metric learning , subspace interpolation , geodesic flow kernel , subspace alignment , covariance matrix alignment , .;In computer vision , domain adaptation has been widely studied as an image classification problem in computer vision .;Recent works aim to improve the domain adaptability of deep neural networks , including .;Related Works;2
29562;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;0;PartFCN;The authors used separate PoseFCN and PartFCN to obtain both part masks and locations and fused them with fully - connected CRFs .;Joint part segmentation and keypoint detection given human detections approach were proposed by Xia et al .;This provides more consistent predictions by eliminating irrelevant detections .;Top - down;6
11797;06b4d8409837dce9d6eb919efd1debdaecc40d01;Method;1;CNN with max - margin objective;( 2 ) the proposed DSN with softmax loss ( DSN - Softmax ) , ( 3 ) CNN with max - margin objective ( CNN - SVM ) , and ( 4 ) the proposed DSN with max - margin objective ( DSN - SVM ) .;Figure [ reference ] ( a ) and ( b ) show results from four methods , namely : ( 1 ) conventional CNN with softmax loss ( CNN - Softmax ) ,;DSN - Softmax and DSN - SVM outperform both their competing CNN algorithms;MNIST;11
91491;3b1b94441010615195a5c404409ce2416860508c;Method;1;encoder LSTM;We employ an encoder LSTM to take the semantic information from image and the question , while using a decoder LSTM to generate the answer .;The log - likelihood of the generated answer can be written as : where is the probability of generating given image information , question and previous words .;Weights are shared between the encoder and decoder LSTM .;An Answer Generation Model with Multiple Inputs;12
74753;2f92b10acf7c405e55c74c1043dabd9ded1b1800;Method;0;neural approaches;The retrieval and preparation of contextually relevant information from knowledge sources is a complex research topic by itself , and there are several statistical Manning:2008 and more recently neural approaches mitra2017neural as well as approaches based on reinforcement learning nogueira2017 .;An important question that remains to be answered is : given some text that is to be understood , what supplementary knowledge should be incorporated ?;Rather than learning both how to incorporate relevant information and which information is relevant , we use a heuristic retrieval mechanism ( § [ reference ] ) and focus on the integration model .;External Knowledge as Supplementary Text Inputs;2
50973;1e7a36c4d4f96b29e3edf51b6eb61f8e16217704;Method;0;sequence density estimators;Recurrent neural networks ( RNNs ) are powerful sequence density estimators that can use long contexts to make predictions .;;They have achieved tremendous success in ( conditional ) sequence modelling tasks such as language modelling , machine translation and speech recognition .;Introduction;1
83627;35ff11e0a5e465c810a30b022b26a9d577a434ce;Method;1;( stack - only ) RNNG;The last row of each table reports the performance of a novel variant of the ( stack - only ) RNNG with attention , to be presented in § [ reference ] .;"We trained each ablation from scratch , and compared these models on three tasks : English phrase - structure parsing ( labeled ) , Table [ reference ] ; dependency parsing , Table [ reference ] , by converting parse output to Stanford dependencies using the tool by kong_14 ; and language modeling , Table [ reference ] .";Discussion .;Ablated RNNGs;4
79896;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;GK;, Graphlet Kernel ( GK ) , Weisfeiler - Lehman Sub - tree Kernel ( WL ) , Deep Graph Kernels ( DGK ) and Multiscale Laplacian Graph Kernels ( MLK ) .;We adopted state - of - art graphs kernels for comparison namely : Random Walk ( RW ) , Shortest Path Kernel ( SP );Baselines Settings :;Experiment and Results;12
31596;10232946dd5fe3cdcbce72fe40e75d852561518b;Method;1;over - regularization;For a larger value of ( designated by ( 3 ) in the figure ) , we can clearly observe the effect of over - regularization .;The size of that achieved the best validation performance is designated by [ reference ] . Especially for CIFAR - 10 , the virtual adversarial examples with of size ( 2 ) are on the verge of total corruption .;"In fact , the virtual adversarial examples generated by the models trained with this range of are very far from the clean image , and we observe that the algorithm implemented with this large an did the unnecessary work of smoothing the output distribution over the set of images that are "" unnatural .";Virtual Adversarial Examples Produced by the Model Trained with Different Choices of;16
85760;36973330ae638571484e1f68aaf455e3e6f18ae9;Method;1;multi - scale approach;The other one is the multi - scale approach which utilizes multi - scale image pyramids for each image , denoted as “ Fast R - CNN multi - scale ” .;For fair comparison with our SAF R - CNN , the input image of “ Fast R - CNN single - scale ” is resized to 800 pixels on the shortest side .;The same five scales of 480 , 576 , 688 , 864 and 1200 are adopted to construct the input image pyramid as specified in .;Comparisons with R - CNN , Fast R - CNN , and Faster R - CNN;24
3474;0209389b8369aaa2a08830ac3b2036d4901ba1f1;Method;1;annotation tool;We then ask annotators to bring the synthesized images into correspondence with the surface using our annotation tool , and for every image estimate the geodesic distance between the correct surface point , and the point estimated by human annotators : where measures the geodesic distance between two surface points .;In particular , we provide annotators with synthetic images generated through the exact same surface model as the one we use in our ground - truth annotation , exploiting the rendering system and textures of .;For any image , we annotate and estimate the error only on a randomly sampled set of surface points and interpolate the errors on the remainder of the surface .;Accuracy of human annotators;4
49662;1db9bd18681b96473f3c82b21edc9240b44dc329;Method;1;categorical sampling;Across all of the presented experiments , we use categorical sampling during decoding with a tempered PixelRecursiveSuperResolution .;;We adjust the concentration of the distribution we sample from with a temperature by which we divide the logits for the channel intensities .;Inference;10
25664;0dab72129b4458d9e3dbf1f109848c2d6d7af8a8;Method;1;sequence embedding models;In addition , we experiment with two simple sequence embedding models : a plain RNN and an LSTM RNN .;Our baseline sentence embedding model simply sums the embeddings of the words in each sentence .;The word embeddings for all of the models are initialized with the 300d reference GloVe vectors ( 840B token version , ? ) and fine - tuned as part of training .;Sentence embeddings and NLI;11
79862;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;GCAPS - CNN code;Our GCAPS - CNN code and data will be made available at Github .;Average classification accuracy based on fold cross validation error is reported for each dataset .;Datasets :;Experiment and Results;12
89576;39dba6f22d72853561a4ed684be265e179a39e4f;Method;1;minibatch;Most sentences are short ( e.g. , length 20 - 30 ) but some sentences are long ( e.g. , length 100 ) , so a minibatch of 128 randomly chosen training sentences will have many short sentences and few long sentences , and as a result , much of the computation in the minibatch is wasted .;Different sentences have different lengths .;To address this problem , we made sure that all sentences in a minibatch are roughly of the same length , yielding a 2x speedup .;Training details;7
82175;35502af359aa60ae8047df172e29503cfb29c3f9;Method;1;vMF;Since the Epanechnikov profile is not differentiable at the boundary , we use the squared exponential kernel adapted to vectors on the sphere : which can be viewed as a natural extension of the Gaussian to spherical data ( known as the von Mises Fisher ( vMF ) distribution ) .;The gradient of can be elegantly computed as the difference between and the mean of all data points with , hence the name “ mean - shift ” for performing gradient ascent .;In our experiments we set the bandwidth based on the margin so that .;Details of Recurrent Mean Shift Grouping;24
85955;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;discriminative methods;Conventional discriminative methods are mostly based on random forests .;By contrast , the discriminative models do not require body templates and they directly estimate the positions of body joints .;Shotton et al .;Related works;4
98324;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;nearest neighbor matching;We augment our SVM detectors with a spherical Gaussian prior over candidate locations constructed by nearest neighbor matching .;For SIFT , we consider features within twice the bin size from the ground truth keypoint to be positives , while samples that are at least four times the bin size away are negatives .;The mean of each Gaussian is taken to be the location of the keypoint in the nearest neighbor in the training set found using cosine similarity on pool 5 features , and we use a fixed standard deviation of 22 pixels .;Keypoint prediction;10
25926;0dcde9f2c5149f0e4c806db7b4cc4915bed077da;Method;1;Classifier projection weights;subsection : Stage 2 Design : Classifier projection weights;The hyper - parameters we used for each dataset are shown in Table [ reference ] .;To construct the matrix we use the method proposed by .;Stage 2 Design : Classifier projection weights;11
90412;3a61d5fbc8d99310965fd91b12527d1cd69d7116;Method;0;Hourglass - 104;Second , we train CornerNet with Hourglass - 54 instead of Hourglass - 104 .;First , we train CornerNet - Saccade with Hourglass - 104 instead of Hourglass - 54 .;For the second experiment , due to limited resources , we train both networks with a batch size of 15 on four 1080Ti GPUs and we follow the training details in CornerNet .;CornerNet - Saccade Results;17
26722;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;1;Const agent;"The Const agent selects a single fixed action throughout an episode ; our results reflect the highest score achieved by any single action within each game .";The Random agent picks a random action on every frame .;"The Perturb agent selects a fixed action with probability 0.95 and otherwise acts uniformly randomly ; for each game , we report the performance of the best policy of this type .";Evaluation Methodology;14
35132;13ea9a2ed134a9e238d33024fba34d3dd6a010e0;Method;1;relaxation training;But after Step 3 , , relaxation training , shifts away from the eigen state .;After Step 1 and Step 2 , the weight vectors are orthogonal , , in an eigen state .;So the training procedure enters another iteration of “ restraint and relaxation ” .;Training SVDNet;5
56512;231af7dc01a166cac3b5b01ca05778238f796e41;Method;1;orig;[ reference ] shows the FID during learning with the original learning method ( orig ) and with TTUR .;Fig .;The original training method is faster at the beginning , but TTUR eventually achieves better performance .;DCGAN on Image Data .;11
55917;22aab110058ebbd198edb1f1e7b4f69fb13c0613;Method;1;cross - replica BatchNorm;We employ cross - replica BatchNorm ioffe2015batchnorm in G , where batch statistics are aggregated across all devices , rather than a single device as in standard implementations .;We use an exponential moving average of the weights of G at sampling time , with a decay rate set to 0.9999 .;Spectral Normalization miyato2018spectral is used in both G and D , following SA - GAN zhang2018sagan .;Experimental Details;18
17578;0a1dc95e4c884a91bd141df8133d1b4961178123;Method;1;cuda - convnet package;We implement our model using the cuda - convnet package [ reference ] .;Although we use a fixed image size in training , the convolutional neural network can be applied on images of arbitrary sizes during testing .;We have also tried the Caffe package [ reference ] and observed similar performance .;Training;13
14698;0891ed6ed64fb461bc03557b28c686f87d880c9a;Method;1;transition - based algorithm;The transition - based algorithm likewise surpasses the best previously published results in several languages , although it performs less well than the LSTM - CRF model .;Experiments in English , Dutch , German , and Spanish show that we are able to obtain state - of - the - art NER performance with the LSTM - CRF model in Dutch , German , and Spanish , and very near the state - of - the - art in English without any hand - engineered features or gazetteers ( § [ reference ] ) .;;Introduction;1
55888;22aab110058ebbd198edb1f1e7b4f69fb13c0613;Method;1;class embedding;It uses a simpler variant of skip - conditioning : instead of first splitting into chunks , we concatenate the entire with the class embedding , and pass the resulting vector to each residual block through skip connections .;The BigGAN - deep model ( Figure [ reference ] ) differs from BigGAN in several aspects .;"BigGAN - deep is based on residual blocks with bottlenecks he2016resnets , which incorporate two additional convolutions : the first reduces the number of channels by a factor of before the more expensive convolutions ; the second produces the required number of output channels .";Architectural details;17
20998;0b5aef2894d3248fb5ecc955d50501f0aa276036;Method;1;CHFusion;subsubsection : Context - Aware Hierarchical Fusion ( CHFusion );Wherein our method learns the weights automatically using a neural network ( Equation 1 , 2 and 3 ) .;The results of this experiment are shown in table : chfusion .;Context - Aware Hierarchical Fusion ( CHFusion );36
78793;32a93598e8a338496f04a0ace81b0768c2ef059d;Method;1;identically - sized model;In fact , greedy decoding with this fine - tuned model has similar performance ( ) as beam search with the original model ( ) , allowing for faster decoding even with an identically - sized model .;( Baseline Seq - Inter ) .;We hypothesize that sequence - level knowledge distillation is effective because it allows the student network to only model relevant parts of the teacher distribution ( i.e. around the teacher ’s mode ) instead of ‘ wasting ’ parameters on trying to model the entire space of translations .;Results and Discussion;13
9867;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;1;average - pooling of SENet - 50 features;The currently best performing method [ reference ] is the same as our SE baseline ( i.e. average - pooling of SENet - 50 features trained for single - image classification ) but trained on a much larger training set , MS - Celeb - 1 M dataset [ reference ] , and then fine - tuned on VGGFace2 .;In this section , our best networks , SE - GV - 3 and SE - GV - 4 - g1 , are compared against the state - of - the - art on the IJB - A and IJB - B datasets .;From Tables 3 and 4 , and Figure 3 , it is clear our GhostVLAD network ( SE - GV - 4 - g1 ) convincingly outperforms previous methods and sets the new state - ofthe - art for both identification and verification on both IJB - A and IJB - B datasets .;Comparison with state - of - the - art;13
67748;2b0d7e51efd004fe3847f54863540c79312f3546;Method;1;per - pixel regression;We use mIoU for semantic segmentation , error of per - pixel regression ( normalized to image size ) for instance segmentation , and disparity error for depth estimation .;We plot the performance of all baselines for the tasks of semantic segmentation , instance segmentation , and depth estimation .;To convert errors to performance measures , we use 1 − instance error and 1 / disparity error .;Conclusion;13
98231;40b0fced8bc45f548ca7f79922e62478d2043220;Method;0;convnet representation;Szegedy et al . show counterintuitive properties of the convnet representation , and suggest that individual feature channels may not be more semantically meaningful than other bases in feature space .;Zeiler and Fergus provide several heuristic visualizations suggesting coarse localization ability .;A concurrent work compares convnet features with SIFT in a standard descriptor matching task .;Deep learning;5
101056;424561d8585ff8ebce7d5d07de8dbf7aae5e7270;Method;1;EB setting;For EdgeBoxes ( EB ) , we generate the proposals by the default EB setting tuned for 0.7 IoU. SS;For Selective Search ( SS ) , we generate about 2000 proposals by the “ fast ” mode .;has an mAP of 58.7 % and EB has an mAP of 58.6 % under the Fast R - CNN framework .;Experiments on PASCAL VOC;11
72956;2e4c06dd00c4c09ad5ac6be883cc66c19d88ea79;Method;0;Structural Deep Network Embedding;Our work is inspired by recent successful applications of autoencoder architectures for collaborative filtering that outperform popular matrix factorization methods , and is related to Structural Deep Network Embedding ( SDNE ) for link prediction .;DeepWalk , LINE , and node2vec do not support external node / edge features .;Similar to SDNE , our models rely on the autoencoder to learn non - linear node embeddings from local graph neighborhoods .;Related Work;5
18567;0a3a003457f5d7758a42a0e4b7278b39a86ed0bd;Method;1;HT meta - batch learning curriculum;In this paper , we show that our novel MTL trained with HT meta - batch learning curriculum achieves the top performance for tackling few - shot learning problems .;;The key operations of MTL on pre - trained DNN neurons proved highly efficient for adapting learning experience to the unseen task .;Conclusions;13
54068;218b80da3eb15ae35267d280dcc4a806d515334a;Method;1;seq2seq model;Instead of progressively correcting a sentence with the same seq2seq model as introduced in Section [ reference ] , round - way correction corrects a sentence through a right - to - left seq2seq model and a left - to - right seq2seq model successively , as shown in Figure [ reference ] .;Based on the idea of multi - round correction , we further propose an advanced fluency boost inference approach : round - way error correction .;The motivation of round - way error correction is straightforward .;Round - way error correction;10
79616;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;graph convolution operation;Similar to the original capsule idea proposed in , this is achieved by replacing the scalar output of a graph convolution operation with a small vector output containing higher order statistical information per feature .;To address this limitation , we propose to improve upon the basic graph convolution operation by introducing the notion of graph capsules which encapsulate more information about nodes in a local neighborhood , where the local neighborhood is defined in the same way as in the standard GCCN model .;Another source of inspiration for our proposed GCAPS - CNN model comes from one of the most successful graph kernels – the Weisfeiler - Lehman;Introduction;1
92895;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;0;face model;Face alignment , which fits a face model to an image and extracts the semantic meanings of facial pixels , has been an important topic in CV community .;;However , most algorithms are designed for faces in small to medium poses ( below ) , lacking the ability to align faces in large poses up to .;Face Alignment Across Large Poses : A 3D Solution;0
27253;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;max - margin loss function;We have tried different pair - wise loss functions and empirically found that the model learned based on the hinge loss ( max - margin loss function ) performs better than the others .;During the inference , we treat the trained model as a point - wise scoring function to score query - document pairs .;Hinge loss is a linear loss that penalizes examples that violate the margin constraint .;Ranking Architectures;5
102510;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;1;convolutional filter;In particular , we also learn the models for noise - free degradation , namely SRMDNF , by removing the connection of the noise level map in the first convolutional filter and fine - tuning with new training data .;We separately learn models for each scale factor .;It is worth pointing out that neither residual learning nor bicubicly interpolated LR image is used for the network design due to the following reasons .;Proposed Network;10
73928;2ebfc12285f5d426e0d0e8d2befa1af27f99a56e;Method;0;” representation;” representation has two major advantages over previous approaches .;This “ image context flux;First , it explicitly encodes the relative position of skeletal pixels to semantically meaningful entities , such as the image points in their spatial context , and hence also the implied object boundaries .;DeepFlux for Skeletons in the Wild;0
21078;0b8759d61e93b809df16d9fe9010d2a2d7241c74;Method;1;Deep Network Learning;document : Fast and Accurate Deep Network Learning by Exponential Linear Units ( ELUs );This indicates that ReLU networks continuously try to correct the bias shift introduced by previous weight updates while this effect is much less prominent in ELU networks .;We introduce the “ exponential linear unit ” ( ELU ) which speeds up learning in deep neural networks and leads to higher classification accuracies .;Fast and Accurate Deep Network Learning by Exponential Linear Units ( ELUs );0
42560;1822ca8db58b0382b0c64f310840f0f875ea02c0;Method;0;Vanilla version;Vanilla version .;[ reference ] .;In the vanilla version , each sample in the new training set belongs to a single identity .;Training with CamStyle;8
60436;258ec208f9c55371a67ebac68aa51bd7f7800a7b;Method;1;filter kernels;Different from segmentation , the filter kernels in our network only eliminate the corruptions , which is usually not sensitive to the orientation of image contents in low level restoration tasks .;To achieve even better results , we propose to process a corrupted image on multiple orientations .;Therefore , we can rotate and mirror flip the kernels and perform forward multiple times , and then average the output to achieve an ensemble of multiple tests .;Testing;8
9675;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;1;aggregation block;The key component of the aggregation block is our GhostVLAD trainable aggregation layer , which given N D F - dimensional face descriptors computes a single;;It is based on the NetVLAD [ reference ] layer which implements an encoding similar to VLAD encoding [ reference ] , while being differentiable and thus fully - trainable .;GhostVLAD : NetVLAD with ghost clusters;6
86893;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;JMT ”;Table [ reference ] shows the results of “ JMT ” with and without the shortcut connections .;Our JMT model feeds the word representations into all of the bi - LSTM layers , which is called the shortcut connection .;The results without the shortcut connections are shown in the column of “ w / o SC ” .;Shortcut connections;28
28661;0f810eb4777fd05317951ebaa7a3f5835ee84cf4;Method;1;value - based RL algorithm;The agent is trained on the augmented reward using any value - based RL algorithm with LFA .;As in the MBIE - EB algorithm , this bonus is added to the reward .;At each timestep our algorithm performs updates for at most estimators , one for each feature .;( - Exploration Bonus ) .;13
91871;3b1d8eb163ffff598c2faa0d9d7cf933857a359f;Method;1;Hypernymy;Hypernymy : It takes the value if one word is a ( direct or indirect ) hypernym of the other word in WordNet , where is the number of edges between the two words in hierarchies , and otherwise .;For example , [ wet , dry ] = .;Note that we ignore pairs in the hierarchy which have more than 8 edges in between .;Lexical Semantic Relations;11
59824;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;Plain - COV;With one single iteration , our method outperforms Plain - COV by .;Plain - COV indicates simple covariance pooling without any normalization .;As iteration number grows , the error rate of iSQRT - COV gradually declines .;Evaluation with AlexNet on ImageNet;14
53129;20cc4bfdb648fd7947c71252589fc867d4d16933;Method;0;pairwise ranking scheme;Parikh and Grauman [ reference ] developed a pairwise ranking scheme for relative attribute learning .;[ reference ] introduced a Siamese neural network for handwriting recognition .;Subsequently , pairwise neural network models have become common for attribute modeling [ reference ][ reference ][ reference ][ reference ] .;Related Work;3
24547;0d24a0695c9fc669e643bad51d4e14f056329dec;Method;0;curriculum learning scheme;The approach also relies on a curriculum learning scheme .;However , REINFORCE is known to have very high variance and does not exploit the availability of the ground - truth like the critic network does .;Standard value - based RL algorithms like SARSA and OLPOMDP have also been applied to structured prediction maes2009structured .;Related Work;12
107174;45fdc73a239e9c6ea65e98c96f6a2d6dc35d6f72;Method;0;quaternion convolution;This section defines the internal quaternion representation ( Section [ reference ] ) , the quaternion convolution ( Section [ reference ] ) , a proper parameter initialization ( Section [ reference ] ) , and the connectionist temporal classification ( Section [ reference ] ) .;;;Quaternion convolutional neural networks;3
25451;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;Quantitative Analysis;subsubsection : Quantitative Analysis;It manifests that our different - resolution information is properly made use of in this framework .;To further understand accuracy gain in each branch , we quantitatively analyze the predicted label maps based on connected components .;Quantitative Analysis;25
59126;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;approximate Bayesian inference;In Section 2.2 , we introduced maximum marginal likelihood estimation of vaes using approximate Bayesian inference under a non - linear generative model ( Eq . 1 ) .;;We now describe our work from the perspective of learning autoencoders .;A taxonomy of autoencoders;9
84830;364da079f91a6cb385997be990af06e9ddf6e888;Method;0;RCV1;RCV1 is a corpus of Reuters news articles as described in LYRL04 .;;RCV1 has 103 topic categories in a hierarchy , and one document may be associated with more than one topic .;RCV1 : topic categorization;18
64139;28703eef8fe505e8bd592ced3ce52a597097b031;Method;0;back - propagation through time;Running this procedure from to is known as back - propagation through time ( BPTT ) .;, where is the loss at step , deriving , for instance , from the score .;In determining the total computational cost of back - propagation here , first note that in the worst case there is one violation at each time - step , which leads to independent , incorrect sequences .;Backward : Merge Sequences;7
22611;0c769c19d894e0dbd6eb314781dc1db3c626df57;Method;1;person search framework;We first compare our proposed person search framework ( with or without using unlabeled identities ) with other baseline combinations that break down the problem into separate detection and re - identification tasks .;;The results are summarized in Table [ reference ] .;Comparison with Detection and Re - ID;11
56303;231af7dc01a166cac3b5b01ca05778238f796e41;Method;1;stochastic approximations;Thus , the gradients and are stochastic approximations to the true gradients .;If the true gradients are and , then we can define and with random variables and .;Consequently , we analyze convergence of GANs by two time - scale stochastic approximations algorithms .;Two Time - Scale Update Rule for GANs;2
34870;139768cf7714beb9309efba734460f8562c60c78;Method;0;RASP;The required POS tags were generated with RASP [ reference ] , using the CLAWS2 tagset .;[ reference ] .;;Pattern Extraction;9
31961;1023b20d226bd0af9fdf0fd1847accefbfa5ec84;Method;1;MenNN;In named entity prediction our best single model with accuracy of 68.6 % performs 2 % absolute better than the MenNN with self supervision , the averaging ensemble performs 4 % absolute better than the best previous result .;Fusing multiple models then gives a significant further increase in accuracy on both CNN and Daily Mail datasets .. CBT .;In common noun prediction our single models is 0.4 % absolute better than MenNN however the ensemble improves the performance to 69 % which is 6 % absolute better than MenNN .;Results;20
98229;40b0fced8bc45f548ca7f79922e62478d2043220;Method;0;heuristic visualizations;Zeiler and Fergus provide several heuristic visualizations suggesting coarse localization ability .;Several recent works have attempted to analyze and explain this overwhelming success .;Szegedy et al . show counterintuitive properties of the convnet representation , and suggest that individual feature channels may not be more semantically meaningful than other bases in feature space .;Deep learning;5
90196;3a61d5fbc8d99310965fd91b12527d1cd69d7116;Method;0;region pooling step;On the other hand , one - stage detectors remove the region pooling step of two - stage detectors .;R - FCN replaces the expensive fully connected sub - detection network with a fully convolutional network , and Light - Head R - CNN reduces the cost in R - FCN by applying separable convolution to reduce the number of channels in the feature maps before RoI pooling .;Efficient Network Architectures .;Related Work;2
102102;42e80c73867bff9eaff6beceb8730fc1276283b9;Method;1;unsupervised tuning procedure;Having done that , we adjust the weights of the underlying log - linear model through a novel unsupervised tuning procedure ( Section [ reference ] ) .;This initial phrase - table is then extended by incorporating subword information , addressing one of the main limitations of previous unsupervised SMT systems ( Section [ reference ] ) .;Finally , we further improve the system by jointly refining two models in opposite directions ( Section [ reference ] ) .;Principled unsupervised SMT;3
91069;3a8d537bcec370d37990d39eab01c729496ad057;Method;1;RBF kernel;We train a Support Vector Machine with an RBF kernel for classification using these “ pre - trained ” embeddings .;We evaluate the features learned on the ModelNet dataset by concatenating both the global latent variable with the local latent layers , creating a single feature vector .;Table [ reference ] shows the performance of previous state - of - the - art supervised and unsupervised methods in shape classification on both variants of the ModelNet dataset .;Shape Classification;13
85869;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;2D convolutional neural networks;Most of the existing deep learning - based methods for 3D hand and human pose estimation from a single depth map are based on a common framework that takes a 2D depth map and directly regresses the 3D coordinates of keypoints , such as hand or human body joints , via 2D convolutional neural networks ( CNNs ) .;;The first weakness of this approach is the presence of perspective distortion in the 2D depth map .;Abstract;1
18903;0a49b4de21363d86599d4a058aaf4f5aed019495;Method;0;segmentation steps;Furthermore , these segmentation steps are often tuned or designed separately from the ultimate objective of translation quality , potentially contributing to a suboptimal quality .;This word segmentation procedure can be as simple as tokenization followed by some punctuation normalization , but also can be as complicated as morpheme segmentation requiring a separate model to be trained in advance .;Based on this observation and analysis , in this paper , we ask ourselves and the readers a question which should have been asked much earlier : Is it possible to do character - level translation without any explicit segmentation ?;Motivation;4
47440;1c0e8c3fb143eb5eb5af3026eae7257255fcf814;Method;1;SSW;Next , we compare the detection performances with two popular object proposal methods , SSW [ reference ] and EB;Object proposals .;[ reference ] .;Detection results;12
97850;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;1;adversarial regularization scheme;Additionally , we discuss how this can be combined with a new adversarial regularization scheme recently proposed in [ reference ] , as well as prior work [ reference ] in order to obtain ensembles of models at no additional cost .;The idea of using batch size to control the noise in a simple cyclical schedule was recently proposed in [ reference ] . Here , we build upon this work by studying different cyclical annealing strategies for a wide range of problems .;In summary , our contributions are as follows :;Introduction;2
90186;3a61d5fbc8d99310965fd91b12527d1cd69d7116;Method;0;low - level vision algorithm;SPP and Fast R - CNN address this by applying a ConvNet fully convolutionally on the image and extracting features directly from the feature maps for each RoI. Faster R - CNN further improves efficiency by replacing the low - level vision algorithm with a region proposal network .;Repeatedly applying a ConvNet to the RoIs introduces many redundant computations .;R - FCN replaces the expensive fully connected sub - detection network with a fully convolutional network , and Light - Head R - CNN reduces the cost in R - FCN by applying separable convolution to reduce the number of channels in the feature maps before RoI pooling .;Related Work;2
82165;35502af359aa60ae8047df172e29503cfb29c3f9;Method;0;multivariate kernels;There are two commonly used multivariate kernels in mean shift algorithm .;;The first , Epanechnikov kernel , has the following profile where is the volume of the unit - dimensional sphere .;Details of Recurrent Mean Shift Grouping;24
27224;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;pseudo - labeler;The goal is to train a ranking model given the scores / ranking generated by the pseudo - labeler as a weak supervision signal .;Note that we can generate as much as training data as we need with almost no cost .;In the following section , we formally present a set of neural network - based ranking models that can leverage the given weak supervision signal in order to learn accurate representations and ranking for the ad - hoc retrieval task .;Weak Supervision for Ranking;3
28480;0f2f4edb7599de34c97f680cf356943e57088345;Method;na;4;On MPII there is over a 2 % average accuracy improvement across all joints , with as much as a 4 - 5 % improvement on more difficult joints like the knees and ankles .;The final network architecture achieves a significant improvement on the state - of - the - art for two standard pose estimation benchmarks ( FLIC and MPII Human Pose ) .;;Introduction;1
72297;2dc32f9e0a7870b272a2a51082202a9fa52fb854;Method;0;cross - entropy loss function;LAPGAN uses a cross - entropy loss function to encourage the output images to respect the data distribution of training datasets .;On the contrary , our LapSRN is a super - resolution model that predicts a particular HR image based on the given LR image .;In contrast , we use the Charbonnier penalty function to penalize the deviation of the prediction from the ground truth sub - band residuals .;Laplacian pyramid .;6
37387;15212fa4d30863ea1f9bd9591eee03848278242d;Method;1;JaDCI;DCI is a transfer learning method for cross - domain and cross - lingual text classification for which we had provided an implementation ( here called JaDCI ) built on top of JaTeCS , a Java framework for text classification .;This paper introduces PyDCI , a new implementation of Distributional Correspondence Indexing ( DCI ) written in Python .;PyDCI is a stand - alone version of DCI that exploits scikit - learn and the SciPy stack .;Revisiting Distributional Correspondence Indexing : A Python Reimplementation and New Experiments;0
96016;3f3a483402a3a2b800cf2c86506a37f6ef1a5332;Method;1;Deep learning method;Deep learning method is outperformed both for PCP ( 84.7 vs. 80.7 % ) and AOP ( 86.5 vs. 84.9 % ) measures .;[ reference ] shows that DeepCuts outperform all prior methods .;This is remarkable , as DeepCuts reason about part interactions across several people , whereas primarily focuses on the single - person case and handles multi - person scenes akin to .;Multi Person Pose Estimation;15
13553;07c4fc48ad7b7d1a417b0bb72d0ae2d4efc5aa83;Method;1;stack super - separable convolutions;To avoid making a bottleneck of this kind , we use stack super - separable convolutions in layer with co - prime .;Note that a super - separable convolution does n’t allow channels in separate groups to exchange information .;In particular , in our experiments we always alternate and .;Super - separable convolutions;4
86395;36b1ba4287c4884df27dd684c4c7f66f32e943db;Method;0;YAGO3;YAGO3 - 10 is a subset of YAGO3 , containing entities which have a minimum of 10 relations each .;WN18RR is a subset of WN18 , created by dettmers2017convolutional by removing the inverse relations from WN18 .;;Datasets;10
33421;121e30c48546e671dc5e16c694c5e69b392cf8fb;Method;1;shuffling method;Our shuffling method improves the performance of all models , and achieves new state of the art results on both datasets .;In addition , we present results for finetuned models , with and without the Partial Shuffle .;Our method does not require any additional parameters or hyper - parameters , and runs in less than th of a second per epoch on the Penn Treebank dataset .;Results;3
104595;44c5dec4d1295d34f052d3243d8e08f14a3c0990;Method;1;positional encoding scheme;Concretely , it consists of a segment - level recurrence mechanism and a novel positional encoding scheme .;As a solution , we propose a novel neural architecture , Transformer - XL , that enables Transformer to learn dependency beyond a fixed length without disrupting temporal coherence .;Our method not only enables capturing longer - term dependency , but also resolves the problem of context fragmentation .;Transformer - XL : Attentive Language Models Beyond a Fixed - Length Context;0
7188;04957e40d47ca89d38653e97f728883c0ad26e5d;Method;0;region - wise classifier;Due to the success of the R - CNN [ reference ] architecture , the two - stage formulation of the detection problems , by combining a proposal detector and a region - wise classifier has become predominant in the recent past .;;To reduce redundant CNN computations in the R - CNN , the SPP - Net [ reference ] and Fast - RCNN [ reference ] introduced the idea of region - wise feature extraction , significantly speeding up the overall detector .;Related Work;3
9409;05357b8c05b5bc020e871fc330a88910c3177e4d;Method;1;weighted version;( [ reference ] ) to a weighted version , as in Eq .;To solve this problem , we change the loss in Eq .;( [ reference ] ) .;Online instance classifier refinement;5
90915;3a8d537bcec370d37990d39eab01c729496ad057;Method;0;T - L network;Of the few unsupervised neural - based approaches that exist , the T - L network is one of the most important , combining a convolutional autoencoder with an image regressor to encode a unified vector representation of a given 2D image .;Some approaches have attempted to minimize human involvement by developing weakly - supervised schemes , making use of image silhouettes to conduct 3D object reconstruction .;However , one fundamental issue with the T - L Network is its three - phase training procedure , since jointly training the system components proves to be too difficult .;Related Work;2
22776;0c9ae806059196007938f24d0327a4237ed6adf5;Method;1;classification function;IIC is a generic clustering algorithm that directly trains a randomly initialised neural network into a classification function , end - to - end and without any labels .;In this paper , we introduce Invariant Information Clustering ( IIC ) , a method that addresses this issue in a more principled manner .;It involves a simple objective function , which is the mutual information between the function ’s classifications for paired data samples .;Introduction;1
12799;072fd0b8d471f183da0ca9880379b3bb29031b6a;Method;0;L2 regression;Several other papers have also used GANs for image - to - image mappings , but only applied the GAN unconditionally , relying on other terms ( such as L2 regression ) to force the output to be conditioned on the input .;The image - conditional models have tackled image prediction from a normal map , future frame prediction , product photo generation , and image generation from sparse annotations ( c.f . for an autoregressive approach to the same problem ) .;These papers have achieved impressive results on inpainting , future state prediction , image manipulation guided by user constraints , style transfer , and superresolution .;Related work;2
22392;0c36c988acc9ec239953ff1b3931799af388ef70;Method;1;ResNet_v1_101;Specifically , ResNet_v1_101 trained on ImageNet - 128w is used for Faster RCNN feature extraction .;Mini batch size is set to 1 considering memory consumption .;It is helpful to freeze the first two blocks in the training stage as data size of WIDER FACE is not so large .;Implementation Details;8
89535;39dba6f22d72853561a4ed684be265e179a39e4f;Method;1;data transformation;We found this simple data transformation to greatly improve the performance of the LSTM .;This way , is in close proximity to , is fairly close to , and so on , a fact that makes it easy for SGD to “ establish communication ” between the input and the output .;;The model;2
6129;0373b97580cdfd0b69f165e1a946bae62da95dce;Method;0;Residual Networks;In Residual Networks , the gradients and features learned in earlier layers are passed back and forth between the layers via the identity transformations .;The identity transformation , is used to reduce the dimensions of to match those of .;Exponential Linear Unit ( ELU ) alleviates the vanishing gradient problem and also speeds up learning in deep neural networks which leads to higher classification accuracies .;Background;2
4484;027f9695189355d18ec6be8e48f3d23ea25db35d;Method;1;SST experiments;90.7 53.7 Table 2 : Results of SST experiments .;89.4 52.3 Gumbel Tree - LSTM ( Ours );The bottom section contains results of RvNN - based models .;Natural Language Inference;10
27263;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;0;ranking function;This model learns a ranking function which predicts the probability of document to be ranked higher than given .;The third architecture is based on a pair - wise scenario during both training and inference ( Figure [ reference ] ) .;Similar to the rank model , each training instance has five elements : .;Ranking Architectures;5
104262;4402c6c8445f17f4161e0f64573b7e28df1ca180;Method;0;FM );[ reference ] and factorization machines ( FM ) [ reference ] are widely used in industrial applications .;[ reference ] . From the modeling perspective , linear logistic regression ( LR ) [ reference ] , [ reference ] and non - linear gradient boosting decision trees ( GBDT );However , these models are limited in mining high - order latent patterns or learning quality feature representations .;II . RELATED WORK;3
39093;15e1af79939dbf90790b03d8aa02477783fb1d0f;Method;1;Matconvnet package;Specifically , the Matconvnet package is used .;We adopt the CNN re - ID baseline used in .;During training , We use the ResNet - 50 model and modify the fully - connected layer to have 751 , 702 and 1 , 367 neurons for Market - 1501 , DukeMTMC - reID and CUHK03 , respectively .;Implementation Details;14
5475;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;0;DKM;DKM is supported by an NIHR Senior Investigator Award .;VFJN is supported by a Health Foundation / Academy of Medical Sciences Clinician Scientist Fellowship .;We gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs for our research .;Acknowledgements;31
609;007ab5528b3bd310a80d553cccad4b78dc496b02;Method;1;Character Embedding Layer;Character Embedding Layer maps each word to a vector space using character - level CNNs .;Our machine comprehension model is a hierarchical multi - stage process and consists of six layers ( Figure [ reference ] ) :;Word Embedding Layer maps each word to a vector space using a pre - trained word embedding model .;Model;2
52203;207e0ac5301a3c79af862951b70632ed650f74f7;Method;0;matrix regularisation;With the number of training samples much smaller than the feature dimension , the existing methods thus face the classic small sample size ( SSS ) problem and have to resort to dimensionality reduction techniques and / or matrix regularisation , which lead to loss of discriminative power .;Typically a person ’s appearance is represented using features of thousands of dimensions , whilst only hundreds of training samples are available due to the difficulties in collecting matched training images .;In this work , we propose to overcome the SSS problem in re - i d distance metric learning by matching people in a discriminative null space of the training data .;Learning a Discriminative Null Space for Person Re - identification;0
106007;4543052aeaf52fdb01fced9b3ccf97827582cef5;Method;1;stacked U - Nets;To demonstrate the advantages of DU - Net , we first compare it with traditional stacked U - Nets .;;This experiment is done on the MPII validation set .;DU - Net vs. Stacked U - Nets;10
56734;231af7dc01a166cac3b5b01ca05778238f796e41;Method;0;martingale difference sequence w.r.t;More precisely , is a martingale difference sequence w.r.t .;Assumptions on the additive noise : and are martingale difference sequence with second moments bounded by .;increasing - fields satisfying for and a given constant .;Assumptions .;33
9320;05357b8c05b5bc020e871fc330a88910c3177e4d;Method;0;Cibis;Cibis trained a multi - fold MIL detector by alternatively relabelling instances and retraining classifier .;For example , Wang relaxed the MIL restraints into a differentiable loss function and optimized it by SGD to speed up training and improve results .;Recently , some researchers combined CNN and MIL to train an end - to - end network for WSOD .;Related work;2
14866;0891ed6ed64fb461bc03557b28c686f87d880c9a;Method;0;letter - based representations;Finally , there is currently a lot of interest in models for NER that use letter - based representations .;ratinov2009design quantitatively compare several approaches for NER and build their own supervised model using a regularized average perceptron and aggregating context information .;gillick2015multilingual model the task of sequence - labeling as a sequence to sequence learning problem and incorporate character - based representations into their encoder model .;Related Work;19
86125;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;classification - guided approach;[ reference ] , classification - guided approach ( Cls - Guide ) [ reference ] , 3DCNN based method ( 3DCNN ) [ reference ] , occlusion aware based method ( Occlusion ) [ reference ] , and hallucinating heat distribution method ( HeatDist );[ reference ] , global - to - local prediction method ( Global - to - Local );[ reference ] . Some reported results of previous works [ reference ] are calculated by prediction labels available online .;Comparison with state - of - the - art methods;16
66829;2aec8d465e9a74c27f956ed1136f3e8a3ba0a833;Method;1;denoising model;In this work , we take a tunable noise level map M as input to make the denoising model flexible to noise levels .;The denoiser should introduce no visual artifacts in controlling the trade - off between noise reduction and detail preservation .;To improve the efficiency of the denoiser , a reversible downsampling operator is introduced to reshape the input image of size into four downsampled sub - images of size .;Proposed Fast and Flexible Discriminative CNN Denoiser;5
71795;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;regular RNN;The DilatedRNN improves significantly on the performance of a regular RNN , LSTM , or GRU with far fewer parameters .;We empirically validate the DilatedRNN in multiple RNN settings on a variety of sequential learning tasks , including long - term memorization , pixel - by - pixel classification of handwritten digits ( with permutation and noise ) , character - level language modeling , and speaker identification with raw audio waveforms .;Many studies have shown that vanilla RNN cells perform poorly in these learning tasks .;Introduction;1
78702;32a93598e8a338496f04a0ace81b0768c2ef059d;Method;0;one - hot distribution;The sequence - level negative log - likelihood for NMT then involves matching the one - hot distribution over all complete sequences , where is the observed sequence .;First , consider the sequence - level distribution specified by the model over all possible sequences , for any length .;Of course , this just shows that from a negative log likelihood perspective , minimizing word - level NLL and sequence - level NLL are equivalent in this model .;Sequence - Level Knowledge Distillation;7
97557;40193e7ba0fbd7153a1fe15e95563463b67c71f3;Method;0;3D face regressor model;[ f , t , Π , α s , α exp ] to regress for the 3D face regressor model .;Putting them together , we have in total 62 parameters α =;;3D morphable model;8
5492;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;Shallow + ” model;Deeper models ( and the “ Shallow + ” model in Sec [ reference ] ) use the weight initialisation scheme of .;The weights of our shallow , 5 - layers networks are initialized by sampling from a normal distribution and their initial learning rate is set to .;The scheme increases the signal ’s variance in our settings , which leads to RMSProm decreasing the effective learning rate .;Additional Details on Network Configurations;33
62763;2788a2461ed0067e2f7aaa63c449a24a237ec341;Method;1;RE - 0;We observe that , 1 ) all erasing schemes outperform the baseline , 2 ) RE - R achieves approximately equal performance to RE - M , and 3 ) both RE - R and RE - M are superior to RE - 0 and RE - 255 .;Table [ reference ] presents the result with different erasing values on CIFAR10 using ResNet18 ( pre - act ) .;If not specified , we use RE - R in the following experiment .;Classification Evaluation;12
104671;44c5dec4d1295d34f052d3243d8e08f14a3c0990;Method;0;context representations;Existing works range from ones where context representations are manually defined mikolov2012context , ji2015document , wang2015larger to others that rely on document - level topics learned from data dieng2016topicrnn , wang2017topic .;To capture the long - range context in language modeling , a line of work directly feeds a representation of the wider context into the network as an additional input .;More broadly , in generic sequence modeling , how to capture long - term dependency has been a long - standing research problem .;Related Work;2
26189;0e37c8f19eefeb0c20d92f5cb4df4153077c116b;Method;0;filter structure;4.1 Leveraging filter structure;Hence , this only affects the way the solution space is explored by the optimization procedure .;This idea exploits the observation that CNN filters have a very characteristic structure that reappears across data sets and problem domains .;Title;0
68659;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;1;mAP@.5;The accuracy is evaluated by mAP@.5 .;The ImageNet Detection ( DET ) task involves 200 object categories .;Our object detection algorithm for ImageNet DET is the same as that for MS COCO in Table 9 .;ImageNet Detection;19
14843;0891ed6ed64fb461bc03557b28c686f87d880c9a;Method;0;fixed - depth decision trees;In the CoNLL - 2002 shared task , carreras2002named obtained among the best results on both Dutch and Spanish by combining several small fixed - depth decision trees .;;Next year , in the CoNLL - 2003 Shared Task , florian2003named obtained the best score on German by combining the output of four diverse classifiers .;Related Work;19
14505;0875fc92cce33df5cf7df169590dbf0ca00d2652;Method;1;caption sentence representation;We obtain the caption sentence representation by first transforming each word to an - dimensional vector representation , using the Bidirectional RNN .;Let be the input caption , represented as a sequence of 1 - of - K encoded words , where is the size of the vocabulary and is the length of the sequence .;In a Bidirectional RNN , the two LSTMs hochreiter_lstm with forget gates;Language Model : the Bidirectional Attention RNN;4
12951;072fd0b8d471f183da0ca9880379b3bb29031b6a;Method;0;Color histogram matching;Color histogram matching is a common problem in image processing , and PixelGANs may be a promising lightweight solution .;For example , the bus in Figure [ reference ] is painted gray when the net is trained with an L1 loss , but becomes red with the PixelGAN loss .;Using a PatchGAN is sufficient to promote sharp outputs , and achieves good FCN - scores , but also leads to tiling artifacts .;From PixelGANs to PatchGANs to ImageGANs;13
76736;309acdd149f5f0ea12acb103b36bb59e6e631671;Method;0;convex relaxation;Representing human 3D pose as a linear combination of a sparse set of 3D bases , pretrained using 3D mocap data , has also proved a popular approach for articulated human motion , while propose a convex relaxation to jointly estimate the coefficients of the sparse representation and the camera viewpoint and enforce limb length constraints .;For instance , defined a generative model based on the assumption that complex shape variations can be decomposed into a mixture of primitive shape variations and achieve competitive results .;Although these approaches can reconstruct 3D pose from a single image , their best results come from imposing temporal smoothness on the reconstructions of a video sequence .;Related Work;2
11514;063ad0349f05c8aacbbb653ffcf01047a293fa30;Method;1;location masking;As we can see , the n - gram representation with location masking achieves slightly better results over the left - right context .;It also shows the results of logistic regression based models versus LSTM models .;N - grams include unigrams and bigrams .;Results;26
107593;46018a894d533813d67322827ca51f78aed6d59e;Method;0;pre - output concatenation;[ Cascaded architecture , using pre - output concatenation , which is an architecture with properties similar to that of learning using a limited number of mean - field inference iterations in a CRF ( MFCascadeCNN ) . ];[ Cascaded architecture , using local pathway concatenation ( LocalCascadeCNN ) . ];;Cascaded architectures;6
52146;2019ede61cc0be14859908312e18458a7c79908f;Method;0;Template Kneser - Ney baseline;This is a total improvement of nearly 15 BLEU over the Template Kneser - Ney baseline .;Global conditioning on the fields improves the model by over 7 BLEU and adding words gives an additional 1.3 BLEU .;Similar observations are made for ROUGE + 15 and NIST + 2.8 .;The more , the better;22
64916;28eceb438da0b841bbd3d02684dbfa263838ed60;Method;0;intermediate feature layers;One of the drawbacks of this approach is that all intermediate feature layers are at full image resolution and have a high memory footprint .;This can be viewed as a full - resolution counterpart to the CRN , based on dilating the filters instead of scaling the feature maps .;Thus the ratio of capacity ( number of parameters ) to memory footprint is much lower than in the CRN .;Baselines;12
54134;218b80da3eb15ae35267d280dcc4a806d515334a;Method;0;AMU;AMU16 : SMT - based GEC systems junczys2014amu , junczys2016phrase developed by AMU .;AMU14 and;NUS14 , NUS16 , NUS17 and;Experimental results;14
82377;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;0;re - parametrization trick;We can use the same re - parametrization trick of vae for back - propagation through the encoder network .;In this case , the stochasticity in comes from both the data - distribution and the randomness of the Gaussian distribution at the output of the encoder .;Universal approximator posterior : Adversarial autoencoders can be used to train the as the universal approximator of the posterior .;Adversarial Autoencoders;3
102575;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;1;specific model;Using ImageNet dataset to train the specific model with bicubic degradation , SRResNet performs slightly better than SRMDNF on scale factor 4 .;In particular , SRMDNF achieves the best overall quantitative results .;To further compare with other methods such as VDSR , we also have trained a SRMDNF model ( for scale factor 3 ) which operates on Y channel with 291 training images .;Experiments on Bicubic Degradation;14
47634;1c7e078611c9df412e6eb3a356f31a0da0c1f99c;Method;0;2D object detection methods;Recently , 2D object detection methods are used as template matching and augmented for 6D pose estimation , especially with deep learning - based object detectors .;In 6D pose estimation , a template is usually obtained by rendering the corresponding 3D model .;Template - based methods are useful in detecting texture - less objects .;RELATED WORK;2
102974;43428880d75b3a14257c3ee9bda054e61eb869c0;Method;0;Convolutional Sequence to Sequence Learning;Convolutional Sequence to Sequence Learning;;;Title;0
72311;2dc32f9e0a7870b272a2a51082202a9fa52fb854;Method;0;LapSRN;On the other hand , all the convolutional filters for feature extraction , upsampling , and residual prediction layers in the LapSRN are jointly trained in an end - to - end , deeply supervised fashion .;Also , the sub - networks in LAPGAN are independently trained .;Third , LAPGAN applies convolutions on the upsampled images , so the speed depends on the size of HR images .;Laplacian pyramid .;6
28536;0f810eb4777fd05317951ebaa7a3f5835ee84cf4;Method;0;compressed representation of the history of visited states;The most promising proposals instead compute generalised counts from a compressed representation of the history of visited states – for example , by constructing a visit - density model over the state space and deriving a “ pseudocount ” , or by using locality - sensitive hashing to cluster states and counting the occurrences in each cluster .;It soon becomes infeasible , for example , to do so by storing the entire history of visited states and comparing each new state to those in the history .;This paper presents a new count - based exploration algorithm that is feasible in environments with large state - action spaces .;Introduction;1
40156;1672ffebacadf849188668f24bcd377a19ae4051;Method;0;indicator function;PReLU is a commonly used activation function where is one dimension of the input of activation function and is an indicator function which controls to switch between two channels of and . in the second channel is a learning parameter .;;Here we refer to as the control function .;Data Adaptive Activation Function;14
6351;03a5b2aac53443e6078f0f63b35d4f95d6d54c5d;Method;0;edge - based;Recent popular SISR methods can be classified into edge - based , image statistics - based and patch - based methods .;The goal of SISR methods is to recover a HR image from a single LR input image .;A detailed review of more generic SISR methods can be found in .;Related Work;2
105795;4543052aeaf52fdb01fced9b3ccf97827582cef5;Method;1;iterative design;Third , to further improve the efficiency , we investigate an iterative design that may reduce the model size to one half .;Compared with the naive implementation , this strategy makes it possible to train a very deep DU - Net ( actually , deeper ) .;More specifically , the output of the first pass of the DU - Net is used as the input of the second pass , where detection or regression loss is applied as supervision .;Introduction;1
82296;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;0;gmmn;GMMN ) gmmn use a moment matching cost function to learn the data distribution .;weighted autoencoders yuri use a recognition network to predict the posterior distribution over the latent variables , generative adversarial networks ( GAN ) gan use an adversarial training procedure to directly shape the output distribution of the network via back - propagation and generative moment matching networks (;In this paper , we propose a general approach , called an adversarial autoencoder ( AAE ) that can turn an autoencoder into a generative model .;Introduction;1
99224;40eb1e54cb5382dfd3b7efd16dc7df826262ea52;Method;1;RGB detector;2D detection APs of our RGB detector are also provided in Tab .;we report the 3D AP curves of our Frustum PointNets on SUN - RGBD val set .;[ reference ] for reference .;Visualizations for SUN - RGBD ( Sec 5.1 );43
4239;023cc7f9f3544436553df9548a7d0575bb309c2e;Method;1;Huffman coding tree;In order to improve our running time , we use a hierarchical softmax based on the Huffman coding tree .;More precisely , the computational complexity is where is the number of classes and the dimension of the text representation .;During training , the computational complexity drops to .;Hierarchical softmax;3
5227;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;11 - layers;In this section we present a series of experiments in order to analyze the impact of each of the main contributions and to justify the choices made in the design of the proposed 11 - layers , multi - scale 3D CNN architecture , referred to as the DeepMedic .;;Starting from the CNN baseline as discussed in Sec .;Analysis of Network Architecture;10
24421;0d24a0695c9fc669e643bad51d4e14f056329dec;Method;1;RL methodology;"The contributions of the paper can be summarized as follows : 1 ) we describe how RL methodology like the actor - critic approach can be applied to supervised learning problems with structured outputs ; and 2 ) we investigate the performance and behavior of the new method on both a synthetic task and a real - world task of machine translation , demonstrating the improvements over maximum - likelihood and REINFORCE brought by the actor - critic training .";We show that some of the techniques recently developed in deep RL , such as having a target network , may also be beneficial for sequence prediction .;;Introduction;1
44277;1a2599e467e855f845dcbf9282f8bdbd97b85708;Method;0;Intermediate Feature Representation;subsection : Intermediate Feature Representation;Our proposed system consists of two components , shown in Figure [ reference ] :;In this work we choose a low - level acoustic representation : mel - frequency spectrograms , to bridge the two components .;Intermediate Feature Representation;3
40229;1672ffebacadf849188668f24bcd377a19ae4051;Method;1;LR model;Obviously , all the deep networks beat LR model significantly , which indeed demonstrates the power of deep learning .;The influence of random initialization on AUC is less than 0.0002 .;PNN and DeepFM with specially designed structures preform better than Wide & Deep .;Result from model comparison on Amazon Dataset and MovieLens Dataset;19
39618;165ef2b5f86b9b2c68b652391db5ece8c5a0bc7e;Method;1;pyramid pooling;The traditional pyramid pooling ( in a sliding manner ) on the feature map is able to capture information from background regions of different sizes .;In our approach , to encode rich background information , we construct multi - scale networks and apply sliding pyramid pooling on feature maps .;Incorporating general pairwise ( or high - order ) potentials usually involves expensive inference , which brings challenges for CRF learning .;Introduction;2
61923;26fe009b958e8728382d9d764bd7153632f0b869;Method;1;Shortcut - Stacked sentence encoders;Our Shortcut - Stacked sentence encoders achieve strong improvements over existing encoders on matched and mismatched multi - domain natural language inference ( top non - ensemble single - model result in the EMNLP RepEval 2017;The overall supervised model uses the above encoder to encode two input sentences into two vectors , and then uses a classifier over the vector combination to label the relationship between these two sentences as that of entailment , contradiction , or neural .;Shared Task ) .;Shortcut - Stacked Sentence Encoders for Multi - Domain Inference;0
98259;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;MRF smoothness prior;We approach this difficult task in the style of SIFT flow : we retrieve near neighbors using a coarse similarity measure , and then compute dense correspondences on which we impose an MRF smoothness prior which finally allows all images to be warped into alignment .;To test this , we use convnet features for the task of aligning different instances of the same class .;Nearest neighbors are computed using fc 7 features .;Intraclass alignment;8
5012;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;3D fully connected Conditional Random Field;For post - processing of the network ’s soft segmentation , we use a 3D fully connected Conditional Random Field which effectively removes false positives .;In order to incorporate both local and larger contextual information , we employ a dual pathway architecture that processes the input images at multiple scales simultaneously .;Our pipeline is extensively evaluated on three challenging tasks of lesion segmentation in multi - channel MRI patient data with traumatic brain injuries , brain tumors , and ischemic stroke .;Efficient Multi - Scale 3D CNN with fully connected CRF for Accurate Brain Lesion Segmentation;0
69173;2c761495cf3dd320e229586f80f868be12360d4e;Method;1;SGD iterations;Our model is trained for 30k SGD iterations using a mini - batch of 6 images , momentum of 0.9 , an initialize learning rate ( LR ) of and ” polynomial ” learning rate policy .;For training , we use mini - batch SGD with momentum .;All layers are trained with L2 - regularization ( weight decay of ) .;Semantic Segmentation;17
41270;1713d05f9d5861cac4d5ec73151667cb03a42bfc;Method;0;neural - based models;Moreover , as the demand for low - latency neural computation for mobile platforms increases , some neural - based models are expected to run on mobile devices .;When other components of the neural network are also large , the model may fail to fit into GPU memory during training .;Thus , it is becoming more important to compress the size of NLP models for deployment to devices with limited memory or storage capacity .;Introduction;1
62294;2742a33946e20dd33140b8d6e80d5fd04fced1b2;Method;1;feature descriptor;[ reference ] . Given a 30×30×30 TDF voxel grid of a local 3D patch around an interest point , we use eight convolutional layers ( each with a rectified linear unit activation function for nonlinearity ) and a pooling layer to compute a 512 - dimensional feature representation , which serves as the feature descriptor .;3DMatch is a standard 3D ConvNet , inspired by AlexNet;Since the dimensions of the initial input voxel grid are small , we only include one layer of pooling to avoid a substantial loss of information .;Network Architecture;8
95335;3e7f54801c886ea2061650fd24fc481e39be152f;Method;0;RTW;( RTW ) .;Figure [ reference ] shows examples of qualitative results from frontal and top down views for Shotton et al . and random tree walk;For the top - down view , we show only 8 joints on the upper body ( i.e. head , neck , left shoulder , right shoulder , left elbow , right elbow , left hand , and right hand ) as the lower body joints are almost always occluded .;Comparison with State - of - the - Art;11
71308;2d83dbf4c8eabc6bdef3326c4a30d5f33ffc944e;Method;0;explicit attention mechanism;Based on this , we argue that MRN is an implicit attention model without explicit attention mechanism .;Since MRN does not depend on a few attention parameters ( e.g . ) , our visualization method shows a higher resolution than others .;;Qualitative Analysis;23
39305;15e81c8d1c21f9e928c72721ac46d458f3341454;Method;0;Autoregressive NMT without RNNs;paragraph : Autoregressive NMT without RNNs;Choosing to factorize the machine translation output distribution autoregressively enables straightforward maximum likelihood training with a cross - entropy loss applied at each decoding step : This loss provides direct supervision for each conditional probability prediction .;Since the entire target translation is known at training time , the calculation of later conditional probabilities ( and their corresponding losses ) does not depend on the output words chosen during earlier decoding steps .;Autoregressive NMT without RNNs;5
62004;26fe009b958e8728382d9d764bd7153632f0b869;Method;1;Entailment Classifier;subsection : Entailment Classifier;Our experiments ( Section [ reference ] ) demonstrate that our enhancements of the stacked - biRNN with shortcut connections provide significant gains on top of this baseline ( for both SNLI and Multi - NLI ) .;After we obtain the vector representation for the premise and hypothesis sentence , we apply three matching methods to the two vectors for these two vectors and then concatenate these three match vectors ( based on the heuristic matching presented in mou2015natural ) .;Entailment Classifier;4
58283;23c141141f4f63c061d3cce14c71893959af5721;Method;0;shift - reduce parser;A shift - reduce parser accepts a sequence of input tokens and consumes transitions , where each specifies one step of the parsing process .;The formalism is widely used in natural language parsing .;In general a parser may also generate these transitions on the fly as it reads the tokens .;Background : Shift - reduce parsing;4
142;000f90380d768a85e2316225854fc377c079b5c4;Method;0;Feed - Forward Networks;Feed - Forward Networks .;;Until recently , the majority of feedforward networks , such as the VGG - variants [ reference ] , were composed of a linear sequence of layers .;Network Architectures for Segmentation;4
4845;02a5b7a41ffa8518eb3b7cae9914a2bd2bbc886b;Method;1;fully - convolutional baselines;the two fully - convolutional baselines .;w.r.t .;Interestingly , the gap significantly widens when considering mAP at the higher accuracy regime of 0.7 IOU : + 41.6 and + 18.4 respectively .;Evaluation for visual object tracking;11
99432;41232a69c0f8d4b993e6c6e00b16c223442c962f;Method;1;Gate Analysis;subsection : Gate Analysis;We leave it as our future work .;As shown in Table [ reference ] , FTSum achieves much higher ROUGE scores than FTSum .;Gate Analysis;15
49721;1db9bd18681b96473f3c82b21edc9240b44dc329;Method;1;attention mechanism aiayn;Here , we use the Image Transformer in an encoder - decoder configuration , connecting the encoder and decoder through an attention mechanism aiayn .;Following PixelRecursiveSuperResolution , in our experimental setup we enlarge an pixel image four - fold to , a process that is massively underspecified : the model has to generate aspects such as texture of hair , makeup , skin and sometimes even gender that can not possibly be recovered from the source image .;For the encoder , we use embeddings for RGB intensities for each pixel in the 8 image and add dimensional positional encodings for each row and width position .;Image Super - Resolution;14
57809;23ae5fa0e8d581b184a8749d764d2ded128fd87e;Method;0;pooling operations;The most basic version of this approach would not involve combining learned pooling operations , but simply learning pooling operations in the form of the values in âpooling filtersâ.;"These pooling layers remain distinct from convolution layers since pooling is performed separately within each channel ; this channel isolation also means that even the option that introduces the largest number of parameters still introduces far fewer parameters than a convolution layer would introduce .";One step further brings us to what we refer to as tree pooling , in which we learn pooling filters and also learn to responsively combine those learned filters .;Tree pooling;8
91630;3b1b94441010615195a5c404409ce2416860508c;Method;0;VIS + BOW;VIS + BOW performs multinomial logistic regression based on image features and a BOW vector obtained by summing all the word vectors of the question .;GUESS simply selects the modal answer from the training set for each of 4 question types .;VIS + LSTM has one LSTM to encode the image and question , while 2 - VIS + BLSTM has two image feature inputs , at the start and the end .;Results on DAQURA;18
29016;0f84a81f431b18a78bd97f59ed4b9d8eda390970;Method;1;conv1 - conv12;It has 12 convolutional layers ( conv1 - conv12 ) and was trained for iterations with batches of samples each , starting with a learning rate of and dividing it by after every iterations .;To test if the architectures performing best on CIFAR - 10 also apply to larger datasets , we trained an upscaled version of the All - CNN - B network ( which is also similar to the architecture proposed by Lin_2014 ) .;A weight decay of was used in all layers .;Classification of Imagenet;9
14019;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;1;moving window;3c and 5 illustrate this process , where a moving window classifies each 3 - sized instance moving segment - by - segment along the track .;Figs .;Table 5 Confusion matrix for classification of instances of 3 segments .;Title;0
34982;13ea9a2ed134a9e238d33024fba34d3dd6a010e0;Method;1;fully connected;We view each weight vector within a fully connected ( FC ) layer in a convolutional neuron network ( CNN ) as a projection basis .;This paper proposes the SVDNet for retrieval problems , with focus on the application of person re - identification ( re - ID ) .;It is observed that the weight vectors are usually highly correlated .;SVDNet for Pedestrian Retrieval;0
93046;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;1;z - score normalization;Since the parameter range varies significantly , we conduct z - score normalization before training .;In this work , we discuss two baselines and propose a novel cost function .;;Cost Function;7
74931;2f92b10acf7c405e55c74c1043dabd9ded1b1800;Method;1;BiLSTM readers;In particular , we have shown that relatively simple task architectures ( e.g. , based on simple BiLSTM readers ) can become competitive with state of the art , task - specific architectures when augmented with our reading architecture .;Our results show that embedding refinement using both the system ’s text inputs , as well as supplementary text from external background knowledge can yield large improvements .;Our analysis demonstrates that our model learns to exploit provided background knowledge in a semantically appropriate way .;Conclusion;22
54443;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;0;convolutional image filters;The spatial term uses the FoE model from , while the data term replaces traditional derivative filters with a set of learned convolutional image filters .;They formulate a classical flow problem with a data term and a spatial term .;With limited training data and a small set of filters , it did not fully show the full promise of learning flow .;Related Work;2
20197;0b544dfe355a5070b60986319a3f51fb45d1348e;Method;1;gradient - based algorithm;In our case , as the output of the decoder , starting from the input , is differentiable , we can use a gradient - based algorithm to estimate the model parameters .;The two components of the proposed RNN Encoder – Decoder are jointly trained to maximize the conditional log - likelihood where is the set of the model parameters and each is an ( input sequence , output sequence ) pair from the training set .;Once the RNN Encoder – Decoder is trained , the model can be used in two ways .;RNN Encoder – Decoder;4
16113;09da677bdbba113374d8fe4bb15ecfbdb4c8fe40;Method;1;HORNN framework;By revealing the equivalence of the state - of - the - art Residual Network ( ResNet ) and Densely Convolutional Network ( DenseNet ) within the HORNN framework , we find that ResNet enables feature re - usage while DenseNet enables new features exploration which are both important for learning good representations .;In this work , we present a simple , highly efficient and modularized Dual Path Network ( DPN ) for image classification which presents a new topology of connection paths internally .;To enjoy the benefits from both path topologies , our proposed Dual Path Network shares common features while maintaining the flexibility to explore new features through dual path architectures .;Dual Path Networks;0
81317;34f63959ea4a13a05948274a1558c6854a051150;Method;1;stochastic prediction dropout;During training , we apply stochastic prediction dropout liu2018san before the above averaging operation .;A one - layer classifier is used to determine the relation at each step : At last , we utilize all of the outputs by averaging the scores : Each is a probability distribution over all the relations .;During decoding , we average all outputs to improve robustness .;Pairwise Text Classification Output :;12
47106;1bea6bbdb4aed87fff5390d42934a1d9b0a7bec4;Method;0;natural language understanding approaches;There does not seem to be much useful headroom for exploring more sophisticated natural language understanding approaches on this dataset .;We believe that the neural - net system already achieves near - optimal performance on all the single - sentence and unambiguous cases .;;Per - category Performance;13
1894;0116899fce00ffa4afee08b505300bb3968faf9f;Method;0;hierarchical RL;To our knowledge , this is the first work that considers the information leaking in GAN framework for better training generators and combines hierarchical RL to address long text generation problems .;As such , the MANAGER module can be also viewed as a spy that leaks information from the discriminator to better guide the generator .;;Related Work;3
59918;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;0;element - wise square root normalization;DeepO P. Since improved B - CNN is identical to MPN - COV if element - wise square root normalization and normalization are neglected , its unsatisfactory performance suggests that , after matrix square root normalization , further element - wise square root normalization and normalization hurt large - scale ImageNet classification .;All matrix square root normalization methods except improved B - CNN outperform B - CNN and;This is consistent with the observation in , where after matrix power normalization , additional normalization by Frobenius norm or matrix norm makes performance decline .;Evaluation with AlexNet on ImageNet;14
54085;218b80da3eb15ae35267d280dcc4a806d515334a;Method;1;MaxMatch;Being consistent with the official evaluation metrics , we use MaxMatch ( M ) dahlmeier - ng:2012:NAACL - HLT for CoNLL - 2014 and use GLEU napoles2015ground for JFLEG evaluation .;CoNLL - 2014 test set contains 1 , 312 sentences , while JFLEG test set has 747 sentences .;It is notable that the original annotations for CoNLL - 2014 dataset are from 2 human annotators , which are later enriched by that contains 10 human expert annotations for each test sentence .;Dataset and evaluation;12
92970;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;0;cascaded regression;By utilizing the feedback characteristic that the the output ( landmark positions ) of the regression has an influence on the input ( features at landmarks ) , the cascaded regression cascades a list of weak regressors to reduce the alignment error progressively and reaches the state of the art .;Recently , the regression based method , which maps the discriminative features around landmarks to the desired landmark positions , has been proposed .;Besides traditional models , convolutional neutral network ( CNN ) has also been employed in face alignment recently .;Related Works;2
90614;3a7895b17db0cda7bbf86bcda52c46a3e03b6ded;Method;0;Speaker GRU;subsubsection : Speaker Update ( Speaker GRU ) :;The main purpose of this module is to ensure that the model is aware of the speaker of each utterance and handle it accordingly .;Speaker usually frames the response based on the context , which is the preceding utterances in the conversation .;Speaker Update ( Speaker GRU ) :;11
26193;0e37c8f19eefeb0c20d92f5cb4df4153077c116b;Method;0;spectral parametrization;Figure 5 : Optimization of CNNs via spectral parametrization .;Pooling ( 5 ) 5 ⇥ 5 4.8 ( b ) Speedup factors .;All experiments include data augmentation .;Title;0
13901;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;0;MANOVA;The former is used in multivariate analysis of variance ( MANOVA ) to test whether there are differences between the means of identified groups of subjects on a combination of dependent variables ( Everitt & Dunn , 1991 ) .;In order to assess the discriminability of the different IVs two statistical measures are introduced : the Wilks ’ Lambda K and the BetweenGroups F.;Wilk ’s Lambda is a statistic that takes into consideration both the differences between groups and the cohesiveness or homogeneity within groups ( Klecka , 1980 ) .;Title;0
19021;0a49b4de21363d86599d4a058aaf4f5aed019495;Method;1;beamsearch;We use beamsearch to approximately find the most likely translation given a source sentence .;;The beam widths are and respectively for the subword - level and character - level decoders .;Decoding and Evaluation;18
76524;303fef411f235e6d1125a40af1e93224f498a4d5;Method;0;SGVB estimator;Except for chung2016hierarchical , these structures all define a continuous latent variable for each step of the RNN computation , and rely on the SGVB estimator kingma2013auto to optimize a variational lower bound of the log - likelihood .;Finally , several previous works have tried to introduce latent variables into sequence modeling bayer2014learning , gregor2015draw , chung2015recurrent , gan2015deep , fraccaro2016sequential , chung2016hierarchical .;Since exact integration is infeasible , these models can not estimate the likelihood ( perplexity ) exactly at test time .;Related work;20
80040;34273979fd2a62fd7b49ee6d14a925864ff94e74;Method;1;recurrent relational reasoning module;We develop a recurrent relational reasoning module , which constitutes our main contribution .;This paper considers many - step relational reasoning , a challenging task for deep learning architectures .;We show that it is a powerful architecture for many - step relational reasoning on three varied datasets , achieving state - of - the - art results on bAbI and Sudoku .;Introduction;1
77036;3112d2d95d66b3d54a72c55072647aab937e410e;Method;0;copy - based models;While the use of copy - based models and additional reconstruction terms in the training loss can lead to improvements in BLEU and in our proposed extractive evaluations , current models are still quite far from producing human - level output , and are significantly worse than templated systems in terms of content selection and realization .;Our experiments indicate that neural systems are quite good at producing fluent outputs and generally score well on standard word - match metrics , but perform quite poorly at content selection and at capturing long - term structure .;Overall , we believe this problem of data - to - document generation highlights important remaining challenges in neural generation systems , and the use of extractive evaluation reveals significant issues hidden by standard automatic metrics .;Introduction;2
59134;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;δ distribution;is a δ distribution with mass only at the output of д ϕ ( x u ) .;There are two key distinctions of note : ( 1 ) The autoencoder ( and denoising autoencoder ) effectively optimizes the first term in the vae objective ( Eq . 5 and Eq . 6 ) using a delta variational distribution;Contrast this to the vae , where the learning is done using a variational distribution , i.e. , д ϕ ( x u ) outputs the parameters ( mean and variance ) of a Gaussian distribution .;A taxonomy of autoencoders;9
37661;15ca7adccf5cd4dc309cdcaa6328f4c429ead337;Method;0;3D shape descriptors;3D shape descriptors lie at the core of shape analysis and a large variety of shape descriptors have been designed in the past few decades .;;3D shapes can be converted into 2D images and represented by descriptors of the converted images .;3D Shape Descriptors .;3
31014;10203151008a20b32ce089f7f9d580005c2426cf;Method;0;min - hash;Then , query images are chosen via min - hash and spatial verification , as in .;It uses Hessian affine local features , RootSIFT descriptors , and a fine vocabulary of 16 M visual words .;Image retrieval based on BoW is used to collect images of the objects / landmarks .;BoW and 3D reconstruction;8
97807;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;1;Bayesian perspective of neural network training;We implement this through a cyclical batch size schedule motivated by a Bayesian perspective of neural network training .;Here , we propose a method of weight re - initialization by repeated annealing and injection of noise in the training process .;We evaluate our methods through extensive experiments on tasks in language modeling , natural language inference , and image classification .;Abstract;1
19483;0a78873e41615798d09391d9f40d41666b8c9beb;Method;0;total knee arthroplasty;"For example , consider the starting intervention span ” underwent conventional total knee arthroplasty ” ; there is only one intervention in the span but some annotators assigned the surgical label to all five tokens while others opted for only ” total knee arthroplasty .";As a proxy for formally enumerating these entities , we observe that a large majority of starting spans only contain a single target relevant to the subspan labeling task , and so identifying repetition between the starting spans is sufficient .;By analyzing repetition at the level of the starting spans , we can compute agreement without concern for the confounds of slight misalignments or differences in length of the subspans .;Repetition;11
88826;38d7920f0e8a3a672ea37c8612b2b2947b9ba9d1;Method;1;SSDLite version;It obtains an mAP of 34.1 % compared to the SSDLite version which obtained 22.1 % .;We also show results for Faster - RCNN trained with MobileNetV2 .;This again highlights the importance of image pyramids ( and SNIPER training ) as we can improve the performance of the detector by 12 % .;Comparison with State - of - the - art;14
95372;3e7f54801c886ea2061650fd24fc481e39be152f;Method;1;viewpoint invariant model;We introduced a viewpoint invariant model that estimates 3D human pose from a single depth image .;;Our model is formulated as a deep discriminative model that attends to glimpses in the input .;Conclusion;13
12206;06c5b86b638b2f3572b9cdd9ef0be4740b16781b;Method;1;multi - sentiment - resource attention mechanism;After obtaining the coupled word embedding , we propose a multi - sentiment - resource attention mechanism to help select the crucial sentiment - resource - relevant context words to build the sentiment - specific sentence representation .;;Concretely , we use the three kinds of sentiment resource words as attention sources to attend to the context words respectively , which is beneficial to capture different sentiment - relevant context words corresponding to different types of sentiment sources .;Multi - sentiment - resource Attention Module;4
89752;3a28fe49e7a856ddd60d134696a891ed7bca5962;Method;0;MS - CNN;Similarly , proposes a uniï¬ed multi - scale convolutional neural network ( MS - CNN ) , which performs detection at multiple intermediate layers to match objects of different scales , as well as an upsampling operation to prevent insufficient resolution of feature maps for handling small instances .;SA - FastRCNN develops a divide - and - conquer strategy based on Fast - RCNN that uses multiple built - in subnetworks to adaptively detect pedestrians across scales .;Rather than using a single downstream classifier , the fused deep neural network ( F - DNN + SS ) method uses a derivation of the Faster R - CNN framework fusing multiple parallel classifiers including Resnet and Googlenet using soft - rejection , and further incorporates pixel - wise semantic segmentation in a post - processing manner to suppress background proposals .;Multi - scale Object Detection;3
10509;06150e6e69a379c27e1d0100fcd7660f073cbacf;Method;1;overcomplete representation;The result is an overcomplete representation well suited for use with orthogonal trees .;Next , rather than applying global decorrelation , which would be computationally prohibitive for sliding window detection with a nonlinear classifier1 , we instead propose to apply an efficient local decorrelation transform .;1Global decorrelation coupled with a linear classifier is efficient as the two linear operations can be merged .;Title;0
51014;1e7a36c4d4f96b29e3edf51b6eb61f8e16217704;Method;0;vanilla RNN;In a vanilla RNN , it is difficult to allow inputs to greatly affect the hidden state vector without erasing information from the past hidden state .;However , if the RNN has flexible input - dependent transition functions , the tree will be able to grow wider more quickly , giving the RNN the flexibility to represent more probability distributions .;However , an RNN with a transition function mapping dependent on the input would allow the relative values of to vary with each possible input , without overwriting the contribution from the previous hidden state , allowing for more long term information to be stored .;Input - dependent transition functions;2
75596;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;1;average feature encodings;This strategy performed consistently worse than computing average feature encodings .;Finally , we also note that we ran experiments computing average media encodings , computing the margins for each encoding , then averaging the margins .;;Fusion Study;10
68268;2c03df8b48bf3fa39054345bafabfeff15bfd11d;Method;0;deep model;Unexpectedly , such degradation is not caused by overfitting , and adding more layers to a suitably deep model leads to higher training error , as reported in [ reference ][ reference ] and thoroughly verified by our experiments .;When deeper networks are able to start converging , a degradation problem has been exposed : with the network depth increasing , accuracy gets saturated ( which might be unsurprising ) and then degrades rapidly .;Fig .;Introduction;2
57992;23ae5fa0e8d581b184a8749d764d2ded128fd87e;Method;0;m - view;m - view;s - view top - 1 m - view top - 5;;Classification results;12
107177;45fdc73a239e9c6ea65e98c96f6a2d6dc35d6f72;Method;0;Quaternion internal representation;subsection : Quaternion internal representation;This section defines the internal quaternion representation ( Section [ reference ] ) , the quaternion convolution ( Section [ reference ] ) , a proper parameter initialization ( Section [ reference ] ) , and the connectionist temporal classification ( Section [ reference ] ) .;The QCNN is a quaternion extension of well - known real - valued and complex - valued deep convolutional networks ( CNN ) .;Quaternion internal representation;4
95189;3e7f54801c886ea2061650fd24fc481e39be152f;Method;1;body part detector;As a result , we can train a body part detector to be invariant to viewpoint changes .;Our core contribution is as follows : we leverage depth data to embed local patches into a learned viewpoint invariant feature space .;To provide richer context , we also introduce recurrent connections to enable our model to reason on past actions and guide downstream global pose estimation ( see Figure [ reference ] ) .;Model;3
58640;23dcfda130aada27c158c0b5f394cac489c9c795;Method;0;face detectors;Detector arrays were once a popular method when frontal face detection had increased success , the idea was to train multiple face detectors for different head poses .;In the classic literature we can discern Appearance Template Models which seek to compare test images with a set of pose exemplars .;Recently , facial landmark detectors which have become very accurate , have been popular for the task of pose estimation .;RELATED WORK;2
55559;2298490e82ff3fd03a3a28bd9c9f307bd897a753;Method;1;bias initialization;In contrast to focal loss , our approach does n’t need a specialized bias initialization .;While the final model evaluated on test - dev adopts ResNeXt - 101 .;;Network Setting :;16
27090;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;0;auto - encoders Baldi:2012;Other generative approaches such as language modeling in NLP , and , more recently , various flavors of auto - encoders Baldi:2012 and generative adversarial networks Goodfellow:2014 in computer vision have shown a promise in building more accurate models .;For example in NLP , various methods for learning distributed word representations , e.g. , word2vec Mikolov:2013 , GloVe Pennington:2014 , and sentence representations , e.g. , paragraph vectors Le:2014 and skip - thought Kiros:2015 have been shown very useful to pre - train word embeddings that are then used for other tasks such as sentence classification , sentiment analysis , etc .;Despite the advances in computer vision , speech recognition , and NLP tasks using unsupervised deep neural networks , such advances have not been observed in core information retrieval ( IR ) problems , such as ranking .;Introduction;1
29048;0f84a81f431b18a78bd97f59ed4b9d8eda390970;Method;0;Fergus;For higher layers of our network the method of Zeiler and Fergus fails to produce sharp , recognizable , image structure .;Interestingly , the very first layer of the network does not learn the usual Gabor filters , but higher layers do .;This is in agreement with the fact that lower layers learn general features with limited amount of invariance , which allows to reconstruct a single pattern that activates them .;Deconvolution;10
96991;3febb2bed8865945e7fddc99efd791887bb7e14f;Method;1;local inference layer;Our baseline , the ESIM sequence model from Chen2017EnhancedLF , uses a biLSTM to encode the premise and hypothesis , followed by a matrix attention layer , a local inference layer , another biLSTM inference composition layer , and finally a pooling operation before the output layer .;The Stanford Natural Language Inference ( SNLI ) corpus snliemnlp2015 provides approximately 550 K hypothesis / premise pairs .;Overall , adding ELMo to the ESIM model improves accuracy by an average of 0.7 % across five random seeds .;Evaluation;8
10590;06150e6e69a379c27e1d0100fcd7660f073cbacf;Method;0;naive brute - force approach;Given the high complexity of each step , a naive brute - force approach for training is infeasible .;Finally computing z = wᵀx over all n training examples and d projections is O ( ndm2 ) .;Speedup :;Title;0
80567;346578304ff943b97b3efb1171ecd902cb4f6081;Method;1;single step gradient method;We use the single step gradient method as in [ reference ] ) , and batch normalization [ reference ] ) was used in each of the generator layers .;We use convolutional transpose layers [ reference ] ) for G and strided convolutions for D except for the input of G and the last layer of D.;The different discriminators were trained with varying dropout rates from [ reference ] . Variations in the discriminators were effected in two ways .;A.8 EXPERIMENTAL SETUP;22
5119;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;CNN component;The main contributions of our work are within the CNN component which we describe first in the following .;Our proposed lesion segmentation method consists of two main components , a 3D CNN that produces highly accurate , soft segmentation maps , and a fully connected 3D CRF that imposes regularization constraints on the CNN output and produces the final hard segmentation labels .;;Method;4
65386;2a69ddbafb23c63e5e22401664bea229daaeb7d6;Method;0;handcrafted representations of global contrast;Early approaches utilize handcrafted representations of global contrast or multi - scale region features .;Precisely locating the salient object regions in an image requires an understanding of both large - scale context information for the determination of object saliency , as well as small - scale features to localize object boundaries accurately .;Li et al .;Salient object detection .;7
48166;1cf6bc0866226c1f8e282463adc8b75d92fba9bb;Method;0;deep features;, we see that models based on deep features significantly outperform the Multi - World approach based on hand - crafted features .;[ reference ];Modeling the question only with either the LSTM model or Question BOW model does equally well in comparison , indicating the the question text contains important prior information for predicting the answer .;Results on DAQUAR;11
81138;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;1;fusion of full - body and body - part identity discriminative features;In this work , we have studied the problem of person ReID in three levels : 1 ) a multi - scale context - aware network to capture the context knowledge for pedestrian feature learning , 2 ) three novel constraints on STN for effective latent parts localization and body - part feature representation , 3 ) the fusion of full - body and body - part identity discriminative features for powerful pedestrian representation .;;We have validated the effectiveness of the proposed method on current large - scale person ReID datasets .;Conclusion;15
55252;228db5326a10cd67605ce103a7948207a65feeb1;Method;1;question models;[ reference ] to [ reference ] show that the two - layer SAN gives the best results across all data sets and the two kinds of question models in the SAN , LSTM and CNN , give similar performance .;The experimental results in Table .;For example , on DAQUAR - ALL ( Table .;Results and analysis;13
37077;1518039b5001f1836565215eb047526b3ac7f462;Method;0;multilingual algorithms;For multilingual segmentation tasks , multilingual algorithms have been proposed;For speech recognition , phone - level language models have been used [ reference ] . [ reference ] investigate subword language models , and propose to use syllables .;[ reference ] .;Related Work;5
59326;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;user - level subsampling;We change the ( user , item ) entry subsampling strategy in SGD training in the original paper to the user - level subsampling as we did with Mult - vae pr and Mult - dae .;Collaborative denoising autoencoder ( cdae ) [ reference ] : augments the standard denoising autoencoder by adding a per - user latent factor to the input .;We generally find that this leads to more stable convergence and better performance .;Baselines;17
45595;1abf6491d1b0f6e8af137869a01843931996a562;Method;0;structure training / inference;In our on going work , we are exploring combining our technique with structure training / inference as done in .;Given the simplicity and ease of training , we find these results very encouraging .;;Conclusion;10
95228;3e7f54801c886ea2061650fd24fc481e39be152f;Method;0;pose prediction;To achieve this , we refine our previous pose prediction by learning correction offsets ( i.e. feedback ) denoted by .;Ultimately , our goal is to predict the location of the joint corresponding to each visible human body part .;Furthermore , we only learn correction offsets for joints that are visible .;Multi - Task Loss;5
9502;05357b8c05b5bc020e871fc330a88910c3177e4d;Method;0;spatial regulariser;Notice that not only uses the weighted pooling as we stated in Section [ reference ] , but also combines the objectness measure of EdgeBoxes and the spatial regulariser , which is much complicated than our basic MIDN .;Specially , our methods achieves much better performance than the method by Bilen and Vedaldi using the same CNN model .;We believe that our performance can be improved by choosing better basic detection network , like the complete network in and using the context information .;Comparison with other methods;15
98033;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;0;cyclical batch size schedules;The primary limitation of our work is that cyclical batch size schedules introduce another hyper - parameter that requires manual tuning .;We performed an extensive variety of experiments on different tasks in order to comprehensively test the algorithm .;We note that this is also true for cyclical learning rate schedules , and hope to address this using second order methods [ reference ] as part of future work .;Limitations;10
26769;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;0;bandit algorithm;UCT uses a variation of UCB1 , a bandit algorithm , to choose which child node to visit next .;The UCT algorithm , developed by kocsis_06 , deals with the exploration - exploitation dilemma by treating each node of a search tree as a multi - armed bandit problem .;A common practice is to apply a - step random simulation at the end of each leaf node to obtain an estimate from a longer trajectory .;UCT : Upper Confidence Bounds Applied to Trees .;19
47859;1c7e078611c9df412e6eb3a356f31a0da0c1f99c;Method;0;loss functions;Two new loss functions are introduced for rotation estimation , with the ShapeMatch - Loss designed for symmetric objects .;The 3D rotation is predicted by regressing to a quaternion representation .;As a result , PoseCNN is able to handle occlusion and symmetric objects in cluttered scenes .;CONCLUSIONS;19
63214;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;deep residual model;Inspired by the deep residual model [ reference ] , RCNN [ reference ] , and UNet [ reference ] , we propose two models for segmentation tasks which are named RU - Net and R2U - Net .;;These two approaches utilize the strengths of all three recently developed deep learning models .;III . RU - NET AND R2U - NET ARCHITECTURES;4
46772;1bc072002d97808340b312b69427baf2dc9fcb8e;Method;1;regularisation technique;For example , the L2 regularisation for FNN in Figure [ reference ] is On the other hand , dropout is a technique which becomes a popular and effective regularisation technique for deep learning during the recent years .;To prevent overfitting , the widely used L2 regularisation term is added to the loss function .;We also implement this regularisation and compare them in our experiment .;Regularisation;6
52868;20926884a62778a2bf3f9f3c56f30976749ad763;Method;0;bounded linear operator;We assume there exists a bounded linear operator : ΓΦ→HrHx such that = ⟨fYt , ⁢kx ( ⁢Ψ ( r ), ⋅ ) ⟩Hx⟨fYt , ⁢ΓΦkr ( r , ⋅ ) ⟩Hx .;Let : Φ→XY be an invertible representation function , and let Ψ be its inverse .;We further assume that the Hilbert - Schmidt norm ( operator norm ) ∥ΓΦ∥⁢HS of ΓΦ is bounded by KΦ .;.;68
87607;37a18be8c599b781cc28b6a62d8f11e8a6a75169;Method;1;asymmetrically large encoder;We follow the encoder - decoder structure of CNN , with asymmetrically large encoder to extract deep image features , and the decoder part reconstructs dense segmentation masks .;In this work , we describe our semantic segmentation approach for volumetric 3D brain tumor segmentation from multimodal 3D MRIs , which won the BraTS 2018 challenge .;We also add the variational autoencoder ( VAE ) branch to the network to reconstruct the input images jointly with segmentation in order to regularize the shared encoder .;Introduction;1
58871;23dcfda130aada27c158c0b5f394cac489c9c795;Method;0;ResNet50;We do however use a more powerful backbone network in ResNet50 .;We do not use any other supervisory information which might improve the performance of the network such as 2D landmark annotations .;We show performance of the same network on both the AFLW test - set and AFW .;AFLW and AFW Benchmarking;12
90878;3a8d537bcec370d37990d39eab01c729496ad057;Method;0;image classification methods;Early efforts often combined simple image classification methods with hand - crafted shape descriptors , requiring intensive effort on the side of the human data annotator .;3D object recognition is a well - studied problem in the computer vision literature .;However , ever since the ImageNet contest of 2012 , deep convolutional networks ( ConvNets ) have swept the vision industry , becoming nearly ubiquitous in countless applications .;Related Work;2
14683;0891ed6ed64fb461bc03557b28c686f87d880c9a;Method;0;Unsupervised learning;Unsupervised learning from unannotated corpora offers an alternative strategy for obtaining better generalization from small amounts of supervision .;Unfortunately , language - specific resources and features are costly to develop in new languages and new domains , making NER a challenge to adapt .;However , even systems that have relied extensively on unsupervised features have used these to augment , rather than replace , hand - engineered features ( e.g. , knowledge about capitalization patterns and character classes in a particular language ) and specialized knowledge resources ( e.g. , gazetteers ) .;Introduction;1
850;007ab5528b3bd310a80d553cccad4b78dc496b02;Method;1;Bi - Directional Attention Flow;In this paper we introduce the Bi - Directional Attention Flow ( BiDAF ) network , a multi - stage hierarchical process that represents the context at different levels of granularity and uses bi - directional attention flow mechanism to obtain a query - aware context representation without early summarization .;Typically these methods use attention to focus on a small portion of the context and summarize it with a fixed - size vector , couple attentions temporally , and / or often form a uni - directional attention .;Our experimental evaluations show that our model achieves the state - of - the - art results in Stanford Question Answering Dataset ( SQuAD ) and CNN / DailyMail cloze test .;Bi - Directional Attention Flow for Machine Comprehension;0
35577;143a3186c368544ded00a444be33153420baa254;Method;0;policy gradient algorithms;Practical implementations of this method may also use a variety of improvements recently proposed for policy gradient algorithms , including state or action - dependent baselines and trust regions [ reference ] .;This algorithm has the same structure as Algorithm 2 , with the principal difference being that steps 5 and 8 require sampling trajectories from the environment corresponding to task T i .;;Algorithm 3 MAML for Reinforcement Learning;11
39472;15e81c8d1c21f9e928c72721ac46d458f3341454;Method;1;fertility inference function;As described above , we supervise the fertility predictions at train time by using a fixed aligner as a fertility inference function .;;We use the fast_alignhttps: // github.com / clab / fast_align implementation of IBM Model 2 for this purpose , with default parameters dyer2013simple .;Fertility supervision during training;32
50832;1e7678467b1807777dcd9be557b79328ce9419a8;Method;0;AutoAugment data augmentation;Note , the AutoAugment data augmentation does not transfer well to the retrieval tasks .;Overall , the different elements employed in our architecture ( RA and the layers specific to Multigrain ) still give a significant improvement over simply using the activations , and is competitive with the state of the art for the same resolution / complexity .;This can be explained by their specificity to Imagenet classification .;Additional results and ablation study for Multigrain in retrieval;39
44818;1a67622ca58aa851afe36ad6c6e78f9fb9d691d2;Method;0;Hierarchical Softmax;section : Hierarchical Softmax;To speed the training time , Hierarchical Softmax [ reference ][ reference ] can be used to approximate the probability distribution .;Given that u k ∈ V , calculating Pr ( u k | Φ ( vj ) ) in line 3 is not feasible .;Hierarchical Softmax;14
77999;31e5dab321066712cdc8b30943f7950066840ee1;Method;1;Seq;The sequential encoder ( Seq ) is a re - implementation of konstas2017neural .;Table [ reference ] shows the comparison on the development split of the LDC2015E86 dataset between sequential , tree and graph encoders .;We test both approaches of stacking structural and sequential components : structure on top of sequence ( SeqTreeLSTM and SeqGCN ) , and sequence on top of structure ( TreeLSTMSeq and GCNSeq ) .;Experiments;13
24806;0d467adaf936b112f570970c5210bdb3c626a717;Method;0;3D convolutional network;Following FlowNet , several papers have studied optical flow estimation with CNNs : featuring a 3D convolutional network , an unsupervised learning objective , carefully designed rotationally invariant architectures , or a pyramidal approach based on the coarse - to - fine idea of variational methods .;Their model , dubbed FlowNet , takes a pair of images as input and outputs the flow field .;None of these methods significantly outperforms the original FlowNet .;Related Work;2
32422;10f62af29c3fc5e2572baddca559ffbfd6be8787;Method;0;algebraic operation;"Traditional sentence modeling uses the bag - of - words model which often suffers from the curse of dimensionality ; others use composition based methods instead , e.g. , an algebraic operation over semantic word vectors to produce the semantic sentence vector .";As one of the core steps in NLP , sentence modeling aims at representing sentences as meaningful features for tasks such as sentiment classification .;However , such methods may not perform well due to the loss of word order information .;Introduction;1
59759;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;0;Pre - normalization;paragraph : BP of Pre - normalization;According to the chain rules ( omitted hereafter for simplicity ) of matrix backpropagation and after some manipulations , , we can derive The final step of this layer is concerned with the partial derivative with respect to , which is given by;Note that here we need to combine the gradient of the loss function with respect to , backpropagated from the post - compensation layer .;BP of Pre - normalization;11
63262;27c761258329eddb90b64d52679ff190cb4527b5;Method;0;pictorial representation;The pictorial representation of the unfolded RCL layers with respect to time - step is shown in Fig 5 .;Finally , the last architecture is U - Net with recurrent convolution layers with residual connectivity as shown in Fig . 4 ( d ) , which is named R2U - Net .;Here t=2;III . RU - NET AND R2U - NET ARCHITECTURES;4
38747;15e07c1344e97e46ade2ee0a57017371fa05fe12;Method;1;distance function;The best results are achieved by the ratio function , which outperforms the distance function by a small but consistent margin of 0.3 - 0.5 points .;More importantly , our proposed margin - based scoring brings large improvements when using both the distance and the ratio functions , outperforming cosine similarity by more than 10 points in all cases .;Interestingly , the retrieval strategy has a very small effect in both cases , suggesting that the proposed scoring is more robust than cosine , yet both bidirectional variants still give marginally better results than forward and backward .;BUCC mining task;8
86842;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;gradient clipping method;"We used mini - batch stochastic gradient decent and empirically found it effective to use a gradient clipping method with growing clipping values for the different tasks ; concretely , we employed the simple function : , where is the number of bi - LSTM layers involved in each task , and is the maximum value .";At each training epoch , we trained our model in the order of POS tagging , chunking , dependency parsing , semantic relatedness , and textual entailment .;We applied our successive regularization to our model , along with L2 - norm regularization and dropout dropout2014ver .;Training Details;19
38201;15cc54ed7b1582b2efd71bedf28b23634d82991b;Method;0;spectral normalization method;Moreover , the RBF - B kernel managed to stabilize the MMD - GAN training for various configurations of the spectral normalization method .;In Appendix [ reference ] , we showed the proposed generalized power iteration ( Section [ reference ] ) imposes a stronger Lipschitz constraint than the method in , and benefited MMD - GAN training using the repulsive loss .;In Appendix [ reference ] , we showed the gradient penalty can also be used with the repulsive loss .;Quantitative Analysis;9
20327;0b544dfe355a5070b60986319a3f51fb45d1348e;Method;1;CSLM;The best performance was achieved when we used both CSLM and the phrase scores from the RNN Encoder – Decoder .;As expected , adding features computed by neural networks consistently improves the performance over the baseline performance .;This suggests that the contributions of the CSLM and the RNN Encoder – Decoder are not too correlated and that one can expect better results by improving each method independently .;Quantitative Analysis;13
62616;2788a2461ed0067e2f7aaa63c449a24a237ec341;Method;0;binary belief network;In addition , Adaptive dropout is proposed where the dropout probability for each hidden neuron is estimated through a binary belief network .;Latter , Wan et al . propose a generalization of dropout approach , DropConect , which instead randomly selects weights to zero during training .;Stochastic Pooling randomly selects activation from a multinomial distribution during training , which is parameter free and can be applied with other regularization techniques .;Related Work;2
33949;12db83e66e50152e170d5009c425c925ad2e2c2a;Method;1;general sequence models;The models presented here are general sequence models , requiring no appeal to Natural Languagespecific processing beyond tokenization , and are therefore a suitable target for transfer learning through pre - training the recurrent systems on other corpora , and conversely , applying the models trained on this corpus to other entailment tasks .;Extending these models with attention over the premise provides further improvements to the predictive abilities of the system , resulting in a new state - of - the - art accuracy for recognizing entailment on the Stanford Natural Language Inference corpus .;Future work will focus on such transfer learning tasks , as well as scaling the methods presented here to larger units of text ( e.g. paragraphs and entire documents ) using hierarchical attention mechanisms .;CONCLUSION;13
8573;0523e14247d74c4505cd5e32e1f0495f291ec432;Method;0;GMMs;With this in mind , it is easy to generalize GMMs in a multi - layered fashion .;⇡ iN x|bi , AiATi .;Instead of sampling one transformation from a set , we can sample a path of transformations in a network of k layers , see Figure 1 ( c ) .;Title;0
3968;0217fb2a54a4f324ddf82babc6ec6692a3f6194f;Method;0;ladder network;Another intriguing line of work consists of the ladder network [ reference ] , which has achieved spectacular results on a semi - supervised variant of the MNIST dataset .;A lot of promising recent work originates from the Skip - gram model [ reference ] , which inspired the skip - thought vectors [ reference ] and several techniques for unsupervised feature learning of images [ reference ] .;More recently , a model based on the VAE has achieved even better semi - supervised results on MNIST [ reference ] . GANs [ reference ] have been used by Radford et al .;Related Work;3
52808;20926884a62778a2bf3f9f3c56f30976749ad763;Method;1;neighbors;neighbors has very good within - sample results on Jobs , because evaluation is performed over the randomized component , but suffers heavily in generalizing out of sample , as expected .;nearest;;Results;20
885;0095c269e7d0c990249312687fc43521019809c4;Method;0;advanced ( RNN , CNN ) components;These models first encode two sequences with some basic ( Neural Bag - of - words , BOW ) or advanced ( RNN , CNN ) components of neural networks separately , and then compute the matching score based on the distributed vectors of two sentences .;Some early works focus on sentence level interactions , such as ARC - I , CNTN and so on .;In this paradigm , two sentences have no interaction until arriving final phase .;Weak interaction Models;2
61684;26c8d040bef85ad6dde55a8f71af936fb38356ad;Method;0;attention - based recurrent networks;We introduce extensions to attention - based recurrent networks that make them applicable to speech recognition .;This basic idea significantly extends the applicability range of end - to - end training methods , for instance , making it possible to construct networks with external memory .;Learning to recognize speech can be viewed as learning to generate a sequence ( transcription ) given another sequence ( speech ) .;Introduction;1
76532;303fef411f235e6d1125a40af1e93224f498a4d5;Method;0;thresholding operation;As an exception , chung2016hierarchical utilizes Bernoulli latent variables to model the hierarchical structure in language , where the Bernoulli sampling is replaced by a thresholding operation at test time to give perplexity estimation .;Moreover , for discrete data , the variational lower bound is usually too loose to yield a competitive approximation compared to standard auto - regressive models .;;Related work;20
69008;2c761495cf3dd320e229586f80f868be12360d4e;Method;0;FastEval14k;FastEval14k consists of 14000 images with labels from 6000 classes ( subset of 18291 classes from JFT - 300 M ) .;FastEval14k ’ .;Unlike labels in JFT - 300 M , the images in FastEval14k are densely annotated and there are around 37 labels per image on average .;Monitoring Training Progress;6
51822;1f76b7b071f3e65c97d09720f88d6b0ad9f07e8f;Method;1;1001 - layer ResNet;This effect is particularly obvious when training the 1001 - layer ResNet .;Ease of optimization .;Fig .;Analysis;9
90011;3a61d5fbc8d99310965fd91b12527d1cd69d7116;Method;0;single - stage detectors;Keypoint - based CornerNet achieves state of the art accuracy among single - stage detectors .;Keypoint - based methods are a relatively new paradigm in object detection , eliminating the need for anchor boxes and offering a simplified detection framework .;However , this accuracy comes at high processing cost .;CornerNet - Lite : Efficient Keypoint Based Object Detection;0
32126;10a36dea0167511b66deca65fdca978aa9afdb11;Method;0;VQA models;Visual features from CNN already have implicit attention and selectivity over the image region , thus the resulting class activation maps are similar to the maps generated by the attention mechanisms of the VQA models in .;The example in lower part of Figure [ reference ] shows the heatmaps generated by two different questions and answers .;;Understanding the Visual QA model;6
101500;424aef7340ee618132cc3314669400e23ad910ba;Method;1;baseline network;We observe direct consequences of this mechanism qualitatively in the Penn Treebank in different ways : First , we notice that the probability of generating the < unk > token with our proposed network ( VD - LSTM + REAL ) is significantly lower compared to the baseline network ( VD - LSTM ) across many words .;One important feature of our framework that leads to better word predictions is the explicit mechanism to assign probabilities to words not merely according to the observed output statistics , but also considering the metric similarity between words .;This could be explained by noting the fact that the < unk > token is an aggregated token rather than a specific word , and it is often not expected to be close to specific words in the word embedding space .;QUALITATIVE RESULTS;10
20653;0b5519f76fc8e31ecf9931f00184aee86694e3a4;Method;1;offline optimization;First , our CNN model learns a direct inverse mapping from blurry patch to its clear counterpart based on the learned image distribution , whereas only estimates the blur kernel for the patch and uses an offline optimization for non - blind deblurring , resulting in some artifacts such as ringing .;We attribute our better performance to two reasons .;Second , our CNN architecture is higher fidelity than the one used in , as ours outputs full - resolution result and learns internally to minimize artifacts , e.g. , aliasing and ringing effect .;Non - Uniform Motion Blur Removal;11
14445;0875fc92cce33df5cf7df169590dbf0ca00d2652;Method;0;inference and generative capabilities;The challenging nature of this task has motivated recent approaches to exploit the inference and generative capabilities of deep neural networks .;Statistical natural image modelling remains a fundamental problem in computer vision and image understanding .;Previously studied deep generative models of images often defined distributions that were restricted to being either unconditioned or conditioned on classification labels .;Introduction;1
21726;0be9ca65ad318ee3729928882ef2c403d4b6d24e;Method;0;dropout techniques;Apart from dropout techniques , DBLP : journals / corr / InanKS16 and press - wolf:2017:EACLshort proposed the word tying method ( WT ) , which unifies word embeddings ( in Equation [ reference ] ) with the weight matrix to compute probability distributions ( in Equation [ reference ] ) .;journals / corr / MelisDB17 used black - box optimization to find appropriate hyperparameters for RNN language models and demonstrated that the standard LSTM with proper regularizations can outperform other architectures .;In addition to quantitative evaluation , DBLP : journals / corr / InanKS16 provided a theoretical justification for WT and proposed the augmented loss technique ( AL ) , which computes an objective probability based on word embeddings .;Related Work;17
105838;4543052aeaf52fdb01fced9b3ccf97827582cef5;Method;0;network quantization approaches;Recently , network quantization approaches offer an efficient solution to reduce the size of network through cutting down high precision operations and operands .;Training deep neural networks usually consumes a large amount of computational resources , which makes it hard to deploy on mobile devices .;In the recent binarized convolutional landmark localizer ( BCLL ) architecture , XNOR - Net was utilized for network binarization .;Related Work;2
8300;052282998bc24db695891f755a00e3cebd3fd796;Method;1;Agent architecture;subsection : Agent architecture;We used simple gradient magnitude estimates as priorities , corresponding to a mean absolute TD error along a sequence for Retrace , as defined in ( [ reference ] ) for the classical RL case , and total variation in the distributional Retrace case .;In order to improve CPU utilization we decoupled acting from learning .;Agent architecture;16
63953;28703eef8fe505e8bd592ced3ce52a597097b031;Method;0;Seq2seq;Seq2seq builds on deep neural language modeling and inherits its remarkable accuracy in estimating local , next - word distributions .;Sequence - to - Sequence ( seq2seq ) modeling has rapidly become an important general - purpose NLP tool that has proven effective for many text - generation and sequence - labeling tasks .;In this work , we introduce a model and beam - search training scheme , based on the work of daume05learning , that extends seq2seq to learn global sequence scores .;Sequence - to - Sequence Learning as Beam - Search Optimization;0
19629;0abb49fe138e8fb7332c26b148a48d0db39724fc;Method;1;probabilistic form of averaging;Instead , we use a probabilistic form of averaging .;’s predictions which we found to degrade performance ( see Section [ reference ] ) .;In this , the activations in each region are weighted by the probability ( see Eqn .;Probabilistic Weighting at Test Time;4
882;0095c269e7d0c990249312687fc43521019809c4;Method;0;ARC - I;Some early works focus on sentence level interactions , such as ARC - I , CNTN and so on .;;These models first encode two sequences with some basic ( Neural Bag - of - words , BOW ) or advanced ( RNN , CNN ) components of neural networks separately , and then compute the matching score based on the distributed vectors of two sentences .;Weak interaction Models;2
13770;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;0;Bayesian Networks;"Examples of these studies use Decision Trees ( Manzoni , Maniloff , Kloeckl , & Ratti , 2011 ; Reddy et al . , 2010 ; Zheng , Chen , Li , Xie , & Ma , 2010 ) , Bayesian Networks ( Stenneth , Wolfson , Yu , & Xu , 2011 ) , Fuzzy Logic ( Schüssler & Axhausen , 2009 ) , Hierarchical Conditional Random Fields ( Liao et al . , 2007 ) , and Support Vector Machines ( SVMs ) ( Zheng , Liu , Wang , & Xie , 2008 ) .";On the other hand , ML approaches attempt to do the inference based on learning from existing data , possibly combined with similar logical assumptions .;These ML approaches could be broken down into three aspects ( or phases ) : data collection ( sample size selected , duration of study and granularity of data ) , selection of variables for inference ( or combination of variables ) , and method of inference ( the details of the learning algorithm used ) .;Title;0
24055;0d0101e65e52ae0cec38bcd13c6a9d631979c577;Method;0;ResNet variants;ResNet variants;Residual networks he2015deep , or ResNets , lead a recent and dramatic increase in both depth and accuracy of convolutional neural networks , facilitated by constraining the network to learn residuals .;he2015deep , he2016identity , huang2016stochasticdepth and related architectures;Introduction;1
1650;01125e3c68edb420b8d884ff53fb38d9fbe4f2b8;Method;0;discretization;Given that the error of state - of - the - art methods is of the order of a few mms , we conclude that discretization by produces negligible error .;[ reference ] .;Given our volumetric facial representation , the problem of regressing the 3D coordinates of all vertices of a facial scan is reduced to one of 3D binary volume segmentation .;Proposed volumetric representation;6
25879;0dcde9f2c5149f0e4c806db7b4cc4915bed077da;Method;1;classifier stage;"— we set each column to have a in a single row , corresponding to the label class for each training vector , and all other entries to be zero ; , of size be the real - valued input weights matrix for the classifier stage ; , of size be the real - valued output weights matrix for the classifier stage ; the function be the activation function of each hidden - unit ; for example , may be the logistic sigmoid , , or a squarer , ; , of size , contain the hidden - unit activations that occur due to each feature vector ; is applied termwise to each element in the matrix .";"Let : , of size , contain each length feature vector ; be an indicator matrix of size , which numerically represents the labels of each training vector , where there are classes";;Stage 2 Architecture : Classifier;4
78433;325093f2c5b33d7507c10aa422e96aa5b10a33f1;Method;1;atrous ( dilated ) convolutions;DeepLabV3 is exploiting atrous ( dilated ) convolutions in a cascaded way for capturing contextual information , together with crop - level features encoding global context ( close in spirit to PSPNet ’s global feature ) .;We chose to adopt the recently introduced DeepLabV3 segmentation approach as head , and evaluate its performance with body networks from § [ reference ] .;We follow the parameter choices suggested in , assembling the head as 4 parallel Conv blocks with output channels each and dilation rates ( with x8 downsampled crop sizes from the body ) and kernel sizes , respectively .;Semantic Segmentation;10
48566;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;0;global architecture;The dense architecture has a clear advantage over the global architecture as shown in Table [ reference ] :;Global vs. dense;dense and dense - no - i m outperforms global and global - no - i m , respectively , by large margins .;Overview;22
13985;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;0;kernel functions;Another advantage of SVMs and kernel functions is that the selected kernel could be applied directly to the data without the need for a feature extraction process .;A k - fold cross validation on the training data of value 3 is performed to assess the quality of the model ( the accuracy rate for classification ) .;This is particularly important in problems where a lot of structure of the data is lost by the feature extraction process ( e.g. the sequence of a GPS trajectory ’s movements : such as the way a car can move fast , stop for traffic and then move again ) .;Title;0
4636;02a5b7a41ffa8518eb3b7cae9914a2bd2bbc886b;Method;1;object trackers;Like conventional object trackers , it relies on a simple bounding box initialisation ( blue ) and operates online .;Our proposed method aims at distilling the best from the two tasks of object tracking and video object segmentation .;Differently from state - ofthe - art trackers such as ECO [ reference ] ( red ) , SiamMask ( green ) is able to produce binary segmentation masks , which can more accurately describe the target object .;Init;3
39345;15e81c8d1c21f9e928c72721ac46d458f3341454;Method;0;Encoder Stack;subsection : Encoder Stack;[ reference ] , the model is composed of the following four modules : an encoder stack , a decoder stack , a newly added fertility predictor ( details in [ reference ] ) , and a translation predictor for token decoding .;Similar to the autoregressive Transformer , both the encoder and decoder stacks are composed entirely of feed - forward networks ( MLPs ) and multi - head attention modules .;Encoder Stack;11
88417;3861ae2a6bdd2a759c2d901a6583e63a216bc2fc;Method;1;learning rate warm - up strategy;We also use the learning rate warm - up strategy for Adam wherein the learning rate takes on the form : for the all parameters except and for .;As in vaswani2017attention , we used the Adam optimizer kingma2014adam with and .;This corresponds to the warm - up strategy used for the original Transformer network except that we use a larger peak learning rate for to compensate for their bounds .;Training Details;6
66260;2ad7cef781f98fd66101fa4a78e012369d064830;Method;0;fixed - size feature representation;We argue that it is more desirable to come with a compact , fixed - size feature representation at the video level , irrespective of the varied length of the videos .;Besides , such a setbased representation would incur O ( n ) space complexity per video face example , which demands a lot of memory storage and confronts efficient indexing .;Such a representation would allow direct , constant - time computation of the similarity or distance without the need for frame - to - frame matching .;Introduction;2
92699;3c78c6df5eb1695b6a399e346dde880af27d1016;Method;1;ReLU activations;As before , we pass the concatenated output through a linear layer with ReLU activations .;In this case we do not use query - to - context attention and we set if .;This layer is applied residually , so this output is additionally summed with the input .;Model;5
86355;36b1ba4287c4884df27dd684c4c7f66f32e943db;Method;0;relation - agnostic sparse 4 moded tensor;Further , rather than duplicating entries of within , we can generalize to a relation - agnostic sparse 4 moded tensor by replacing entries with - dimensional strands of .;Linearity allows the product to be considered separately as generating a matrix for each relation .;Thus , the HypER model can be described explicitly as tensor multiplication of and with a core tensor , where is heavily constrained in terms of its number of free variables .;Understanding HypER as Tensor Factorization;6
61042;25c108a56e4cb757b62911639a40e9caf07f1b4f;Method;0;YOLO;RPN and YOLO have fixed size of the input scale , and proposals for all scales are generated in the final layer by using multiple classifiers .;A multi - scale detector takes one shot for the image and generates detection results aross all scales .;However , it is not easy to detect objects in various scales based on the final feature map .;Related work;2
44833;1a67622ca58aa851afe36ad6c6e78f9fb9d691d2;Method;1;ASGD );This allows us to use asynchronous version of stochastic gradient descent ( ASGD ) , in the multi - worker case .;This results in a long tail of infrequent vertices , therefore , the updates that affect Φ will be sparse in nature .;Given that our updates are sparse and we do not acquire a lock to access the model shared parameters , ASGD will achieve an optimal rate of convergence;Parallelizability;16
11105;061c05faf3d68a7bdade9d4debeab369e2f9746c;Method;0;Batch - normalization layers;Batch - normalization layers were added in both networks between convolutional layers , and fully connected layers were removed from these networks .;Compared with the conventional CNNs , maxpooling layers were replaced with strided convolutions in both networks , and fractionally - strided convolutions were used in the generator network to upsample feature maps across layers to finer resolutions .;However , unlike the DCGAN , the LS - GAN model ( unconditional version in Section [ reference ] ) did not use a sigmoid layer as the output for the loss function network .;Architectures;34
59346;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;hybrid NeuCF model;In particular , we compare with the hybrid NeuCF model which gives the best performance in He et al .;The authors kindly provided the two datasets ( ML - 1 M and Pinterest ) used in the original paper , as well as the training / test split , therefore we separately compare with ncf on these two relatively smaller datasets in the empirical study .;[ reference ] , both with and without pre - training .;Baselines;17
55283;228db5326a10cd67605ce103a7948207a65feeb1;Method;1;LSTM Q + I;[ reference ] , Q + I vs Question , and LSTM Q + I vs LSTM Q.;No , as shown in Table .;Our results demonstrate clearly the positive impact of using multiple attention layers .;Results and analysis;13
6827;04640006606ddfb9d6aa4ce8f55855b1f23ec7ed;Method;0;wide - dropout;[ wide - dropout ];basic_c.pdf;[ scale=0.4 ] images /;Introduction;1
9402;05357b8c05b5bc020e871fc330a88910c3177e4d;Method;1;forward process;In each training iteration , after the forward process of SGD , we can get a set of proposal scores .;Suppose the label vector for proposal is .;Then we can obtain the supervision of refinement time according to .;Online instance classifier refinement;5
51488;1ea6b2f67a3a7f044209aae0d0fd1cb14a1e9e06;Method;0;biasing process;"Then , in the biasing process , for each layer in the conditional PixelRNN , one simply maps the conditioning map into a map that is added to the input - to - state map of the corresponding layer ; this is performed using a unmasked convolution .";In the upsampling process , one uses a convolutional network with deconvolutional layers to construct an enlarged feature map of size , where is the number of features in the output map of the upsampling network .;The larger image is then generated as usual .;Multi - Scale PixelRNN;11
1077;0095c269e7d0c990249312687fc43521019809c4;Method;0;MQA;The reason may be that MQA is a relative simple task , which requires less reasoning abilities , compared with RTE task .;Additionally , LC - LSTMs is superior to TC - LSTMs .;Moreover , the parameters of LC - LSTMs are less than TC - LSTMs , which ensures the former can avoid suffering from overfitting on a relatively smaller corpus .;Results;33
68072;2bb9f0768fac9622a0be446df69daf75a954d5ac;Method;1;JAMR parser;To have a better understanding of our aligner , we conduct ablation test by removing the semantic matching and oracle parser tuning respectively and retrain the JAMR parser on the newswire proportion .;;The results are shown in Table [ reference ] .;Ablation;21
37408;15212fa4d30863ea1f9bd9591eee03848278242d;Method;0;pivot - based feature - transfer domain adaptation method;Distributional Correspondence Indexing ( DCI ) is a pivot - based feature - transfer domain adaptation method for cross - domain and cross - lingual text classification .;;"DCI was first described in , and later improved and extended in ; it was formerly implemented in Java as part of the JaTeCS";Introduction;1
9807;05ee231749c9ce97f036c71c1d2d599d660a8c81;Method;1;Ext - GV - S (- gG;To disambiguate various network configurations , we name the networks as Ext - GV - S (- gG ) , where Ext is the feature extractor network ( Res for ResNet - 50 or SE for SENet - 50 );= 256 or D F = 128 .;, S is the size of image sets used during training , and G is the number of ghost clusters ( if zero , the suffix is dropped ) .;Networks , deployment and baselines;11
71754;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;0;WaveNet abandoned RNN structures;For efficient sequential training , WaveNet abandoned RNN structures , proposing instead the dilated causal convolutional neural network ( CNN ) architecture , which provides significant advantages in working directly with raw audio waveforms .;"The problem of vanishing and exploding gradients is mitigated by LSTM and GRU memory gates ; other partial solutions include gradient clipping , orthogonal and unitary weight optimization , and skip connections across multiple timestamps .";However , the length of dependencies captured by a dilated CNN is limited by its kernel size , whereas an RNN ’s autoregressive modeling can , in theory , capture potentially infinitely long dependencies with a small number of parameters .;Introduction;1
25368;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;GPU cards;GPU cards under CUDA 7.5 and CUDNN V5 .;TitanX;Our testing uses only one card .;Implementation Details;15
71935;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;stack Vanilla;The common baselines include single - layer RNNs ( denoted as Vanilla RNN , LSTM , and GRU ) , multi - layer RNNs ( denoted as stack Vanilla , stack LSTM , and stack GRU ) , and Vanilla RNN with regular skip connections ( denoted as Skip Vanilla ) .;Three RNN cells , Vanilla , LSTM and GRU cells , are combined with the DilatedRNN , which we refer to as dilated Vanilla , dilated LSTM and dilated GRU , respectively .;Additional baselines will be specified in the corresponding subsections .;Experiments;14
23574;0cb8f50580cc69191144bd503e268451ce966fa6;Method;1;GG - NN model;We began our exploration of MPNNs around the GG - NN model which we believe to be a strong baseline .;;We focused on trying different message functions , output functions , finding the appropriate input representation , and properly tuning hyperparameters .;MPNN Variants;6
8291;052282998bc24db695891f755a00e3cebd3fd796;Method;1;density evaluation;It is based on AVL trees velskii1976avl and can execute sampling , insertion , deletion and density evaluation in time .;We implemented an algorithm satisfying the above constraints and called it Contextual Priority Tree ( CPT ) .;We describe CPT in detail in the Appendix in Section [ reference ] .;.;15
62637;2788a2461ed0067e2f7aaa63c449a24a237ec341;Method;0;Fast - RCNN detection;Recently , Wang et al . learn an adversary with Fast - RCNN detection to create hard examples on the fly by blocking some feature maps spatially .;The combination of random cropping and Random Erasing can produce more various training data .;Instead of generating occlusion examples in feature space , Random Erasing generates images from the original images with very little computation which is in effect , computationally free and does not require any extra parameters learning .;Related Work;2
52155;2019ede61cc0be14859908312e18458a7c79908f;Method;1;Beam search;"Beam search generates many ngram lookups for Kneser - Ney which requires many random memory accesses ; while neural models perform scoring through matrix - matrix products , an operation which is more local and can be performed in a block parallel manner where modern graphic processors shine [ reference ] . Table 4 shows generations for different variants of our model based on the Wikipedia table in Figure 1 .";Our model is also several times faster than the baseline , requiring only about 200 ms per sentence with K = 5 .;First of all , comparing the reference to the fact table reveals that our training data is not perfect .;Sentence decoding;24
69896;2d294bde112b892068636f3a48300b3c033d98da;Method;0;Template fitting methods;Template fitting methods match faces by constructing shape templates .;Conventional face alignment methods can be classified as two categories : template fitting and regression - based .;Cootes et al . proposed a typical template fitting method named Active Appearance Model ( AAM ) , which minimizes the texture residual to estimate the shape .;Conventional Face Alignment;3
91691;3b1b94441010615195a5c404409ce2416860508c;Method;1;A + C + S - K - LSTM model;We have also tested on the VQA test - dev and test - standard consisting of 60 , 864 and 244 , 302 questions ( for which ground truth answers are not published ) using our final A + C + S - K - LSTM model , and evaluated them on the VQA evaluation server .;More questions that require common - sense knowledge to answer can be found in the supplementary materials .;Table [ reference ] shows the server reported results .;Results on VQA;20
59996;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;shrinkage principle of robust covariance estimation;Recall that , for covariance pooling ConvNets , we face the problem of small sample of large dimensionality , and matrix square root is consistent with general shrinkage principle of robust covariance estimation .;, we observe that iSQRT - COV with ResNet computing approximate square root performs better than MPN - COV which can obtain exact square root by EIG .;Hence , we conjuncture that approximate matrix square root may be a better robust covariance estimator than the exact square root .;Fast Convergence of iSQRT - COV Network;16
54860;223319a93dcf3912bbc1e5f949e5ab4d53906e62;Method;0;domain discriminator;For example , we can set the architecture of the domain discriminator to be the layer - by - layer concatenation of two replicas of the label predictor followed by a two layer non - linear perceptron aimed to learn the XOR - function .;We assume that the family of domain classifiers is rich enough to contain the symmetric difference hypothesis set of : It is not an unrealistic assumption as we have a freedom to pick whichever we want .;Given the assumption holds , one can easily show that training the is closely related to the estimation of .;Relation to - distance;6
60193;258ec208f9c55371a67ebac68aa51bd7f7800a7b;Method;1;hidden layers of convolutional filters;Their network accepts an image as the input and produces an entire image as the output through four hidden layers of convolutional filters .;Their framework is the same as the recent Fully Convolutional Neural Networks ( FCN ) for semantic image segmentation and other tasks such as super - resolution , although their network is not as deep as today ’s models .;The weights are learned by minimizing the difference between the output and the clean image .;Introduction;1
24188;0d0101e65e52ae0cec38bcd13c6a9d631979c577;Method;1;element - wise means;"With element - wise means , this is trivial ; each computes the mean of only its active inputs .";As with dropout , signals may need appropriate rescaling .;In experiments , we train with dropout and a mixture model of local and global sampling for drop - path .;Regularization via Drop - path;4
46441;1bb5520bbc168e54c553758a76c6d953933bd8eb;Method;0;πθi;Then each of the p samples drawn from the SNES search distribution ( with mean µ and covariance Σ ) representing the parameters , θi , of a candidate policy , πθi , undergoes n trials , one for each image in the batch .;Each generation starts by selecting a subset of n images from X at random .;During a trial , the image is presented to the Maxout net T times .;Title;0
54055;218b80da3eb15ae35267d280dcc4a806d515334a;Method;0;single - round inference;As we discuss in Section [ reference ] , some sentences with multiple grammatical errors usually can not be perfectly corrected through normal seq2seq inference which makes only single - round inference .;;Fortunately , neural GEC is different from NMT : its source and target language are the same .;Multi - round error correction;9
98573;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;1;divergence based loss;In this work , we deploy an - divergence based loss used in the DANN model .;Domain Adversarial Training : Aligning two distributions has been widely studied in the literature .;Specifically , let us denote as a domain classifier , which is used to predict which domain an input pixel - level feature comes from , where denotes the source domain , and denotes the target domain .;Spatial - Aware Adaptation;5
36015;14908a18ff831005b6b4fc953ce61e1b4e7b54ee;Method;1;weighted uniform random sampler;The scores are used as weights in a weighted uniform random sampler , and from this , we sampled 5 , 000 tweets to be labeled .;This way , positive predictions for sentiment categories are weighted by how much they would contribute towards balancing all of the class distributions .;We found that overall , the method produced tweets with more emotion .;Active Learning;14
4035;0217fb2a54a4f324ddf82babc6ec6692a3f6194f;Method;1;auxiliary distribution Q ( c|x;Fortunately we can obtain a lower bound of it by defining an auxiliary distribution Q ( c|x ) to approximate P ( c|x ) :;"In practice , the mutual information term I ( c ; G ( z , c ) ) is hard to maximize directly as it requires access to the posterior P ( c|x ) .";This technique of lower bounding mutual information is known as Variational Information Maximization;Variational Mutual Information Maximization;6
80169;34273979fd2a62fd7b49ee6d14a925864ff94e74;Method;0;recurrent node update;introduces a recurrent node update for the same domain .;wei2016convolutional applies this idea to pose estimation using a series of convolutional layers and deng2016structure;There is rich literature on combining symbolic reasoning and logic with sub - symbolic distributed representations which goes all the way back to the birth of the idea of parallel distributed processing mcculloch1943logical .;Related work;14
99282;41232a69c0f8d4b993e6c6e00b16c223442c962f;Method;1;dependency parse technologies;To avoid generating fake facts in a summary , we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text .;While previous abstractive summarization approaches usually focus on the improvement of informativeness , we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system .;The dual - attention sequence - to - sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions .;Faithful to the Original : Fact Aware Neural Abstractive Summarization;0
95622;3e95925d2bca43223453010ff8516a492287ce19;Method;0;parameterless dot product attention;luong2015effective analyze various attention techniques and highlight the effectiveness of the simple , parameterless dot product attention .;Bahdanau2014NeuralMT propose attentional sequence to sequence models for neural machine translation .;Similar models have also proven successful in tasks such as summarization .;Related Work;13
98841;40eb1e54cb5382dfd3b7efd16dc7df826262ea52;Method;0;CNN based methods;However , these CNN based methods still require quantitization of point clouds with certain voxel resolution .;design more efficient 3D CNN or neural network architectures that exploit sparsity in point cloud .;Recently , a few works propose a novel type of network architectures ( PointNets ) that directly consumes raw point clouds without converting them to other formats .;Deep Learning on Point Clouds;4
42527;1822ca8db58b0382b0c64f310840f0f875ea02c0;Method;1;visualization;Barnes - Hut t - SNE [ reference ] visualization on Market - 1501 .;We use the same architecture Figure 4 .;We randomly select real training images of 700 identities to train the re - ID model and visualize the real samples ( R , dots ) and their fake ( style - transferred ) samples ( F , triangles ) of a rest 20 identities .;Camera - aware Image - Image Translation;6
69654;2cf6a8389135f682b0cb727a07f4e77c097d5434;Method;0;RRQR;Specifically , the Rank Revealing QR ( RRQR ) selects the subsets that give the best conditional sub - matrix .;One category relies on the assumption that the data points lie in one or multiple low - dimensional subspaces .;Unfortunately , this method has suboptimal properties , as it is not assured to find the globally optimum in polynomial time .;Introduction;1
61461;269c7aeca29dae51dca8208815f1c4c81bd471c2;Method;1;HFA based methods;The major advancements of the proposed approach over the HFA based methods are described in the following aspects : Firstly , the proposed approach revises the decomposition of in the HFA based methods to the multiplication of hidden components and , which is more intuitive and concise to model the unrelated components with less extra hyper - parameters .;Specifically , given a feature , the HFA based methods decompose the as , where is the mean feature regarding identity - related component , is the additional noise and are the transformation matrices for identity - related component and age - related component respectively .;Secondly , we explicitly project the identity features on a hypersphere to match the cosine similarity measurement for effectively combining the improvement strategies based on the Softmax loss and the margin of decision boundaries .;Discussion;5
20540;0b5519f76fc8e31ecf9931f00184aee86694e3a4;Method;0;iterative solver;"These methods are computationally expensive as an iterative solver is required for deconvolution after estimating the blur kernel ; and the deep learning approach can not generalize well to novel motion kernels .";leverage prior information about smooth motion by selecting from a predefine discretized set of linear blur kernels .;Compression Artifact Reduction is of significance as lossy image compression is ubiquitous for reducing the size of images transmitted over the web and recorded on data storage media .;Related Work;2
29548;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;0;Top - down methods;Top - down methods first detect people ( typically using a top performing , off - the - shelf object detector ) and then run a single person pose estimation;;( SPPN ) method per person to get the final pose predictions .;Top - down;6
74273;2f0c30d6970da9ee9cf957350d9fa1025a1becb4;Method;0;primitive bounding box based feature extraction;For another example , while object detection has seen significant and rapid progress recently , all approaches still rely on the primitive bounding box based feature extraction .;Because different locations may correspond to objects with different scales or deformation , adaptive determination of scales or receptive field sizes is desirable for visual recognition with fine localization , , semantic segmentation using fully convolutional networks .;This is clearly sub - optimal , especially for non - rigid objects .;Introduction;1
4730;02a5b7a41ffa8518eb3b7cae9914a2bd2bbc886b;Method;0;tracking system;[ reference ] propose to use , as a fundamental building block of a tracking system , an offlinetrained fully - convolutional Siamese network that compares an exemplar image z against a ( larger ) search image x to obtain a dense response map .;Bertinetto et al .;z and x are , respectively , a w×h crop centered on the target object and a larger crop centered on the last estimated position of the target .;Fully - convolutional Siamese networks;6
49066;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;end gradient based optimization;By thresholding memory modifications to a sparse subset , and using efficient data structures for content - based read operations , our model is optimal in space and time with respect to memory size , while retaining end - to - end gradient based optimization .;In this paper , we present a MANN named SAM ( sparse access memory ) .;To test whether the model is able to learn with this sparse approximation , we examined its performance on a selection of synthetic and natural tasks : algorithmic tasks from the NTM work , Babi reasoning tasks used with Memory Networks and Omniglot one - shot classification .;Introduction;1
65424;2a69ddbafb23c63e5e22401664bea229daaeb7d6;Method;1;CNN model;This dimension changes filters from the single - branch to multi - branch and improves the representation ability of a CNN model .;The dimension cardinality indicates the number of groups within a filter .;In our design , we can replace the 3 3 convolution with the 3 3 group convolution , where indicates the number of groups .;Dimension cardinality .;11
71628;2d876ed1dd2c58058d7197b734a8e4d349b8f231;Method;0;strongly - typed recurrent neural networks;Quasi - recurrent neural networks are related to several such recently described models , especially the strongly - typed recurrent neural networks ( T - RNN ) introduced by .;Exploring alternatives to traditional RNNs for sequence tasks is a major area of current research .;While the motivation and constraints described in that work are different , ’s concepts of “ learnware ” and “ firmware ” parallel our discussion of convolution - like and pooling - like subcomponents .;Related Work;8
102883;432d8cba544bf7b09b0455561fea098177a85db1;Method;1;Exponential , Gaussian , Uniform or Laplacian distribution;Datasets consist of samples from either an Exponential , Gaussian , Uniform or Laplacian distribution with equal probability .;We generated a collection of synthetic - D datasets each containing samples .;Means and variances are sampled from and respectively .;Simple 1 - D Distributions;16
8413;052282998bc24db695891f755a00e3cebd3fd796;Method;1;Insertion;Insertion , deletion , sampling and probability query operations can be done in O ( ln ( n ) ) time .;Similarly , probabilities of arbitrary keys can be queried by traversing the tree from the root node towards the child node of an interest while maintaining a product of probabilities at each branching point .;The suggested algorithm has the desired property that it becomes a simple proportional sampling algorithm once all the priorities are known .;Contextual priority tree;26
50204;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;1;IMS;For morphology , the top system for most languages ( IMS ) used its own segmentation [ reference ] . For the evaluation , we used the official evaluation script [ reference ] .;Most of the top performing systems for part - of - speech tagging used as input UDPipe to obtain the segmentation for the input data .;;Data Sets;12
25740;0dab72129b4458d9e3dbf1f109848c2d6d7af8a8;Method;0;domain - general semantic representations;This suggests that NLI is an ideal testing ground for theories of semantic representation , and that training for NLI tasks can provide rich domain - general semantic representations .;Natural languages are powerful vehicles for reasoning , and nearly all questions about meaningfulness in language can be reduced to questions of entailment and contradiction in context .;To date , however , it has not been possible to fully realize this potential due to the limited nature of existing NLI resources .;Conclusion;14
96584;3f45d73a7b8d10a59a68688c11950e003f4852fc;Method;0;general model;Note that the above methods all trained a general model independent of camera views .;This indicates that the new feature helps to reduce intra - class variations , so that the same person can be recognized at a higher rank .;A research in show that the performance can be improved by utilizing the camera network information .;Experiments on QMUL GRID;15
5271;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;1;DeepMedic ”;[ b ] 0.5 The final version of the proposed network architecture , referred to as “ DeepMedic ” , is built by extending the Deep + model with a second convolutional pathway that is identical to the first one .;;Two hidden layers are added for combining the multi - scale features before the classification layer , resulting in a deep network of 11 - layers ( cf .;Effect of the Multi - Scale Dual Pathway;14
56424;231af7dc01a166cac3b5b01ca05778238f796e41;Method;1;cumulants;These equalities of expectations are used to describe distributions by moments or cumulants , where are polynomials of the data .;The equality holds except for a non - measurable set if and only if for a basis spanning the function space in which and live .;We generalize these polynomials by replacing by the coding layer of an inception model in order to obtain vision - relevant features .;Performance Measure .;8
85936;36a03f648b40d209ce361550dbe1c823ddb715b5;Method;0;region ensemble network;[ reference ][ reference ] proposed a region ensemble network to accurately estimate the 3D coordinates of hand keypoints and Chen et al .;Guo et al .;[ reference ] improved this network by iteratively refining the estimated pose .;Related works;4
4027;0217fb2a54a4f324ddf82babc6ec6692a3f6194f;Method;0;generator distribution G ( z;To cope with the problem of trivial codes , we propose an information - theoretic regularization : there should be high mutual information between latent codes c and generator distribution G ( z , c ) .;P G ( x ) .;"Thus I ( c ; G ( z , c ) ) should be high .";Mutual Information for Inducing Latent Codes;5
87805;37b685caf39b38b07af60eacf1a7d7ada2122372;Method;1;graph laplacians;Since different shapes give rise to different nearest neighbor graphs on their point clouds , the eigenbases we get for the graph laplacians are not directly comparable .;The issue of information sharing across shapes is more challenging .;We synchronize all these laplacians by applying a functional map in the spectral domain to align them to a common canonical space .;Introduction;1
24982;0d467adaf936b112f570970c5210bdb3c626a717;Method;1;EPE;Among non - stereo methods we obtain the best EPE on KITTI2012 and the first rank on the KITTI2015 benchmark .;Fine - tuning on a combination of the KITTI2012 and KITTI2015 training sets reduces the error roughly by a factor of ( FlowNet2 - ft - kitti ) .;This shows how well and elegantly the learning approach can integrate the prior of the driving scenario .;Speed and Performance on Public Benchmarks;11
5864;03184ac97ebf0724c45a29ab49f2a8ce59ac2de3;Method;1;second layer weights;Here , we pre - train the first layer weights using standard Word2Vec on Wikipedia , and fine - tune the second layer weights using a negative - sampling objective only on the fine - grained text corpus .;We achieve the best results in our experiments using our novel variant of the CBOW formulation .;These weights correspond to the final output embedding .;Learning Label Embeddings from Text;9
48158;1cf6bc0866226c1f8e282463adc8b75d92fba9bb;Method;1;encoding phase;The image CNN feature vector is shown at each time step of the encoding phase .;Neural - Image - QA : uses an LSTM to encode the question and then decode the hidden information into the answer .;Question LSTM : only shows the question to the LSTM to predict the answer without any image information .;Results on DAQUAR;11
48429;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;0;fusion sub - components;Below , we go into the details of our point cloud and fusion sub - components .;[ reference ] D ) , in which we use a dense spatial anchor mechanism to improve the 3D box predictions and two scoring functions to select the best predictions .;;PointFusion;3
12563;071b16f25117fb6133480c6259227d54fc2a5ea0;Method;0;stemming;We do not apply any other special preprocessing , such as lowercasing or stemming , to the data .;Any word not included in the shortlist is mapped to a special token ( ) .;( a ) ( b ) ( c ) ( d );Dataset;8
6964;04640006606ddfb9d6aa4ce8f55855b1f23ec7ed;Method;0;WRNs;Due to this fact , we hereafter restrict our attention to only WRNs with convolutions so as to be also consistent with other methods .;Based on the above , blocks with comparable number of parameters turned out to give more or less the same results .;;Type of convolutions in a block;8
62152;2742a33946e20dd33140b8d6e80d5fd04fced1b2;Method;0;commodity range sensing technologies;With the rise of commodity range sensing technologies , this research has become paramount to many applications including object pose estimation , object retrieval , 3D reconstruction , and camera localization .;Matching 3D geometry has a long history starting in the early days of computer graphics and vision .;However , matching local geometric features in lowresolution , noisy , and partial 3D data is still a challenging task as shown in Fig . 1 .;Introduction;2
47115;1bea6bbdb4aed87fff5390d42934a1d9b0a7bec4;Method;0;an - gram model;"According to the first study on this dataset , a language model ( an - gram model or a recurrent neural network ) with local context is sufficient for predicting verbs or prepositions ; however , for named entities or common nouns , it improves performance to scan through the whole paragraph to make predictions .";The questions are also categorized by the type of the missing word : named entity , common noun , preposition or verb .;So far , the best published results are reported by window - based memory networks .;Related Tasks;14
89061;39978ba7c83333475d6825d0ff897692933895fc;Method;0;unrolling message passing algorithms;Many of these ideas can be traced back to [ reference ] , which proposes unrolling message passing algorithms as simpler operations that could be performed within a CNN .;[ reference ] , in particular proposes an approach based on learning messages .;In a different setup , Krähenbühl and Koltun [ reference ] demonstrated automatic parameter tuning of dense CRF when a modified mean - field algorithm is used for inference .;Related Work;3
102644;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;1;SRMD;From Figure [ reference ] , we can see that VDSR and SelfEx both tend to produce over - smoothed results , whereas SRMD can recover sharp image with better intensity and gradient statistics of clean images .;In comparison , SRMD can not only remove the unsatisfying artifacts but also produce sharp edges .;;Experiments on Real Images;17
1605;01125e3c68edb420b8d884ff53fb38d9fbe4f2b8;Method;0;CNN - based depth estimation;CNN - based depth estimation .;Given training data like the one produced by , then we believe that our method has the capacity to learn finer facial details , too .;Our work has been inspired by the work of who showed that a CNN can be directly trained to regress from pixels to depth values using as input a single image .;Closely related work;3
85501;36973330ae638571484e1f68aaf455e3e6f18ae9;Method;0;multi - order context representation;A multi - order context representation was used in to exploit co - occurrence contexts of different objects .;Nam et al . introduced an efficient feature transform that removes correlations in local image neighborhoods by extending the features of to ACF .;Cai et al .;Related Work;2
49762;1db9bd18681b96473f3c82b21edc9240b44dc329;Method;1;bicubic down - sampled version;We take a bicubic down - sampled version of our high resolution sample , find the nearest low resolution input image in the training data for that sample , and calculate the MS - SSIM score between the high resolution sample and the corresponding high resolution image in the training data .;We quantify that our models are more effective than exemplar based Super Resolution techniques like Nearest Neighbors , which perform a naive look - up of the training data to find the high resolution output .;On average , we get a MS - SSIM score of , on samples from the validation set , which shows that our models do n’t merely learn to copy training images but generate high - quality images by adding synthesized details on the low resolution input image .;CelebA;15
35695;143a3186c368544ded00a444be33153420baa254;Method;1;Siamese nets;1 - shot 5 - shot 1 - shot 5 - shot MANN , no conv [ reference ] 82.8 % 94.9 % -- MAML , no conv ( ours ) 89.7 ± 1.1 % 97.5 ± 0.6 % -- Siamese nets [ reference ] 97.3 % 98.4 % 88.2 % 97.0 % matching nets [ reference ] 98.1 % 98.9 % 93.8 % 98.5 % neural statistician [ reference ] 98.1 % 99.5 % 93.2 % 98.1 % memory mod .;Accuracy Omniglot [ reference ];[ reference ] 98.4 % 99.6 % 95.0 % 98.6 % MAML ( ours );Classification;15
17304;0a053f55804eee01f3c8b4138a1d3364d5bc45ac;Method;1;RL - based method;We observe that M - Walk outperforms the other RL - based method ( MINERVA ) .;The results are reported in Table [ reference ] .;However , it is still worse than the embedding - based methods .;Knowledge Graph Link Prediction;32
29507;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;0;HG;Hourglass blocks , ( HG ) developed by Newell et al . , are basically convolution - deconvolution structures with residual connections .;Wei et al . were inspired by the pose machines and used CNNs as feature extractors in pose machines .;Newell et al . stacked HG blocks to obtain an iterative refinement process and showed its effectiveness on single person pose estimation .;Single Person Pose Estimation;3
1821;0116899fce00ffa4afee08b505300bb3968faf9f;Method;0;MAN - AGER;Next , given the goal embedding produced by the MAN - AGER , the WORKER first encodes current generated words with another LSTM , then combines the output of the LSTM and the goal embedding to take a final action at current state .;We thus call it a leakage of information from D.;As such , the guiding signals from D are not only available to G at the end in terms of the scalar reward signals , but also available in terms of a goal embedding vector during the generation process to guide G how to get improved .;Introduction;2
943;0095c269e7d0c990249312687fc43521019809c4;Method;0;gating units;where the gating units and determine which memory units are affected by the inputs through , and which memory cells are written to the hidden units .;We define a tightly coupled - LSTMs units as follows .;is an affine transformation which depends on parameters of the network and .;Tightly Coupled - LSTMs ( TC - LSTMs );8
94990;3e79a574d776c46bbe6d34f41b1e83b5d0f698f2;Method;1;BiLSTM - CRF;We compare S - LSTM - CRF with BiLSTM - CRF for sequence labelling , using the same settings as decided on the movie review development experiments for both BiLSTMs and S - LSTMs .;Bi - directional RNN - CRF structures , and in particular BiLSTM - CRFs , have achieved the state of the art in the literature for sequence labelling tasks , including POS - tagging and NER .;For the latter , we decide the number of recurrent steps on the respective development sets for sequence labelling .;Final Results for Sequence Labelling;12
20981;0b5aef2894d3248fb5ecc955d50501f0aa276036;Method;1;early - fusion;On the other hand , we compared our model with early fusion ( early - fusion ) for aforementioned feature sets ( UFE ) .;Although other modalities contribute to improve the performance of multimodal classifiers , that contribution is little in compare to the textual modality .;Our fusion mechanism consistently outperforms early fusion for all combination of modalities .;Hierarchical Fusion ( HFusion );35
91614;3b1b94441010615195a5c404409ce2416860508c;Method;1;vggNet;We also implement a baseline model VggNet + ft - LSTM , which applies a vggNet that has been fine - tuned on the COCO dataset , based on the task of image - attributes classification .;The CNN is a pre - trained ( on ImageNet ) VggNet model from which we extract the coefficients of the last fully connected layer .;We also present results from a series of cut down versions of our approach for comparison .;Results on DAQURA;18
6542;03a5b2aac53443e6078f0f63b35d4f95d6d54c5d;Method;1;python / theano implementation;For methods which use convolutions including our own , a python / theano implementation is used to improve the efficiency based on the Matlab codes provided in .;We evaluate the run time of other methods from the Matlab codes provided by and .;The results are presented in Fig .;Run time evaluations;14
59562;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;0;multi - GPUs Large - scale;CUDA support Scalability to multi - GPUs Large - scale ( LS ) or Small - scale ( SS ) EIG algorithm BP of EIG limited G DeNet SVD algorithm BP of SVD limited Improved B - CNN Newton - Schulz Iter .;Very recent works have demonstrated that matrix square root normalization of global covariance pooling plays a key role in achieving state - of - the - art performance in both large - scale visual recognition and challenging FGVC .;BP by Lyapunov equation ( SCHUR or EIG required );Introduction;1
101681;42764b57d0794b63487a295ce8c07eeb6961477e;Method;0;masking layers;Sharing convolutional features among mask - level proposals is enabled by using masking layers .;Using mask - level region proposals , instance - aware semantic segmentation can be addressed based on the R - CNN philosophy , as in R - CNN , SDS , and Hypercolumn .;All these methods rely on computationally expensive mask proposal methods .;Related Work;2
87763;37b685caf39b38b07af60eacf1a7d7ada2122372;Method;1;parameterizing kernels;To enable the prediction of vertex functions on them by convolutional neural networks , we resort to spectral CNN method that enables weight sharing by parameterizing kernels in the spectral domain spanned by graph laplacian eigenbases .;Compared with images that are 2D grids , shape graphs are irregular and nonisomorphic data structures .;Under this setting , our network , named SyncSpecCNN , strive to overcome two key challenges : how to share coefficients and conduct multi - scale analysis in different parts of the graph for a single shape , and how to share information across related but different shapes that may be represented by very different graphs .;SyncSpecCNN : Synchronized Spectral CNN for 3D Shape Segmentation;0
46637;1bc072002d97808340b312b69427baf2dc9fcb8e;Method;1;Factorisation Machine supported Neural Network;We introduce two types of deep learning models , called Factorisation Machine supported Neural Network ( FNN ) and Sampling - based Neural Network ( SNN ) .;In this paper , we take ad CTR estimation as a working example to study deep learning over a large multi - field categorical feature space by using embedding methods in both supervised and unsupervised fashions .;Specifically , FNN with a supervised - learning embedding layer using factorisation machines is proposed to efficiently reduce the dimension from sparse features to dense continuous features .;Introduction;1
73646;2e942d19333651bf6012374ea9e78d6937fd33ac;Method;1;region - based methods;Face detection has achieved great success using the region - based methods .;;In this report , we propose a region - based face detector applying deep networks in a fully convolutional fashion , named Face R - FCN .;Detecting Faces Using Region - based Fully Convolutional Networks;0
26617;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;0;general competency;While general competency remains the long - term goal for artificial intelligence , ALE proposes an achievable stepping stone : techniques for general competency across the gamut of Atari 2600 games .;Ideally , agents designed in this fashion are evaluated on the testing games only once , with no possibility for subsequent modifications to the algorithm .;We believe this represents a goal that is attainable in a short time - frame yet formidable enough to require new technological breakthroughs .;Introduction;1
81044;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;1;EDM;: For the CUHK03 dataset , we compare our method with many existing approaches , including Filter Pair Neural Networks ( FPNN ) , Improved Deep Learning Architecture ( IDLA ) , Cross - view Quadratic Discriminant Analysis ( XQDA ) , PSD constrained asymmetric metric learning ( denoted as MLAPG ) , Sample - Specific SVM ( SS ) , Single image and Cross image representation ( SI - CI ) , Embedding Deep Metric ( EDM ) , Domain Guided Dropout ( DGD ) , DNS , S - LSTM and Gate - SCNN .;CUHK03;On this dataset , we conduct experiments on both the detected and the labeled datasets .;Comparison with State - of - the - art Methods;11
84991;364da079f91a6cb385997be990af06e9ddf6e888;Method;1;seq;The seq2 - bow - CNN for Elec in the table is the same except that the regions sizes of seq - convolution layers are 3 and 4 .;This third layer is a bow - convolution layer with one region of variable size that takes one - hot vectors with - gram vocabulary as input to learn document embedding .;On both datasets , performance is improved over seq2 - CNN .;Performance results;19
49366;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;SDNC;The results are described in Supplementary [ reference ] and demonstrate the SDNC is capable of learning competitively .;In order to compare the models on an interesting task we ran the DNC and SDNC on the Babi task ( this task is described more fully in the main text ) .;In particular , it achieves the best report result on the Babi task .;Results;32
57616;2393447b8b0b79046afea1c88a8ed3949338949e;Method;0;word level PasunuruB18 , LiXLG18 , GehrmannDR18;"Most methods select content at the sentence level HsuLLMTS18 , ChenB18 and the word level PasunuruB18 , LiXLG18 , GehrmannDR18 ; our model incorporates content selection at the passage level in the combined attention .";In particular , content selection approaches , which decide what to summarize , have recently been used with abstractive models .;Query - based abstractive summarization has been rarely studied .;Abstractive summarization .;49
71848;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;DilatedRNN structure;One is the proposed DilatedRNN structure with layers and ( equation ( [ reference ] ) ) .;Consider two RNN architectures .;The other is a regular - layer RNN with skip connections ( equation ( [ reference ] ) ) .;( Mean Recurrent Length ) .;8
64282;28703eef8fe505e8bd592ced3ce52a597097b031;Method;1;BSO models;For all experiments , we trained both seq2seq and BSO models with mini - batch Adagrad ( using batches of size 64 ) , and we renormalized all gradients so they did not exceed 5 before updating parameters .;Finally , it has been established that dropout regularization improves the performance of LSTMs , and in our experiments we run beam search under dropout .;We did not extensively tune learning - rates , but we found initial rates of 0.02 for the encoder and decoder LSTMs , and a rate of 0.1 or 0.2 for the final linear layer ( i.e. , the layer tasked with making word - predictions at each time - step ) to work well across all the tasks we considered .;Methodology;10
27245;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;conceptual architecture;The conceptual architecture of the model is illustrated in Figure [ reference ] .;We consider the mean squared error as the loss function for a given batch of training instances : where denotes the query and the corresponding retrieved document in the training instance , i.e. in the batch .;Rank model : In this model , similar to the previous one , the goal is to learn a scoring function for a given pair of query and document with the set of model parameters .;Ranking Architectures;5
20260;0b544dfe355a5070b60986319a3f51fb45d1348e;Method;0;CGM );The difference with our model is that they used a convolutional - gram model ( CGM ) for the encoder and the hybrid of an inverse CGM and a recurrent neural network for the decoder .;In their paper , they proposed a similar model that consists of an encoder and decoder .;They , however , evaluated their model on rescoring the - best list proposed by the conventional SMT system and computing the perplexity of the gold standard translations .;Related Approaches : Neural Networks in Machine Translation;8
98179;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;convnet features;In this paper , we provide evidence that convnet features perform at least as well as conventional ones , even in the regime of point - to - point correspondence , and we show considerable performance improvement in certain settings , including category - level keypoint prediction .;Or do large receptive fields mean that correspondence is effectively pooled away , making this a task better suited for hand - engineered features ?;;Introduction;1
86738;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;sentence - level representation;We compute the sentence - level representation as the element - wise maximum values across all of the word - level representations in the fourth layer : This max - pooling technique has proven effective in text classification tasks lai2015maxpooling .;Now it is required to obtain the sentence - level representation rather than the word - level representation used in the first three tasks .;To model the semantic relatedness between and , we follow tai2015treelstm .;Semantic Task : Semantic relatedness;7
46758;1bc072002d97808340b312b69427baf2dc9fcb8e;Method;1;RBM;With the sampled units , we can train an RBM via contrastive divergence and a DAE via SGD with unsupervised approaches to largely reduce the data dimension with high recovery performance .;( c ) are unsampled and thus ignored when pre - training the data instance .;The real - value dense vector is used as the input of the further layers in SNN .;Sampling - based Neural Networks ( SNN );5
5433;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;0;locally connected random fields;The capabilities of DeepMedic and the employed CRF for capturing 3D patterns exceed those of 2D networks and locally connected random fields , models that have been commonly used in previous work .;The design of the proposed system is well suited for processing medical volumes thanks to its generic 3D nature .;At the same time , our system is very efficient at inference time , which allows its adoption in a variety of research and clinical settings .;Discussion and Conclusion;30
85578;36973330ae638571484e1f68aaf455e3e6f18ae9;Method;0;scale - aware weighting layer;A scale - aware weighting layer is designed to perform a gate function which is defined over the sizes of object proposals and used to combine the detection results from two sub - networks .;As the extracted features from large - size and small - size pedestrians exhibit significant differences , SAF R - CNN incorporates two sub - networks , focusing on the detection of large - size and small - size pedestrians , respectively .;Intuitively , the weights for the two sub - networks should satisfy the following constraints .;Scale - aware Weighting;7
36658;151313065d71b49dbf07289c002c887d7b5a0a6b;Method;0;Product - based Neural Network;Feature interaction is studied in , by introducing a product layer between embedding layer and fully - connected layer , and proposing the Product - based Neural Network ( PNN ) .;This model pre - trains FM before applying DNN , thus limited by the capability of FM .;As noted in , PNN and FNN , like other deep models , capture little low - order feature interactions , which are also essential for CTR prediction .;Introduction;1
65759;2a94c84383ee3de5e6211d43d16e7de387f68878;Method;0;Single Shot Detector;The Single Shot Detector ( SSD ) [ reference ] is one of the first attempts at using a ConvNet 's pyramidal feature hierarchy as if it were a featurized image pyramid ( Fig . 1 ( c ) ) .;The high - resolution maps have low - level features that harm their representational capacity for object recognition .;Ideally , the SSD - style pyramid would reuse the multi - scale feature maps from different layers computed in the forward pass and thus come free of cost .;Introduction;2
24488;0d24a0695c9fc669e643bad51d4e14f056329dec;Method;0;unbiased estimate;Since this gradient expression is an expectation , it is trivial to build an unbiased estimate for it : where are random samples from .;Intuitively , this equality corresponds to increasing the probability of actions that give high values , and decreasing the probability of actions that give low values .;By replacing with a parameteric estimate one can obtain a biased estimate with relatively low variance .;Actor - Critic for Sequence Prediction;6
17910;0a34fe39e9938ae8c813a81ae6d2d3a325600e5c;Method;0;iterative methods;Unlike our proposed pose estimation , they regress poses by using iterative methods which involve computationally costly face rendering .;Some recently addressed faces in particular , though their methods are designed to estimate 2D landmarks along with 3D face shapes .;We regress 6DoF directly from image intensities without such rendering steps .;Related work;2
19336;0a6d7e8e61c54c796f53120fdb86a25177e00998;Method;0;negative sampling procedure;Another direction would be to develop a more intelligent negative sampling procedure , to generate more informative negatives with respect to the positive sample from which they have been sampled .;For example , the use of pairwise embeddings together with complex numbers might lead to improved results in many situations that involve non - compositionality .;It would reduce the number of negatives required to reach good performance , thus accelerating training time .;Conclusion;12
72742;2e10643c3759f97b673ff8c297778c0b6c20032b;Method;1;character - level convolutional networks;This article offers an empirical study on character - level convolutional networks for text classification .;;We compared with a large number of traditional and deep learning models using several large - scale datasets .;Conclusion and Outlook;13
37907;15ca7adccf5cd4dc309cdcaa6328f4c429ead337;Method;1;probing filter;The DotProduct layer computes the dot product between the probing filter weights and the signals from the Sensor layer .;The Sensor layer is responsible for collecting the signals ( the values in the input fields ) at the probing points in the forward pass , and updating the probing point locations in the backward pass .;The Gaussian layer is an utility layer that transforms distance field into a representation that is more friendly for numerical computation .;Field Probing Layers;8
50106;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;0;recurrent layer;The idea of using a recurrent layer over characters to induce a complementary view of a word has occurred in numerous papers .;[ reference ] , [ reference ] and [ reference ] extended the work of Chen et al . to a structured prediction setting , the later two use again a mix of word and sub - word features .;Perhaps the earliest is Santos and Zadrozny ( 2014 ) who compare character - based LSTM encodings to traditional word - based embeddings .;Related Work;3
30033;0ff9ea8409c932baf3c0302c89ede79add1431aa;Method;1;softmax - style normalization;We next describe how softmax - style normalization can be performed at the local or global level .;Note that the score is linear in the parameters .;;Transition System;3
13357;074b6fe0cc6848fb86a6703d1c52074494177c79;Method;1;image discriminators;This yeilds learned parameters for the image transformations , and , image discriminators , and , as well as an initial setting of the task model , , which is trained using pixel transformed source images and the corresponding source pixel labels .;Next , we perform pixel - level adaptation using our image space GAN losses together with semantic consistency and cycle consistency losses .;Finally , we perform feature space adpatation in order to update the target semantic model , , to have features which are aligned between the source images mapped into target style and the real target images .;Implementation Details;12
8365;052282998bc24db695891f755a00e3cebd3fd796;Method;0;Rainbow;Additionally , unlike Rainbow , Reactor does not use Noisy Networks fortunato2017noisy , which was reported to have contributed to the performance gains .;( see Figure [ reference ] , right ) .;When evaluating under the no - op starts regime ( Table [ reference ] ) , Reactor out performs all methods except for Rainbow .;Comparing to prior work;19
41531;1713d05f9d5861cac4d5ec73151667cb03a42bfc;Method;1;coding schemes;The assignment of codes is more balanced for larger coding schemes .;” for the first component , while the subcode “ 1 ” is only used by 5 % of the words .;In any coding scheme , even the most unpopular codeword is used by about 1000 words .;Analysis of Code Efficiency;11
82727;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;1;Batch - normalization batch;Batch - normalization batch did not help in the case of the MNIST dataset .;In the case of MNIST with 100 labels , the test error after the first epochs is 16.50 % , after 50 epochs is 3.40 % , after 500 epochs is 2.21 % and after 5000 epochs is 1.80 % .;;MNIST;18
2712;01959ef569f74c286956024866c1d107099199f7;Method;1;human models;"The dataset contains 20 "" paperdoll "" human models [ reference ] spanning genders , races , and ages with 8 different expressions .";To attract researchers interested in exploring the high - level reasoning required for VQA , but not the low - level vision tasks , we create a new abstract scenes dataset [ reference ] , [ reference ] , [ reference ] , [ reference ] containing 50 K scenes .;The limbs are adjustable to allow for continuous pose variations .;VQA DATASET COLLECTION;4
30627;100c730003033151c0f78ed1aab23df3e9bd5283;Method;0;pair - specific representation;The latter adds an attention model to learn pair - specific representation for prediction on the basis of the vanilla LSTM .;[ reference ] ) on the independent question and answer representations which are the last state outputs and from the question and answer LSTM models .;Moreover , LSTM + Att is the deterministic counterpart of NASM , which has the same neural network architecture as NASM .;Dataset & Setup for Answer Sentence Selection;8
29546;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;0;associative vector embeddings;Newell et al . extended their SHG idea by outputting associative vector embeddings which can be thought as tags representing each keypoint ’s group .;This model runs in realtime .;They group keypoints with similar tags into individual people .;Bottom - up;5
62255;2742a33946e20dd33140b8d6e80d5fd04fced1b2;Method;1;descriptor;We can use the correspondences from these challenging registrations to train a powerful descriptor that can be used for other tasks where the aforementioned domain knowledge is unavailable .;Second , reconstructions can leverage domain knowledge such as temporal information and well - engineered global optimization methods , which can facilitate wide baseline registrations ( loop closures ) .;Third , by learning from multiple reconstruction datasets , we can optimize 3DMatch to generalize and robustly match local geometry in real - world partial 3D data under a variety of conditions .;Learning From Reconstructions;4
48328;1d5d0a41b720bc51fd568cf78f8aa4ec5af4f802;Method;0;stereo;Hence , many current real - world systems either use stereo or augment their sensor stack with lidar and radar .;Methods for 3D box regression from a single image , even including recent deep learning methods such as , still have relatively low accuracy especially in depth estimates at longer ranges .;The lidar - radar mixed - sensor setup is particularly popular in self - driving cars and is typically handled by a multi - stage pipeline , which preprocesses each sensor modality separately and then performs a late fusion or decision - level fusion step using an expert - designed tracking system such as a Kalman filter .;Introduction;1
16959;0a053f55804eee01f3c8b4138a1d3364d5bc45ac;Method;0;optimal policy;And defines the long - term reward of taking action at state and then following the optimal policy thereafter .;In M - Walk , it is used as a prior to bias the MCTS search .;The objective is to learn a policy that maximizes the terminal rewards , i.e. , correctly identifies the target node with high probability .;Graph Walking as a Markov Decision Process;2
71974;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;skip vanilla;Further , we observe significant performance differences between stack Vanilla and skip vanilla , which is consistent with the findings in that RNNs can better model long - term dependencies and achieves good results when recurrent skip connections added .;However , the performance improvements of dilated GRU and LSTM over both the single - and multi - layer ones are marginal , which might be because the task is too simple .;Nevertheless , the dilated vanilla has yet another significant performance gain over the skip Vanilla , which is consistent with our argument in section [ reference ] , that the DilatedRNN has a much more balanced memory over a wide range of time periods than RNNs with the regular skips .;Pixel - by - pixel MNIST;16
63820;27e4b65121d3c88643d86dc91a9bdafdf223b988;Method;0;bag - of - embeddings representation;We believe the bidirectional RNN we used to model the source captures richer contextual information of every word than the bag - of - embeddings representation used by namas and chopra in their convolutional attentional encoders , which might explain our superior performance .;- lvt5k - 1sent outperforms the state - of - the - art model of with statistically significant improvement on Rouge - 1 .;Further , explicit modeling of important information such as multiple source sentences , word - level linguistic features , using the switch mechanism to point to source words when needed , and hierarchical attention , solve specific problems in summarization , each boosting performance incrementally .;Gigaword Corpus;9
42632;1822ca8db58b0382b0c64f310840f0f875ea02c0;Method;1;Euclidean distance;In testing , we extract the output of the Pool - 5 layer as image descriptor ( 2 , 048 - dim ) and use the Euclidean distance to compute the similarity between images .;The learning rate is divided by 10 after 40 epochs , we train 50 epochs in total .;Training CNN with CamStyle .;Experiment Settings;11
27865;0f0a25d3be0d50a134f6f68e6a82bd8a2f668882;Method;0;transfer learning approaches;SwiDeN. This shows that the complexity involved in cross - depiction recognition can not be addressed merely by employing typical transfer learning approaches such as fine - tuning .;However , its performance for ‘ Art ’ is relatively lower compared to;In spite of the class - bias induced by the splits provided by Cai et al . , we compared the performance of our C4 - S SwiDeN architecture against that of the multi - attribute part - graph model proposed by Wu et al . .;Results;15
24648;0d24a0695c9fc669e643bad51d4e14f056329dec;Method;0;log - likelihood training;When beam - search is used , the ranking of the compared approaches is the same , but the margin between the proposed methods and log - likelihood training becomes smaller .;Surprisingly , the best performing method is REINFORCE with critic , with an additional BLEU point advantage over the actor - critic .;The final performances of the actor - critic and the REINFORCE - critic with greedy search are also and BLEU points respectively better than what ranzato2015sequence report for their MIXER approach .;IWSLT 2014 with a convolutional encoder;16
7309;04957e40d47ca89d38653e97f728883c0ad26e5d;Method;0;quality detectors;At inference , the quality of the hypotheses is sequentially improved , by applications of the same cascade procedure , and higher quality detectors are only required to operate on higher quality hypotheses .;, this guarantees a sequence of effectively trained detectors of increasing quality .;This enables high quality object detection , as suggested by Figure 1 ( c ) and ( d ) .;Cascaded Detection;10
59051;23f5854b38a15c2ae201e751311665f7995b5e10;Method;1;logistic;We compare multinomial likelihood with Gaussian and logistic in Section 4 .;where σ ( x ) = 1 /( 1 + exp ( −x ) ) is the logistic function .;;Model;4
71405;2d876ed1dd2c58058d7197b734a8e4d349b8f231;Method;1;QRNN variants;We describe QRNN variants tailored to several natural language tasks , including document - level sentiment classification , language modeling , and character - level machine translation .;Like RNNs , QRNNs allow the output to depend on the overall order of elements in the sequence .;These models outperform strong LSTM baselines on all three tasks while dramatically reducing computation time .;Introduction;1
8157;052282998bc24db695891f755a00e3cebd3fd796;Method;0;Prioritized DQN;Specifically , for a replay buffer of size , prioritized experience replay samples transition with probability , and applies weighted importance - sampling with to correct for the prioritization bias , where Prioritized DQN significantly increases both the sample - efficiency and final performance over DQN on the Atari 2600 benchmarks schaul2015prioritized .;Drawing inspiration from prioritized sweeping moore1993prioritized , prioritized experience replay replaces the uniform sampling with prioritized sampling proportional to the absolute TD error schaul16prioritized .;;Prioritized experience replay;4
86351;36b1ba4287c4884df27dd684c4c7f66f32e943db;Method;1;HypER as;subsection : Understanding HypER as Tensor Factorization;For high level intuition , Figure [ reference ] shows a t - SNE plot of a subject entity embedding before and after it has been transformed by relation - specific filters , i.e. prior to combining it with the object embeddings .;Having described the HypER architecture , we can view it as a series of tensor operations by considering the hypernetwork and weight matrix as tensors and respectively .;Understanding HypER as Tensor Factorization;6
47035;1bea6bbdb4aed87fff5390d42934a1d9b0a7bec4;Method;1;question embedding;In this step , the goal is to compare the question embedding and all the contextual embeddings , and select the pieces of information that are relevant to the question .;We choose to use Gated Recurrent Unit ( GRU ) in our experiments because it performs similarly but is computationally cheaper than LSTM .;We compute a probability distribution depending on the degree of relevance between word ( in its context ) and the question and then produce an output vector which is a weighted combination of all contextual embeddings : is used in a bilinear term , which allows us to compute a similarity between and more flexibly than with just a dot product .;End - to - end Neural Network;5
3300;01c824989d24a8cae214c3156edd9d4492faa579;Method;1;stacked design;This further confirms that our stacked design can significantly improve the image quality over GAN without stacking .;With votes for each evaluated model , our AMT workers got error rate for samples from SGAN and for samples from DCGAN .;;Visual Turing test;12
44274;1a2599e467e855f845dcbf9282f8bdbd97b85708;Method;0;neural vocoder;Char2Wav describes yet another similar approach to end - to - end TTS using a neural vocoder .;However , unlike our system , its naturalness has not been shown to rival that of human speech .;However , they use different intermediate representations ( traditional vocoder features ) and their model architecture differs significantly .;Introduction;1
10300;060ff1aad5619a7d6d6cdfaf8be5da29bff3808c;Method;1;SA;Here there is little difference between any of the models , with LISA models tending to perform slightly better than SA .;Both parsers are correct on 26 % of sentences .;Both parsers make mistakes on the majority of sentences ( 57 % ) , difficult sentences where SA also performs the worst .;Analysis;12
25310;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;Light weighted CNNs;"Light weighted CNNs ( green dotted box ) are adopted in higher resolution branches ; different - branch output feature maps are fused by cascade - feature - fusion unit ( Sec .";Thus we can safely limit the number of parameters in both middle and bottom branches .;[ reference ] ) and trained with cascade label guidance ( Sec .;Network Architecture;10
43862;1a0912bb76777469295bb2c059faee907e7f3258;Method;0;FCIS;This includes MNC [ reference ] and FCIS [ reference ] , the winners of the COCO 2015 and 2016 segmentation challenges , respectively .;All instantiations of our model outperform baseline variants of previous state - of - the - art models .;Without bells and whistles , Mask R - CNN with ResNet - 101 - FPN backbone outperforms FCIS +++ [ reference ] , which includes multi - scale train / test , horizontal flip test , and online hard example mining ( OHEM );Main Results;7
93092;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;1;MFF;On the face region , we fit a 3DMM through the Multi - Features Framework ( MFF ) , see Fig .;The depth estimation of a face image can be conducted on the face region and external region respectively , with different requirements of accuracy .;[ reference ] .;3D Image Meshing;12
49247;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;sparse read and write scheme;We have demonstrated that you can train neural networks with large memories via a sparse read and write scheme that makes use of efficient data structures within the network , and obtain significant speedups during training .;Scaling memory systems is a pressing research direction due to potential for compelling applications with large amounts of memory .;Although we have focused on a specific MANN ( SAM ) , which is closely related to the NTM , the approach taken here is general and can be applied to many differentiable memory architectures , such as Memory Networks .;Discussion;18
79233;33998aff64ce51df8dee45989cdca4b6b1329ec4;Method;0;masked self - attentional layers;We present graph attention networks ( GATs ) , novel neural network architectures that operate on graph - structured data , leveraging masked self - attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations .;;By stacking layers in which nodes are able to attend over their neighborhoods ' features , we enable ( implicitly ) specifying different weights to different nodes in a neighborhood , without requiring any kind of costly matrix operation ( such as inversion ) or depending on knowing the graph structure upfront .;ABSTRACT;1
15851;09879f7956dddc2a9328f5c1472feeb8402bcbcf;Method;1;Real NVP;document : Density estimation using Real NVP;We observe that , although the faces are changed as to respect the new attributes , several properties remain unchanged like position and background .;Unsupervised learning of probabilistic models is a central yet challenging problem in machine learning .;Density estimation using Real NVP;0
87321;372bc106c61e7eb004835e85bbfee997409f176a;Method;1;joint distribution learning;A perfect joint distribution learning should render two identical images .;We then compared the transformed image with the image generated by .;Hence , we used the ratios of agreed pixels between 10 K pairs of images generated by each network ( 10 K randomly sampled ) as the performance metric .;Experiments;4
76119;303065c44cf847849d04da16b8b1d9a120cef73a;Method;1;optimization procedures;To this end , herein , we first formulate the cost function and then present two optimization procedures .;We propose to fit the 3DMM on an input image using Gauss - Newton iterative optimization .;;Model Fitting;6
75514;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;1;VGG - Face only;This evaluation compares the baseline approach of VGG - Face only with the proposed approach of VGG - Face encoding with probe and gallery template adaptation .;Figure [ reference ] shows the overall evaluation results on IJB - A.;These results show that identification performance is slightly improved for rank 1 and rank 10 retrieval , however there are large performance improvements for the 1:N DET for identification and the 1:1 DET for verification .;IJB - A Evaluation;6
39744;165ef2b5f86b9b2c68b652391db5ece8c5a0bc7e;Method;1;contextual CNN potentials;Therefore , our system takes advantage of both contextual CNN potentials and the traditional smoothness potentials to improve the final system .;Hence we also apply the dense CRF method , as in many other recent methods , in the prediction refinement step .;More details are described in Sec .;Pairwise potential functions;7
28261;0f2f4edb7599de34c97f680cf356943e57088345;Method;0;multi - stage training;Their method requires multi - stage training and the weights are shared across each iteration .;A set of predictions is included with the input , and each pass through the network further refines these predictions .;Wei et al .;Related Work;2
61529;269c7aeca29dae51dca8208815f1c4c81bd471c2;Method;1;stochastic gradient descent ( SGD ) algorithm;All models are trained with Caffe framework and optimized with stochastic gradient descent ( SGD ) algorithm .;For the factor , we empirically selected an optimal value 0.01 to balance the two losses .;Training batch size is set to 512 and the number of iterations is set to 21 epochs .;Implementation Details;10
8920;052443e1709c0f7d3432cca7c451534eea76b7ca;Method;1;context specific regressors;To each anchor we assign the closest patches and instead of training one regressor as A + would , we train 4 context specific regressors .;We keep the standard A + pipeline with 1024 anchors and 0.5 million training patches ( A +( 0.5 m ) ) .;For each context we compute a regressor using the 1024 patches closest to both anchor and context centroid , in a 10 to 1 contribution .;Reasoning with context ( R );14
34563;1329206dbdb0a2b9e23102e1340c17bd2b2adcf5;Method;1;fine grained categorization system;By developing a novel deep part detection scheme , we propose an end - to - end fine grained categorization system which requires no knowledge of object bounding box at test time , and can achieve performance rivaling previously reported methods requiring the ground truth bounding box at test time to filter false positive detections .;"The Poselet and DPM methods have previously been utilized to obtain part localizations with a modest degree of success ; methods generally report adequate part localization only when given a known bounding box at test time .";The recent success of convolutional networks , like , on the ImageNet Challenge has inspired further work on applying deep convolutional features to related image classification and detection tasks .;Introduction;1
49151;1d8653d9fca853a8e3727fa7d8f5ec0631cad08f;Method;1;ANN indexes;We considered two types of ANN indexes : FLANN ’s randomized k - d tree implementation that arranges the datapoints in an ensemble of structured ( randomized k - d ) trees to search for nearby points via comparison - based search , and one that uses locality sensitive hash ( LSH ) functions that map points into buckets with distance - preserving guarantees .;However there are no gradients with respect to the ANN as its function is fixed .;We used randomized k - d trees for small word sizes and LSHs for large word sizes .;Approximate nearest neighbors;11
76709;309acdd149f5f0ea12acb103b36bb59e6e631671;Method;1;3D human pose model;Information captured by the 3D human pose model is embedded in the CNN architecture as an additional layer that lifts 2D landmark coordinates into 3D while imposing that they lie on the space of physically plausible poses .;We propose a novel CNN architecture that learns to combine the image appearance based predictions provided by convolutional - pose - machine style 2D landmark detectors , with the geometric 3D skeletal information encoded in a novel pretrained model of 3D human pose .;The advantage of integrating the output proposed by the 2D landmark location predictors – based purely on image appearance – with the 3D pose predicted by a probabilistic model , is that the 2D landmark location estimates are improved by guaranteeing that they satisfy the anatomical 3D constraints encapsulated in the human 3D pose model .;Introduction;1
26550;0e8753f550350e53824358ca3f0f8cfd2f2dc2f7;Method;1;nuclear proximal operators;The associated convex non - smooth optimization problem is solved with a well - posed iterative ADMM scheme , which alternates between nuclear proximal operators and approximate solutions of linear systems .;As an application , our matrix completion model offers a new recommendation algorithm that combines the traditional collaborative filtering and content - based filtering tasks into one unified model .;Artificial and real data experiments are conducted to study and validate the proposed matrix recovery model , suggesting that in real - life applications where the number of available matrix entries ( ratings ) is usually low and information about products and people taste is available , our model would outperform the standard matrix completion approaches .;Conclusion;10
27713;0f0a25d3be0d50a134f6f68e6a82bd8a2f668882;Method;0;deep networks of interest;Instead of learning from scratch , a common paradigm is to utilize pre - trained CNNs as a starting point while constructing deep networks of interest .;;We follow a similar paradigm in our approach .;Motivation;4
83745;35ff11e0a5e465c810a30b022b26a9d577a434ce;Method;0;linear recurrent models;In another work , li:15 investigated the importance of recursive tree structures ( as opposed to linear recurrent models ) in four different tasks , including sentiment and semantic relation classification .;Our experiment on the importance of composition function was motivated by vinyals:2015 and wiseman_16 , who achieved competitive parsing accuracy without explicit composition .;Their findings suggest that recursive tree structures are beneficial for tasks that require identifying long - range relations , such as semantic relationship classification , with no conclusive advantage for sentiment classification and discourse parsing .;Related Work;12
6122;0373b97580cdfd0b69f165e1a946bae62da95dce;Method;0;ResNets;With increasing depth , ResNets give better function approximation capabilities as they gain more parameters .;This improves the accuracy of deeper networks .;The authors ’ hypothesis is that the plain deeper networks give worse function approximation because the gradients vanish when they are propagated through many layers .;Background;2
90921;3a8d537bcec370d37990d39eab01c729496ad057;Method;0;system components;However , one fundamental issue with the T - L Network is its three - phase training procedure , since jointly training the system components proves to be too difficult .;Of the few unsupervised neural - based approaches that exist , the T - L network is one of the most important , combining a convolutional autoencoder with an image regressor to encode a unified vector representation of a given 2D image .;The 3D - GAN offers a way to train 3D object models in an adversarial learning scheme .;Related Work;2
84959;364da079f91a6cb385997be990af06e9ddf6e888;Method;0;predictive ) component;To see this point , in Table [ reference ] we show the text regions from the test set , which did not appear in the training data , either entirely or partially as bi - grams , and yet whose embedded features have large values in the heavily - weighted ( predictive ) component thus contributing to the prediction .;By contrast , one strength of CNN is that - grams ( or text regions of size ) can contribute to accurate prediction even if they did not appear in the training data , as long as ( some of ) their constituent words did , because input of embedding is the constituent words of the region .;There are many more of these , and we only show a small part of them that fit certain patterns .;Why is CNN effective ?;24
63813;27e4b65121d3c88643d86dc91a9bdafdf223b988;Method;1;ABS + model;The table shows that , despite this fact , our model outperforms the ABS + model of namas with statistical significance .;The reason we did not evaluate our best validation models here is that this test set consisted of only 1 sentence from the source document , and did not include NLP annotations , which are needed in our best models .;In addition , our models exhibit better abstractive ability as shown by the src .;Gigaword Corpus;9
93912;3dd2f70f48588e9bb89f1e5eec7f0d8750dd920a;Method;0;log loss;Like R - CNN , training is a multi - stage pipeline that involves extracting features , fine - tuning a network with log loss , training SVMs , and finally fitting bounding - box regressors .;SPPnet also has notable drawbacks .;Features are also written to disk .;2 .;4
6240;0373b97580cdfd0b69f165e1a946bae62da95dce;Method;1;AWS g2.2xlarge instance;These two models are trained on a AWS g2.2xlarge instance ( which has a single GPU ) with a mini batch - size of 128 .;In our model , we add an ELU activation function just before the global average pooling layer .;We use a weight decay of 0.0001 and a momentum of 0.9 , and adopt the weight initialization in and BN but with no dropout .;CIFAR - 10 Analysis;11
3262;01c824989d24a8cae214c3156edd9d4492faa579;Method;1;fc3;Encoder : For all datasets we use a small CNN with two convolutional layers as our encoder : conv1 - pool1 - conv2 - pool2 - fc3 - fc4 , where fc3 is a fully connected layer and fc4 outputs classification scores before softmax .;Readers may refer to our code repository for more details about experimental setup , hyper - parameters , etc .;On CIFAR - 10 we apply horizontal flipping to train the encoder .;Experiments;8
36639;151313065d71b49dbf07289c002c887d7b5a0a6b;Method;0;FTRL;Despite their simplicity , generalized linear models , such as FTRL , have shown decent performance in practice .;Even for easy - to - understand interactions , it seems unlikely for experts to model them exhaustively , especially when the number of features is large .;However , a linear model lacks the ability to learn feature interactions , and a common practice is to manually include pairwise feature interactions in its feature vector .;Introduction;1
24826;0d467adaf936b112f570970c5210bdb3c626a717;Method;0;variational approach;As a remedy , out - of - the - box optimization can be applied to the network predictions as a postprocessing operation , for example , optical flow estimates can be refined with a variational approach .;Convolutional networks trained for per - pixel prediction tasks often produce noisy or blurry results .;In some cases , this refinement can be approximated by neural networks : Chen & Pock formulate reaction diffusion model as a CNN and apply it to image denoising , deblocking and superresolution .;Related Work;2
50290;1e5b9e512c01e244287fe7afb05e03c96d5c1cd0;Method;1;ContextSensitive Character Encodings;section : Concatenation Strategies for the ContextSensitive Character Encodings;The examples show that the combined model has significantly higher accuracy compared with either the character and word models individually .;The proposed model bases a token encoding on both the forward and the backward character representations of both the first and last character in the token ( see Equation 1 ) .;Concatenation Strategies for the ContextSensitive Character Encodings;19
63570;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;RRCNN;Let 's consider that the output of the RRCNN - block is + 1 and can be calculated as follows :;In the case of R2U - Net , the final outputs of the RCNN unit are passed through the residual unit that is shown Fig . 4 ( d ) .;Here , represents the input samples of the RRCNN - block .;III . RU - NET AND R2U - NET ARCHITECTURES;4
94852;3e79a574d776c46bbe6d34f41b1e83b5d0f698f2;Method;1;recurrent state transition process;The backward LSTM component follows the same recurrent state transition process as described in Eq 1 .;σ is the sigmoid function .;Starting from an initial state h n + 1 , which is a model parameter , it reads the input x n , x n−1 , . .;Baseline BiLSTM;5
27070;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;0;Ranking model;KEYWORDS Ranking model , weak supervision , deep neural network , deep learning , ad - hoc retrieval;Our findings also suggest that supervised neural ranking models can greatly benefit from pre - training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models .;compat=1.11 ternary compat;Neural Ranking Models with Weak Supervision;0
51709;1f76b7b071f3e65c97d09720f88d6b0ad9f07e8f;Method;1;chain rule of backpropagation;Denoting the loss function as , from the chain rule of backpropagation we have :;( [ reference ] ) also leads to nice backward propagation properties .;Eqn .;Analysis of Deep Residual Networks;2
79734;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;0;polynomial form;Then the most general form of a GCNN layer output function equipped with polynomial filters is given by Equation ( [ reference ] ) , In Equation ( [ reference ] ) , is defined as a graph convolution filter of polynomial form with degree .;Laplacian and be a node feature matrix .;While are learning weight parameters where each .;Graph Capsule CNN Model;3
15050;0899bb0f3d5425c88b358638bb8556729720c8db;Method;0;photo - realistic modeling;However , photo - realistic modeling is always imperfect and requires much effort .;It is suitable for simple environments and performs well if jointly trained with a relatively small amount of real annotated images .;;Photo - Realistic Rendering;4
4514;027f9695189355d18ec6be8e48f3d23ea25db35d;Method;1;SST - 5 model;We also see that the performance of our SST - 5 model is on par with that of the current state - of - the - art model [ reference ] , which is pretrained on large parallel datasets and uses character n - gram embeddings alongside word embeddings , even though our model does not utilize external resources other than GloVe vectors and only uses wordlevel representations .;byte - mLSTM ( Radford , Jozefowicz , and Sutskever 2017 ) , where a byte - level language model trained on the large product review dataset is used to obtain sentence representations .;The authors of [ reference ] stated that utilizing pretraining and character n - gram embeddings improves validation accuracy by 2.8 % ( SST - 2 ) or 1.7 % ( SST - 5 ) .;Sentiment Analysis;12
22092;0c278ecf472f42ec1140ca2f1a0a3dd60cbe5c48;Method;0;model - free algorithms;Conversely , there are not many model - free approaches with proven sample - complexity bounds [ e.g. ,][] Strehl06 , but there are multiple model - free algorithms for exploration that actually work in large domains [ e.g. ,][] Stadie15 , Bellemare16 , Ostrovski17 , Plappert18 , Burda19 .;Importantly , notice that it is not clear how to apply these traditional algorithms such as R - Max and E to large domains where function approximation is required .;Among these algorithms , the use of pseudo - counts through density models is the closest to ours Bellemare16 , Ostrovski17 .;Related Work;15
38079;15cc54ed7b1582b2efd71bedf28b23634d82991b;Method;0;Spectral Normalization;subsection : Spectral Normalization in Discriminator;However , the computational cost of RBF - B kernel is relatively low .;Without any Lipschitz constraints , the discriminator may simply increase the magnitude of its outputs to minimize the discriminator loss , causing unstable training .;Spectral Normalization in Discriminator;6
18188;0a3a003457f5d7758a42a0e4b7278b39a86ed0bd;Method;0;data generator;Several methods propose to learn a data generator e.g. conditioned on Gaussian noise .;Data augmentation is a classic technique to increase the amount of available data and thus also useful for few - shot learning .;However , the generation models often underperform when trained on few - shot data .;Introduction;1
10472;06150e6e69a379c27e1d0100fcd7660f073cbacf;Method;1;orthogonal trees;Next , in § 5 , we demonstrate that orthogonal trees trained with locally decorrelated features are efficient and effective .;We introduce the baseline in § 3 and in § 4 show that use of oblique trees improves results but at considerable computational expense .;Experiments and results are presented in § 6 .;Title;0
91444;3b1b94441010615195a5c404409ce2416860508c;Method;0;attribute - based representation;Figure [ reference ] summarises how this is achieved : given an image , an attribute - based representation ( in Section [ reference ] ) is first generated and it will used as one of input sources of our VQA - LSTM model .;The novelty lies in the fact that this is achieved by representing both of these disparate forms of information as text before combining them .;The second input source are those captions generated in section [ reference ] .;A VQA Model with External Knowledge;9
49600;1db9bd18681b96473f3c82b21edc9240b44dc329;Method;0;encoder and decoder;For both the encoder and decoder , the Image Transformer uses stacks of self - attention and position - wise feed - forward layers , similar to aiayn .;While doing so , it consumes the previously generated pixels and the input image representation generated by the encoder .;In addition , the decoder uses an attention mechanism to consume the encoder representation .;Self - Attention;5
75258;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;0;algorithm;In fact , recent studies have shown that while algorithm performance for near frontal recognition is equal to or better than humans , performance of automated systems at the extremes of illumination and pose are still well behind human performance .;However , the imagery in LFW was constructed with a well known near - frontal selection bias , which means evaluations are not predictive of performance for large in - the - wild pose variation .;The IJB - A dataset was created to provide the newest and most challenging dataset for both verification and identification .;Introduction;1
77555;31ae4873da19b1e28eca8787a17f49bba08627e5;Method;1;classification layers;This feature vector is then passed through classification layers to compute the loss .;For each sliding window , we drop out the values in all channels whose spatial locations are covered by the window and generate a new feature vector for the region proposal .;Based on the loss of all the windows , we select the one with the highest loss .;Adversarial Spatial Dropout for Occlusion;7
47673;1c7e078611c9df412e6eb3a356f31a0da0c1f99c;Method;1;semantic labeling;Compared to recent 6D pose estimation methods that resort to object detection with bounding boxes , semantic labeling provides richer information about the objects and handles occlusions better .;In order to detect objects in images , we resort to semantic labeling , where the network classifies each image pixel into an object class .;The embedding step of the semantic labeling branch , as shown in Fig .;Semantic Labeling;5
100785;424561d8585ff8ebce7d5d07de8dbf7aae5e7270;Method;0;CPMC;Widely used object proposal methods include those based on grouping super - pixels ( , Selective Search , CPMC , MCG ) and those based on sliding windows ( , objectness in windows , EdgeBoxes ) .;Comprehensive surveys and comparisons of object proposal methods can be found in .;Object proposal methods were adopted as external modules independent of the detectors ( , Selective Search object detectors , R - CNN , and Fast R - CNN ) .;Related Work;2
74790;2f92b10acf7c405e55c74c1043dabd9ded1b1800;Method;0;reading comprehension models;This is reminiscent of the ( soft ) attention mechanism used in reading comprehension models ( e.g. , Cheng2016 , wang2017gated ) .;Pooling over lemma - occurrences effectively connects different text passages ( even across texts ) that are otherwise disconnected , mitigating the problems arising from long - distance dependencies .;However , our setup is more general as it allows for the connection of multiple passages ( via pooling ) at once and is able to deal with multiple inputs which is necessary to make use of additional input texts such as relevant background knowledge .;Refined Word Embeddings ( );5
106466;45b559e6271570598602fcf9777ed6f2f2d133e6;Method;1;deep convolutional network architecture;First , we introduce a very deep convolutional network architecture with up to 14 weight layers .;In this paper we propose a number of architectural advances in CNNs for LVCSR .;There are multiple convolutional layers before each pooling layer , with small 3 3 kernels , inspired by the VGG Imagenet 2014 architecture .;Very Deep Multilingual Convolutional Neural Networks for LVCSR;0
86922;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;character - gram embeddings;"The pre - training of the character - gram embeddings is also effective ; for example , without the pre - training , the POS accuracy drops from 97.52 % to 97.38 % and the chunking accuracy drops from 95.65 % to 95.14 % .";These results clearly show that jointly using the pre - trained word and character - gram embeddings is helpful in improving the results .;;Character - gram embeddings;35
98474;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;0;asymmetric metric learning;Conventional methods include asymmetric metric learning , subspace interpolation , geodesic flow kernel , subspace alignment , covariance matrix alignment , .;In computer vision , domain adaptation has been widely studied as an image classification problem in computer vision .;Recent works aim to improve the domain adaptability of deep neural networks , including .;Related Works;2
75323;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;0;joint Bayesian metric learning;DeepID2 + and DeepID3 extended the inception architecture to include joint Bayesian metric learning and multi - task learning for both identification and verification .;DeepFace uses a deep network coupled with 3D alignment , to normalize facial pose by warping facial landmarks to a canonical position prior to encoding .;These top performing convolutional network architectures have interesting common properties .;Related Work;2
45553;1abf6491d1b0f6e8af137869a01843931996a562;Method;1;DeepLab Baseline;We start with using DeepLab Baseline , and try to add pool6 to it .;Since we have reproduced both network architecture on VOC2012 , we want to see how does global context , normalization , and early or late fusion affect performance .;It improves from 64.92 % to 67.49 % by adding pool6 with normalization .;Combining Local and Global Features;9
103913;435259c5f3cffd75ef837a8e638cc8f6244e25c4;Method;1;LateFusion_Ensemble;Finally , the third method , referred to as LateFusion_Ensemble , is an ensemble of 10 semi - dense CNNs , each performing a late fusion of modalities in different paths ( Fig .;The second , denoted EarlyFusion_Ensemble , consists of an ensemble of early - fusion CNNs , trained with different subjects .;[ reference ] ) and trained with different subjects .;Results;17
69644;2cf6a8389135f682b0cb727a07f4e77c097d5434;Method;1;ALM and equivalent derivations;Based on this observation , we propose a speedup solver ( via ALM and equivalent derivations ) to highly reduce the computational cost , theoretically from to .;Actually , data size is generally much larger than feature length , i.e. .;Extensive experiments on ten benchmark datasets verify that our method not only outperforms state of the art methods , but also runs 10 , 000 + times faster than the most related method .;10 , 000 + Times Accelerated Robust Subset Selection ( ARSS );0
102063;42e80c73867bff9eaff6beceb8730fc1276283b9;Method;0;artetxe2018robust or adversarial training;Both methods build upon the recent work on unsupervised cross - lingual embedding mappings , which independently train word embeddings in two languages and learn a linear transformation to map them to a shared space through self - learning artetxe2017learning , artetxe2018robust or adversarial training conneau2018word .;More recently , the task got a renewed interest after the concurrent work of artetxe2018unmt and lample2018unsupervised on unsupervised NMT which , for the first time , obtained promising results in standard machine translation benchmarks using monolingual corpora only .;The resulting cross - lingual embeddings are used to initialize a shared encoder for both languages , and the entire system is trained using a combination of denoising autoencoding , back - translation and , in the case of lample2018unsupervised , adversarial training .;Related work;2
65318;2a69ddbafb23c63e5e22401664bea229daaeb7d6;Method;0;multi - scale feature representation ability;As designed , CNNs are equipped with basic multi - scale feature representation ability since the input information follows a coarse - to - fine fashion .;Recent years have witnessed numerous backbone networks , achieving state - of - the - art performance in various vision tasks with stronger multi - scale representations .;The AlexNet stacks filters sequentially and achieves significant performance gain over traditional methods for visual recognition .;Backbone Networks;3
45976;1b9472907f5b7a1815c98b4562dce6c46dd2cf34;Method;0;ordinal regression models;Also , ordinal regression models are common choices for text message advertising and various recommender systems .;Along with age estimation niu2016ordinal , popular applications for ordinal regression include predicting the progression of various diseases , such as Alzheimer ’s disease doyle2014predicting , Crohn ’s disease , artery disease , and kidney disease .;While the field of machine learning field developed many powerful algorithms for predictive modeling , most algorithms were designed for classification tasks .;Introduction;1
105301;4508f81033c9a7cec785ce4d16f1193920c1b341;Method;1;Recurrent ByteNets;subsection : Recurrent ByteNets;For the sake of a more complete analysis , we include two recurrent ByteNet variants ( which we do not evaluate in the experiments ) .;The ByteNet is composed of two stacked encoder and decoder networks where the decoder network dynamically adapts to the output length .;Recurrent ByteNets;12
102483;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;1;dimensionality stretching strategy;The proposed dimensionality stretching strategy is schematically illustrated in Figure [ reference ] .;;Suppose the inputs consist of a blur kernel of size , a noise level and an LR image of size , where denotes the number of channels .;Dimensionality Stretching;9
70747;2d2a22f1f9eae9188f3d43254daa2d5b7f3a2470;Method;0;joint update;We can write the joint update for all as Restrict the update to define a contraction mapping in the Euclidean metric .;Let , and .;This means that there is some such that for any , or in other words , We can immediately see that this implies that for each by letting be the elementary vector that is all zero except for a 1 in position and letting be the all zeros vector .;Contraction Map Example;35
27215;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;0;learning approach;In this paper , we refer to weak supervision as a learning approach that automatically creates its own training data using an existing unsupervised approach , which differs from imprecise data coming from external observations ( e.g. , click - through data ) or noisy human - labeled data .;In general , weak supervision refers to learning from training data in which the labels are imprecise .;We focus on query - dependent ranking as a core IR task .;Weak Supervision for Ranking;3
41819;1751668492bac56f0ae2b6410417515ab3215945;Method;1;ver 3.5.0;For English ( PTB - WSJ ) , we first convert the treebank into Stanford Dependencies ( SD ) using Stanford CoreNLP ( ver 3.8.0 ) , and then apply two well - known dependency parsers : Stanford Parser ( ver 3.5.0 ) and Parsey McParseface ( SyntaxNet ) .;To test the hypothesis , we consider three settings in dependency parsing of English and French : using POS tags predicted by the baseline model , using POS tags predicted by the AT model , and using gold POS tags .;For French ( UD ) , we use Parsey Universal from SyntaxNet .;Sentence - level & Downstream Analysis;26
84814;364da079f91a6cb385997be990af06e9ddf6e888;Method;1;MMRB14;We also tested NB - LM , which first appeared ( but without performance report ) as NBSVM in WM12 and later with a small modification produced performance that exceeds state - of - the - art supervised methods on IMDB ( which we experimented with ) in MMRB14 .;;We experimented with the MMRB14 version , which generates binary bag - of -;NB - LM;13
97798;4087ebc37a1650dbb5d8205af0850bee74f3784b;Method;0;Cyclical Batch Size Schedules;Parameter Re - Initialization through Cyclical Batch Size Schedules;;;Title;0
91710;3b1b94441010615195a5c404409ce2416860508c;Method;0;RNN - based VQA approach;Secondly , in this paper we have shown that it is possible to extend the state - of - the - art RNN - based VQA approach so as to incorporate the large volumes of information required to answer general , open - ended , questions about images .;Indeed , at the time of submitting this paper , our image captioning model outperforms the state - of - the - art on several captioning datasets .;The knowledge bases which are currently available do not contain much of the information which would be beneficial to this process , but nonetheless can still be used to significantly improve performance on questions requiring external knowledge ( such as ’;Conclusions;21
79601;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;capsules;It is inspired by the notion of capsules developed in : capsules are new types of neurons which encapsulate more information in a local pool operation ( e.g. , a convolution operation in a CNN ) by computing a small vector of highly informative outputs rather than just taking a scalar output .;In particular , we propose a new model , referred to as Graph Capsule Convolution Neural Networks ( GCAPS - CNN ) .;Our graph capsule idea is quite general and can be employed in any version of GCNN model either design for solving graph semi - supervised problem or doing sequence learning on graphs via Graph Convolution Recurrent Neural Network models ( GCRNNs ) .;Introduction;1
58404;23c141141f4f63c061d3cce14c71893959af5721;Method;0;SPINN model;Though this experiment only covers SPINN - PI - NT , the results should be similar for the full SPINN model : most of the computation involved in running SPINN is involved in populating the buffer , applying the composition function , and manipulating the buffer and the stack , with the low - dimensional tracking and parsing components adding only a small additional load .;With a large but practical batch size of 512 , the largest on which we tested the TreeRNN , our model is about 25 faster than the standard CPU implementation , and about 4 slower than the RNN baseline .;;Inference speed;17
75592;30180f66d5b4b7c0367e4b43e2b55367b72d6d2a;Method;0;average fusion;The default strategy is average fusion such that .;We explore strategies based on winner take all ( ) , template weighted fusion ( and an experiment using the SVM geometric margin ( e.g. ) , as suggested in .;Results show that the strategy of computing a weighted average with of probe and gallery templates is the best strategy .;Fusion Study;10
79728;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;General GCNN Model;General GCNN Model :;When used , we will use to denote the dimension of hidden ( latent ) variables / feature space .;We start by describing a general GCNN model before presenting our Graph Capsule CNN model .;Graph Capsule CNN Model;3
63225;27c761258329eddb90b64d52679ff190cb4527b5;Method;0;Recurrent Convolutional Layers;The operations of the Recurrent Convolutional Layers ( RCL ) are performed with respect to the discrete time steps that are expressed according to the RCNN;The recurrent residual convolutional operations can be demonstrated mathematically according to the improved - residual networks in [ reference ] .;[ reference ] . Let 's consider the input sample in the ℎ layer of the residual RCNN ( RRCNN ) block and a pixel located at ( , ) in an input sample on the k th feature map in the RCL .;III . RU - NET AND R2U - NET ARCHITECTURES;4
21831;0c278ecf472f42ec1140ca2f1a0a3dd60cbe5c48;Method;1;deep RL algorithm;Finally , we extend these ideas to a deep RL algorithm and show that it achieves state - of - the - art performance in Atari 2600 games .;We use this result to introduce an algorithm that performs as well as some theoretically sample - efficient approaches .;Count - BasedExplorationwiththeSuccessorRepresentation;Title;0
47269;1c0e8c3fb143eb5eb5af3026eae7257255fcf814;Method;0;pLSA;[ reference ] propose an iterative technique that applies a latent semantic clustering via latent Semantic Analysis ( pLSA ) on the windows of positive samples and selects the most discriminative cluster for each class based on its classification performance .;Wang et al .;Bilen et al .;Related Work;3
103143;43428880d75b3a14257c3ee9bda054e61eb869c0;Method;1;ROUGE - 2;We evaluate on the DUC - 2004 test data comprising 500 article - title pairs [ reference ] and report three variants of recall - based ROUGE [ reference ] , namely , ROUGE - 1 ( unigrams ) , ROUGE - 2 ( bigrams ) , and ROUGE - L ( longest - common substring ) .;We train on the Gigaword corpus [ reference ] and pre - process it identically to [ reference ] resulting in 3.8 M training examples and 190 K for validation .;We also evaluate on a Gigaword test set of 2000 pairs which is identical to the one used by [ reference ] and we report F1 ROUGE similar to prior work .;3;13
104782;44c5dec4d1295d34f052d3243d8e08f14a3c0990;Method;0;vanilla Transformers;Specifically , Transformer - XL significantly outperforms a contemporary method using vanilla Transformers , suggesting the advantage of Transformer - XL is generalizable to modeling short sequences .;Although Transformer - XL is mainly designed to better capture longer - term dependency , it dramatically improves the single - model SoTA from 23.7 to 21.8 .;We also report the results on word - level Penn Treebank in Table [ reference ] .;Main Results;8
101963;42764b57d0794b63487a295ce8c07eeb6961477e;Method;0;RoI;For the RoI pooling layer , its inputs are a predicted box and the convolutional feature map , both being functions of .;( [ reference ] ) lies on the spatial transform of a predicted box that determines RoI pooling .;In Fast R - CNN , the box proposals are pre - computed and fixed , and the backpropagation of RoI pooling layer in only involves .;End - to - End Training;7
14864;0891ed6ed64fb461bc03557b28c686f87d880c9a;Method;0;aggregating context information;ratinov2009design quantitatively compare several approaches for NER and build their own supervised model using a regularized average perceptron and aggregating context information .;eisenstein2011structured use Bayesian nonparametrics to construct a database of named entities in an almost unsupervised setting .;Finally , there is currently a lot of interest in models for NER that use letter - based representations .;Related Work;19
105818;4543052aeaf52fdb01fced9b3ccf97827582cef5;Method;1;quantized DU - Net;By choosing appropriate quantization bit - widths for weights , inputs and gradients , quantized DU - Net achieves 75 % training memory saving with comparable performance .;Different from previous efforts of quantizing only the model parameters , we are the first to quantize their inputs and gradients for better training efficiency on landmark localization tasks .;Exhaustive experiments are performed to validate DU - Net in different aspects .;Introduction;1
93504;3d5d9d8e74b215609eabba80ef79a35ebf460e49;Method;0;Cycle / Bicycle;Cycle / Bicycle is trained on pseudo paired data generated by CycleGAN .;Therefore , the variations take place in global color difference .;The quality of the pseudo paired data is not uniformly ideal .;Qualitative Evaluation;8
98525;40b4596a0ae4f4ff065f3f13f36db39543e50068;Method;0;semantic segmentation models;Such strategy has also been widely exploited for learning semantic segmentation models .;Our method is motivated by the common practice in computer vision community of initializing network weights using a network pretrained on large - scale image dataset such as ImageNet .;Recall that the original ImageNet model is pretrained on real images , so we propose to employ a distillation loss to guide the semantic segmentation models to behave like the pretrained real style model .;Target Guided Distillation;4
24128;0d0101e65e52ae0cec38bcd13c6a9d631979c577;Method;0;ResNet variant;In work subsequent to our own , investigate a ResNet variant with explicit skip connections .;Highway networks srivastava2015highway and ResNet he2015deep , he2016identity offer additional twists in the form of parameterized pass - through and gating .;These methods share distinction as the only other designs demonstrated to scale to hundreds of layers and beyond .;Related Work;2
76283;303fef411f235e6d1125a40af1e93224f498a4d5;Method;0;conditional factor;Despite the huge variety of models , as a density estimation problem , language modeling mostly relies on a universal auto - regressive factorization of the joint probability and then models each conditional factor using different approaches .;As a fundamental task in natural language processing , statistical language modeling has gone through significant development from traditional Ngram language models to neural language models in the last decade bengio2003neural , mnih2007three , mikolov2010recurrent .;Specifically , given a corpus of tokens , the joint probability factorizes as where is referred to as the context of the conditional probability hereafter .;Introduction;1
27069;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;supervised neural ranking models;Our findings also suggest that supervised neural ranking models can greatly benefit from pre - training on large amounts of weakly labeled data that can be easily obtained from unsupervised IR models .;Our experiments indicate that employing proper objective functions and letting the networks to learn the input representation based on weakly supervised data leads to impressive performance , with over 13 % and 35 % MAP improvements over the BM25 model on the Robust and the ClueWeb collections .;KEYWORDS Ranking model , weak supervision , deep neural network , deep learning , ad - hoc retrieval;Neural Ranking Models with Weak Supervision;0
13838;07cca2bdd0dc2fee02889e17789748eba9d06ffa;Method;0;CDR;"Most of these studies have been carried out in complex urban study areas using either : mobile applications on smart phones ( Manzoni et al . , 2011 ; Stenneth et al . , 2011 ) ; strictly GPS devices alone ( Chung & Shalaby , 2005 ; Liao et al . , 2007 ; Schüssler & Axhausen , 2009 ; Stopher , Clifford et al . , 2008a ; Zheng et al . , 2010 ) or integrated with other devices , such as accelerometers ( Reddy et al . , 2010 ) , or ; others through mobile phone call detail records ( CDR ) ( Wang , Calabrese , Di Lorenzo , & Ratti , 2010 ) .";Limitations of data collection for inferring the transportation mode Over the last decade , a plethora of studies have attempted to infer the transportation mode from GPS data collected by travel surveys .;Such studies resulted in collecting a large amount of diverse data to test different approaches .;Title;0
13004;072fd0b8d471f183da0ca9880379b3bb29031b6a;Method;0;commodity tool;Nonetheless , they demonstrate the promise of our approach as a generic commodity tool for image - to - image translation problems .;Note that these applications are creative projects , were not obtained in controlled , scientific conditions , and may rely on some modifications to the pix2pix code we released .;;Community - driven Research;16
70226;2d294bde112b892068636f3a48300b3c033d98da;Method;1;left eye model;Moreover , for the right eye cluster , the right eye model improves the accuracy more significantly than the left eye model .;Taking the left eye model as an example , it additionally reduces the errors of landmarks of right eye , mouth , and chin , which is due to the correlations among different facial parts .;It can be concluded that each shape prediction layer emphasizes on the corresponding cluster respectively .;Analysis of Shape Prediction Layers;21
27828;0f0a25d3be0d50a134f6f68e6a82bd8a2f668882;Method;1;Art ’ sub - network;For the ‘ Art ’ sub - network , we used a learning rate scaled by a factor of since the base network ( VGG ) is primarily trained for non - Art images .;The same training procedure and hyperparameters as in the baseline were used for training SwiDeN architectures C1 - S , C2 - S … C5 - S ( Section [ reference ] ) with the exception of the the learning rates for the depictive style sub - networks .;The learning rate was stepped down by a factor of when the validation accuracy plateaued .;SwiDeN : training;12
34473;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;1;adaptive gradients;We used stochastic gradient descent with ADAM updates for optimization , which combines classical momentum and adaptive gradients kingma2014adam .;Our model was implemented using the Theano 2016arXiv160502688short and Lasagne Python libraries .;The batch size was 32 and the initial learning rate was which was halved every epoch after the second epoch .;Implementation Details;18
19311;0a6d7e8e61c54c796f53120fdb86a25177e00998;Method;0;eigen - decomposition;The eigen - decomposition in the complex domain as taught today in linear algebra courses came 40 years later .;In the early age of spectral theory in linear algebra , complex numbers were not used for matrix factorization and mathematicians mostly focused on bi - linear forms .;Similarly , most of the existing approaches for tensor factorization were based on decompositions in the real domain , such as the Canonical Polyadic ( CP ) decomposition .;Related Work;11
72716;2e10643c3759f97b673ff8c297778c0b6c20032b;Method;0;large models;All ConvNets in the figure are the large models with thesaurus augmentation respectively .;Each of these plots is computed by taking the difference between errors on comparison model and our character - level ConvNet model , then divided by the comparison model error .;Character - level ConvNet is an effective method .;Discussion;12
42911;18d62040534012818abb90e37eade5dab6dca716;Method;1;question generation model;We augment their system by training a discriminative reranker with the model score of the question generation model and the well - formedness probability of our classifier as features to optimize BLEU score between the selected question from the - best list and the reference question on the development set .;Their current best model selects the top ranked question from the - best list produced by the decoder as the output .;We then use this reranker to select the best question from the - best list of the test set .;Improving Question Generation;9
12692;071b16f25117fb6133480c6259227d54fc2a5ea0;Method;1;bidirectional recurrent neural network ( BiRNN );First , the forward states of the bidirectional recurrent neural network ( BiRNN ) are computed : where is the word embedding matrix . , are weight matrices .;The model takes a source sentence of 1 - of - K coded word vectors as input and outputs a translated sentence of 1 - of - K coded word vectors where and are the vocabulary sizes of source and target languages , respectively . and respectively denote the lengths of source and target sentences .;and are the word embedding dimensionality and the number of hidden units , respectively .;Encoder;26
55524;2298490e82ff3fd03a3a28bd9c9f307bd897a753;Method;0;unit region approximation;An alternative choice is directly using as the measurement of different attributes , but the new problem is can reach to infinite in theory and the unit region approximation can not be implemented .;Note that all the examples with larger than the division point have the same gradient norm , which makes the distinguishing of examples with different attributes impossible if depending on the gradient norm .;To conveniently apply GHM on regression loss , we first modify the traditional loss into a more elegant form : This loss shares similar property with loss :;GHM - R Loss;13
58805;23dcfda130aada27c158c0b5f394cac489c9c795;Method;1;multi - loss ResNet50 networks;We present two multi - loss ResNet50 networks with different regression coefficients of and trained on the 300W - LP dataset .;We expect that this gap will be closed when more data is available .;For BIWI we also present a multi - loss ResNet50 ( ) trained on AFLW .;Fine - Grained Pose Estimation on the AFLW2000 and BIWI Datasets;10
82186;35502af359aa60ae8047df172e29503cfb29c3f9;Method;1;mean shift module;We can observe that the mean shift module produces sharper distributions , driving the similarity between positive pairs to 1 making it trivial to identify instances .;We plot the distribution of pairwise similarities for positive and negative pairs during forward propagation through 10 iterations .;;Details of Recurrent Mean Shift Grouping;24
49570;1db9bd18681b96473f3c82b21edc9240b44dc329;Method;0;training models;Another , popular direction of research in image generation is training models with an adversarial loss gan .;These modifications are readily applicable to our model , which we plan to evaluate in future work .;Typically , in this regime a generator network is trained in opposition to a discriminator network trying to determine if a given image is real or generated .;Background;2
16295;09da677bdbba113374d8fe4bb15ecfbdb4c8fe40;Method;1;grouped convolution layer;To enhance the leaning capacity of each micro - block , we use the grouped convolution layer in the second layer as the ResNeXt .;The output of the last convolutional layer is split into two parts : the first part is element - wisely added to the residual path , and the second part is concatenated with the densly connected path .;Considering that the residual networks are more wildly used than the densely connected networks in practice , we choose the residual network as the backbone and add a thin densely connected path to build the dual path network .;Dual Path Networks;6
88242;380b2c78d21ae6c43d418b6f0cb0222d5293d345;Method;0;transition - based dependency parser;lstmacl15 presented stack LSTMs and used them to implement a transition - based dependency parser .;;The parser uses a greedy learning strategy which potentially provides very high parsing speed while still achieving state - of - the - art results .;Conclusions;9
107042;45e8ef229fae18b0a2ab328037d8e520866c3c81;Method;1;Ours - 29;A larger model with our pyramid module ( Ours - 29 , ×1664d ) achieves test error , which is the state - of - the - art result on CIFAR - 10 .;Our method with similar or less model size ( Ours - 28 - 9 and Ours - 29 , ×864d ×1664d ) achieve better results .;;Experiments on CIFAR - 10 Image Classification;15
15441;08d55271589f989d90a7edce3345f78f2468a7e0;Method;1;convolution part;In order to generate a quality score , the convolution part contains a 2 - stride pooling layer and a final pooling layer with kernel size .;[ reference ] , the output spatial of Pool4 layer is .;A fully connected layer is followed by the final pooling layer to generate the original quality score .;Details of quality generation part;7
102104;42e80c73867bff9eaff6beceb8730fc1276283b9;Method;1;self - learning;So as to build our initial phrase - table , we follow artetxe2018usmt and learn n - gram embeddings for each language independently , map them to a shared space through self - learning , and use the resulting cross - lingual embeddings to extract and score phrase pairs .;;More concretely , we train our n - gram embeddings using phrase2vechttps: // github.com / artetxem / phrase2vec , a simple extension of skip - gram that applies the standard negative sampling loss of mikolov2013distributed to bigram - context and trigram - context pairs in addition to the usual word - context pairs .;Initial phrase - table;4
75777;302207c149bdf7beb6e46e4d4afbd2fa9ac02c64;Method;1;StarGAN;To alleviate this problem , we introduce a mask vector that allows StarGAN to ignore unspecified labels and focus on the explicitly known label provided by a particular dataset .;Mask Vector .;In StarGAN , we use an - dimensional one - hot vector to represent , with being the number of datasets .;Training with Multiple Datasets;5
45542;1abf6491d1b0f6e8af137869a01843931996a562;Method;1;FCN - 32s network;We use the FCN - 32s network with the parameters found in PASCAL - Context .;We do not use the geometric categories during training .;Instead of using two stages of learning as done in , we combine the feature directly from different layers for learning .;Combining Local and Global Features;9
63416;27c761258329eddb90b64d52679ff190cb4527b5;Method;1;ADAM optimization technique;We used the ADAM optimization technique with a learning rate of 2×10 - 4 and binary cross entropy loss .;In this implementation , this dataset is preprocessed with mean subtraction and normalized according to the standard deviation .;In addition , we also calculated MSE error during the training and validation phase .;4 ) Skin Cancer Lesion Segmentation;13
50544;1e7678467b1807777dcd9be557b79328ce9419a8;Method;1;pooled feature map;Setting this exponent as increases the contrast of the pooled feature map and focuses on the salient features of the image .;Formally , the GeM embedding is given by where is a parameter .;GeM is a generalization of the average pooling commonly used in classification networks ( ) and of spatial max - pooling layer ( ) .;Image retrieval;11
81917;35502af359aa60ae8047df172e29503cfb29c3f9;Method;0;Hough voting;predict an embedding based on scene depth and direction towards the instance center ( like Hough voting ) .;train a regressor that predicts the distance to the contour centerline for boundary detection , while predict the distance transform of the instance masks which is then post - processed with watershed transform to generate segments .;Finally , we note that these ideas are related to work on using embedding for solving pairwise clustering problems .;Related Work;2
103887;435259c5f3cffd75ef837a8e638cc8f6244e25c4;Method;1;ITK - SNAP;Geometric defects were also removed with the help of surface rendering , using ITK - SNAP .;Based on this initial automatic segmentation , manual editing was then performed by an experienced neuro - radiologist , to correct segmentation errors in both T1 - and T2 - weighted MR images .;For example , if a hole / handle was found on the surface , the neuro - radiologist first localized the related slices and then checked the segmentation maps of both T1w and T2w images , in order to determine whether to fill the hole or cut the handle .;Ground truth generation;12
81034;34cf90fcbf83025666c5c86ec30ac58b632b27b0;Method;1;Deep Learning Architecture;: For the CUHK03 dataset , we compare our method with many existing approaches , including Filter Pair Neural Networks ( FPNN ) , Improved Deep Learning Architecture ( IDLA ) , Cross - view Quadratic Discriminant Analysis ( XQDA ) , PSD constrained asymmetric metric learning ( denoted as MLAPG ) , Sample - Specific SVM ( SS ) , Single image and Cross image representation ( SI - CI ) , Embedding Deep Metric ( EDM ) , Domain Guided Dropout ( DGD ) , DNS , S - LSTM and Gate - SCNN .;CUHK03;On this dataset , we conduct experiments on both the detected and the labeled datasets .;Comparison with State - of - the - art Methods;11
64188;28703eef8fe505e8bd592ced3ce52a597097b031;Method;1;LSTM layers;For all word - ordering experiments we use 2 - layer encoder and decoder LSTMs , each with 256 hidden units , and dropout with a rate of 0.2 between LSTM layers .;For experiments , we use the same PTB dataset ( with the standard training , development , and test splits ) and evaluation procedure as in zhang15discriminative and later work , with performance reported in terms of BLEU score with the correctly ordered sentences .;We use simple 0 / 1 costs in defining the function .;Word Ordering;12
54058;218b80da3eb15ae35267d280dcc4a806d515334a;Method;1;multi - round model inference;The characteristic allows us to edit a sentence more than once through multi - round model inference , which motivates our fluency boost inference .;Fortunately , neural GEC is different from NMT : its source and target language are the same .;As Figure [ reference ] ( b ) shows , fluency boost inference allows a sentence to be incrementally edited through multi - round seq2seq inference as long as the sentence ’s fluency can be improved .;Multi - round error correction;9
92994;3d18ce183b5a5b4dcaa1216e30b774ef49eaa46f;Method;0;regression based 3DMM fitting;Recently , regression based 3DMM fitting , which estimates the model parameters by regressing the features at landmark positions , has been proposed to improve the efficiency .;3DMM can cover arbitrary poses but suffers from the one - minute - per - image computation cost .;However , since the features at landmarks may be self - occluded as in 2D methods , the fitting algorithm is no longer pose - invariant and suffers from the three problems in Section [ reference ] .;Related Works;2
29551;0fbd17a4f791e04bbf8f240f7c48c178900e30a6;Method;0;SPPN ) method;( SPPN ) method per person to get the final pose predictions .;Top - down methods first detect people ( typically using a top performing , off - the - shelf object detector ) and then run a single person pose estimation;Since a SPPN model is run for each person instance , top - down methods are extremely slow , however , each pose estimator can focus on an instance and perform fine localization .;Top - down;6
81220;34f63959ea4a13a05948274a1558c6854a051150;Method;1;pre - trained model;To apply a pre - trained model to specific NLU tasks , we often need to fine - tune , for each task , the model with additional task - specific layers using task - specific training data .;For example , BERT is based on a multi - layer bidirectional Transformer , and is trained on plain text for masked word prediction and next sentence prediction tasks .;For example , bert2018 shows that BERT can be fine - tuned this way to create state - of - the - art models for a range of NLU tasks , such as question answering and natural language inference .;Introduction;1
22067;0c278ecf472f42ec1140ca2f1a0a3dd60cbe5c48;Method;1;pseudo - count based methods;DQN + SR , when using the - norm of the SR , exhibits performance comparable to pseudo - count based methods , despite not being the best results we obtained .;These results also support the claim that the norm of the SR can be used to generate exploration bonuses .;We also revisited the results presented in Section [ reference ] to evaluate the impact of the different norms in Sarsa + SR .;On the Mismatch between the Used Norms;14
40508;16cd50316e41cbb1d9dfeafeb524b31654cef37a;Method;1;convolutional neural net ( CNN ) and long - short - term memory ( LSTM ) models;Section [ reference ] describes the convolutional neural net ( CNN ) and long - short - term memory ( LSTM ) models .;Section [ reference ] describes our measurement of human performance .;Section [ reference ] describes our implementation of i - vector adaptation .;Introduction;1
55739;22aab110058ebbd198edb1f1e7b4f69fb13c0613;Method;0;high - resolution GANs;karras2018progan trains high - resolution GANs in the single - class setting by training a single model across a sequence of increasing resolutions .;ProGAN;In conditional GANs mirza2014conditional class information can be fed into the model in various ways .;Background;2
57207;2329a46590b2036d508097143e65c1b77e571e8c;Method;0;open source speech recognition software;That system was built using Kaldi , state - of - the - art open source speech recognition software .;( DNN - HMM FSH ) achieves 19.9 % WER when trained on the Fisher 2000 hour corpus .;We include this result to demonstrate that Deep Speech , when trained on a comparable amount of data is competitive with the best existing ASR systems .;Conversational speech : Switchboard Hub5’00 ( full );13
13266;074b6fe0cc6848fb86a6703d1c52074494177c79;Method;1;weak classification model;Our modified version , which uses the source labels to train a weak classification model which can be used to enforce semantic consistency before and after translation , resolves this issue and produces strong performance .;However , the MNIST - like image has mismatched semantics .;Ablation : No Cycle Consistency .;Digit Adaptation;5
5295;02e85d62fbd8249a046d00ac10e39546511b2a51;Method;0;3D version;This allows developing 2D variants with increased width , depth and size of training batch with similar requirements as the 3D version , which are valid candidates for model selection in practical scenarios .;so does the memory required .;We assess various configurations and present some representatives in Table [ reference ] along with their performance .;Processing 3D in comparison to 2D Context;15
15420;08d55271589f989d90a7edce3345f78f2468a7e0;Method;1;propagation process;The gradients back propagated through set pooling unit can be formulated as follows , So we can formulate propagation process of the final loss as Where is the dimension of images ’ representation .;Keeping this in mind , we consider the set pooling operation .;We discuss how a quality score is automatically learned by this back propagation process .;Training QAN without quality supervision;5
38087;15cc54ed7b1582b2efd71bedf28b23634d82991b;Method;1;spectral normalization;We propose a generalized power iteration method to directly estimate the spectral norm of a convolution kernel ( see Appendix [ reference ] for details ) and applied spectral normalization to the discriminator in all experiments .;However , to estimate the spectral norm of a convolution kernel , reshaped the kernel into a matrix .;In Appendix [ reference ] , we explore using gradient penalty to impose the Lipschitz constraint ( ) for the proposed repulsive loss .;Spectral Normalization in Discriminator;6
98248;40b0fced8bc45f548ca7f79922e62478d2043220;Method;1;HOGgles;In Figure [ reference ] , we perform a nonparametric reconstruction of images from features in the spirit of HOGgles .;, we provide a novel visual investigation of the effective pooling regions of convnet features .;Rather than paired dictionary learning , however , we simply replace patches with averages of their top - nearest neighbors in a convnet feature space .;Feature visualization;7
59807;2451db113552afb6d9ad15ef4009ec4133d28f74;Method;1;Newton - Schulz iterations;In the first part of experiments , we analyze , with AlexNet architecture , the design choices of our iSQRT - COV method , including the number of Newton - Schulz iterations , time and memory usage , and behaviors of different pre - normalization methods .;;We select AlexNet because it runs faster with shallower depth , and the results can extrapolate to deeper networks which mostly follow its architecture design .;Evaluation with AlexNet on ImageNet;14
65132;29c19276b8fff231717c3e342cb24144d2b77726;Method;1;mean log frequency;Figure [ reference ] shows absolute improvements in accuracy of bi - LSTM over mean log frequency , for different language families .;In order to evaluate the effect of modeling sub - token information , we examine accuracy rates at different frequency rates .;We see that especially for Slavic and non - Indoeuropean languages , having high morphologic complexity , most of the improvement is obtained in the Zipfian tail .;Rare words;8
39888;165ef2b5f86b9b2c68b652391db5ece8c5a0bc7e;Method;0;multi - scale and sliding pooling;We start from a baseline setting of our FeatMap - Net ( “ FullyConvNet Baseline ” in the result table ) , for which multi - scale and sliding pooling is removed .;We present the results of adding different components of FeatMap - Net in Table [ reference ] .;This baseline setting is the conventional fully convolution network for segmentation , which can be considered as our implementation of the FCN method in .;Component Evaluation;16
30600;100c730003033151c0f78ed1aab23df3e9bd5283;Method;1;perplexity;Since is intractable in the NVDM , we use the variational lower bound ( which is an upper bound on perplexity ) to compute the perplexity following mnih2014neural .;In document modelling , perplexity is computed by , where is the number of documents , represents the length of the th document and is the log probability of the words in the document .;While all the baseline models listed in Table [ reference ] apply discrete latent variables , here NVDM employs a continuous stochastic document representation .;Experiments on Document Modelling;7
57551;2393447b8b0b79046afea1c88a8ed3949338949e;Method;1;answer decoder;Also , the ranker shares the question - passages reader with the answer decoder , and this sharing contributed to the improvements over the ranker trained without the answer decoder .;Our ranker improved the initial ranking provided by Bing by a significant margin .;This result is similar to those reported in NishidaSOAT18 .;Does our joint learning improve the passage re - ranking performance ?;40
99850;41b38da2f4137c957537908f9cb70cbd2fac8bc1;Method;0;Geometric and appearance - based features;Geometric and appearance - based features are commonly used in facial expression recognition .;As the technology advances , vision systems will be able to sense subtle emotions and sentiments that humans can not .;In this study , we focus on spatial features , which are a type of geometric feature .;Introduction;1
25315;0d5fa5be4bfe085de8f88dbee1c3b2a6e5ab9ee2;Method;1;50 + layers;Even for PSPNet with 50 + layers , inference time and memory are 18ms and 0.6 GB for the large images in Cityscapes .;Although the top branch is based on a full segmentation backbone , the input resolution is low , resulting in limited computation .;Because weights and computation ( in 17 layers ) can be shared between low - and medium - branches , only 6ms is spent to construct the fusion map .;Network Architecture;10
70559;2d2a22f1f9eae9188f3d43254daa2d5b7f3a2470;Method;1;propagation model;For all the GGS - NNs in this section we used the simpler variant in which and share a single propagation model .;For bAbI task 4 , 15 , 16 , 18 , 19 , we used GG - NN with the size of node vectors set to , , , and respectively .;For shortest path and Eulerian circuit tasks , we used .;More Training Details .;16
5907;03184ac97ebf0724c45a29ab49f2a8ce59ac2de3;Method;1;weakly - supervised Word2Vec ( );For our proposed weakly - supervised Word2Vec ( ) , we use the same embedding dimensions as the plain Word2Vec ( ) .;We cross - validate the skip - window size and embedding dimensions .;For BoW , we download the Wikipedia articles that correspond to each class and build the vocabulary by omitting least - and most - frequently occurring words .;Experimental Setting;12
77298;3112d2d95d66b3d54a72c55072647aab937e410e;Method;0;bidirectional LSTM model;These vectors are concatenated ( into a vector in R 500 ) and fed into either a convolutional model or a bidirectional LSTM model .;Each model consumes the words in the sentence , which are embedded in R 200 , as well as the distances of each word in the sentence from both the entity - word - span and the number - word - spans ( as described above ) , which are each embedded in R 100 .;The convolutional model uses 600 total filters , with 200 filters for kernels of width 2 , 3 , and 5 , respectively , a ReLU nonlinearity , and maxpooling .;C. Information Extraction Details;18
78617;32a93598e8a338496f04a0ace81b0768c2ef059d;Method;0;cutting - edge hardware;The sheer size of the models requires cutting - edge hardware for training and makes using the models on standard setups very challenging .;Zhou2016 obtained state - of - the - art results on English French with a - layer LSTM with units per layer .;This issue of excessively large networks has been observed in several other domains , with much focus on fully - connected and convolutional networks for multi - class classification .;Introduction;1
115;000f90380d768a85e2316225854fc377c079b5c4;Method;0;CRFs;Most commonly , conditional random fields ( CRFs ) [ reference ] are applied on the network output;Many approaches apply smoothing operations to the output of a CNN in order to obtain more consistent predictions .;[ reference ][ reference ][ reference ][ reference ][ reference ] . More recently , some papers approximate the mean - field inference of CRFs using specialized network architectures;Related Work;3
35053;13ea9a2ed134a9e238d33024fba34d3dd6a010e0;Method;0;feature embedding;To take advantage of both the feature learning and similarity learning , Zheng and Geng combine the contrastive loss and the identification loss to improve the discriminative ability of the learned feature embedding , following the success in face verification .;To deal with spatial misalignment , Zheng propose the PoseBox structure similar to the pictorial structure to learn pose invariant embeddings .;This paper adopts the classification mode , which is shown to produce competitive accuracy without losing efficiency potentials .;Related Work;2
63141;27c761258329eddb90b64d52679ff190cb4527b5;Method;0;RNN );The performance of FCN has improved with recurrent neural networks ( RNN ) , which are fine - tuned on very large datasets [ reference ] . Semantic image segmentation with DeepLab is one of the state - of - the - art performing methods;The main drawback of this approach is that a large number of pixel overlap and the same convolutions are performed many times .;[ reference ] . SegNet consists of two parts , one is the encoding network which is a 13 - layer VGG16 network [ reference ] , and the corresponding decoding network uses pixel - wise classification layers .;II . RELATED WORK;3
22471;0c769c19d894e0dbd6eb314781dc1db3c626df57;Method;0;sampling strategies;Different sampling strategies could significantly impact the convergence rate and quality , but finding efficient sampling strategies becomes much more difficult as increases .;However , they are not efficient as only several data samples are compared at each time , and there are potential input combinations , where is the number of images .;Another approach is learning to classify identities with the Softmax loss function , which effectively compares all the samples at the same time .;Introduction;1
20716;0b5519f76fc8e31ecf9931f00184aee86694e3a4;Method;1;2D t - SNE embedding;Finally , in order to summarize the spatially varying structure of the filters , we use the 2D t - SNE embedding to assign a color to each centroid ( as given by the reference color chart shown top - left ) , and visualize the nearest centroid for the filter at each filter location in the third row grid in Fig .;PCA reveals smooth , symmetric harmonic structure for super - resolution with some intriguing vertical and horizontal features .;[ reference ] .;Visualization and Analysis;14
16508;09ec60f2eea5d43792b2bc9da63b1c9b7719f666;Method;0;analytically - differentiable rendering layer;With the projection parameter , 3D shape , and texture , a novel analytically - differentiable rendering layer is designed to reconstruct the original input face .;Two decoders serve as the nonlinear 3DMM to map from the shape and texture parameters to the 3D shape and texture , respectively .;The entire network is end - to - end trainable with only weak supervision .;Nonlinear 3D Face Morphable Model;0
54599;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;1;temporal filters;The temporal filters show a clear derivative - like structure in time .;We also observe filters that spatially resemble second derivative or Gabor filters .;Note that these filters are very different from those reported in ( Sup .;Visualization of Learned Filters .;13
74756;2f92b10acf7c405e55c74c1043dabd9ded1b1800;Method;0;integration model;Rather than learning both how to incorporate relevant information and which information is relevant , we use a heuristic retrieval mechanism ( § [ reference ] ) and focus on the integration model .;The retrieval and preparation of contextually relevant information from knowledge sources is a complex research topic by itself , and there are several statistical Manning:2008 and more recently neural approaches mitra2017neural as well as approaches based on reinforcement learning nogueira2017 .;In the next section , we turn to the question of how to leverage the retrieved supplementary knowledge ( encoded as text ) in a NLU system .;External Knowledge as Supplementary Text Inputs;2
58969;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;Vaes;Vaes generalize linear latent - factor models and enable us to explore non - linear probabilistic latent - variable models , powered by neural networks , on large - scale recommendation datasets .;Here , we extend variational autoencoders ( vaes ) [ reference ][ reference ] to collaborative filtering for implicit feedback .;We propose a neural generative model with multinomial conditional likelihood .;INTRODUCTION;2
23219;0ca2bd0e40a8f0a57665535ae1c31561370ad183;Method;1;COPY operation;The COPY operation , which simply performs ( c t ,;For example , in the case of the COPY operation , we do not need to compute any of these values and thus can save computations .;h t ) ← ( c t−1 , h t−1 ) , implements the observation that an upper layer should keep its state unchanged until it receives the summarized input from the lower layer .;THE PROPOSED MODEL;5
102107;42e80c73867bff9eaff6beceb8730fc1276283b9;Method;1;skip - gram;More concretely , we train our n - gram embeddings using phrase2vechttps: // github.com / artetxem / phrase2vec , a simple extension of skip - gram that applies the standard negative sampling loss of mikolov2013distributed to bigram - context and trigram - context pairs in addition to the usual word - context pairs .;So as to build our initial phrase - table , we follow artetxe2018usmt and learn n - gram embeddings for each language independently , map them to a shared space through self - learning , and use the resulting cross - lingual embeddings to extract and score phrase pairs .;Having done that , we map the embeddings to a cross - lingual space using VecMap with identical initialization artetxe2018robust , which builds an initial solution by aligning identical words and iteratively improves it through self - learning .;Initial phrase - table;4
78018;31e5dab321066712cdc8b30943f7950066840ee1;Method;1;tree;The best performing model is GCNSeq , both as a tree and as a graph encoder , with the latter obtaining the highest results .;The results show a clear advantage of tree and graph encoders over the sequential encoder .;Table [ reference ] shows the comparison between our best sequential ( Seq ) , tree ( GCNSeq without reentrancies , henceforth called Tree ) and graph encoders ( GCNSeq with reentrancies , henceforth called Graph ) on the test set of LDC2015E86 and LDC2017T10 .;Experiments;13
71805;2dad7e558a1e2982d0d42042021f4cde4af04abf;Method;1;dilated recurrent skip connection;"The main ingredients of the DilatedRNN are its dilated recurrent skip connection and its use of exponentially increasing dilation ; these will be discussed in the following two subsections respectively .";;;Dilated Recurrent Neural Networks;2
102637;42f20d37f4eba56284a941d5f9f58609ee650de0;Method;0;self - similarity based method SelfEx;For image “ Chip ” which contains repetitive structures , a self - similarity based method SelfEx is also included for comparison .;“ Cat ” which is corrupted by compression artifacts , Waifu2x is also used for comparison .;It can be observed from the visual results that SRMD can produce much more visually plausible HR images than the competing methods .;Experiments on Real Images;17
97701;40193e7ba0fbd7153a1fe15e95563463b67c71f3;Method;0;2DASL;"We believe more "" in - the - wild "" face images used for training ensures better performance of 2DASL .";• to 90 • ) , 2DASL achieves 0.2 lower NME than PRNet .;;Dense face alignment;15
79942;33a8d0a35390fde736744d4a0dd20dff7961c777;Method;1;strong graph kernels;It again show a consistent performance gain of accuracy ( highest being on PTC dataset ) on many bioinformatic datasets when compared against with strong graph kernels .;Our GCAPS - CNN is also very competitive with state - of - art graph kernel methods .;While other considered deep learning methods are not even close enough to beat graph kernels on many of these datasets .;Experiment and Results;12
54585;220a0b46840a2a1421c62d3d343397ab087a3f17;Method;0;learning of residual flow;the warping function and learning of residual flow .;–;By using the warping function directly , the convnet does not need to learn it .;Model Size;12
47908;1cf6bc0866226c1f8e282463adc8b75d92fba9bb;Method;1;spatial attention architecture;We propose a novel spatial attention architecture that aligns words with image patches in the first hop , and obtain improved results by adding a second attention hop which considers the whole question to choose visual evidence based on the results of the first hop .;Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory , and uses the question to choose relevant regions for computing the answer , a process of which constitutes a single “ hop ” in the network .;To better understand the inference process learned by the network , we design synthetic questions that specifically require spatial inference and visualize the attention weights .;Ask , Attend and Answer : Exploring Question - Guided Spatial Attention for Visual Question Answering;0
27230;0ee850dd6640a96531ac5ad21da5438db04d8b3c;Method;1;base neural network model;Then , we describe the architecture of the base neural network model shared by different ranking models .;In this section , we first introduce our ranking models .;Finally , we discuss the three input layer architectures used in our neural rankers to encode ( query , candidate document ) pairs .;Neural Ranking Models;4
14391;07f3f736d90125cb2b04e7408782af411c67dd5a;Method;1;unfolding - RAE;Nevertheless , our generic matching models still manage to perform reasonably well , achieving an accuracy and F1 score close to the best performer in 2008 based on hand - crafted features , but still significantly lower than the state - of - the - art ( 76.8% / 83.6 % ) , achieved with unfolding - RAE and other features designed for this task .;As stated earlier , our model is not specially tailored for modeling synonymy , and generally requires instances to work favorably .;;Experiment III : Paraphrase Identification;22
34356;12f008bea798a05ebfa2864ec026999cb375bcd9;Method;0;neural network readers;However , existing neural network readers are restricted to either attend to tokens hermann2015teaching , chen2016thorough or entire sentences weston2014memory , with the assumption that certain sub - parts of the document are more important than others .;As an example , human readers are able to keep the question in mind during multiple passes of reading , to successively mask away information irrelevant to the query .;In contrast , we propose a finer - grained model which attends to components of the semantic representation being built up by the GRU .;Gated - Attention Reader;3
61441;269c7aeca29dae51dca8208815f1c4c81bd471c2;Method;1;multi - task based learning algorithm;In this part , we describe a multi - task based learning algorithm to jointly learn these features .;According to Equation [ reference ] , feature output from the last FC layer is decomposed into and .;An overview of the proposed CNN model is illustrated in Figure [ reference ] .;Multi - Task Learning;4
51209;1e7a36c4d4f96b29e3edf51b6eb61f8e16217704;Method;0;Character level models;Character level models do have an inherent advantage of being able to capture subword language information , motivating their use on traditionally word - level tasks .;From the perspective of training , character level language models must model longer range dependencies , and must learn a more complex non - linear fit to capture joint dependencies between characters .;Character level language models can be compared with word level language models by converting bits per character to perplexity .;WikiText - 2;12
92577;3c1d781f2dab8da12e3cb0e4d7abfb440a340a09;Method;0;probability distribution strategies;We also experiment with majority voting and averaging the probability distribution strategies for ensemble models using the same set of models as our weighted averaging ensemble method ( as described above ) .;Formally , we replace Equations [ reference ] and [ reference ] in the paper with the following equations respectively : Our final ensemble model , DR - BiLSTM ( Ensemble ) is the combination of the following 6 models : tanh - Projection , DR - BiLSTM ( with 1 round of dependent reading ) , DR - BiLSTM ( with 3 rounds of dependent reading ) , and 3 DR - BiLSTMs with different initialization seeds .;Figure [ reference ] shows the behavior of the majority voting strategy with different number of models .;Ensemble Strategy Study;18
90451;3a61d5fbc8d99310965fd91b12527d1cd69d7116;Method;0;C version;On the other hand , CornerNet - Squeeze is implemented in Python and still faster than the C version of YOLOv3 .;YOLOv3 is implemented in C and also provides a Python API , which adds a 10ms overhead to the inference time .;There is a potential speed - up if we implement CornerNet - Squeeze purely in C. Ablation Study;CornerNet - Squeeze Results;18
74385;2f0c30d6970da9ee9cf957350d9fa1025a1becb4;Method;0;Category - Aware RPN;Category - Aware RPN is almost the same as the region proposal network in , except that the 2 - class ( object or not );A following softmax layer then outputs the per - pixel probabilities .;convolutional classifier is replaced by a - class convolutional classifier .;Deformable ConvNets;5
61475;269c7aeca29dae51dca8208815f1c4c81bd471c2;Method;1;age regression task;Though we train the identity - related component with a loss function similar to A - Softmax , the proposed algorithm takes advantage of the age information to explicitly train age - related component with an additional age regression task ( Equation [ reference ] ) .;SphereFace introduces A - Softmax loss to learn the angular margin between identities for GFR .;To intuitively investigate the impact by introducing such additional age regression task , we construct a toy example to compare features learned by Softmax , A - Softmax and our proposed algorithm .;Discussion;5
73987;2ebfc12285f5d426e0d0e8d2befa1af27f99a56e;Method;0;unsupervised fashion;A novel definition of an appearance medial axis transform ( AMAT ) has been proposed in , to detect symmetry in the wild in a purely bottom up , unsupervised fashion .;Lee et al . improve the approach in by using a deformable disc model , which can detect curved and tapered symmetric parts .;In , the authors present an unconventional method based on joint co - skeletonization and co - segmentation .;Related Work;2
59028;23f5854b38a15c2ae201e751311665f7995b5e10;Method;0;multinomial logit choice model;This multinomial likelihood is commonly used in language models , e.g. , latent Dirichlet allocation [ reference ] , and economics , e.g. , multinomial logit choice model;The log - likelihood for user u ( conditioned on the latent representation ) is :;[ reference ] . It is also used in the cross - entropy loss 2 for multi - class classification .;Model;4
676;007ab5528b3bd310a80d553cccad4b78dc496b02;Method;1;bidirectional LSTM layer;For the end index of the answer phrase , we pass to another bidirectional LSTM layer and obtain .;We obtain the probability distribution of the start index over the entire paragraph by where is a trainable weight vector .;Then we use to obtain the probability distribution of the end index in a similar manner : Training .;6 . Output Layer .;8
2475;0171bdeb1c6e333287be655c667cfba5edb89b76;Method;1;multi - scale , multi - crop testing;We had a single - model top - 1 / top - 5 error rates of 17.7% / 3.7 % using the multi - scale dense testing in [ reference ] , on par with Inception - ResNet - v2 's single - model results of 17.8% / 3.7 % that adopts multi - scale , multi - crop testing .;We note that many models ( including ours ) start to get saturated on this dataset after using multi - scale and / or multicrop testing .;We had an ensemble result of 3.03 %;Experiments on ImageNet - 1 K;11
26812;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;1;Inter - Algorithm Normalization;subsubsection : Inter - Algorithm Normalization;"For example , humans often play games without seeking to maximize score ; humans also benefit from prior knowledge that is difficult to incorporate into domain - independent agents .";A third alternative is to normalize using the scores achieved by the algorithms themselves .;Inter - Algorithm Normalization;26
9294;05357b8c05b5bc020e871fc330a88910c3177e4d;Method;1;alternatively training strategy;Compared with the alternatively training strategy , we demonstrate that our method can not only reduce the training time , but also boost the performance .;The proposed network is end - to - end trainable .;Our method achieves significantly better results over previous state - of - the - art methods on the challenging PASCAL VOC 2007 and 2012 benchmarks for weakly supervised object detection .;Introduction;1
7733;051b3763c2ad4e4271db712b0e9a4cfe298d05db;Method;0;compact network;A compact network termed SPyNet from Ranjan is inspired from spatial pyramid .;Our model uses a more efficient architecture containing 30 times fewer parameters than FlowNet2 while the performance is on par with it .;Nevertheless , the accuracy is far below FlowNet2 .;Related Work;2
107194;45fdc73a239e9c6ea65e98c96f6a2d6dc35d6f72;Method;0;real - valued matrices representation of quaternions;For this computation , the Hamilton product is computed using the real - valued matrices representation of quaternions .;In a QCNN , the convolution of a quaternion filter matrix with a quaternion vector is performed .;Let be a quaternion weight filter matrix , and the quaternion input vector .;Quaternion - valued convolution;5
55093;228db5326a10cd67605ce103a7948207a65feeb1;Method;0;maximum entropy language model;Different from , the approach proposed in first used a CNN to detect words given the images , then used a maximum entropy language model to generate a list of caption candidates , and finally used a deep multimodal similarity model ( DMSM ) to re - rank the candidates .;The method proposed in went one step further to use an attention mechanism in the caption generation process .;Instead of using a RNN or a LSTM , the DMSM uses a CNN to model the semantics of captions .;Related Work;2
25037;0d467adaf936b112f570970c5210bdb3c626a717;Method;0;Chairs;While Chairs are similar to Sintel , UCF101 is fundamentally different and contains much more small displacments .;We compute optical flow using LDOF and compare the flow magnitude distribution to the synthetic datasets we use for training and benchmarking , this is shown in Figure [ reference ] .;To create a training dataset similar to UCF101 , following , we generated our ChairsSDHom;The ChairsSDHom Dataset;22
4424;027f9695189355d18ec6be8e48f3d23ea25db35d;Method;1;leaf transformation;which we call leaf transformation .;Our basic model applies an affine transformation to each x i to obtain the initial hidden and cell state :;In Eq .;Gumbel Tree - LSTM;7
78490;325093f2c5b33d7507c10aa422e96aa5b10a33f1;Method;0;single - scale approach;This setting leads to the highest reported single - scale score of 53.12 % on validation data so far , significantly outperforming the LSUN 2017 segmentation winner ’s single - scale approach of 51.59 % .;Due to the increase of object classes ( 19 for Cityscapes and 65 for Vistas ) , we used minibatches of 12 crops at ( with InPlace - ABNsync ) , increased the initial learning rate to and trained for 90 epochs .;As also listed in Table [ reference ] , their approach additionally used hybrid dilated convolutions , applied an inverse frequency weighting for correcting training data class imbalance as well as pretrained on Cityscapes .;Semantic Segmentation;10
17339;0a1dc95e4c884a91bd141df8133d1b4961178123;Method;0;example - based;To learn the prior , recent state - of - the - art methods mostly adopt the example - based [ reference ] strategy .;Such a problem is typically mitigated by constraining the solution space by strong prior information .;These methods either exploit internal similarities of the same image [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , or learn mapping functions from external low - and high - resolution exemplar pairs [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] , [ reference ] ,;INTRODUCTION;2
26582;0ecd4fdce541317b38124967b5c2a259d8f43c91;Method;0;AI techniques;We illustrate the promise of ALE by developing and benchmarking domain - independent agents designed using well - established AI techniques for both reinforcement learning and planning .;Most importantly , it provides a rigorous testbed for evaluating and comparing approaches to these problems .;In doing so , we also propose an evaluation methodology made possible by ALE , reporting empirical results on over 55 different games .;The Arcade Learning Environment : An Evaluation Platform for General Agents;0
70408;2d2a22f1f9eae9188f3d43254daa2d5b7f3a2470;Method;1;graph - based neural network model;We will show how the GNN framework can be adapted to these settings , leading to a novel graph - based neural network model that we call Gated Graph Sequence Neural Networks ( GGS - NNs ) .;In these cases , the challenge is how to learn features on the graph that encode the partial output sequence that has already been produced ( e.g. , the path so far if outputting a path ) and that still needs to be produced ( e.g. , the remaining path ) .;We illustrate aspects of this general model in experiments on bAbI tasks weston2015towards and graph algorithm learning tasks that illustrate the capabilities of the model .;Introduction;1
82416;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;0;mixture 10 Gaussians;In contrast , the VAE exhibit systematic differences from the mixture 10 Gaussians indicating that the VAE emphasizes matching the modes of the distribution as discussed above ( Figure [ reference ] d ) .;The adversarial autoencoder successfully matched the aggregated posterior with the prior distribution ( Figure [ reference ] b ) .;An important difference between VAEs and adversarial autoencoders is that in VAEs , in order to back - propagate through the KL divergence by Monte - Carlo sampling , we need to have access to the exact functional form of the prior distribution .;Relationship to Variational Autoencoders;4
93342;3d5d9d8e74b215609eabba80ef79a35ebf460e49;Method;1;cross - domain mapping;Given a pair of unaligned images , we first perform a cross - domain mapping to obtain intermediate results by swapping the attribute vectors from both images .;To handle unpaired datasets , we propose a cross - cycle consistency loss using the disentangled representations .;We can then reconstruct the original input image pair by applying the cross - domain mapping one more time and use the proposed cross - cycle consistency loss to enforce the consistency between the original and the reconstructed images .;Introduction;1
52369;207e0ac5301a3c79af862951b70632ed650f74f7;Method;1;cross - view correspondence matrix;To utilise the unlabelled data , we use their projections to build a cross - view correspondence matrix which captures the identity relationship for the unlabelled people across views .;Then is projected to the lower - dimensional subspace through and becomes .;Note , since the data are unlabelled , the true cross - view correspondence relationship is unknown .;Semi - supervised Learning;9
80217;34273979fd2a62fd7b49ee6d14a925864ff94e74;Method;0;recurrent steps;One of the main differences between the relational network and our proposed model , aside from the recurrent steps , is that we encode the sentences and question together .;To test which parts of the proposed model is important to solving the bAbI tasks we perform ablation experiments .;We ablate the model in two ways to test how important this is .;bAbI ablation experiments;19
12742;072fd0b8d471f183da0ca9880379b3bb29031b6a;Method;0;label maps;We demonstrate that this approach is effective at synthesizing photos from label maps , reconstructing objects from edge maps , and colorizing images , among other tasks .;This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations .;Indeed , since the release of the pix2pix software associated with this paper , a large number of internet users ( many of them artists ) have posted their own experiments with our system , further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking .;Image - to - Image Translation with Conditional Adversarial Networks;0
78431;325093f2c5b33d7507c10aa422e96aa5b10a33f1;Method;1;body networks;We chose to adopt the recently introduced DeepLabV3 segmentation approach as head , and evaluate its performance with body networks from § [ reference ] .;Segmentation approach .;DeepLabV3 is exploiting atrous ( dilated ) convolutions in a cascaded way for capturing contextual information , together with crop - level features encoding global context ( close in spirit to PSPNet ’s global feature ) .;Semantic Segmentation;10
522;0012de6bec1f25599e4f02517637e531a71909b9;Method;1;multinomial logistic loss;Using this formulation we do not need to assign weights to samples of different classes to establish the right balance between foreground and background voxels , and we obtain results that we experimentally observed are much better than the ones computed through the same network trained optimising a multinomial logistic loss with sample re - weighting ( Fig .;This formulation of Dice can be differentiated yielding the gradient computed with respect to the - th voxel of the prediction .;[ reference ] ) .;Dice loss layer;3
1403;00b1cdc5bd77bf27f9b1ca630365eeeb456913b4;Method;0;Bonanza engine;"They also performed a large - scale optimization based on minimax search regulated by expert game logs ; this formed part of the Bonanza engine that won the 2013 World Computer Shogi Championship .";Kaneko and Hoki trained the weights of a shogi evaluation function comprising a million features , by learning to select expert human moves during alpha - beta serach .;Giraffe evaluated positions by a neural network that included mobility maps and attack and defend maps describing the lowest valued attacker and defender of each square .;Prior Work on Computer Chess and Shogi;4
43568;19fd2c2c9d4eecb3cf1befa8ac845a860083e8e7;Method;1;single GPU DQN;Using human starts Gorila DQN outperformed single GPU DQN on 41 out of 49 games given roughly one half of the training time of single GPU DQN .;Figure [ reference ] shows the normalized scores under the human starts evaluation .;On 22 of the games Gorila DQN obtained double the score of single GPU DQN , and on 11 games Gorila DQN ’s score was 5 times higher .;Results;13
87960;37b685caf39b38b07af60eacf1a7d7ada2122372;Method;1;precomputed functional map;paragraph : Initialization by precomputed functional map;Such a maneuver is more friendly to differentiation and easier to train .;Given the huge optimization space and the non - convex objective , a good starting point helps to avoid optimization from getting stuck in bad local minima .;Initialization by precomputed functional map;18
20652;0b5519f76fc8e31ecf9931f00184aee86694e3a4;Method;1;inverse mapping;First , our CNN model learns a direct inverse mapping from blurry patch to its clear counterpart based on the learned image distribution , whereas only estimates the blur kernel for the patch and uses an offline optimization for non - blind deblurring , resulting in some artifacts such as ringing .;We attribute our better performance to two reasons .;Second , our CNN architecture is higher fidelity than the one used in , as ours outputs full - resolution result and learns internally to minimize artifacts , e.g. , aliasing and ringing effect .;Non - Uniform Motion Blur Removal;11
64022;28703eef8fe505e8bd592ced3ce52a597097b031;Method;0;DAgger;DAgger is a similar approach , which differs in terms of how training examples are generated and aggregated , and there have additionally been important refinements to this style of training over the past several years .;Thus , SEARN explicitly targets the mismatch between oracular training and non - oracular ( often greedy ) test - time inference by training on the output of the model ’s own policy .;When it comes to training RNNs , SEARN / DAgger has been applied under the name “ scheduled sampling ” , which involves training an RNN to generate the ’ st token in a target sequence after consuming either the true ’ th token , or , with probability that increases throughout training , the predicted ’ th token .;Related Work;2
31863;1023b20d226bd0af9fdf0fd1847accefbfa5ec84;Method;0;forward network;This time the last hidden state of the forward network is concatenated with the last hidden state of the backward network to form the query embedding , that is .;The query encoder is implemented by another bidirectional GRU network .;The word embedding function is implemented in a usual way as a look - up table .;Model instance details;9
82657;357776cd7ee889af954f0dfdbaee71477c09ac18;Method;1;probabilistic autoencoders;In this paper , we proposed to use the GAN framework as a variational inference algorithm for both discrete and continuous latent variables in probabilistic autoencoders .;;Our method called the adversarial autoencoder ( AAE ) , is a generative autoencoder that achieves competitive test likelihoods on real - valued MNIST and Toronto Face datasets .;Conclusion;12
43474;19fd2c2c9d4eecb3cf1befa8ac845a860083e8e7;Method;1;off - policy RL algorithm;The learner applies an off - policy RL algorithm such as DQN to this minibatch of experience , in order to generate a gradient vector .;For each learner update , a minibatch of experience tuples is sampled from either a local or global experience replay memory ( see above ) .;"The gradients are communicated to the parameter server ; and the parameters of the Q - network are updated periodically from the parameter server .";Distributed Architecture;7
86913;3729a9a140aa13b3b26210d333fd19659fc21471;Method;1;random strategy;We see that the scores of the semantic tasks drop by the random strategy .;Table [ reference ] shows the results of training our model by randomly shuffling the order of the tasks for each epoch in the column of “ Random ” .;In our preliminary experiments , we have found that constructing the mini - batch samples from different tasks also hampers the effectiveness of our model , which also supports our hypothesis .;Order of training;33
64621;289e91654f6da968d625481ef21f52892052d4fc;Method;1;char - based models;We observed the following from the Table [ reference ] : Word - based models are better than char - based models in Kanshan - Cup dataset .;Table [ reference ] gives the performance of our model and baselines over two different datasets with respect to Precision , Recall@5 and .;That may be because in Chinese the words can offer more supervisions than characters and the question tagging task needs more word supervision .;Performance Comparison;24