Table 2. Number of (un)shared parameters of each model
| (A) Retailer S | (B) Retailer L | (C) Grocery S | (D) Grocery L |
Vocabulary size | 2,138 | 2,160 | 8,765 | 9,521 |
#Total parameters | 1,702,490 | 1,710,960 | 4,253,885 | 4,544,945 |
#Shared parameters | 879,360 |
Encoder layer | 406,528 |
Decoder layer | 472,832 |
#Unshared parameters | 823,130 | 831,600 | 3,374,525 | 3,665,585 |
Encoder embedding layer | 273,664 | 276,480 | 1,121,920 | 1,218,688 |
Decoder embedding layer | 273,664 | 276,480 | 1,121,920 | 1,218,688 |
Output layer | 275,802 | 278,640 | 1,130,685 | 1,228,209 |