Paper-Weekly11-FiD-ICL
LLMs are capable of few-shot in-context learning(ICL), i.e. performing a new task by prepending a few demonstrations before the test input. However, concatenated demonstrations are often excessively long and induce additional computation. Insipired by FID that fused multiple passages and outperforms concatenation-based models, same techinique can be applied to ICL.
They draw connections between open-domain QA (Chen and Yih, 2020) and ICL, since both problems task a model with reading long context and making a prediction based on the context (answer a relevant question vs. infer about a new input).
Evaluation methods: train models to perform ICL on a mixture of tasks using one selected fusion method, then evaluated on held-out tasks.
Performance: FiD-ICL>Concatenate & Ensemble.
实现细节:
在meta training阶段:
- 随机sample出一个task T
- 再sample k个support examples,${(x_i^{(s)},y_i^{(s)})}$
- Sample m个query examples,${(x_i^{(q)},y_i^{(q)})}$
- 更新模型参数,利用不同的fusion方式,minimize the loss of generating correct target sequences.
三种fusion方式
Early Fusion: Concatenation-based ICL
Intermediate Fusion: Fusion-in-decoder(FiD)
The support examples and the query are encoded separately by the same encoder layers in the transformer model. The representations produced by the last encoder layer are then concatenated and sent to the decoder layers.
Computation cost grows linearly with the number of shots. Unlike open-domain QA where question and paragraph are first concatenated and then encoded.
Late Fusion: Ensemble-based ICL
其实就是拿一堆ICL的结果进行投票,作者的实现方式是1-shot ICL,再把他们的类别概率相加取argmax。
本文的问题
- 只能用于encoder-decoder结构,需要一个encoder来做Fusion,因此不适用于大规模的GPT
- 依然是劣于fine tuning,且提升并不是universal的,是否T0存在影响。并且没有在更大参数规模和更多shot设置下完善实验。