Woof to Husky’Log!

This is Husky’Log, a starting explorer in AI world. I write about my projects, thoughts, and more. Wish I can develop the habit of writing high-quality blogs.

How to Write a Good Letter of Recommendation?

When requesting recommendation letter from Professors, it鈥檚 quite common nowadays you will be asked to provide a timeline, a work summary, or even a draft of your letter. Professors are all busy (you may fail to imagine how many emails they need to reply in a day), so it鈥檚 a reasonable request. Besides, this is actually a good news, since you could handle the content of recommendation letter by yourself. Therefore, you need to learn how to write a good recommendation letter, even as a student. Here I collected several sources and notes from them, hope they could be helpful. ...

October 22, 2024 路 10 min 路 1994 words 路 Benhao Huang

Paper Reading: Cheating Popular LLM Benchmarks

Anti-cheating has long been a critical consideration when designing the rules for leaderboards, but this remains unexplored in the context of LLM benchmarks. ( Citation: Zheng , & al., 2024 Zheng, X., Pang, T., Du, C., Liu, Q., Jiang, J., Lin, M.(2024). Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates.Retrieved from http://arxiv.org/abs/2410.07137 ) . Introduction There are many well-known LLM benchmarks, such as AlpacaEval 2.0 ( Citation: Dubois , & al., 2024 Dubois, Y., Galambosi, B., Liang, P., Hashimoto, T.(2024). Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators.https://doi.org/10.48550/arXiv.2404.04475 ) , Arena-Hard-Auto ( Citation: Li , & al., 2024 Li, T., Chiang, W., Frick, E., Dunlap, L., Wu, T., Zhu, B., Gonzalez, J., Stoica, I.(2024). From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline.https://doi.org/10.48550/arXiv.2406.11939 ; Citation: Zheng , & al., 2023 Zheng, L., Chiang, W., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J., Stoica, I.(2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.https://doi.org/10.48550/arXiv.2306.05685 ) , and MTBench ( Citation: Zheng , & al., 2023 Zheng, L., Chiang, W., Sheng, Y., Zhuang, S., Wu, Z., Zhuang, Y., Lin, Z., Li, Z., Li, D., Xing, E., Zhang, H., Gonzalez, J., Stoica, I.(2023). Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena.https://doi.org/10.48550/arXiv.2306.05685 ) . They are widely used in the research community to evaluate the performance of LLMs. ...

October 11, 2024 路 8 min 路 1663 words 路 Benhao Huang