philschmid

ISSUE #6: First open-source ChatGPT alternative got released

Published on:
3 min read

The first open-source ChatGPT alternative got released! Together released a 20B chat-GPT model! πŸ—£Β The model is an instruction-tuned large language model, fine-tuned for a chatΒ from EleutherAI GPT-NeoX-20B with over 43 million instructions under an Apache-2.0 license! πŸ’₯πŸ’₯

Check out the demo on Hugging Face!!

News & Announcements πŸ“£

Google Cloud brings generative AI to developers, businesses, and governments. Google launches Generative AI toolkit for Vertex AI to bring PaLM and Image Generation models to customers. Additionally, Google Cloud announced a PaLM API similar to GPT-3, which is will be available in the coming months.

Brave (Browser) announced a new AI Summarizer to answer your searches directly powered by Hugging Face models.

After FLAN-T5, Google trained and released FLAN-UL2, a 20 billion parameter transformer that outperforms FLAN-T5 by a relative of ~4%.

DeepSpeed Inference added automatic Tensor Parallelism for HuggingFace Models for better latency on multi-GPU environments.

MosaicML announced and probably changed the game of pre-training BERT by improving its architecture to accelerate pertaining. The team achieves BERT-base (2018) performance on GLUE after $22 worth of computing.

The Open Assistant initiative has published its first β€œofficial” alpha models (12B) on Hugging Face, already showing impressive performance.

Tutorials & Demos πŸ“

Diffusers added ControlNet to allow controlled text-to-image generation. Check out my latest blog post on how to deploy ControlNet on Inference Endpoints.

Stanford did a nice session about AI Safety, RLHF, and Self-Supervision with Jared Kaplan from Anthropic.

Stas investigated which optimizer provides the best performance and speed for training transformers.

Hugging Face has published a blog on how to fine-tune 20B LLMs with RLHF on a 24GB consumer GPU.

Reads & Papers πŸ“š

NVlabs published Prismer, a Vision-Language Model with Multi-Modal Experts, achieving state-of-the-art performance on visual question answering and image captioning. Check out the demo to test it.

Meet in the Middle proposed a new pre-training paradigm for language models showing impressive results by outperforming Codex 12B using a 2.7B model.

IBM and Stanford propose UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers to improve domain adaptation under annotation constraints and domain shifts.

AWS published an insightful blog post about training large language models on Amazon SageMaker.

Stitch Fix shared their learning and improvements by integrating LLMs to generate product categories and product descriptions.

Dennis has taken a close look at the summarization endpoint of cohere.


I hope you enjoyed this newsletter. πŸ€— If you have any questions or are interested in collaborating, feel free to contact me on Twitter or LinkedIn.

See you next week πŸ‘‹πŸ»πŸ‘‹πŸ»

πŸ—žοΈ Stay updated with bi-weekly Transformers & Cloud News and Insights delivered to your inbox