tgoop.com/CodeProgrammer/3876
Create:
Last Update:
Last Update:
GPU by hand ✍️ I drew this to show how a GPU speeds up an array operation of 8 elements in parallel over 4 threads in 2 clock cycles. Read more
CPU
• It has one core.
• Its global memory has 120 locations (0-119).
• To use the GPU, it needs to copy data from the global memory to the GPU.
• After GPU is done, it will copy the results back.
GPU
• It has four cores to run four threads (0-3).
• It has a register file of 28 locations (0-27)
• This register file has four banks (0-3).
• All threads share the same register file.
• But they must read/write using the four banks.
• Each bank allows 2 reads (Read 0, Read 1) and 1 write in a single clock cycle.
#AIEngineering #MachineLearning #DeepLearning #LLMs #RAG #MLOps #Python #GitHubProjects #AIForBeginners #ArtificialIntelligence #NeuralNetworks #OpenSourceAI #DataScienceCareers