Wish me luck for writing, rewriting, incorporating all the changes, editing and rewriting linguistically the rest of the paper by tomorrow afternoon.
At least it is only 4 pages but packed with information.
At least it is only 4 pages but packed with information.
(Oh come on you are such a drama queen. But is my first real dive into an academic world. All the way. Thesis and my ex-job shared task escapades do not count).
Ksenia's Public Channel on #whatever
(Semantic circle about concepts, what the hell does it even mean.)
Ahahaha, I am not the only one 🙄
David invented semantic-rationale tracking to say that we provide the evidence information is based on in the answer. Hilarious.
David invented semantic-rationale tracking to say that we provide the evidence information is based on in the answer. Hilarious.
MIT made a statement about MIT exams paper and basically said: the guys obtained the exams dataset without consent, so the paper didn’t turn out well. It would be a great paper otherwise.
Nobody said anything about gpt4 grading answers of gpt4. Go MIT.
Nobody said anything about gpt4 grading answers of gpt4. Go MIT.
I never really liked it and there were many cases where it didn’t make sense to me but I was trying to fit my code in it because everywhere handbooks and courses were saying I should.
I self taught myself to code and I had no idea what is right and what wrong. I just had a vague feeling that functional programming covers a lot of my needs and why should I build OOP architectures from scratch when it is a hell easier to do it with functions.
https://twitter.com/meaningness/status/1672999331618766848?s=46&t=tJA3XyJ6UXEb-9Si1PfeJQ
I self taught myself to code and I had no idea what is right and what wrong. I just had a vague feeling that functional programming covers a lot of my needs and why should I build OOP architectures from scratch when it is a hell easier to do it with functions.
https://twitter.com/meaningness/status/1672999331618766848?s=46&t=tJA3XyJ6UXEb-9Si1PfeJQ
Twitter
@tdietterich Yes, the good original idea was that a software object should correspond one-to-one with a real-world object; and some version of OOP makes sense for that.
In the SIMULA papers, they got it right at first, but the confusion started nearly immediately:…
In the SIMULA papers, they got it right at first, but the confusion started nearly immediately:…
Of course, deep learning libraries are all built on OOP because
(1) datasets, models, trainers, learning rate schedules etc etc. are indeed all real world objects
(2) they need to be reutilizable and adaptable
(3) they are super optimized as well
but it doesn’t mean you have to build every time every structure from scratch.
You build a transformer out of pure Torch when you are in the uni. And then you may build some small custom object on top of some stuff that already exists. Or you have to rewrite the cose for deployment (but machine learning engineers are a different breed). Or you are writing open source libraries.
Most of those cases are outside the scope of AI researcher. My dirty small scripts are serving me just fine.
(Ofc you have to know OOP and know how to read it and do it. But it doesn’t mean you have to.)
(1) datasets, models, trainers, learning rate schedules etc etc. are indeed all real world objects
(2) they need to be reutilizable and adaptable
(3) they are super optimized as well
but it doesn’t mean you have to build every time every structure from scratch.
You build a transformer out of pure Torch when you are in the uni. And then you may build some small custom object on top of some stuff that already exists. Or you have to rewrite the cose for deployment (but machine learning engineers are a different breed). Or you are writing open source libraries.
Most of those cases are outside the scope of AI researcher. My dirty small scripts are serving me just fine.
(Ofc you have to know OOP and know how to read it and do it. But it doesn’t mean you have to.)
Everybody liked how I wrote the paper and now I need to write two more papers 😁
Ksenia's Public Channel on #whatever
Everybody liked how I wrote the paper and now I need to write two more papers 😁
(I have to write them anyway ofc.)
I have a lot of privileges but the biggest of them is (I think) that I do not work in software development (whatever development apart from mb games). It seems terribly boring.
David just found an error in the calculations of the project we have been working on since mi-December, and now the results are absolutely INSANE (that is exactly what we had expected).
I feel like I am going through an MDMA trip climax in the middle of the day.
I feel like I am going through an MDMA trip climax in the middle of the day.
Ksenia's Public Channel on #whatever
David just found an error in the calculations of the project we have been working on since mi-December, and now the results are absolutely INSANE (that is exactly what we had expected). I feel like I am going through an MDMA trip climax in the middle of the…
Mb it’s too soon too early.
But I loved the feeling.
But I loved the feeling.
We just got a new server and I spent literally two hours trying to configure it to be able to work with remote files and interpreter from local. I mean, Pycharm 🙄🙄🙄
Also I have been fighting trying everything and finally realized that I don’t have permissions for a project folder ahahaha
David is a genious, I already told you that. His ideas are elegant. Simple. But nobody came up with that before him.
Wrt why the original implementation is not working: einsum is just not optimized enough for matrix multiplication. Threads, batch multiplication, optimization for loading to CPU/GPU in torch’s matrix multiplication just beats whatever advantage we have in the quantity of operations using einsum. We multiply 6 matrix at once using einsum and it is always beaten by the performance of 3 consecutive operations of matrix multiplication in the original transformer.
Wrt why the original implementation is not working: einsum is just not optimized enough for matrix multiplication. Threads, batch multiplication, optimization for loading to CPU/GPU in torch’s matrix multiplication just beats whatever advantage we have in the quantity of operations using einsum. We multiply 6 matrix at once using einsum and it is always beaten by the performance of 3 consecutive operations of matrix multiplication in the original transformer.