– demonstrated experience with machine learning (you don't need to get fancy here if you really know how linear regression and k-means work you're going to be useful) – a clear history of quantitative work with real data (here's where most computer science undergrads fall down "traditional" scientists, social scientists and economists tend to fare well here) ![]() – evidence of solid programming skills in particular one of Python or Java for building systems, plus one of Python or R for modelling, plus reasonably strong SQL. When I interview potential interns, I'm looking for I've never used postgres for key/value, but theoretically thats pretty awesome. Local or on a server? One or multiple users? Fully ACID? Fully SQL compliant? High write/read ratio? Distributed? etc.Īs one commentor mentions, key/value store is possible in postgres but not the others. ![]() Choosing between them depends largely on your operational requirements (what your application will be doing with your DB, what your expectations for data consistency and performance are). Popular relational databases include sqlite, mysql, and postgres. So, unless you have an exotic use case, your best bet is probably a relational database. You also mentioned the data is "small" (which in this case I'll take to mean "fits in memory" and "homogenous records"). You mentioned CSVs, so I assume you are working with columnar data. document store) and then choosing a particular implementation (e.g., postgres vs mysql or mongodb vs couchdb). It's really about choosing a database model (e.g., relational vs. That's just our team though, everyone should do whatever is most effective for them.ĭepends on what kind of data you have and what you want to use it for. I will say R Shiny was nice to have, but after we discovered Plotly's Dash framework it was incredibly easy to let go. There was plenty of stuff in python we weren't about to leave behind, because there was no R equivalent. Performance was generally worse and there wasn't anything it was providing us that Python didn't have. Our team used to have more people writing R, but earlier this year, we got to a point where it became more trouble than it was worth and we dropped it entirely. ![]() It's probably one of our top 3 most used libraries alongside TF (No PyTorch here for production reasons.) Ultimately, we gravitated towards spaCy, because it's so much better in almost every way imaginable, from installation, easy of use, model deployment, to GPU support, and considering the massive amount of analysis it does by default, it maintains spectacular runtime performance. Gensim is far more limited, though good at what it does. NLTK is alright, though results tend to be average at best and installation has been a pain on occasion. We've evaluated pretty much every library on any list you can find online.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |