Skip to content

Latest commit

 

History

History
74 lines (45 loc) · 5.51 KB

File metadata and controls

74 lines (45 loc) · 5.51 KB

What Else Should Data Scientists Learn?

Data Science is a complex job. Learning a lot of statistics and machine learning is the core of the job, but failing to focus on other skills can leave you with weaknesses that will hinder your ability to work effectively. On the other hand, developing basic competencies in other skills can make you much more effective individually, and a joy to work with on a team. Besides statistics and machine learning, what are some other skills that data scientists should practice regularly, and what are some good resources for learning and practicing those skills?

Software Engineering

Data scientists often write code that is only used for a short period of time, and often where they are the only user. If the code becomes too complex and unweildy, it can easily be thrown out to start over because no one is relying on it. In this mode, you won't learn what it takes to write high quality software, as part of a large development team, so that the code is reliable, scalable, and changeable. Professional software engineers are thoughtful about managing the complexity of projects so that changes can be made quickly, safely, and locally. If you practice these skills, you'll be able to write better software with fewer bugs. It pays off even if you are the only user. Don't listen to people who say it's not worth the extra time to use good software engineering practices; you'll move even faster writing good code than you did writing sloppy code.

  • Martin, Clean Code: A Handbook of Agile Software Craftsmanship, 2008
  • Gamma, et. al, Design Patterns: Elements of Reusable Object-Oriented Software, 1994

Scientific Record Keeping

We use reproducible approaches to manage our software and our models. We should do the same for our thoughts and ideas. Ideally, every thought, belief, assumption, and decision that may have an impact on the outcome of any research would be tracked and logged. Keeping a record of what you're doing, but also what you're thinking as you work as a number of uses. It has a legal purpose for establishing intellectual property claims. It is a good reminder of what happened years later when you've forgotten. It is onboarding material for new team members getting up-to-speed on what has already been done. It can help clarify issues when there are doubts about the veracity of your research. It can help you diagnose problems if something goes wrong, including problems with your beliefs or your thinking. Writing your thoughts shapes and clarifies your thoughts.

I typically use markdown notes under version control for keeping records. There are also specialized tools called electronic lab notebooks. I've never found an ideal tool for this that I strongly recommend.

Technical Writing

It doesn't matter how great your ideas are if they aren't communicated clearly in writing. A verbal update in a meeting is no substitute. The meeting doesn't necessarily include everyone who will ever be interested in the idea, and realistically people can't understand new things all at once at the speed of conversation. Make sure everything you do gets documented in a report or summary that is placed somewhere people will search for it. These reports are invaluable to the organization, both as a record of what was done and for storing and retrieving institutional knowledge. They are also invaluable to future you who doesn't remember anything about a two-year-old project you worked on that suddenly becomes important once again.

  • Robert Barrass, Scientists Must Write: A Guide to Better Writing for Scientists, Engineers and Student (2002)
  • Write about your work -> Ask colleagues for feedback, not just from other data scientists -> improve your document -> Repeat for every project

Project Management

Project management is a core life skill that everyone uses all the time. Making dinner, buying clothes, going on vacation, or maintaining a vehicle are all projects that require planning and executing multiple steps. But, not everyone is good at project management. Once there are a lot of steps, a lot of uncertainty, competition for resources and multiple people involved, managing a project requires a more professional skill set. There are many different approaches and frameworks for managing projects that have advantages and disadvantages in different scenarios. If you aspire to be a professional project manager you should practice using lots of different frameworks. But, most people would benefit from choosing one and getting good with it. Use it at work, use it at home, use it for vacation planning. Your new skills and tools will help you out in many aspects of your life. I recommend learning Kanban as a starting point because of its relative simplicity and the emphasis on minimizing multitasking.

  • Joakim Sunden and Marcus Hammarberg, Kanban in Action, 2014

Product Management

What is the long term vision for the product? Where is the product headed? Why is that the best direction? What do the customers need? What's the broad strategy for realizing the product vision?

These questions have incredible importance to the data science team.

Business

Teaching and Learning

Fair hiring practices, Diversity and Inclusion, Rights in the workplace